Background technology
Huffman coding is a kind of coding method that utilizes the statistical property of information symbol that Huffman (Huffman) proposed in nineteen fifty-two, i.e. coding method from top to bottom.Huffman coding is a kind of entropy coding that generally uses at present, also is one of basic and main coding techniques.
Figure 1 shows that a kind of simple realization flow of huffman coding, comprise the steps:
Step 101: the cell symbol that adds up to N, by the probability P of each cell symbol appearance
i(i=1,2 ... N) descending sequence arrangement is P
1〉=P
2〉=... 〉=P
N
Step 102: with the probability addition of two cell symbols of probability of occurrence minimum, synthetic probability; With this probability with the probability of other cell symbols sequence arrangement by size again;
Step 103: judge whether that probability is 1, if execution in step 104 then, otherwise go to step 102;
Step 104: with line the cell symbol is coupled together, progressively from after encode forward, each node has two branches, the tax 0 big to probability, the tax 1 that probability is little (also tax 1 that can be big, the tax 0 that probability is little) to probability, arrive end-node through behind several nodes, be also referred to as end points;
Step 105: will be from first node to end points 0 or 1 line up in order is exactly the code word of the pairing cell symbol of this end points.
Figure 2 shows that the codeword structure schematic diagram of huffman coding.Wherein, white circle is represented intermediate node (Internal Node), and gray circles is then represented end points (Result Node).As can be seen, the length of this code word is variable.According to above-mentioned coding flow process as can be known, the shortest code word of cell symbol correspondence of probability maximum, and the longest code word of cell symbol correspondence of probability minimum so just can shorten total code length.
Normal at present employing two is advanced to set search method above-mentioned code word is decoded.Its basic principle is from first node, from code word, read a bit at every turn, according to 0 or 1 branch that judge to select binary tree, the code word that judges whether to search out needs according to the value of branch node still is next step search of needs then, and the bit that has read can abandon.Can see that two to advance to set the number of times of searching under the search method worst case be the length of maximum length code word in the Huffman code word.After search obtains code word, search the code word that sets in advance and the mapping table of cell symbol again, just can obtain the cell symbol of this code word correspondence.
See that from the angle of grouping two advance the structure of search tree, can think that two block lengths of advancing search tree are fixed as 1.In decoding time, by bit, promptly once analyzed a bit to the search of code stream.Two Hofmann decoding methods that advance to set search method can reach very high decoding efficiency, but need each intermediate node of storage and end points.As can be seen from Figure 2, because each node code word of a corresponding bit only, but need distribute the memory space of certain-length for each node, therefore, existing two advance to set search method need consume more memory space.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the present invention is further elaborated below in conjunction with accompanying drawing.
The embodiment of the invention is carried out the grouping of variable-length to all code words in the huffman code table, and the variable length code word slice section that obtains according to grouping generates node, and described node is formed elongated degree divide the group Huffman search tree; Then, divide the group Huffman search tree that the code stream that huffman coding forms is searched for according to described elongated degree.
The node structure storage that the elongated degree of the embodiment of the invention divides the group Huffman search tree as shown in Figure 3.Suppose that memory length is 16 bits, comprise the Endpoint ID of a bit, be used to represent whether this node is end points, for example value is that 1 this node of expression is an end points, value is an intermediate node for this node of null representation, can certainly represent that this node is an intermediate node with 0, represents that with 1 this node is an end points.Ensuing three bits are used to represent the bit number of the code word fragment of downstream site, are called the block length (Segmentation Length) of next stage node.12 remaining bits then are used to store the first address side-play amount of the first address or the next stage node of next stage node, described first address side-play amount can be the side-play amount with respect to first node address, also can be the side-play amount with respect to the even higher level of node first address, below unification be represented with first address.The bit number that is used for stores packets length is not limited to above-mentioned value, can adjust according to actual needs.Total memory length of node also is not limited to 16 bits, can be worth for other, for example is that 24 bits, 32 bits or 64 compare top grade.The memory contents of end points comprises the code word fragment length and the cell symbol of Endpoint ID, end points.Just directly obtain the cell symbol after search finishes like this, and need not to search again the mapping table of code word and cell symbol, improved decoding efficiency.
The embodiment of the invention comprises following two basic steps:
A, the code word in the huffman code table is carried out the grouping of elongated degree, the variable length code word slice section that obtains according to grouping generates node and described node is formed elongated degree divide the group Huffman search tree.Therefore the method for the grouping of elongated degree can have many kinds, and resulting elongated degree divides the group Huffman search tree also may be not unique.
For example, a kind of method of simple elongated degree grouping is as follows: the length of first order node is the short code word length in the code table, and middle length at different levels is 3.The afterbody of each branch is according to the length of remainder codewords, and its length can be 1,2, and perhaps 3.The method of so just having formed a kind of grouping.
In memory space, generate elongated degree then and divide the group Huffman search tree, specifically, comprise the steps:
A, be node memory allocated at different levels spaces.Belong to each node of same superior node, its memory space is continuous, and the block length of neglecting present node greatly of memory space and deciding.If such as the block length of present node is 2, then length be the code word fragment one of 2 bits have 00,01,10 and 11 amount to 4 kinds may, if the memory space of a node is 16 bits, then need to distribute at least 4 16 bit storage space; If the block length of present node is 8, if then node storage space is to need to distribute at least 256 16 bit storage space under the prerequisite of 16 bits.
B, the corresponding Endpoint ID of code word fragment allocation that grouping obtains to code word.Judging whether this code word fragment is last segment of a complete code, if then Endpoint ID is true, otherwise is false, can represent with 0 or 1 respectively.
For Endpoint ID is false code word fragment, next stage node grouping length, first address and the Endpoint ID of this node correspondence is saved in the memory space of this node; For Endpoint ID is genuine code word fragment, then Endpoint ID, code word fragment length and cell symbol is kept at the memory space of this node.In addition, if this node is empty node, then still keep this memory space for empty.
B, divide the group Huffman search tree that code stream is searched for, obtain the cell symbol of code word correspondence according to described elongated degree.
The flow process that the code stream that the embodiment of the invention divides the group Huffman search tree that huffman coding is formed according to elongated degree is decoded comprises the steps: as shown in Figure 4
Step 401: with first node is present node, obtains first order node grouping length and address according to the content of storing in the first node.
Step 402: the intercepting code word fragment identical with block length at the corresponding levels from the Huffman code stream of input according to the code word fragment that obtains and the first address of node at the corresponding levels, searches present node from node at the corresponding levels.
For example, if the chip field is " 011 ", the code word fragment of first node correspondence is " 001 " after the node first address then at the corresponding levels, and the code word fragment of second node correspondence is " 010 ", and the 3rd node is exactly present node; In like manner, if the chip field is " 1101 ", the 13rd node of storage is present node after the node first address then at the corresponding levels.
Step 403: judge whether present node is end points, if then go to step 405, otherwise go to step 404.
Step 404: according to the content of storing in the present node, obtain next stage node grouping length and node first address, go to step 402 then.
Step 405: the cell symbol of end points storage is exported as decoded result.
To continuous Huffman code stream, repeat the processing procedure of above step 401 to step 405, finish so all decode up to code word.
With a concrete example decode procedure of the present invention is further specified below.Suppose one group of cell symbol is carried out huffman coding, obtain Huffman code word as shown in table 1:
Code word | Length | The cell symbol |
0 | 1 | a1 |
1000 | 4 | a2 |
1001 | 4 | a3 |
1010 | 4 | a4 |
101100 | 6 | a5 |
101101 | 6 | a6 |
101111 | 6 | a7 |
1100 | 4 | a8 |
11010 | 5 | a9 |
11011 | 5 | a10 |
1110 | 4 | a11 |
1111 | 4 | a12 |
Table 1
Code word shown in the his-and-hers watches 1 is carried out elongated degree grouping, the length of first order node is the short code word length 1 in the code table, middle 1 grade length is 3, the afterbody of each branch is according to the length of remainder codewords, its length is 1 or 2, and a kind of elongated degree that obtains divides the group Huffman search tree as shown in Figure 5.Wherein, diamond is represented first node, and the white ovals frame table shows intermediate node, and the grey oval frame is represented end points, and the black oval frame is then represented empty node.
Be the memory space that nodes at different levels distribute so, its storage organization is:
For
node 501, node headed by it, so end marker is made as 0; Its downstream site 502,503 length are 1, and then subordinate's block length is made as 1, and the downstream site first address is the memory address ADD502 of
node 502, so the storage organization of
node 501 is:
For
node 502, it is an end points, so end marker is made as 0, and the code word fragment length is 1, and the Hofmann decoding content is a1, so the storage organization of
node 502 is:
For
node 503, it is an intermediate node, and subordinate's block length is 3, and the downstream site first address is
node 504 corresponding address ADD504, and the storage organization that then obtains
node 503 is:
……
For
node 516, it is an end points, and the code word fragment is 1, and corresponding Hofmann decoding content is a9, that is:
……
Suppose that the binary code stream that huffman coding obtains is 10011110110110010110010100......, then when decoding, divide the group Huffman search tree according to elongated degree shown in Figure 5, at first the next stage block length according to first node is 1, the downstream site first address is ADD502, first bit of binary code stream is 1, and this first address is added 1, searches node 503; Then, next stage block length according to node 503 is 3, the downstream site first address is ADD504, and the content of the 2nd to the 4th bit of binary code stream is 001, ADD504 adds 1 with first address, then search node 505, so just obtained first code word 1001, the Hofmann decoding content that obtains this code word correspondence accordingly is a3.Repeat such search procedure for code word afterwards, just above-mentioned code stream is decomposed into: 1,001; 1,110; 1,101,1; 0; 0; 1,011,00; 1,010; 0; ....Wherein comma is a boundaries of packets, and branch is the code word segment boundaries.It is a3a11a10a1a1a5a4a1...... that thereby decoding obtains corresponding Hofmann decoding content
As can be seen from Figure 5, for node not at the same level, its block length may be different; For with the one-level node, its block length also may be different.For example node 512 and node 517 belong to same one-level, but the block length of node 512 is 2 bits, and the block length of node 517 is 1 bit.But, all be identical with the block length of those nodes that have identical superior node in the one-level, for example node 512 is to node 516.The first address of next stage node of storage is the address of node 512 in its superior node 507, and node 512 to node 516 is storages continuously, therefore can search node 512 any one node to the node 516 according to this first address.
Code word 101110 is empty, that is to say not have this code word in the code table, and this explanation possibility code stream error code occurs or mistake appears in decode procedure.
If to above-mentioned code streams by using traditional two advance the tree search, then node structure is as shown in Figure 6.Compare with Fig. 5, the node among Fig. 5 add up to for 17 (comprising first node and empty node), and node adds up to 23 (not including first node) among Fig. 6.If each node needs identical memory cell, the memory space that method then shown in Figure 5 needs is than 6 memory cell of lacking among Fig. 6, if the required memory space of the node unit of these two kinds of methods is identical, then among this embodiment, variation group Huffman coding/decoding method has been saved 26.1% memory space.
In addition from number of comparisons, the two node progression that advance tree are 6, and elongated degree to divide the group Huffman search tree be 3.As code word 1001, divide in the group Huffman search tree method at elongated degree and only need carry out 2 times and judge the success of just can decoding, and two advance search tree and need carry out 4 judgements and could decode.Though the present invention program need additionally obtain grouping information, is coupled in the node by the information with result and intermediate demand, can finish decoding efficiently.
The variation group Huffman decoding device of the embodiment of the invention comprises grouping module 701, code table information searching module 702 and search module 703 as shown in Figure 7.Wherein,
Code table information searching module 702 is used to store the relevant information that elongated degree divides the group Huffman search tree, comprises the information that this elongated degree divides each node of group Huffman search tree.If this node is an intermediate node, then nodal information comprises Endpoint ID, next stage block length and next stage node first address information; If this node is an end points, then nodal information comprises Endpoint ID, code word fragment length and Hofmann decoding content.Described first address information can be the physical address of the first node of next stage, also can be the side-play amount of the physical address of the first node of next stage.May further include generation unit in the code table information searching module 702, be used to generate described elongated degree and divide the group Huffman search tree, just according to the block length of nodes at different levels, distribute elongated degree to divide the memory space of each node of group Huffman search tree, and store the relevant information of each node into each memory space.
Code table information searching module 702 also comprises the end points judging unit, be used to judge that present node is end points or intermediate node, if intermediate node, the Endpoint ID of then described this intermediate node of generation unit is set to vacation, and with the next stage node grouping length of this node correspondence, the memory space that first address stores described intermediate node into; If end points, the Endpoint ID of then described this end points of generation unit is set to very, and the cell symbol of the code word fragment length of this end points and end points correspondence is stored into the memory space of described end points.
Code table information searching module 702 divides the relevant information of group Huffman search tree to be sent to grouping module 701 and search module 703 respectively described elongated degree.
Grouping module 701 is used to receive the Huffman code stream, according to the block length from the next stage node of search module 702, Huffman code stream is divided into groups, and obtains the current code word fragment that needs search; If then divide the block length of the first node indication of group Huffman search tree first when carrying out, Huffman code stream is divided into groups and obtains the code word fragment that current needs are searched for according to elongated degree from code table information searching module 702.Gained code word fragment is sent to search module 703.
Search module 703, be used to receive code word fragment from grouping module 701, and reception divides the nodal information of group Huffman search tree from the elongated degree of code table information searching module 702, according to physical address in the described nodal information or physical address side-play amount, described code word fragment is searched for, find and the corresponding node of described code word fragment.
Also further comprise judging unit and output unit in the search module 703, described judging unit is used for judging according to the Endpoint ID of the node that is searched whether described node is end points, if the notice output unit extracts the cell information of storing in this node and externally output.Described output unit then is used to export described cell symbol.
If this node of judgment unit judges is not an end points, then search module 702 is sent to grouping module 701 with the next stage node grouping length of this node.
The above only is preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of being done within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.