Summary of the invention
The present invention is directed to the AVS entropy coder, provide a kind of efficient hardware to quicken to realize.The object of the present invention is to provide a kind of method for encoding parallel series entropy and,, realize the real-time of high definition coding thereby reach to improve the speed of encoder based on hardware-accelerated AVS class entropy coding device.
Technical scheme of the present invention is:
A kind of method for encoding parallel series entropy the steps include:
1) the quantization parameter grouping with each transform block walks abreast input and output in two registers group; Its method is:
A) in a clock, deposit one group of quantization parameter in a registers group;
B) current transform block all is input to this registers group by group after, at next clock from this registers group according to the order of zig-zag by group output quantization parameter, begin follow-up grouping quantization parameter is imported another registers group from this clock simultaneously;
C) repeat above-mentioned steps a) and b), realize the ping-pong operation of two registers group;
2) (run level), identifies the EOB symbol simultaneously on first non-zero quantized coefficients of current transform block to adopt the method for Run-Length Coding to calculate the code coefficient of every group of quantization parameter in same clock; Wherein level represents the size of quantization parameter absolute value, and run represents number zero between quantization parameter and the previous non-zero quantized coefficients;
3) (run level) carries out inverted order output with code coefficient;
4) select the code table of current quantization parameter according to the level maximum in the code coefficient of output;
5) select the code table of this transform block EOB according to the level maximum of all code coefficients in the current transform block;
6) step 4) and the selected code table of the step 5) logic determines by Columbus's coding is converted to the bit wide table, and adds and calculate the number of coded bits of current all coefficients of transform block and the number of coded bits of EOB.
Described transform block is the 8x8 piece, and the quantization parameter of described 8x8 piece is divided into 8 groups of data with behavior unit.
The level value of described code coefficient is the absolute value of quantization parameter; The run value calculating method of described code coefficient is: set a variable base0 and be used for being used to write down the number at every group of quantization parameter end adjacent 0 and the line number that a counter counter is used to write down this transform block to run value assignment, a variable base1; Counter=0, base0=0; =8, the base0=base1 of next group quantization parameter.
A kind of encoding parallel series entropy device, it comprises that successively the data input unloading module, Run-Length Coding module, inverted order matrix module, the code table that connect select module, table look-up module, Columbus's coding module, add the bit number module with transform block;
Described data input unloading module is used to handle the quantization parameter of every group of parallel input, and it comprises that two storage matrix are used for realizing the ping-pong operation of data input and output;
Described Run-Length Coding module be used for every group of quantization parameter that the counter register group is imported in same clock code coefficient (run, level),
Described inverted order matrix module is used to finish code coefficient (run, level) the inverted order output of transform block;
Described code table selects module to be used for code coefficient according to current inverted order output, and (run level) determines the code table of current quantization parameter and the code table of this transform block EOB;
The code table that described table look-up module is used for obtaining is mapped as corresponding code word;
Described Columbus's coding module is used for code word is converted to bit wide, obtains the bit wide table of code table correspondence;
The described number of coded bits that is used for calculating current all quantization parameters of transform block with the bit number module of transform block that adds according to the bit wide table.
Described Run-Length Coding module comprises that a sign bit comparator, a counter and some input data comparators are used for every group of input data are compared in same clock with zero, and described input data comparator links to each other with this group end adjacent zeros number comparator with the EOB comparator respectively; Described EOB comparator is connected with a selector, and described this group end adjacent zeros number comparator is connected with run value comparator through a selector; Described counter is connected with described two selectors respectively.
Described code table selects the circuit connecting relation of module to be: the some input size of data comparators that are used for comparison transform block input data determine that with one maximum comparison circuit before the coefficient, a selector are connected respectively; The output of described selector is connected with maximum comparison circuit before described definite coefficient with an EOB MUX respectively; The maximum comparison circuit is connected with some parallel single-stage MUX before described definite coefficient; Described selector is connected with the output of same counter respectively with described EOB MUX; The some input size of data comparator output terminals that are connected with described selector are connected with described EOB MUX, and the output of described EOB MUX is connected with the selected comparator of a code table.
The circuit connecting relation of described table look-up module is: code table type gating switch is connected with the code table gating switch respectively, described code table gating switch respectively through a comparator be connected with a MUX again after a code word gating switch is connected; Described code table gating switch is connected with one yard table number incoming line respectively with described MUX; Described comparator is connected with the input data respectively with described MUX; Described MUX is to every pairing described level value of code table of selection signal reference of code table inter_VLC0, inter_VLC1, inter_VLC2, inter_VLC3, intra_VLC0, intra_VLC1, intra_VLC2, intra_VLC3, chroma_VLC0, chroma_VLC1, chroma_VLC2, to every pairing described run value of code table of selection signal reference of code table inter_VLC4, inter_VLC5, inter_VLC6, intra_VLC4, intra_VLC5, intra_VLC6, chroma_VLC3, chroma_VLC4.
The circuit connecting relation of described Columbus's coding module is: the cascade comparator of input code table number and code table type is connected with an adder 1 through shift unit 1, and this adder 1 input is connected with the output of level absolute value, its output be connected by bit comparator 1; Described cascade comparator is connected with an adder 2 through another shift unit 2, and these adder 2 inputs are connected with the output of level sign bit, are connected with run value output through a shift unit 3 through a selector 1, simultaneously its output be connected by bit comparator 2; The described bit comparator 1 of pressing is connected with a shift unit 4 through an adder 3 by bit comparator 2 with described, and two outputs of described shift unit 4 and described cascade comparator are connected through the input of a subtracter with a selector 2; The cascade comparator output terminal that is connected with described shift unit 2 is connected with a subtracter simultaneously, and the input of this subtracter is connected with the output of code word through a shift unit 5, and the output of this subtracter is connected with the input of described selector 2 simultaneously; Described selector 2 inputs are connected with described code word output.
Parallel AVS class entropy coding method of the present invention comprises:
1) 8 of quantized data parallel input and output: in order further to improve the speed of hardware designs, coefficient for the 8x8 piece after quantizing divides 8 clocks to be written into register, each clock is written into 8 coefficients, these 8 relevant positions that are deposited with current registers group according to the coefficient of natural order input according to the order of Zig-Zag scanning;
2) ping-pong operation of two groups of registers group: the data of a 8x8 piece are written into a registers group at 8 clocks, finish successively since the 9th operation that clock is follow-up like this, and export the result of 8 coefficients at every turn, in order to ensure carrying out smoothly of streamline, at the 9th clock, the data of another 8x8 piece begin to be written into another identical in structure registers group, like this, since the 9th clock, the coefficient of the previous 8x8 piece of each output is carried out simultaneously with the coefficient that is written into current 8x8 piece, like this, the input and output of data are successive, and are chosen between two registers group with ping-pong operation and change;
3) Zig-Zag of Run-Length Coding scanning and (run, obtaining level), input 8x8 piece coefficient is stored in the registers group successively according to the order of Zig-Zag scanning, Run-Length Coding directly calculates (the run of each coefficient in order, level), but, here the input and output of data are 8 parallel, 8 coefficients of promptly each processing, and run determine not only relate to current 8 coefficients, and it is relevant with last nonzero coefficient in preceding 8 coefficients, handle (the run of current 8 coefficients, level) information, the run of last nonzero coefficient in current 8 coefficients of needs record (being the number of back zero) is as the basis of 8 coefficient calculations run of rear adjacent value;
4) carrying out code table according to the maximum of code coefficient selects: it all is that level value with previous code coefficient is as foundation that the code table that entropy coder software is realized switches, the selection that is current code table must be waited until after the code table of its previous coefficient is determined and could carry out, and the hardware design methods of class entropy coder of the present invention is based on 8 coefficient parallel processings, if according to the method that software is realized, need 8 clocks just can handle one group 8 and row coefficient so.Therefore, based on the function of software approach, the present invention has revised the rule that code table switches, and promptly so that the maximum of code coefficient is as the foundation of code table switching before the current coefficient, 8 comparators of such clock just can be realized the code table of 8 coefficients.And class entropy coder hardware capability of the present invention is realized consistent with software fully.
5) determine the code table that coding EOB selects according to the maximum of all 8x8 piece coefficients: for each 8x8 piece, will add EOB before first nonzero coefficient, EOB is corresponding one by one with code table.In the AVS entropy coder, the coding of coefficient inverted order is from back to front encoded successively, and all by the nonzero coefficient decision of its rear adjacent, determining with coefficient of EOB code table is identical for the code table that each coefficient is selected, and the code table that it is selected is by first nonzero coefficient decision.But in fact, according to method 4), the selection of EOB code table is determined by the maximum of all coefficients of 8x8 piece.
6) set up 8x8 piece coefficient maximum and the coding EOB bit number between direct relation: according between code table and the EOB one by one the mapping relation, class entropy coder hardware design methods of the present invention is set up the direct relation between 8x8 piece coefficient maximum and the EOB bit number, and does not need to determine EOB by corresponding code table.
7) conversion between code table and the bit wide table: the bit wide that realizes entropy coder by logic determines is calculated, because the code table that each coefficient coding is selected, do not need to obtain its accurate code word, as long as but obtain its bit wide, so the present invention with doing a conversion between the bit wide table, directly is mapped as the codenumber in the code table bit number of this codenumber of coding at every code table.
8) treatment of special situation in the code table: according to the AVS standard, the codenumber value of finding in original appendix code table also will be done subsequent treatment, if promptly original level is a negative, codenumber=codenumber+1, the needed bit number of the codenumber that finds in the code table is just different for positive and negative number like this, but most of pairing bit wide is identical, except several individually.Just can be handled by analyzing every code table one by one like this, thereby be obtained the bit wide table of every code table these special situations.
AVS class entropy coder hardware unit of the present invention comprises: data inputs dump module, Run-Length Coding module, inverted order matrix module, code table select module, table look-up module and Golomb coding module, add the bit number module with transform block.
1) data inputs dump module is used to handle the coefficient of parallel input, according to the coefficient of the scanning sequency storage input of Zig-Zag, makes the later coefficient of dump by the sequence arrangement of lining by line scan.
2) data input dump module of the present invention has been used the registers group of 2 same structures on hardware designs, reaches the effect of ping-pong operation, so reaches the water operation to the 8x8 piece.
3) data of parallel input Run-Length Coding resume module 1 of the present invention), obtain (the run of each coefficient, level), because (the run of input 8x8 piece coefficient, level) relevance between, the coefficient of current input need be used the information that coefficient run has been handled in the front, at the run that needs to increase by one 8 parallel input coefficient on the record on the hardware designs, calculates the benchmark of run as current 8 parallel input datas.
4) code table of the present invention is selected the needs of module according to hardware designs, revised the described code table of selecting the present encoding coefficient according to the level value of a last code coefficient of standard, but determine the code table of current coefficient according to the maximum of the coefficient before the present encoding coefficient in the current 8x8 piece, like this, the selection of each coefficient code table does not have dependence each other in fact, thereby a clock just can be handled 8 parallel coefficients of importing.
5) code table of the present invention is selected module 4) needn't wait until also that for the selection of EOB code table first non-zero handles and could determine equally, a maximum that in fact only need all coefficients of calculating just can have been determined the code table of EOB.
6) flashback functions of modules of the present invention is according to the AVS standard code, finishes the code coefficient inverted order of transform block. basic skills is exactly to utilize the counter counter of one 8 state, comes the input and the inverted order output (as shown in figure 12) of control data.
7) table look-up module of the present invention is according to " code table selection " determined code table of module, obtain the code word of current coefficient, just in the precoding stage, do not need the actual real code word that obtains each coefficient, but only need the bit wide of code word, characteristics according to the Golomb code word are easy to obtain every pairing bit wide table of code table, but need to handle the processing of some particular points (as: in the position of some settings, because the positive and negative bit wide that can influence code word of data, we need do sign with sign bit).
8) according to 6) described, the particular point that code table is converted in the bit wide table is that the different processing method of positive negative causes.
When 9) AVS class entropy coder hardware unit of the present invention was used in precoding, code table was reduced to the bit wide table, and can adopt logic determines to realize looking into the function of bit wide table on the hardware designs fully.
10) Golomb coding module of the present invention mainly is with 7) bit wide of gained encodes according to the different rank of Golomb, also calculates the bit number of required coding simultaneously for the escape incident.
11) to add the bit number module functions with transform block be exactly that the number of coded bits of all coefficients of transform block and the number of coded bits of EOB are all added up to data of the present invention.
Good effect of the present invention:
Traditional entropy coding, need finish each parts of whole entropy coding, and can only serial carry out, inefficiency, in handling, HD video is difficult to accomplish real-time coding like this. and when adopting encoding parallel series entropy device of the present invention to carry out entropy coding, processing speed is 8 times of conventional method, has improved the performance of whole encoder greatly. and the present invention simultaneously is on particular hardware realizes, optimized the VLC form that accounts for resource most, area reduces in a large number; And realized with logic determines, saved great amount of hardware resources.
Embodiment
The present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
In order to realize the advantage of hardware parallel processing, and the conflict in the budget law design on the cost of economize on hardware design to greatest extent and the solution hardware designs, the present invention proposes the equivalence of some algorithms, these schemes comprise: parallel zig-zag scanning, the parallel processing of coefficient in the Run-Length Coding, eliminate code table and select dependence between coefficient, the foundation that the EOB code table is selected, precoding module to table look-up to table look-up on the simplification of computing and the hardware designs logic realization of computing.
Principle of the present invention as shown in Figure 2.The present invention is based on hard-wired class entropy coding method and device, the ping-pong operation that comprises two registers group that are used for the parallel input of image residual error data, the image residual error coefficient is carried out the Run-Length Coding module of scanning in a zigzag, select the code table of the used code table of coding to select module for each code coefficient, for the code coefficient (run behind the selected code table, level) be mapped as the table look-up module of code word and the codenum (code word) that obtains of tabling look-up is mapped as the Golomb coding module of bit wide (bit number).At first the module before the entropy coding (comprises motion estimation and compensation, change quantization) residual error coefficient that produces coding is deposited with in the register in two registers with parallel input and according to predetermined order, this predetermined order is exactly the position that each coefficient is placed in correspondence according to the order of the zigzag scanning that will carry out, the Run-Length Coding of back just can be peeked from register and Run-Length Coding according to the every successively row of natural order like this, the result of Run-Length Coding produces the (level of each coefficient, run), then, to these (level, run) select code table according to the value of level, here AVS 2D-VLC has 19 code tables that are used for coefficient coding, the code table of each coefficient is by the level decision of its level and encoded coefficient, then, the code word that from each code table, the obtains bit wide that obtains to write code stream according to the structure and the exponent number of Golomb code word again.Each module that this process relates to will be described in detail below.
As the flowing water block diagram of Fig. 3 for class entropy coder of the present invention, it comprises following submodule:
1. the memory allocation of data:
Data bit width behind the change quantization is 12, and the data of each 8x8 piece are advanced into register with one at every turn, is stored in the registers group as Fig. 4 according to the structure of zig-zag scanning.In order to guarantee carrying out smoothly of 8 parallel pipelines, this module has adopted the registers group of two 8x8, and per 8 clocks conversion once.The following describes the transformational structure of storage matrix inside.
Because data are 8 parallel inputs, so for fear of in the conflict of carrying out order on the Run-Length Coding, we just deposit data in registers group, the parallel processing that this mode of marshal data has again solved data well according to the order of zig-zag when storage.As shown in Figure 4, zig-zag is 0,1,8,16,9,2,3,10,17,24 in proper order ..., corresponding respectively register matrix position is 0,1,2,3,4,5,6,7,8,9 ...With one of them matrix is example, and the clock arrangement of data flow is described: first clock, 8 data of input are deposited address 0,1,8,16,9,2,3,10; Second clock deposits 17,24 with the data of input, and 32,25,18,11,4,5 The 8th clock deposits 53,60 with the data of input, 61,54,47,55,62,63; To the 9th clock, the data of input have forwarded on the another one matrix, and current matrix begins dateout, 0,1,8,16,9,2,3,10; The 10th clock, output 17,24,32,25,18,11,4,5 The 16th clock, output 53,60,61,54,47,55,62,63.
2. Run-Length Coding:
The Run-Length Coding algorithm structure as shown in Figure 5, this module functions is exactly that the number in the registers group is carried out Run-Length Coding, each number of scan matrix, coefficient to each non-zero, produce (a run respectively, level) right, on the coefficient of first non-zero, indicate simultaneously the EOB symbol, so be not among zero the block (cbp is non-vanishing) an EOB symbol to be arranged all entirely for each.
Level is the absolute value of input coefficient, can directly ask the coefficient of input to thoroughly deserve.Key is to ask run, because data are parallel inputs, so will construct the reference variable of a run, be made as base0, tentation data is 8 parallel inputs, is designated as { a0, a1, a2, a3, a4, a5, a6, a7}, base0 is used for representing the number of a0 preceding adjacent 0, if a0 is not equal to 0 like this, the run of a0 equals base0; Known the run of a0, with the a0 in the delegation, a1, a2, a3, a4, a5, the run of a6 and a7 just can in the hope of, so, the parallel processing of data just can realize, the interior data that just can handle delegation of clock.
The calculating of base0 is carried out in the following manner:
Parallel be input as example with 8, a counter counter from 0 to 7 at first will be arranged, what be used for representing input in this section clock is the data of same block.Clearly, when counter=0, base0=0 (counter=0 when first group of 8 data of input, also corresponding first clock, counter equal several just corresponding which clocks); Other the time base0 then be the amount of a variation, for this reason, can define another one variable base1, in 8 data of this row that base1 represents to import, adjacent zero the number at end.For 8 numbers of input, for example: for 0,12,3,0,0,4., 0,0; Base1=2.For 1,0,3,0,0,0,3,7; Base1=0.For 0,0,0,0,0,0,0,0; Base1=8.Base0 can be definite so in sum, counter=0, base0=0;=0 o'clock (be that counter is not equal at 0 o'clock,!=8, the base0=base1 of next line
As shown in Figure 6, the hardware configuration of whole module roughly can be divided into 4 level production lines, with the 8 parallel examples that are input as 4 grades of fluvial processeses is described:
The first order, main some parallel comparators and a counter controls.Whether the coefficient that uses 8 comparators to produce each input respectively is zero indications, be designated as sign_zero[0~7], be respectively 0,1,0,1,0,0,1,1 such as 8 identifiers that produced, then be designated as sign_zero[01010011] (sign_zero=01010011); Counting is the umber of beats of input valid data, and per 8 bats are the data of same block 8x8.
The second level mainly comprises two parallel 1 logics of looking for.By from sign_zero[7] to sign_zero[0] look for 1 logic can obtain the coefficient of first non-zero each row, and be denoted as 1, other are zero.Like this each the row EOB[7:0] just obtained.The EOB[7:0 of every row clearly] in have only 11 at most, perhaps all be zero.From sign_zero[0] to sign_zero[7] look for 1 logic can obtain behind last nonzero coefficient of this row zero number, be designated as base1.
The third level mainly comprises two selectors.Utilize the counter selector that EOB is controlled, in same,, guarantee the fast maximum situations that 1 EOB equals 1 that only occur of each conversion in case after first nonzero coefficient having occurred, just later coefficient all is denoted as 0; The another one selector is the run value that is used for constructing every first coefficient of row, is designated as base0[5:0];=8, base0<=base1, if base1=8, base0=8+base0.
The fourth stage, mainly comprise 8 comparators, be used for obtaining each coefficient (run, level), here be noted that traditional zig-zag, only the encode coefficient of non-zero is 8 parallel structures because adopt here, so when the coefficient of input equals zero, note (run=0, level=0), also participate in coding, the bit number that needs of only encoding equals 0.Run value base0 before 3rd level has been known first coefficient of every row is in conjunction with sign_zero[7:0], by comparator be easy to obtain this each coefficient of row (run, level).
3. code table is selected
According to the AVS standard, initializaing variable maxAbsLevel (level of expression absolute value maximum) is 0, and first nonzero coefficient option code Table V LC0.The size that compares each coefficient absolute value (abslevel) and maxAbsLevel then, if abslevel is greater than maxAbsLevel, code table then takes place to be switched, otherwise code table is constant, after switching code table the value of abslevel is composed to maxAbsLevel, so circularly each coefficient decoding is finished.
With regard to encoder, the code table of the next coefficient of coding depends on the code table that a coefficient is selected, and the maxAbsLevel of a last code table.Because what we adopted now is the structure that walks abreast, obviously such algorithm is difficult to realize with hardware.We obtain so important conclusion by analysis: the code table that current coefficient is selected, only determine, so, can obtain the code table of this coefficient as long as we can try to achieve the maximum before this coefficient by the maximum of code coefficient.The algorithm flow of selecting according to above conclusion code table as shown in Figure 7.
Algorithm steps following (with the 8 parallel examples that are input as):
The first step: successively relatively 8 coefficient level values (level[0~7]) obtain 8 maximums (be designated as max[0~6] and maxlevel1}, these 8 maximums are defined as respectively: max0=max{level0}, max1=max{level0, level1}, max2=max{level0, level1, level2} ..., max6=max{level 0~6}, maxlevel1=max{level0~7}, maximum numerical value in maxlevel1 this row of representing to import here;
Second step: determine to have imported in the 8x8 piece maximum of data, be designated as maxlevel0.Obviously, during counter=0, maxlevel0=0;=0 o'clock, compare maxlevel0 and maxlevel1, getting the greater is maxlevel0;
The 3rd step: determine to select the parameter value of code table, be designated as tab_value 0~7.According to our conclusion: the code table that current coefficient is selected, only determine by the maximum of code coefficient.Obviously, to 8 coefficient level0~7 of input, the code table of level0 is determined by the size of tab_value0=maxlevel0; The code table of level1 is by tab_valuel=max{maxlevel0, and max0} is definite ..., the code table of level7 is by tab_value7=max{maxlevel0, and max6} determines;
The 4th step: according to the corresponding code table of AVS Standard Selection.
In addition, the selection of EOB code table is level decision by last coding nonzero coefficient according to standard, but in fact according to above analysis, the selection of EOB code table is determined by number maximum in the 8x8 piece.
The hardware pipeline structure of whole module is divided into 4 grades and finishes as shown in Figure 8:
The first order mainly comprises 56 comparator sum counter counter[2:0].In order to guarantee in a clock, to try to achieve max0~7[11:0].The data of each input compare with other 7 data respectively, need 56 comparators altogether.Counter[2:0] be used for representing to import the umber of beats of data, per 8 to clap be a 8x8 piece.
The second level comprises one according to counter[2:0] selector.Input be first line data time, maxlevel0[11:0] equal 0, other the time then equal the greater of maxlevel0 and maxlevel1.
The third level determines that the maximum comparison circuit before each coefficient comprises 7 parallel comparators.Max0~6[11:0] respectively with maxlevel0[11:0] relatively, obtain parametric t ab_value1~7 of tabling look-up of each coefficient; The parameter of tabling look-up of first coefficient is then only by maxlevel0[11:0] determine i.e.: Tab_value0[11:0]=maxlevel0[11:0].
The fourth stage comprises 8 parallel single-stage MUX, and EOB MUX and EOB code table are chosen comparator.
4. table look-up
The main effect of this module is that the inquiry code table is determined code word (codenumber) value.Modal method is that code table is existed among the ROM hard-wired the time, inquires about according to the address then, and code table ROM stores as shown in Figure 9.
Each (run, level) code table that need inquire about have been obtained in the 3rd part; Then, the MUX addressing ROM that selects based on code table arrives corresponding code table; Then, according to the codenumber value in the level selection correspondence table.
The difference of 2D VLC entropy coding and precoding precoding maximum is exactly that the former need generate code stream with codenumber, and the latter only needs the bit number of codenumber code stream.According to the bit wide of codenumber in the code table and Golomb code word, can obtain the bit wide table of each code table correspondence, with intra_VLC4 the mapping relations of example explanation code table and bit wide table.
Table 1.AVS standard appendix table intra_VLC4
Secundum legem, intra_VLC4 need encode with second order Golomb, and we can be according to the codenumber in the code table, and contrast Golomb coding rule finds corresponding information bit bit number.The code table of conversion is as follows:
Table 2.intra_VLC4 bit wide table
For the escape incident, we can allow the bit number of information bit equal 7 and do sign.Two special circumstances (run=1 in the table 2, level=2) and (run=0, level=6) value of two changes appears in meeting, this is because secundum legem, the codenumber value of finding in original appendix code table also will be done following processing: if original level is a negative, and codenumber=codenumber+1.The codenumber of the two groups of special data correspondences in top is respectively 11,27, if level is a positive number, encoded information bits is respectively 3,4, if level is a negative, encoded information bits then is respectively 4,5.According to top method, change other code tables.Make discovery from observation,, have only 1~6 and 7, so just can replace rom to realize precoding with the simple logic design for the information bit bit number in the code table.Figure 10 is to be that example has illustrated how to obtain (run with intra_VLC4, level) information bit bit number, input be certain coefficient (run level), table type and corresponding code table, finally obtains the information bit bit number of this coefficient by a series of logic determines.
On above analysis foundation, the hardware configuration of tabling look-up can be divided into 3 level production lines as shown in figure 10:
The first order is made of the gating switch and the cascade of code table gating switch of code table type, navigates to unique specific code table with it.That is to say that at any time having only a code table is gating
The second level is made of some parallel comparators, according to each coefficient (run level) determines code word.On hardware designs, table look-up and use comparator to realize, what deserves to be explained is that the comparison other here can change to some extent according to the code table difference.For example inter_VLC0 can use run to be comparison other, and inter_VLC6 can use level to be comparison other, and purpose is in order to reduce the quantity of comparator as far as possible in a word.
The third level mainly is made of a gating switch and MUX.Gating switch is identical in the gate logic of front, is used for selecting wherein in 3 paths; The selection signal and the partial comparison signal of MUX will replace, be inter_VLC0, inter_VLC1, inter_VLC2, inter_VLC3, intra_VLC0, intra_VLCl, intra_VLC2, intra_VLC3, chroma_VLC0, chroma_VLC1, the selection signal sel signal of these 11 table selectors of chroma_VLC2 is with reference to the level value; The selection signal sel signal of other tables is with reference to the run value.
Just (run, the bit number of information bit level) when bit number equals 7, are (escape) incident of overflowing can to obtain every pair by above three steps.
5. Columbus encodes
This module functions is to obtain every pair of (run, bit number level).The information bit of Golomb sign indicating number has following rule:
M=floor?log
2(codenum+2
k),
Wherein, k represents the coding exponent number of Golomb sign indicating number, the numerical value that codenumber indicates to encode, the information bit bit wide that the M presentation code obtains.Thereby the needed bit number of coding codenumber of each coefficient is:
M_stream=2M+1-k
For the escape incident, the coding of run still uses the pairing Golomb exponent number of current code table coding, and level is then according to the exponent number coding of Columbus's sign indicating number of prescribed by standard.
The hardware of whole module is realized dividing following a few step to finish as Figure 11:
The first step: comprise a cascade comparator, 3 shift units and a selector.According to the AVS standard, the cascade comparator is by input signal table_type[1:0] and table_num[2:0], the Golomb exponent number of obtain encoding run and escape incident.Two shift units wherein are the Golomb exponent number displacements to run and level.The another one shift unit then is to run[5:0] displacement.Selector then is an added constant when being used in the person of the choosing escape incident coding run, and sign is that timing adds 60, otherwise adds 59.
Second step: comprise two adders.Finish C_level=level+1<<esc_rank, and C_run=2*run+59/60+1<<rank.
The 3rd step: comprise two parallel comparators.Be used for trying to achieve the information bit bit wide of coding level and run, owing to hardware is realized and can not directly be taken the logarithm, so can finish this function by a logic of looking for from a high position to the low level.
The 4th step: comprise an adder.Be M_level[3:0]+M_run[3:0].
The 5th step: comprise two shift units and two adders (subtraction).Try to achieve the number of coded bits of escape incident and non-escape incident respectively.
The 6th step: comprise a selector.As value[2:0] when equaling 7, be expressed as the escape incident and then export bits_esc[5:0]; Otherwise output bits[3:0].