US20130044810A1 - 2-bin parallel decoder for advanced video processing - Google Patents

2-bin parallel decoder for advanced video processing Download PDF

Info

Publication number
US20130044810A1
US20130044810A1 US13/657,431 US201213657431A US2013044810A1 US 20130044810 A1 US20130044810 A1 US 20130044810A1 US 201213657431 A US201213657431 A US 201213657431A US 2013044810 A1 US2013044810 A1 US 2013044810A1
Authority
US
United States
Prior art keywords
bit
decoding
context
bin
decoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/657,431
Inventor
I-pieng Peter Kao
Jack Benkual
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital Video Systems Inc
Original Assignee
MStar Semiconductor Inc Taiwan
Digital Video Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MStar Semiconductor Inc Taiwan, Digital Video Systems Inc filed Critical MStar Semiconductor Inc Taiwan
Priority to US13/657,431 priority Critical patent/US20130044810A1/en
Publication of US20130044810A1 publication Critical patent/US20130044810A1/en
Assigned to DIGITAL VIDEO SYSTEMS INC. reassignment DIGITAL VIDEO SYSTEMS INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MSTAR SEMICONDUCTOR, INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • the present invention relates to adopting parallelism to provide decoding for industrial standard based video processing.
  • MPEG-4 Part 10 (formally, ISO/IEC 14496-10) is a digital video codec standard which is noted for achieving very high data compression.
  • the ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10 standard are technically identical.
  • the standard is also known as AVC, for Advanced Video Coding or JVT, for Joint Video Team, as it is a collective partnership effort by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG).
  • the H.264 plays a crucial role in providing video compression for standard Internet Definition, High Definition, Full High Definition, as well as Mobile Content. It reduces the transmission rates for required resolution and frame rates.
  • a typical video processing sequence for DVD or broadcast includes encoding and decoding phases.
  • the encoding consists of spatial and temporal prediction, transform, quantization, scanning and variable length coding or arithmetic coding, also called entropy coding.
  • CABAC Context-based Adaptive Binary Arithmetic Coding
  • the decoding phase consists of decoding and post-processing and error recovery. It is in the area of decoding that this invention is addressed. Specifically, parallelism is adopted to accelerate the speed of decoding.
  • an efficient parallel algorithm for decoding multiple bits at a time from a bitstream encoded using the CABAC algorithm assumes that the Most Probable bit values will be decoded for all bits except the last bit decoded. It is further assumed that the next quantized range used to decode the next bit is highly predictable using the current range value and the current probability state. We further assert that only two sequence elements decoding needs to be accelerated to speed up significantly the decoding process. We claim these assumptions hold for most encoded bit streams encoding a large variety of motion pictures and our implementation is as fast as the single bit sequential implementation when the above assumptions do not hold.
  • the implementation of the arithmetic section using carry save adders is unique, reducing to single carry propagation for the timing critical adder paths.
  • Our solutions are also applicable to other similar sequential problems that need to be accelerated. Since the result of the decoding of a given bit influences the decoding of the next bit, devising parallel or pipelined implementation of this algorithm is particularly challenging.
  • FIG. 1 depicts a CABAC 1-bit Decode Implementation.
  • FIG. 2 shows the 1-bit Decode Decision Flow Chart.
  • FIG. 3 depicts a CABAC Renorm Implementation.
  • FIG. 4 shows the Renorm Flow Chart.
  • FIG. 5 illustrates the CABAC 2-bit Decode Block Diagram.
  • FIG. 6 depicts a CABAC pState and valMPS On Demand Initialization and Fetching Implementation.
  • FIG. 7 depicts a CABAC pState and valMPS Initialization and Fetching for Up to 2 bin Decode Implementation.
  • FIG. 8 presents the CABAC significant_coeff_flag sequence.
  • FIG. 9 presents the CABAC coeff_abs_value_minus1 sequence.
  • FIG. 10 shows the CABAC Coeff_abs_value_minus1 Context State transitioning.
  • FIG. 11 depicts the CABAC 2-bin Arithmetic Decoder Section.
  • FIG. 12 depicts the CABAC 2-bin 1-bit slice Arithmetic Evaluator.
  • CABAC has several beneficial characteristics. These include the usage of adaptive probability models for most symbols, exploiting symbol correlations by using contents, simple and fast adaptation mechanism and fast binary arithmetic codec based on table look-ups and shifts only.
  • the average bit-rate saving over CABAC over CAVLC is around 10 to 15 percent and is adopted by H.264 as standard.
  • the decoder decision algorithm for CABAC is a sequential algorithm that decodes one bit at a time. For High Definition video resolution where pictures are encoded using a high bit rate, decoding one bit at a time is found to be too slow for some movie sequences. In our solution, we take advantage of the calculated probabilities of the decoded bit values and only accelerate the paths for the most probable outcomes. Since the arithmetic decoders already calculate these probabilities and maintain the probable decoded bit values, the decision-making overhead for our parallel implementation is relatively small.
  • FIG. 1 depicts a 1-bit decode implementation for the CABAC option specified by the JVT.
  • the decision algorithm for the decoder is shown in FIG. 2 .
  • the inputs to the decoding process are the coded bit stream 101 , Range 102 , Offset 103 , pState 104 , and valMPS 105 .
  • the outputs are the next values for Range, Offset, the new values of the pair of pState, valMPS for the current Context, and the decoded bin value.
  • the Range and Offset are two 9-bit values that are used to determine the decoded bin.
  • the Range changes for every bin that gets decoded.
  • the Offset changes when a LPS (least Probable Symbol) is decoded or when the Range becomes smaller than the Offset.
  • LPS least Probable Symbol
  • a renormalization process 106 201 shifts in new bits from the encoded bit stream into the least significant bits of the Offset and shifts left (multiplies by 2) the Range until it is larger or equal to 256.
  • FIG. 3 depicts a CABAC Renorm Implementation.
  • FIG. 4 shows the Renorm Flow Chart. At the beginning of Slice decoding process Range is initialized to the value of 510 and Offset to the next 9 bits from the encoded stream.
  • the pState and valMPS are associated to a Context 107 .
  • a Context distinguishes a sequence element that is coded through a binarization process, a bit position, and alternatives depending on past decoding history. Context numbers vary from 0 to 398. In some cases the Context number for the next bin depends on the result of the currently decoded bin. In other cases the same Context is used to decode a consecutive number of bins of a sequence element.
  • the pState is a quantized probability of decoding a bin that its value equal to valMPS. The pState varies between 0 and 63, where 0 represent a probability close to 50% and 63 a value close to 100%.
  • the valMPS represents the most probable bin value, either 0 or 1.
  • the first step of the Arithmetic Decoding consists of determining the value RangeLPS 108 . This is achieved by a table lookup using the pState input and the bits 7 and 6 of the Range input. The RangeLPS is subtracted from Range to form the value of RangeMPS 109 . The Offset is compared 202 against RangeMPS. If RangeMPS is larger than Offset, we are decoding valMPS, RangeMPS becomes the Range value and Offset remains unchanged prior to Renormalization. A transMPS table 110 lookup determines the new pState value. Otherwise, if Offset is larger or equal to RangeMPS, the inverse of valMPS is decoded.
  • the next Offset value (before Renorm) is obtained by subtracting RangeMPS from Offset, and the RangeLPS is used as the next Range value prior to Renorm. If the current pState is zero the new valMPS for the current Context is flipped. The new pState value is obtained by transLPS table look-up.
  • RangeLPS is a number between 240 and 2, with smaller values for higher pState, the need for Renorm is certain if a Least Probable Symbol is decoded but diminishes with larger pState.
  • the quantization of Range also affects the RangeLPS value by reducing it when the Range is smaller.
  • the H.264 Standard uses CABAC as an option to encode the macroblock picture frame information.
  • the average encoded input bit stream rate is 20 Mbits/sec for High Definition resolution but peak rates can be much higher.
  • CABAC uses relatively a simple binarization scheme relying mostly on the arithmetic coding ability to encode a long sequence of highly predictable binary symbols using a relatively small number of encoded bits. This means that when arithmetic coding is very efficient it will be decoding most probable bins without the need of modifying the Offset or reading more encoded bits from the input bit-stream through Renorm. When a least probable symbol is decoded or Renorm is required the input bit-stream rate will limit the performance requirement of the CABAC engine. Therefore, to reduce buffering and to effectively deal with the high throughput of decoded bins a scheme that can decode more than a single bin per cycle is very desirable.
  • the resultant decoded bit value determines the Context number to be used for the next bin, consequently the pState and valMPS values that are directly involved in evaluating the next bin decoding.
  • the Range value quantized using its bits 7 , and 6 ) resulting from the previous bin decoding is needed to fetch the next RangeLPS value that is essential for the next bin decoding.
  • RangeLPS[i] will be a small value, yielding a RangeMPS[i], which becomes Range[i+1] to have a small change compared to Range[i]. More specifically we assume that RangeMPS[i] bits 7 and 6 are either equal or one less than the Range[i] bits 7 and 6 .
  • RangeLPS0[i+1] rangeTablLPS[qRange[i]][pState[i+1]].
  • RangeLPS1[i+1] rangeTablLPS[qRange[i] ⁇ 1] [pState[i+1]].
  • OffsetLPS[i] Offset ⁇ RangeMPS[i]. 14.
  • OffsetLPS0[i+1] Offset ⁇ RangeMPS0[i+1].
  • OffsetLPS1[i+1] Offset ⁇ RangeMPS1[i+1].
  • qRange[i+ 1] (RangeMPS[i] 6) & 3 17.
  • IF (OffsetLPS[i] > 0) THEN a. Only one bin of value !valMPS[i] is decoded as the i-th bin.
  • b. ⁇ Offset[i+1], Range[i+1] Renorm( ⁇ OffsetLPS[i],RangeLPS[i] ⁇ ).
  • pState[Context[i]] transLPS[pState[i]]. 18. ELSE IF (RangeMPS[i] ⁇ 256
  • FIG. 5 shows a general block diagram for a 2-bit CABAC decoder.
  • the init_Ctx 501 is the Context that will be initialized and the couple init —pState[ 5:0] 502 and init_valMPS 503 are the associated initial values. These are either calculated on demand or sequentially at the beginning of slice decoding or loaded from main memory.
  • FIG. 6 depicts a CABAC pState and valMPS On Demand Initialization and Fetching Implementation. The initialization is dependent of SliceQPy 601 and cabac_init_idc 602 .
  • SliceQPy is the slice quantization factor and varies from 0 to 51.
  • the cabac_init_idc is determined at picture level and can vary from 0 to 2.
  • the values m 603 and n 604 are fetched from a table maintained per Context and for each three values of cabac_init_idc.
  • the function Clip3(I,u,v) 605 clips the value v to the range [I,u].
  • FIG. 7 shows a possible implementation for context based pState and valMPS initialization, fetching and updating. It also provides the RangeLPS values for the current pState and most probable next pState, using the current qRange 701 and assuming that the next qRange remains the same or decremented by one.
  • a single port memory 702 holds the pState[5:0] and valMPS for all Context indices.
  • Contexts that are specially organized to be able to evaluate multiple bins at a time. These are related to 4 ⁇ 4 block residual coefficients and account for most decoded bins. These Contexts are the pair of significant_coeff_flag and last_significant_coeff_flag that indicate which of the 16 coefficients have non-zero value and the coeff_abs_value_minusl that encodes the actual non-zero value. Since the previously decoded values are used to decide the next context index, we will decode the rest of the contexts one bin at a time to reduce complexity.
  • the Context state is written in large chunks (128-bits in our implementation). These are fetched from the external Main Memory by the DMA engine not shown here for industrial art reasons. For high speed, these could be done in dual-channel and interleaving fashion.
  • the Sequencer controls the writes and the address increments. All contexts are written at the beginning of each slice as soon as the SliceQPy value is decoded by other means (e.g. the CAVLC option of the H.264).
  • the Sequencer starts fetching the Context state one bin at a time according to the syntax element sequence.
  • the Sequencer de-asserts the valid — 2St 703 output to indicate that only one state and RangeLPS0 704 is valid.
  • the Se14 705 and Sel0 706 control signals are used to select the current state output.
  • the selMPS0 707 input indicates that the valMPS value was decoded as the current bin and it is used to determine the updated state values to be written back into the Context State Memory.
  • the qRange[1:0] input is used to select one of the four possible RangeLPS0[7:0] values as the primary output.
  • the Context State Memory and the Sequencer is organized to fetch the current state as well as the next state such that two bins can be decoded in one cycle. Starting with the first coefficient in the zigzag scan order, if a zero is decoded the last_significant_coeff_flag is skipped and the next coefficient significant_coeff_flag is decoded. Otherwise, the last_significant_coeff_flag is decoded for that coefficient.
  • FIG. 8 shows the sequencer decision making on the Sel1 708 control and incrementing the running pointer to states corresponding to significant_coeff_flag (even pointer values) and last_significant_coefficient_flag (odd pointer values).
  • the control inputs used are:
  • a counter that starts at 0 and saturates 3 is maintained to count the number of coefficients decoded so far with an absolute value equal to one and the number of coefficients greater than one.
  • An absolute value equal to one is encoded as single bin of “zero”.
  • An absolute value greater than one has at least two bins, the first bin is always a “one”.
  • the context number and the counter is incremented at the beginning while decoding “0” value bins indicating coefficients with absolute value minus 1 equal to zero.
  • the second and subsequent bins for that symbol uses the counter value counting the number of coefficients encountered so far that had a value greater than one. From then on, the first bin is decoded using a context indicating that at least one coefficient with an absolute value greater than one has been decoded.
  • FIG. 9 shows how the context number (counter) is incremented depending on first bin value decoding 0 indicating that a coefficient with absolute value equal to one has been decoded.
  • FIG. 10 shows in more detailed the Context state transitioning.
  • the E states count the number of coefficients with abs value equal to one, and the G and B states count the number of coefficients greater than one for the first and other bins respectively.
  • the transition condition is the decoded bin value 0 or 1.
  • FIG. 11 shows the basic block diagram for the arithmetic section capable of decoding up to two bins.
  • First RangeMPS0 1101 is evaluated by subtracting the input RangeLPS0 1102 from the content of the current Range register.
  • the RangeMPS0 is subtracted from the current Offset to determine selMPS0 1103 .
  • the selMPS0 selects the Offset0 and Range0 values in case a single bin is decoded and to be used as un-normalized inputs to the Renorm sub-block.
  • the input valid — 2st 1104 and the most significant bit of RangeMPS0 (significant greater than 256 value), and qRange1 1106 result are also used to determine dec2bin 1105 that indicates 2 bins are decoded.
  • RangeMPS0 is used to evaluate the Offst1, Range1, and valMPS1 controlled by the qRange1 result. Using the most probable paths and assuming that qRange stays the same or decrements by one can similarly achieve more than 2 bin simultaneous decoding.
  • FIG. 12 shows an actual implementation of the arithmetic section as a one bit slice out of 9-bits that form the Range and Offset Registers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A critical phase of video processing is the decoding of bit streams coming from standard based heavy compressed sources. Entropy coding can be effectively decoded by adopting parallelism to speed up the process. Reasonable assumptions make possible for example the multiple bits at a time processing for the Context-based Adaptive Binary Arithmetic Coding (CABAC) algorithm. In particular, a clever arithmetic section reduces single propagation for the timing critical path while decoding done for only two sequence elements at a time by calculating and maintaining most probable bit values. This in turn making accelerated path using pre-determined probability outcome through parallelism not cost.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a Continuation Application and claims priority to U.S. application Ser. No. 11/755,698, filed May 30, 2007, entitled “2-BIN PARALLEL DECODER FOR ADVANCED VIDEO PROCESSING,” which claims the priority to U.S. Provisional Application No. 60/815,749, filed Jun. 21, 2006, entitled “DVD Decoder Solution”, all of which are incorporated herein by reference.
  • FIELD OF THE INVENTION
  • The present invention relates to adopting parallelism to provide decoding for industrial standard based video processing.
  • DESCRIPTION OF THE RELATED ART
  • There are a number of industrial standards that describe the standard way that video or movie processing should be based on as far as the standard body is concerned. Vendor's systems typically conform to these standards, such as the universal plug-and-play (UPnP) standard and the upcoming all encompassing MPEG-4 standard, as well as old existing standards, such as the VHS format to provide audio/visual (AV) capability. MPEG-4 Part 10 (formally, ISO/IEC 14496-10) is a digital video codec standard which is noted for achieving very high data compression. The ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10 standard are technically identical. The standard is also known as AVC, for Advanced Video Coding or JVT, for Joint Video Team, as it is a collective partnership effort by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG).
  • Even using current compression technology, a standard 2-hour movie may take from 1 to 4 hours to download from the Internet. This time could get longer when processing is interrupted by errors. Therefore, a need arises for a method and system of quickly processing AV content. The invention of this patent based on the provisional patent application entitled “DVD DECODER SOLUTION” describes herein a method by adopting parallelism to accelerate the decoding of the video processing algorithm.
  • SUMMARY OF THE INVENTION
  • The H.264 plays a crucial role in providing video compression for standard Internet Definition, High Definition, Full High Definition, as well as Mobile Content. It reduces the transmission rates for required resolution and frame rates.
  • A typical video processing sequence for DVD or broadcast includes encoding and decoding phases. The encoding consists of spatial and temporal prediction, transform, quantization, scanning and variable length coding or arithmetic coding, also called entropy coding. In H.264, the Context-based Adaptive Binary Arithmetic Coding (CABAC) is used. The decoding phase consists of decoding and post-processing and error recovery. It is in the area of decoding that this invention is addressed. Specifically, parallelism is adopted to accelerate the speed of decoding.
  • In accordance with one aspect of the invention, an efficient parallel algorithm for decoding multiple bits at a time from a bitstream encoded using the CABAC algorithm. The implementation of the algorithm assumes that the Most Probable bit values will be decoded for all bits except the last bit decoded. It is further assumed that the next quantized range used to decode the next bit is highly predictable using the current range value and the current probability state. We further assert that only two sequence elements decoding needs to be accelerated to speed up significantly the decoding process. We claim these assumptions hold for most encoded bit streams encoding a large variety of motion pictures and our implementation is as fast as the single bit sequential implementation when the above assumptions do not hold.
  • In one embodiment, the implementation of the arithmetic section using carry save adders is unique, reducing to single carry propagation for the timing critical adder paths. Our solutions are also applicable to other similar sequential problems that need to be accelerated. Since the result of the decoding of a given bit influences the decoding of the next bit, devising parallel or pipelined implementation of this algorithm is particularly challenging.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 depicts a CABAC 1-bit Decode Implementation.
  • FIG. 2 shows the 1-bit Decode Decision Flow Chart.
  • FIG. 3 depicts a CABAC Renorm Implementation.
  • FIG. 4 shows the Renorm Flow Chart.
  • FIG. 5 illustrates the CABAC 2-bit Decode Block Diagram.
  • FIG. 6 depicts a CABAC pState and valMPS On Demand Initialization and Fetching Implementation.
  • FIG. 7 depicts a CABAC pState and valMPS Initialization and Fetching for Up to 2 bin Decode Implementation.
  • FIG. 8 presents the CABAC significant_coeff_flag sequence.
  • FIG. 9 presents the CABAC coeff_abs_value_minus1 sequence.
  • FIG. 10 shows the CABAC Coeff_abs_value_minus1 Context State transitioning.
  • FIG. 11 depicts the CABAC 2-bin Arithmetic Decoder Section.
  • FIG. 12 depicts the CABAC 2-bin 1-bit slice Arithmetic Evaluator.
  • DETAILED DESCRIPTION
  • In the area of entropy coding in video processing, the CABAC has several beneficial characteristics. These include the usage of adaptive probability models for most symbols, exploiting symbol correlations by using contents, simple and fast adaptation mechanism and fast binary arithmetic codec based on table look-ups and shifts only. The average bit-rate saving over CABAC over CAVLC is around 10 to 15 percent and is adopted by H.264 as standard.
  • The decoder decision algorithm for CABAC is a sequential algorithm that decodes one bit at a time. For High Definition video resolution where pictures are encoded using a high bit rate, decoding one bit at a time is found to be too slow for some movie sequences. In our solution, we take advantage of the calculated probabilities of the decoded bit values and only accelerate the paths for the most probable outcomes. Since the arithmetic decoders already calculate these probabilities and maintain the probable decoded bit values, the decision-making overhead for our parallel implementation is relatively small.
  • FIG. 1 depicts a 1-bit decode implementation for the CABAC option specified by the JVT. The decision algorithm for the decoder is shown in FIG. 2. The inputs to the decoding process are the coded bit stream 101, Range 102, Offset 103, pState 104, and valMPS 105. The outputs are the next values for Range, Offset, the new values of the pair of pState, valMPS for the current Context, and the decoded bin value.
  • The Range and Offset are two 9-bit values that are used to determine the decoded bin. The Range changes for every bin that gets decoded. The Offset changes when a LPS (least Probable Symbol) is decoded or when the Range becomes smaller than the Offset. When the Range becomes less than 256, a renormalization process 106 201 shifts in new bits from the encoded bit stream into the least significant bits of the Offset and shifts left (multiplies by 2) the Range until it is larger or equal to 256. FIG. 3 depicts a CABAC Renorm Implementation. FIG. 4 shows the Renorm Flow Chart. At the beginning of Slice decoding process Range is initialized to the value of 510 and Offset to the next 9 bits from the encoded stream.
  • The pState and valMPS are associated to a Context 107. A Context distinguishes a sequence element that is coded through a binarization process, a bit position, and alternatives depending on past decoding history. Context numbers vary from 0 to 398. In some cases the Context number for the next bin depends on the result of the currently decoded bin. In other cases the same Context is used to decode a consecutive number of bins of a sequence element. The pState is a quantized probability of decoding a bin that its value equal to valMPS. The pState varies between 0 and 63, where 0 represent a probability close to 50% and 63 a value close to 100%. The valMPS represents the most probable bin value, either 0 or 1.
  • The first step of the Arithmetic Decoding consists of determining the value RangeLPS 108. This is achieved by a table lookup using the pState input and the bits 7 and 6 of the Range input. The RangeLPS is subtracted from Range to form the value of RangeMPS 109. The Offset is compared 202 against RangeMPS. If RangeMPS is larger than Offset, we are decoding valMPS, RangeMPS becomes the Range value and Offset remains unchanged prior to Renormalization. A transMPS table 110 lookup determines the new pState value. Otherwise, if Offset is larger or equal to RangeMPS, the inverse of valMPS is decoded. In this unlikely outcome, the next Offset value (before Renorm) is obtained by subtracting RangeMPS from Offset, and the RangeLPS is used as the next Range value prior to Renorm. If the current pState is zero the new valMPS for the current Context is flipped. The new pState value is obtained by transLPS table look-up.
  • Since RangeLPS is a number between 240 and 2, with smaller values for higher pState, the need for Renorm is certain if a Least Probable Symbol is decoded but diminishes with larger pState. The quantization of Range (bits 7 and 6) also affects the RangeLPS value by reducing it when the Range is smaller.
  • The H.264 Standard uses CABAC as an option to encode the macroblock picture frame information. The average encoded input bit stream rate is 20 Mbits/sec for High Definition resolution but peak rates can be much higher. Furthermore, CABAC uses relatively a simple binarization scheme relying mostly on the arithmetic coding ability to encode a long sequence of highly predictable binary symbols using a relatively small number of encoded bits. This means that when arithmetic coding is very efficient it will be decoding most probable bins without the need of modifying the Offset or reading more encoded bits from the input bit-stream through Renorm. When a least probable symbol is decoded or Renorm is required the input bit-stream rate will limit the performance requirement of the CABAC engine. Therefore, to reduce buffering and to effectively deal with the high throughput of decoded bins a scheme that can decode more than a single bin per cycle is very desirable.
  • There are multiple challenges to speed-up the CABAC decoding. First, there are multiple sequential steps that need be carried-on to decode a single bin. This makes it difficult to implement it in a single clock cycle at relatively high frequencies. Second, in most cases the resulting previous state and decoded symbol is used to decode the next symbol. This makes traditional speed-up techniques like pipelining and parallel execution very hard to implement. For example, part of the macroblock encoded syntax elements is a sequence of bits that represent the significant—coeff_flag followed by last_significant_coeff in case a value of “1” was decoded as the corresponding significant—coeff_flag on the previous cycle. Since each encoded bit has a different Context number, the resultant decoded bit value determines the Context number to be used for the next bin, consequently the pState and valMPS values that are directly involved in evaluating the next bin decoding. In addition to that, the Range value (quantized using its bits 7, and 6) resulting from the previous bin decoding is needed to fetch the next RangeLPS value that is essential for the next bin decoding.
  • By predicting that multiple Most Probable Symbols to be decoded in a row the following simplifications can be made:
  • 1. We assume that Offset[i]<RangeMPS[i] leading to the selection of the valMPS[i] for the ith cycle. Also, the next pStateIdx for Context[i] is easily evaluated.
  • 2. No Renorm will be needed on the ith cycle.
  • 3.valMPS[i] will be used to determine the next Context number, Context[i+1].
  • 4. We assume that RangeLPS[i] will be a small value, yielding a RangeMPS[i], which becomes Range[i+1] to have a small change compared to Range[i]. More specifically we assume that RangeMPS[i] bits 7 and 6 are either equal or one less than the Range[i] bits 7 and 6.
  • For example to decode 2 bins at a time the following operations needed:
  •  1. qRange[i] = (Range[i] 
    Figure US20130044810A1-20130221-P00001
     6) & 3.
     2. pState[i] = table_context_pState[Context[i]].
     3. valMPS[i] = table_context_valMPS[Context[i]].
     4. Context[i+1] = evalute_next_context[valMPS[i]].
     5. pState[i+1] = table_context_pState[Context[i+1]].
     6. valMPS[i+1] = table_context_valMPS[Context[i+1]].
     7. RangeLPS[i] = rangeTablLPS[qRange[i]][pState[i]].
     8. Range[i+1] = RangeMPS[i] = Range[i] − RangeLPS[i].
     9. RangeLPS0[i+1] = rangeTablLPS[qRange[i]][pState[i+1]].
    10. RangeLPS1[i+1] = rangeTablLPS[qRange[i] − 1] [pState[i+1]].
    11. Range0[i+2] = RangeMPS0[i+1] = Range[i] − RangeLPS[i] −
    RangeLPS0[i+1].
    12. RangeI[i+2] = RangeMPS1[i+1] = Range[i] − RangeLPS[i] −
    RangeLPS1[i+1].
    13. OffsetLPS[i] = Offset − RangeMPS[i].
    14. OffsetLPS0[i+1] = Offset − RangeMPS0[i+1].
    15. OffsetLPS1[i+1] = Offset − RangeMPS1[i+1].
    16. qRange[i+ 1] = (RangeMPS[i] 
    Figure US20130044810A1-20130221-P00001
     6) & 3
    17. IF (OffsetLPS[i] >= 0) THEN
    a. Only one bin of value !valMPS[i] is decoded as the i-th bin.
    b. {Offset[i+1], Range[i+1] = Renorm({OffsetLPS[i],RangeLPS[i]}).
    c. IF (pState[i] == 0) THEN valMPS[Context[i]] = !valMPS[i].
    d. pState[Context[i]] = transLPS[pState[i]].
    18. ELSE IF (RangeMPS[i] < 256 || (qRange[i+1] != qRange[i] &&
    qRange[i+1] !=
    qRange[i]−1) THEN
    a. Only one bin of value valMPS[i] is decoded as the i-th bin.
    b. {Offset[i+l], Range[i+l]} = Renorm({Offset[i], RangeMPS[i]}).
    c. pState[Context[i]] = transMPS[pState[i]].
    19. ELSE IF (OffsetLPS0[i] >= 0) THEN
    a. Two bins are decoded: {valMPS[i], !valMPS[i+1]}.
    b. IF (qRange[i+1] == qRange[i]) THEN
    i. {Offset[i+2], Range[i+2]} = Renorm({OffsetLPS0[i+1],
    RangeLPSO[i+1]}).
    ii. ELSE {Offset[i+2], Range[i+2]} =
    Renorm({OffsetLPS1[i+1],
    RangeLPS1[i+1]}).
    c. IF (pState[i+1] == 0) THEN valMPS[Context[i+1]] =
    !valMPS[i+1].
    d. pState[Context[i]] = transMPS[pState[i]].
    e. pState[Context[i+1]] = transLPS[pState[i+1]].
    20. ELSE THEN
    a. Two bins are decoded: {valMPSp], valMPS[i+1]}.
    b. IF (qRange[i+l] == qRange[i]) THEN
    i. {Offset[i+2], Range[i+2]} = Renorm({Offset,
    RangeMPS0[i+l]}).
    ii. ELSE {Offset[i+2], Range[i+2]} = Renorm({Offset,
    RangeMPS1[i+1]}).
    c. pState[Context[i]] = transMPS[pState[i]].
    d. pState[Context[i+1]] = transMPS [pState[i+1]].
  • FIG. 5 shows a general block diagram for a 2-bit CABAC decoder. When a new slice decoding starts the Context based asserting the reset signal initializes Probability States. The init_Ctx 501 is the Context that will be initialized and the couple init—pState[5:0] 502 and init_valMPS 503 are the associated initial values. These are either calculated on demand or sequentially at the beginning of slice decoding or loaded from main memory. FIG. 6 depicts a CABAC pState and valMPS On Demand Initialization and Fetching Implementation. The initialization is dependent of SliceQPy 601 and cabac_init_idc 602. SliceQPy is the slice quantization factor and varies from 0 to 51. The cabac_init_idc is determined at picture level and can vary from 0 to 2.
  • When large external Dram or Flash based memory is available it is advantageous to pre-calculate the pState and valMPS for all possible SliceQPy and cabac—init_idc and store it in the external memory. When the SliceQPy is decoded the data is fetched for all contexts and stored in the lookup tables. This can be done 128-bit at a time, with negligible performance impact. Following is the algorithm used for initialization:
  • 1. preCtxState = Clip3(1, 126, ((m*SliceQPy >> 4) + n)).
    2. IF (preCtxState <= 63) THEN
    a. pSateIdx = 63 − preCtxState.
    b. valMPS = 0.
    3. ELSE
    a. pStateIdx = preCtxState − 64.
    b. valMPS = 1.
  • The values m 603 and n 604 are fetched from a table maintained per Context and for each three values of cabac_init_idc. The function Clip3(I,u,v) 605 clips the value v to the range [I,u].
  • FIG. 7 shows a possible implementation for context based pState and valMPS initialization, fetching and updating. It also provides the RangeLPS values for the current pState and most probable next pState, using the current qRange 701 and assuming that the next qRange remains the same or decremented by one.
  • A single port memory 702 holds the pState[5:0] and valMPS for all Context indices. There are some Contexts that are specially organized to be able to evaluate multiple bins at a time. These are related to 4×4 block residual coefficients and account for most decoded bins. These Contexts are the pair of significant_coeff_flag and last_significant_coeff_flag that indicate which of the 16 coefficients have non-zero value and the coeff_abs_value_minusl that encodes the actual non-zero value. Since the previously decoded values are used to decide the next context index, we will decode the rest of the contexts one bin at a time to reduce complexity.
  • During initialization, the Context state is written in large chunks (128-bits in our implementation). These are fetched from the external Main Memory by the DMA engine not shown here for industrial art reasons. For high speed, these could be done in dual-channel and interleaving fashion. The Sequencer controls the writes and the address increments. All contexts are written at the beginning of each slice as soon as the SliceQPy value is decoded by other means (e.g. the CAVLC option of the H.264). At the end of initialization the Sequencer starts fetching the Context state one bin at a time according to the syntax element sequence. The Sequencer de-asserts the valid2St 703 output to indicate that only one state and RangeLPS0 704 is valid. The Se14 705 and Sel0 706 control signals are used to select the current state output. The selMPS0 707 input indicates that the valMPS value was decoded as the current bin and it is used to determine the updated state values to be written back into the Context State Memory. The qRange[1:0] input is used to select one of the four possible RangeLPS0[7:0] values as the primary output.
  • When the next to be decoded syntax element becomes the pair significant—coeff_flag and last_significant_coeff_flag, the Context State Memory and the Sequencer is organized to fetch the current state as well as the next state such that two bins can be decoded in one cycle. Starting with the first coefficient in the zigzag scan order, if a zero is decoded the last_significant_coeff_flag is skipped and the next coefficient significant_coeff_flag is decoded. Otherwise, the last_significant_coeff_flag is decoded for that coefficient. If a “one” value is decoded for the last_significant_coeff_flag, the rest of the coefficients are considered to be zero and the coeff_abs_value_minius1 are decoded starting in reverse order from the last of significant coefficients.
  • FIG. 8 shows the sequencer decision making on the Sel1 708 control and incrementing the running pointer to states corresponding to significant_coeff_flag (even pointer values) and last_significant_coefficient_flag (odd pointer values). The control inputs used are:
      • the last_signif 801 which indicates if the current pointer is odd, pointing to a last_significant_coeff_flag state,
      • valMPS0 802 is the current state output as the most probable symbol value,
      • selMPS0 803 is the result of the current bin decoding indicating that the most probable symbol was selected,
      • dec2bin 804 input indicates that 2 bins have decoded on this cycle,
      • valMPS1 805 is the state output for the next state following the most probable symbol path,
      • selMPS1 806 is the result of the second bin decoding indicating that valMPS1 was selected.
  • When last—significant_coeff_flag is decoded with a “one” then the next context sequence is to decode the coeff_abs_value_minus1.
  • When decoding the coeff_abs_value_minus1 the sequencing is done as follows:
  • A counter that starts at 0 and saturates 3 is maintained to count the number of coefficients decoded so far with an absolute value equal to one and the number of coefficients greater than one.
  • An absolute value equal to one is encoded as single bin of “zero”. An absolute value greater than one has at least two bins, the first bin is always a “one”.
  • The context number and the counter is incremented at the beginning while decoding “0” value bins indicating coefficients with absolute value minus 1 equal to zero. When a “1” bin is decoded as the first bin, the second and subsequent bins for that symbol uses the counter value counting the number of coefficients encountered so far that had a value greater than one. From then on, the first bin is decoded using a context indicating that at least one coefficient with an absolute value greater than one has been decoded.
  • To perform decoding of two bins or more at a time the valMPS path is followed. For example, at the start we the first bin is decoded using the known context that there have been 0 decoded coefficients with absolute value equal to one. Out of the two possible next contexts, that will result by the decoding a zero bin (incrementing the context for 1 decoded coefficient with absolute value equal to one) or a one bin indicating that a coefficient greater than one has been decoded we will assume we have decoded the valMPS=0 for example, incrementing the counter by one.
  • FIG. 9 shows how the context number (counter) is incremented depending on first bin value decoding 0 indicating that a coefficient with absolute value equal to one has been decoded. FIG. 10 shows in more detailed the Context state transitioning. The E states count the number of coefficients with abs value equal to one, and the G and B states count the number of coefficients greater than one for the first and other bins respectively. The transition condition is the decoded bin value 0 or 1.
  • The arithmetic section 504 to decode multiple bins is more straightforward. FIG. 11 shows the basic block diagram for the arithmetic section capable of decoding up to two bins. First RangeMPS0 1101 is evaluated by subtracting the input RangeLPS0 1102 from the content of the current Range register. The RangeMPS0 is subtracted from the current Offset to determine selMPS0 1103. The selMPS0 selects the Offset0 and Range0 values in case a single bin is decoded and to be used as un-normalized inputs to the Renorm sub-block. The input valid2st 1104 and the most significant bit of RangeMPS0 (significant greater than 256 value), and qRange1 1106 result are also used to determine dec2bin 1105 that indicates 2 bins are decoded. In parallel RangeMPS0 is used to evaluate the Offst1, Range1, and valMPS1 controlled by the qRange1 result. Using the most probable paths and assuming that qRange stays the same or decrements by one can similarly achieve more than 2 bin simultaneous decoding.
  • FIG. 12 shows an actual implementation of the arithmetic section as a one bit slice out of 9-bits that form the Range and Offset Registers. Some of the signal names have been shortened for conciseness and are described below:
      • R[i]: Bit i of Range[8:0].
      • F[i]: Bit i of Offset[8:0].
      • RL0[i]: Bit i of RangeLPS0[7:0], bit 8 is zero.
      • RL10[i]: Bit i of RangeLPS10[7:0], bit 8 is zero. Used if the second bin is decoded using the same qRange as the first bin.
      • RL11[i]: Bit i of RangeLPS11[7:0], bit 8 is zero. Used if the second bin is decoded using the qRange of the first bin minus one.
      • HDS: Performs a single bit subtract of the single bit operands, both ways producing the partial sum and carry (borrow) bits. With (cab, cba, s)=HDS(a,b), s=!(âb), cab=a&!b, cba=!a&b , cab is the borrow for the a−b operation and cba is the borrow for the b−a operation. We also have s=!(cab||cba), where âb=a& !b|| !a&b.
      • RMs0[i]: Partial sum bit i for the subtraction of Range and RangeLPS. It is used to evaluate RangeMPS0 and OffsetLPS0.
      • RM0c[i]: Carry in for the i-th bit evaluation of Range−RangeLPS.
      • MR0c[i]: Carry in for the i-th bit evaluation of RangeLPS−Range.
      • RM0c[i+1]: Carry out from the i-th bit evaluation of Range−RangeLPS.
      • MR0c[i+1]: Carry out from the i-th bit evaluation of RangeLPS−Range.
      • CPA: Performs a fast carry propagate adder for a single bit. When (co, s)=CPA(a,b,ci), we have t=â b; s=(ci ? !t, t); co=(ci ?a||b, a&b).
      • RM0cp[i]: Carry in for the i-th bit evaluation of RangeMPS=Range−RangeLPS. RM0cp[7] is also used to determine qRange1.
      • RM0cp[i+1]: Carry out from the i-th bit evaluation of RangeMPS=Range−RangeLPS.
      • FA: Performs a one bit Full Adder function similar to CPA all inputs are assumed to arrive at the same time.
      • FL0c[i]: Carry in for the i-th bit evaluation of Offset+RangeLPS−Range.
      • FL0c[i+1]: Carry out from the i-th bit evaluation of Offset+RangeLPS−Range.
      • FL0cp[i]: Carry in for the i-th bit evaluation of OffsetLPS=Offset+RangeLPS−Range. FL0cp[8] is used to determine SelMPS0.
      • FL0cp[i+1]: Carry out from the i-th bit evaluation of OffsetLPS=Offset+RangeLPS−Range.
      • FLs0[i]: i-th bit of the partial sum of Offset+RangeLPS−Range.
      • FL0[i]: i-th bit of OffsetLPS=Offset+RangeLPS−Range.
      • SelMPS0: Indicates that the Most Probable Symbol value valMPS is selected as the decoded bin.
      • R0[i]: i-th bit of Range0[8:0], the next un-normalized range value for a single bin decoding.
      • F0[i]: i-th bit of Offset0[8:0], the next un-normalized range value for a single bin decoding.
      • DFS: Performs one-bit dual full subtract operation. When (ce, cf, se, sf)=DFS(a,b,e,f) we have sf=t̂f; ce=(a||lb)&e || (a&!lb); cf=(a ||lb)&f || (a&lb); t=â!b.
      • DFA: Performs one-bit dual full addition. When (ce, cf, se, sf)=DFA(a,b,e,f) we have sf=t̂f; ce=(a||b)& e||(a&b); cf=(a||b)&f||(a&b); t=âb.
      • RM10c[i]: Carry in for the i-th bit evaluation of Range−RangeLPS0−RangeLPS10.
      • RM11[i]: Carry in for the i-th bit evaluation of Range−RangeLPS0−RangeLPS11.
      • RM10c[i+1]: Carry out from the i-th bit evaluation of Range−RangeLPS0−RangeLPS10.
      • RM11c[i+1]: Carry out from the i-th bit evaluation of Range−RangeLPS0−RangeLPS11.
      • FL10c[i]: Carry in for the i-th bit evaluation of Offset−Range+RangeLPS0+RangeLPS10.
      • FL11c[i]: Carry in for the i-th bit evaluation of Offset−Range+RangeLPS0+RangeLPS11.
      • FL10c[i+1]: Carry out from the i-th bit evaluation of Offset−Range+RangeLPS0+RangeLPS10.
      • FL11c[i+1]: Carry out from the i-th bit evaluation of Offset−Range+RangeLPS0+RangeLPS11.
      • RM10cp[i]: Carry in for the i-th bit evaluation of RangeMPS10=Range−RangeLPS0−RangeLPS10.
      • RM11cp[i]: Carry in for the i-th bit evaluation of RangeMPS11=Range−RangeLPS0−RangeLPS11.
      • RM10cp[i+1]: Carry out from the i-th bit evaluation of RangeMPS10=Range−RangeLPS0−RangeLPS10.
      • RM11cp[i+1]: Carry out from the i-th bit evaluation of RangeMPS11=Range−RangeLPS0−RangeLPS11.
      • FL10cp[i]: Carry in for the i-th bit evaluation of OffsetLPS10=Offset−Range+RangeLPS0+RangeLPS10. FL10cp[8] is used to evaluate SelMPS10.
      • FL11cp[i]: Carry in for the i-th bit evaluation of OffsetLPS11=Offset−Range+RangeLPS0+RangeLPS11. FL11 cp[8] is used to evaluate SelMPS11.
      • FL10cp[i+1]: Carry out from the i-th bit evaluation of OffsetLPS10=Offset−Range+RangeLPS0+RangeLPS10.
      • FL11cp[i+1]: Carry out from the i-th bit evaluation of OffsetLPS11=Offset−Range+RangeLPS0+RangeLPS11.
      • SelMPS10: Indicates that the Most Probable Symbol value valMPS10 is selected as the second decoded bin when qRange1 is zero.
      • SelMPS11: Indicates that the Most Probable Symbol value valMPS11 is selected as the second decoded bin when qRange1 is one.
      • SelqRange1: Indicates that the qRange for the second bin that consists of Range0[8:7] is equal to Range[8:7]−1. RM0cp[8] is used to determine this condition.
      • dec2bin: Indicates that two bins are decoded, requiring SelMPS0 to be true, RangeMPS0>256, qRange1 equal to qRange0 or equal to qRange0−1.
      • unR[i]: i-th bit of the next un-normalized Range[8:0].
      • unF[i]: i-th bit of the next un-normalized Offset[8:0].
  • Although illustrative embodiments have been described in detail herein with reference to the accompanying figures, it is to be understood that the invention is not limited to those precise embodiments. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed. As such, many modifications and variations will be apparent to practitioners skilled in this art.
  • Accordingly, it is intended that the scope of the invention be defined by the following Claims and their equivalents.

Claims (17)

1. A method of decoding a video bit stream using an efficient parallel algorithm, the method comprising:
inputting and initializing from a previous video processing context;
decoding multiple bits at a time of the previous video processing context, wherein the decoding is performed on most probable bit values; and
receiving the decoded multiple bits and providing an output bit stream for further video processing.
2. The method of claim 1, wherein the video bit stream includes:
video compression techniques to encode using Advanced Video Coding (AVC) Context-based Adaptive Binary Arithmetic Coding (CABAC) algorithm.
3. The method of claim 1, wherein a next quantized range used to decode a next bit is highly predictable using a current range value and a current probability state.
4. The method of claim 1, wherein only two sequence elements needs to be accelerated to speed up significantly the decoding process.
5. The method of claim 1, wherein the rate of bits produced is never slower than the single bit sequential method and implementation.
6. The method of claim 1, wherein the initialization and fetching is done for up to 2 bin at a time for AVC CABAC pState and valMPS which are pre-calculated and stored in a DRAM or Flash memory.
7. The method of claim 1, further including:
utilizing an arithmetic section using carry save adders, reducing to single carry propagation for the timing critical adder paths.
8. The method of claim 2, further comprising:
utilizing an AVC CABAC 2-bin 1-bit slice Arithmetic Evaluator.
9. A system for decoding a video bit stream using an efficient parallel algorithm, the system comprising:
a main memory for inputting and initializing from a previous video processing context:
a decoder for decoding multiple bits of the previous video processing context at a time, wherein the decoding is performed on most probable bit values; and
a sequencer for receiving the decoded multiple bits and providing an output bit stream for further video processing.
10. The system of claim 9, wherein the video bit stream includes:
video compression techniques to encode using Advanced Video Coding (AVC) Context-based Adaptive Binary Arithmetic Coding (CABAC) algorithm.
11. The system of claim 9, wherein the decoder further:
decodes Most Probable bit values for all bits except a last bit decoded.
12. The system of claim 9, wherein a next quantized range used to decode the next bit is highly predictable using a current range value and a current probability state.
13. The system of claim 9, wherein only two sequence elements needs to be accelerated to speed up significantly the decoding process.
14. The system of claim 9, wherein output bit stream bit rate of bits produced is never slower than a single bit sequential method and implementation.
15. The system of claim 9, wherein the inputting and initialization is performed for up to 2 bin at a time for AVC CABAC pState and valMPS which are pre-calculated and stored in a DRAM or Flash memory.
16. The system of claim 9, further including:
an arithmetic section using save carry save adders, reducing to single carry propagation for timing critical adder paths.
17. The system of claim 9, further comprising:
an AVC CABAC 2-bin 10-bit slice Arithmetic Evaluator.
US13/657,431 2006-06-21 2012-10-22 2-bin parallel decoder for advanced video processing Abandoned US20130044810A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/657,431 US20130044810A1 (en) 2006-06-21 2012-10-22 2-bin parallel decoder for advanced video processing

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US81574906P 2006-06-21 2006-06-21
US11/755,698 US8306125B2 (en) 2006-06-21 2007-05-30 2-bin parallel decoder for advanced video processing
US13/657,431 US20130044810A1 (en) 2006-06-21 2012-10-22 2-bin parallel decoder for advanced video processing

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/755,698 Continuation US8306125B2 (en) 2006-06-21 2007-05-30 2-bin parallel decoder for advanced video processing

Publications (1)

Publication Number Publication Date
US20130044810A1 true US20130044810A1 (en) 2013-02-21

Family

ID=40931670

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/755,698 Active 2031-06-08 US8306125B2 (en) 2006-06-21 2007-05-30 2-bin parallel decoder for advanced video processing
US13/657,431 Abandoned US20130044810A1 (en) 2006-06-21 2012-10-22 2-bin parallel decoder for advanced video processing

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/755,698 Active 2031-06-08 US8306125B2 (en) 2006-06-21 2007-05-30 2-bin parallel decoder for advanced video processing

Country Status (1)

Country Link
US (2) US8306125B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130243102A1 (en) * 2011-01-14 2013-09-19 Ntt Docomo, Inc. Method and apparatus for arithmetic coding and termination
CN107534772A (en) * 2015-05-19 2018-01-02 联发科技股份有限公司 The method and device of context adaptive binary arithmetic coding based on multilist
US10419772B2 (en) 2015-10-28 2019-09-17 Qualcomm Incorporated Parallel arithmetic coding techniques
US10992312B2 (en) 2014-12-10 2021-04-27 Samsung Electronics Co., Ltd. Semiconductor device and operating method of matching hardware resource to compression/decompression algorithm

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9648325B2 (en) 2007-06-30 2017-05-09 Microsoft Technology Licensing, Llc Video decoding implementations for a graphics processing unit
US8542748B2 (en) 2008-03-28 2013-09-24 Sharp Laboratories Of America, Inc. Methods and systems for parallel video encoding and decoding
US7932843B2 (en) * 2008-10-17 2011-04-26 Texas Instruments Incorporated Parallel CABAC decoding for video decompression
BR112012009448A2 (en) * 2009-10-20 2022-03-08 Fraunhofer Ges Forschung Audio encoder, audio decoder, method for encoding audio information, method for decoding audio information and computer program that uses iterative interval size reduction
US8416104B2 (en) * 2010-04-23 2013-04-09 Certicom Corp. Method and apparatus for entropy decoding
US8421655B2 (en) * 2010-04-23 2013-04-16 Certicom Corp. Apparatus for parallel entropy encoding and decoding
US20120014429A1 (en) * 2010-07-15 2012-01-19 Jie Zhao Methods and Systems for Parallel Video Encoding and Parallel Video Decoding
US20120014433A1 (en) * 2010-07-15 2012-01-19 Qualcomm Incorporated Entropy coding of bins across bin groups using variable length codewords
US8520740B2 (en) * 2010-09-02 2013-08-27 International Business Machines Corporation Arithmetic decoding acceleration
US8344917B2 (en) 2010-09-30 2013-01-01 Sharp Laboratories Of America, Inc. Methods and systems for context initialization in video coding and decoding
US9313514B2 (en) 2010-10-01 2016-04-12 Sharp Kabushiki Kaisha Methods and systems for entropy coder initialization
US8798139B1 (en) * 2011-06-29 2014-08-05 Zenverge, Inc. Dual-pipeline CABAC encoder architecture
US9258565B1 (en) 2011-06-29 2016-02-09 Freescale Semiconductor, Inc. Context model cache-management in a dual-pipeline CABAC architecture
US20130003858A1 (en) * 2011-06-30 2013-01-03 Vivienne Sze Simplified Context Selection For Entropy Coding of Transform Coefficient Syntax Elements
AU2012285851B2 (en) 2011-07-15 2015-11-12 Ge Video Compression, Llc Sample array coding for low-delay
US20130114667A1 (en) * 2011-11-08 2013-05-09 Sony Corporation Binarisation of last position for higher throughput
US9681133B2 (en) * 2012-03-29 2017-06-13 Intel Corporation Two bins per clock CABAC decoding
TWI517681B (en) * 2013-06-19 2016-01-11 晨星半導體股份有限公司 Decoding method and decoding apparatus for avs system
CN103974066B (en) * 2014-05-14 2017-02-01 华为技术有限公司 Video coding method and device
CN112929659B (en) * 2015-10-13 2023-12-26 三星电子株式会社 Method and apparatus for encoding or decoding image
CN110324625B (en) * 2018-03-31 2023-04-18 华为技术有限公司 Video decoding method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010005235A1 (en) * 1996-12-21 2001-06-28 Sony Corporation Comb filter and a video apparatus
US20020163961A1 (en) * 2001-03-20 2002-11-07 Koninklijke Philips Electronics N.V. Low-cost high-speed multiplier/accumulator unit for decision feedback equalizers
US20020184594A1 (en) * 2001-05-30 2002-12-05 Jakob Singvall Low complexity convolutional decoder
US20030219072A1 (en) * 2002-05-14 2003-11-27 Macinnis Alexander G. System and method for entropy code preprocessing
US20040123228A1 (en) * 2002-08-27 2004-06-24 Atsushi Kikuchi Coding apparatus, coding method, decoding apparatus, and decoding method
US20060126744A1 (en) * 2004-12-10 2006-06-15 Liang Peng Two pass architecture for H.264 CABAC decoding process
US20070091747A1 (en) * 2005-10-25 2007-04-26 Teac Corporation Optical disk apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7262722B1 (en) * 2006-06-26 2007-08-28 Intel Corporation Hardware-based CABAC decoder with parallel binary arithmetic decoding

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010005235A1 (en) * 1996-12-21 2001-06-28 Sony Corporation Comb filter and a video apparatus
US20020163961A1 (en) * 2001-03-20 2002-11-07 Koninklijke Philips Electronics N.V. Low-cost high-speed multiplier/accumulator unit for decision feedback equalizers
US20020184594A1 (en) * 2001-05-30 2002-12-05 Jakob Singvall Low complexity convolutional decoder
US20030219072A1 (en) * 2002-05-14 2003-11-27 Macinnis Alexander G. System and method for entropy code preprocessing
US20040123228A1 (en) * 2002-08-27 2004-06-24 Atsushi Kikuchi Coding apparatus, coding method, decoding apparatus, and decoding method
US20060126744A1 (en) * 2004-12-10 2006-06-15 Liang Peng Two pass architecture for H.264 CABAC decoding process
US20070091747A1 (en) * 2005-10-25 2007-04-26 Teac Corporation Optical disk apparatus

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130243102A1 (en) * 2011-01-14 2013-09-19 Ntt Docomo, Inc. Method and apparatus for arithmetic coding and termination
US10070127B2 (en) * 2011-01-14 2018-09-04 Ntt Docomo, Inc. Method and apparatus for arithmetic coding and termination
US10992312B2 (en) 2014-12-10 2021-04-27 Samsung Electronics Co., Ltd. Semiconductor device and operating method of matching hardware resource to compression/decompression algorithm
CN107534772A (en) * 2015-05-19 2018-01-02 联发科技股份有限公司 The method and device of context adaptive binary arithmetic coding based on multilist
EP3269141A4 (en) * 2015-05-19 2018-09-12 MediaTek Inc. Method and apparatus for multi-table based context adaptive binary arithmetic coding
US10225555B2 (en) 2015-05-19 2019-03-05 Mediatek Inc. Method and apparatus for multi-table based context adaptive binary arithmetic coding
US10742984B2 (en) 2015-05-19 2020-08-11 Mediatek Inc. Method and apparatus for multi-table based context adaptive binary arithmetic coding
CN111614957A (en) * 2015-05-19 2020-09-01 联发科技股份有限公司 Entropy coding and decoding method and device for image or video data
US10419772B2 (en) 2015-10-28 2019-09-17 Qualcomm Incorporated Parallel arithmetic coding techniques

Also Published As

Publication number Publication date
US8306125B2 (en) 2012-11-06
US20090196355A1 (en) 2009-08-06

Similar Documents

Publication Publication Date Title
US8306125B2 (en) 2-bin parallel decoder for advanced video processing
JP7164692B2 (en) Sample array encoding for low latency
US7932843B2 (en) Parallel CABAC decoding for video decompression
JP3224164B2 (en) decoder
US7573951B2 (en) Binary arithmetic decoding apparatus and methods using a pipelined structure
KR100624432B1 (en) Context adaptive binary arithmetic decoder method and apparatus
CA2682315C (en) Entropy coding for video processing applications
JP5043033B2 (en) Decryption system and method
JP5390004B2 (en) Method for generating state machine for probability estimation, arithmetic encoder, arithmetic decoder, and decoding method
US8634474B2 (en) CABAC macroblock rewind and end of slice creation to control slice size for video encoders
US9001882B2 (en) System for entropy decoding of H.264 video for real time HDTV applications
JP5264706B2 (en) Arithmetic decoding method and device
JP2009260977A (en) Video data compression using combination of irreversible compression and reversible compression
JP2009534886A5 (en)
WO2008034094A2 (en) Entropy processor for decoding
KR101082184B1 (en) A second deblocker in a decoding pipeline
TW201724852A (en) Parallel arithmetic coding techniques
US6674376B1 (en) Programmable variable length decoder circuit and method
KR101151352B1 (en) Context-based adaptive variable length coding decoder for h.264/avc
US20220109891A1 (en) Features of range asymmetric number system encoding and decoding
US5644504A (en) Dynamically partitionable digital video encoder processor
US7801935B2 (en) System (s), method (s), and apparatus for converting unsigned fixed length codes (decoded from exponential golomb codes) to signed fixed length codes
US20060222085A1 (en) System(s), methods(s), and apparatus for extracting slices from bitstream
US7610325B2 (en) System(s), method(s), and apparatus for detecting end of slice groups in a bitstream

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: DIGITAL VIDEO SYSTEMS INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MSTAR SEMICONDUCTOR, INC.;REEL/FRAME:036289/0601

Effective date: 20150715