TW200809689A - Decoding system and graphics processing unit - Google Patents

Decoding system and graphics processing unit Download PDF

Info

Publication number
TW200809689A
TW200809689A TW96120728A TW96120728A TW200809689A TW 200809689 A TW200809689 A TW 200809689A TW 96120728 A TW96120728 A TW 96120728A TW 96120728 A TW96120728 A TW 96120728A TW 200809689 A TW200809689 A TW 200809689A
Authority
TW
Taiwan
Prior art keywords
decoding
module
docket
unit
variable length
Prior art date
Application number
TW96120728A
Other languages
Chinese (zh)
Other versions
TWI354239B (en
Inventor
Hussain Zahid
Brothers John
Huy Bui Duc
Original Assignee
Via Tech Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Via Tech Inc filed Critical Via Tech Inc
Publication of TW200809689A publication Critical patent/TW200809689A/en
Application granted granted Critical
Publication of TWI354239B publication Critical patent/TWI354239B/en

Links

Landscapes

  • Image Generation (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

Various embodiments of decoding systems and methods are disclosed. One system embodiment, among others, comprises a software programmable core processing unit having a variable length decoding unit (VLD) unit configured to execute a shader, the shader configured to selectively implement decoding of a video stream coded based on a plurality of different coding methods to provide a decoded data output, wherein the decoding is implemented using a combination of software and hardware.

Description

200809689 九、發明說明: 【發明所屬之技術領域】 本發明係有關於資料處理系統,特別是有關於可編程 圖形處理系統以及方法。 I } 【先前技術】 電腦圖形是用電腦產生圖像、影像或是其他圖形或圖 Φ 像資訊的一種技術。目前,許多的圖形系統是透過介面的 使用而實施,例如:微軟的Direct3D介面、OpenGL等, 其可在執行特定操作系統(例如:微軟的視窗系統)的電 腦上對多媒體硬體(例如:圖形加速器或是圖形處理單元 (graphics processing unit,GPU)提供控制。圖像或是影像 的產生一般稱之為描繪成像(rendering ),上述操作的細 節主要是經由圖形加速器所實施。一般而言,在三維(three dimensional,3D )電腦圖形中,場景内物件表面(或容體) φ 所表示的幾何被轉換成像素(圖像元素),並儲存在圖框 緩衝器(frame buffer)内,接著顯示於顯示裝置上。每個 物件或是物件群都有與表面外觀有關的特定視覺性質(例 如:材料、反射係數、形狀、紋理(texture )等),其可 被定義成物件或物件群的描繪成像内容(rendering context) 〇 電腦圖形用以增加消費者對遊戲及其他多媒體產品的 控制性及特色的要求、產生更加真實的影像以及改善處理 速度及耗能。現已發展出許多標準,可以利用較少的位元200809689 IX. DESCRIPTION OF THE INVENTION: TECHNICAL FIELD OF THE INVENTION The present invention relates to data processing systems, and more particularly to programmable graphics processing systems and methods. I } [Prior Art] Computer graphics are a technique for generating images, images, or other graphics or images from a computer. At present, many graphics systems are implemented through the use of interfaces, such as Microsoft's Direct3D interface, OpenGL, etc., which can be used for multimedia hardware (such as graphics) on a computer running a specific operating system (for example, Microsoft's Windows system). Accelerators or graphics processing units (GPUs) provide control. Image or image generation is generally referred to as rendering, and the details of the above operations are primarily implemented via graphics accelerators. In three-dimensional (3D) computer graphics, the geometry represented by the surface (or volume) of the object in the scene is converted into pixels (image elements) and stored in the frame buffer, and then displayed. On the display device, each object or group of objects has specific visual properties related to the appearance of the surface (eg, material, reflection coefficient, shape, texture, etc.), which can be defined as a depiction of the object or group of objects. Rendering context 〇Computer graphics to increase consumer-to-game and other multimedia products Regulatory requirements and characteristics, resulting in more realistic images and improve processing speed and power consumption. Has developed a number of criteria, you can use fewer bits

Client’s Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 5 200809689 、數來產生較佳品質的影像。這些標準之一的Η·264標準(亦 為 ISO 動晝專豕群(motion picture experts group,MPEG ) 的第十部份)為高壓縮數位視頻編碼譯碼(codec)標準。 相較於MPEG-2編碼器,h.264相容之編碼譯碼器僅使用 幾乎三分之一的位,元數來編碼視頻並維持相似的視頻品 貝。規格提供兩種型式的熵(entropy)編碼處:理, 包括内谷適應二進位算術編碼(c〇ntext_acjaptive binary ❿ arithmetic coding,CABAC)以及内容適應可變長度編碼 (context-adaptive variable length coding ? CAVLC) 〇 為了滿足這些連續變化的需要,已提出了許多不同的 純軟體或是純硬體解決方式,然而,已知技術皆會導致較 高的庫存、立即淘汰的技術以及在設計上缺乏彈性。 【發明内容】 本發明揭露用於圖形處理單元之多執行序平行計算核 心之解碼系統以及方法。本發明提供一系統,包括一軟體 釀可編程核心處理單元,具有一可變長度解碼單元,用以執 行一著色器,上述著色器係選擇性地執行一視頻串流之一 解碼步驟以輸出-解碼資料,其中上述視頻串流係根據内 容適應二進位算術編碼(CABAC)、内容適應可變長度編 碼(CAVLC)、EXP-Golomb、動晝專家群(MpEG_2)以 及VC-1彳示準而得,且上述解碼步驟係使用軟體以及硬體 之一組合而執行。 本發明提供另一系統,包括一圖形處理單元耦接至一 主機:處理器以及θ己’丨思體,上述圖形處理單元包括一圖形處Client's Docket No.: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 5 200809689 , number to produce better quality images. One of these standards, the 264264 standard (also the tenth part of the ISO motion picture experts group (MPEG)) is the high compression digital video coding (codec) standard. Compared to MPEG-2 encoders, h.264-compatible codecs use only a third of the bits, the number of elements to encode video and maintain similar video products. The specification provides two types of entropy coding: ,, including 内ntext_acjaptive binary ❿ arithmetic coding (CABAC) and content-adaptive variable length coding (CAVLC). 〇In order to meet these continuous changes, many different pure software or pure hardware solutions have been proposed. However, the known techniques lead to higher inventory, immediate elimination technology and lack of flexibility in design. SUMMARY OF THE INVENTION The present invention discloses a decoding system and method for a multi-execution sequential parallel computing core for a graphics processing unit. The present invention provides a system comprising a software-programming core processing unit having a variable length decoding unit for performing a shader, the shader selectively performing a video stream decoding step to output - Decoding data, wherein the video stream is obtained according to content adaptive binary arithmetic coding (CABAC), content adaptive variable length coding (CAVLC), EXP-Golomb, dynamic expert group (MpEG_2), and VC-1. And the above decoding steps are performed using a combination of software and hardware. The present invention provides another system including a graphics processing unit coupled to a host: a processor and a θ 丨 丨 body, the graphics processing unit including a graphics

Client’s Docket No·: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 6 200809689 、理器,具有一軟體可編程核心處理單元,包括一或多個執 行單元’上述一或多個執行單元、包括執行單元資料路握硬 體,其包括一可變長度解碼單元,上述可變長度解碼單元 用以執行一著色器,上述著色器根據内容適應二進位算術 編碼、内容適,應可變長度編碼、EXP-Golomb、MPEG-2以 1 及VC-1標準:選擇性地執行一視頻串流編碼之解碼以提供 一解碼過之資料輪出。 ^ 【實施方式】 為讓本發明之上述和其他目的、特徵、和優點能更明 顯易懂,下文特舉出較佳實施例,並配合所附圖式,作詳 細說明如下: 實施例: 、 本發明揭露解碼系統以及方法的許多實施例(其中, 上述系統及方法將統稱為解碼系統)。在一實施例中,解 ⑩ 碼糸統係内嵌於圖形處理單元(graphics processing unit, GPU)之可編程、多執行序(muitithread)以及平行計算核 心之一或多個執行單元中。使用軟體或硬體之結合以實施 解碼功能。即視訊解碼是在圖形處理單元程式設計 (programming )的内容(context)以及圖形處理單元資料 路徑内的硬體實施所完成。例如,在一實施例中,解碼運 算或方法係由具有擴充指令集(extended instruction set) 之著色器(shader )(例如:頂點著色器)、圖形處理單元 的執行單元資料路徑、以及用於位元流緩衝器之自動管理Client's Docket No:: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 6 200809689, processor with a software programmable core processing unit, including one or more execution units 'one or more of the above The execution unit includes an execution unit data path holding hardware, and includes a variable length decoding unit, wherein the variable length decoding unit is configured to execute a shader, and the shader is adapted to the binary arithmetic coding according to the content, and the content is suitable. Variable length coding, EXP-Golomb, MPEG-2 to 1 and VC-1 standards: Decoding of a video stream encoding is selectively performed to provide a decoded data round. The above and other objects, features, and advantages of the present invention will become more apparent from the embodiments of the invention. The present invention discloses many embodiments of decoding systems and methods (wherein the above systems and methods will be collectively referred to as decoding systems). In one embodiment, the decimation system is embedded in one or more execution units of a programmable processing unit (GPU), a muitithread, and a parallel computation core. Use a combination of software or hardware to implement the decoding function. That is, video decoding is done in the context of programming of the graphics processing unit and the hardware implementation in the data path of the graphics processing unit. For example, in one embodiment, the decoding operation or method is performed by a shader having an extended instruction set (eg, a vertex shader), an execution unit data path of the graphics processing unit, and a bit for Automatic management of the stream buffer

Client’s Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 7 200809689 ^的額外硬體所實施。相較於現有系統,現有系統為處理純 硬體或純軟體為主的解決方式,因此會遇到於先前技術中 所提到的一些問題。 在本文所描述的解碼系統中,可實施使用複數熵編碼 技術之資,訊解碼的編碼動作。解碼系統可根據著名之國際 電信聯盟通訊標準部門(international telecommunication union telecommunication standardization sector,ITU-T) Η·264 b準的CABAC以及CAVLC進行解碼,亦可根據 春 MPEG-2以及VC-1標準進行解碼。不同的解碼系統實施例 係根據複數模式之一而操作,其中各模式係對應於先前所 描述的標準之一並根據執行一或多個從圖形處理單元圖框 緩衝記憶體或對應於主機處理器之記憶體(例如主機中央 處理單元(central processing unit,CPU))所接收到的指 令集(例如經由預先載入(prel〇ad )等已知機制或是快取 失敗)。可重新使用硬體以提供多種型式的解碼標準(即 鲁根據所選擇的模式)。再者,所選擇的模式亦會對初始化、 使用和/或更新内容記憶體的方式造成影響。 根據解碼的啟動模式,解碼系統可使用如Exp-Gol〇mb 編碼、像霍夫曼(Huffman)的編碼(例如:。人¥1^、]\/^£0-2 以及VC-1)和/或算術編碼(例如:caBAC)。藉由延伸 對應於或夕執彳亍卓元的指令集,以及提供額外的自動管 理位元流之硬體來執行熵解碼方法,以在CAVLV解碼以 及CABAC解碼中執行内容模型。在一實施例中,熵編碼 表係使用不同的s己憶體表格或是其他的資料結構(例如唯Client's Docket No.: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 7 200809689 ^ Additional hardware implementation. Compared to existing systems, existing systems are solutions that deal with pure hardware or pure software, and therefore encounter some of the problems mentioned in the prior art. In the decoding system described herein, the encoding operation of the encoding using the complex entropy encoding technique can be implemented. The decoding system can be decoded according to the well-known international telecommunication union telecommunication standardization sector (ITU-T) C·264 b CABAC and CAVLC, and can also be decoded according to the spring MPEG-2 and VC-1 standards. . Different decoding system embodiments operate in accordance with one of a plurality of modes, wherein each mode corresponds to one of the previously described standards and buffers memory or corresponds to a host processor according to execution of one or more slave graphics processing unit frames The set of instructions received by the memory (eg, the central processing unit (CPU)) (eg, via a known mechanism such as preload (prel〇ad) or a cache failure). The hardware can be reused to provide multiple types of decoding standards (ie, depending on the mode selected). Furthermore, the mode selected will also affect the way in which the content memory is initialized, used, and/or updated. Depending on the startup mode of the decoding, the decoding system can use, for example, Exp-Gol〇mb encoding, encoding like Huffman (eg: .¥1^,]\/^£0-2 and VC-1) and / or arithmetic coding (for example: caBAC). The entropy decoding method is performed by extending the instruction set corresponding to the 或 彳亍 彳亍 ,, and providing an additional hardware that automatically manages the bit stream to perform the content model in CAVLV decoding and CABAC decoding. In an embodiment, the entropy coding table uses different s replied forms or other data structures (eg, only

Clients Docket No.: S3U06-0013-TW TT^s Docket N〇:0608-A41246twf.doc/NikeyChen 8 200809689 % •讀記憶體(read only memory,ROM)表)。 此外,自動位元流鍰衝器具備一些優點,例如,一旦 位元流緩衝器的直接記憶體存取(ciirect memory aeeess, DMA)弓I擎得知位元流的位置(位址),便會自動管理位 兀流,而不需要進一步的指令。相較於傳,統的微處理器/數位 4吕號處理器(digital signal processor,DSP)系統,位元许 管理代表了大量的間接費用。再者,透過追蹤所使用的位 元數量,位元流缓衝器機制可以偵測和處理錯誤的位元流。 _ 本發明解碼系統實施例的另一優點是將指令延遲 (latency)減縮到最小。例如,因為CABAC解碼是非常 連續的動作且不易利用多執行序處理,因此在不同實施例 中使用一種轉發(forwarding)機制(例如暫存轉發)以減 少有效相依延遲。進一步解釋,許多深管線(deep_pipeiine ) 以,多執行序處理器的限制是無法在同一執行序“had) 中每一週期内執行指令。有些系統可使用一般轉發,其係 φ 藉由心查先韵結果的運异元(0Perand )位址以及指令運算 元位址,當兩者相同時,則使用先前結果的運算元。傳: ^ ’ 一般轉發需要複雜的比較和多工。在解碼系統的部分 實施例中,不管是使用先前的計算結果(例如儲存在内部 之暫存器)或是原始運算元的資料,將利用不同的轉發型 式來使用指令中的位元以編碼,例如:總政2位元而每一 運算元使用1位元。藉由這種方式,可以減少整體的延遲 而改善處理器管線的效率。 第1圖係顯示圖形處理系統1〇〇之一實施例的方塊Clients Docket No.: S3U06-0013-TW TT^s Docket N〇:0608-A41246twf.doc/NikeyChen 8 200809689 % • Read only memory (ROM) table). In addition, the automatic bit stream buffer has some advantages, for example, once the bit stream buffer's ciirect memory aeeess (DMA) knows the location (address) of the bit stream, The bit stream is automatically managed without further instructions. Compared to the transmission, the digital signal processor (DSP) system, the bit management allows a large amount of overhead. Furthermore, by tracking the number of bits used, the bitstream buffer mechanism can detect and process the wrong bitstream. Another advantage of embodiments of the decoding system of the present invention is that the instruction latency is reduced to a minimum. For example, because CABAC decoding is a very continuous action and it is not easy to utilize multiple execution order processing, a forwarding mechanism (e.g., temporary forwarding) is used in different embodiments to reduce the effective dependent delay. Further explanation, many deep pipelines (deep_pipeiine), the limitation of multiple execution processors is that they cannot execute instructions in each cycle of the same execution sequence "had". Some systems can use general forwarding, which is based on φ. The heterogeneous (0Perand) address of the rhyme result and the instruction operand address. When the two are the same, the operand of the previous result is used. Pass: ^ 'General forwarding requires complex comparison and multiplexing. In the decoding system In some embodiments, whether using previous calculation results (such as stored in internal registers) or raw operand data, different forwarding patterns are used to encode the bits in the instruction, for example: General Administration 2 bits and 1 bit for each operand. In this way, the overall delay can be reduced to improve the efficiency of the processor pipeline. Figure 1 shows a block of an embodiment of the graphics processing system 1

Client’s Docket No·: S3U06-0013-TW TT s Docket No:0608-A41246twf.doc/NikeyChen 9 200809689 〜圖’其中解碼系統以及方法的實施例於圖形處理系統1 〇〇 中實施。在部分實施例中,圖形處理系統100可以是電月每 系統。圖形處理器系統100可包括由顯示介面單元(displ_ interface unit,DIU) 104驅動的顯示裝置102以及局部記 ,憶體106 (例如:可包括顯示緩衝器、圖框緩衝器、纹理 :缓衝器、命令緩衝器等)。局部記檍體106亦可取代為圖 框緩衝器或是儲存單元。局部記憶體106經由一或多個記 憶介面單元(memory interface unit,MIU) 110 _馬接於圖 _ 形處理單元114。在一實施例中,記憶介面單元11〇、圖形 處理單元114以及顯示介面單元104皆麵接至與高速週邊 組件互連(peripheral component interconnect express, PCI-E)相容之匯流排介面單元(bus interface unit,BIU) 118。在一實施例中,匯流排介面單元118可使用圖形位址 重新映射表(graphics address remapping table,GART), 然而亦可使用其他的記憶映射(mapping )機制。圖形處理 單元114包括解碼系統200,其將描述於後。在部分實施 ® 例中,雖然解碼系統200係顯示為圖形處理單元114内的 一個元件,解碼系統200亦可包括所顯示之圖形處理系統 100的一或多個額外元件或是不同元件。 匯流排介面單元118耦接於晶片組122 (例如:北橋晶 片組)或開關。晶片組122包括介面電子電路以增強來自 中央處理單元126 (又稱主機處理器)的信號,並分離從 系統記憶體124進出的信號以及從輸入輸出(:[/〇)裝置(未 顯示)進出的信號。雖然提到了 PCI-E匯流排協定,然而Client's Docket No:: S3U06-0013-TW TT s Docket No: 0608-A41246twf.doc/NikeyChen 9 200809689 ~ Figure 'The embodiment of the decoding system and method is implemented in the graphics processing system 1 。. In some embodiments, graphics processing system 100 can be a monthly system. The graphics processor system 100 can include a display device 102 driven by a display interface unit (DIU) 104 and a local memory 106 (eg, can include a display buffer, a frame buffer, a texture: a buffer) , command buffer, etc.). The local recording body 106 can also be replaced by a frame buffer or a storage unit. The local memory 106 is coupled to the graphics processing unit 114 via one or more memory interface units (MIUs) 110. In one embodiment, the memory interface unit 11 , the graphics processing unit 114 , and the display interface unit 104 are all connected to a bus component interface unit (bus that is compatible with a peripheral component interconnect express (PCI-E). Interface unit, BIU) 118. In an embodiment, bus interface interface unit 118 may use a graphics address remapping table (GART), although other memory mapping mechanisms may be used. Graphics processing unit 114 includes a decoding system 200, which will be described later. In a partial implementation ® example, although the decoding system 200 is shown as an element within the graphics processing unit 114, the decoding system 200 can also include one or more additional components or different components of the graphics processing system 100 being displayed. The bus interface unit 118 is coupled to the chip set 122 (e.g., a north bridge wafer set) or a switch. Wafer set 122 includes interface electronic circuitry to enhance signals from central processing unit 126 (also known as host processor) and to separate signals entering and exiting system memory 124 and from input and output (:[/〇) devices (not shown). signal of. Although the PCI-E bus protocol is mentioned,

Clienfs Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 10 200809689 /在部分實施例中亦可在主機處理器與圖形處理單元〗14之 間使用其他的連接和/或通訊方式,例如:pci、專屬高速 匯流排荨。系統記憶體124亦包括驅動軟體i2g,其可使 用中央處理單元126將指令集或命令傳送至圖形處理單元 , 114内的暫存器。 r : 在部分實施例中,可透過晶片組122使用額外的圖形 處理單元故由PCI-E匯流排協定|禺接至第1圖中的元件。 在一實施例中,圖形處理單元1〇〇可包括第丨圖所顯示之 所有元件,或是較少元件和/或不同於第1圖所顯示之元 件。再者,在部分實施例中,可使用額外的元件,例如耦 接至晶片組122的南橋晶片組。 參考第2圖,第2圖係顯示實施解碼系統2〇〇之一實 施例之處理環境的方塊圖。特別是圖形處理單元114包括 圖形處理器202。圖形處理器202包括多執行單元 (execution unit,EU)及計算核心204 (亦稱為軟體可編 程核心處理單元)。在一實施例中,計算核心204包括内 _ 山 肷於執行單元資料路徑(execution unit data path,EUDP) 的解碼系統200 (亦稱為VLD單元),其中執行單元資料 路徑被分配至一或多個執行單元。圖形處理器202亦包括 執行單元集合(execution unit pool,EUP)控制、頂點/ 串流快取單元206 (這裡稱為執行單元集合控制單元206) 以及具有固定功能邏輯單元(例如包含三角形設定單元 (triangle set-up unit,TSU )、栅格-圖塊產生器(Span-tile generator,STG)等)的圖形管線208,其將描述於後。計Clienfs Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 10 200809689 / Other connections and/or between the host processor and graphics processing unit 14 may also be used in some embodiments. Communication methods, such as: pci, exclusive high-speed bus. System memory 124 also includes driver software i2g, which can be used by central processing unit 126 to transfer instruction sets or commands to the scratchpad within graphics processing unit, 114. r: In some embodiments, an additional graphics processing unit may be used through the chip set 122 to be connected to the elements in Figure 1 by a PCI-E bus protocol. In one embodiment, the graphics processing unit 1 may include all of the elements shown in the figures, or fewer elements and/or elements other than those shown in FIG. Moreover, in some embodiments, additional components may be utilized, such as a south bridge wafer set coupled to the wafer set 122. Referring to Fig. 2, a second diagram is a block diagram showing a processing environment for implementing an embodiment of the decoding system. In particular, graphics processing unit 114 includes graphics processor 202. Graphics processor 202 includes a multiple execution unit (EU) and computing core 204 (also known as a software programmable core processing unit). In an embodiment, the computing core 204 includes a decoding system 200 (also referred to as a VLD unit) that executes the unit data path (EUDP), wherein the execution unit data path is assigned to one or more Execution units. The graphics processor 202 also includes an execution unit pool (EUP) control, a vertex/streaming cache unit 206 (referred to herein as an execution unit set control unit 206), and a fixed function logic unit (eg, including a triangle setting unit) A graphics pipeline 208 of a triangle set-up unit (TSU), a tile-tile generator (STG), etc., which will be described later. meter

Client’s Docket No,: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 11 200809689 •‘算核心204包括多執行單元之集合以符合不同著色 之著色任務的計算要求,其中著色器程式包括頂 器、幾何著色器和/或像素著色器處理圖形管線2〇8的= 料。在一實施例中,當著色器透過計算核心2〇4執行解= 系統200的功能時,圖,處理器實施例的說明將被描^ 接著說明解碼系統200的:特定實施例。 解碼系統200可以用硬體、軟體、韌體或其組合等方 式而實施。在較佳實施例中,解碼系統200係以硬體以及 • 軟體的方式實施,其包括下列已知技術之任何技術或是結 合:具有邏輯閘且可對資料信號進行邏輯功能的離散邏輯 電路、具有適當組合邏輯閘的特殊應用集成電路 (application specific integrated circuit ^ ASIC)、可程式 j匕 閘極陣歹丨j ( programmable gate array,PGA )、場式可程式 化閘陣列(field programmable gate array,FPGA)以及狀 態機(state machine )等。 參考第3圖以及第4圖’其分別為圖形處理器202之 ⑩ 實施例中選擇元件的方塊圖。如前所述,解碼系統200的 一實施例可以是具有擴充指令集以及額外硬體元件之圖形 處理器202内的著色器,圖形處理器202的一實施例以及 對應的處理將描述於後。雖然第3圖與第4圖並未顯示圖 形處理的全部元件,但是第3圖與第4圖所顯示的元件已 足夠使熟知此技藝之人士理解到相關圖形處理器的功能及 架構。麥考第3圖’可編程處理環境的中心為計算核心 204,其包括解碼系統200並可處理各種指令。不同型式的Client's Docket No,: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 11 200809689 • 'Calculation Core 204 includes a collection of multiple execution units to meet the computational requirements of different colored coloring tasks, where the shader program Include the topper, geometry shader, and/or pixel shader to process the graphics pipeline 2〇8. In one embodiment, when the shader performs the functions of the solution = system 200 through the computing core 2, the description of the processor embodiment will be described below: a particular embodiment of the decoding system 200. The decoding system 200 can be implemented in the form of hardware, software, firmware, or a combination thereof. In a preferred embodiment, decoding system 200 is implemented in a hardware and software manner, including any of the following techniques or combinations of discrete techniques: logic logic having logic gates and logic functions on data signals, Application specific integrated circuit (ASIC) with appropriate combination of logic gates, programmable gate array (PGA), field programmable gate array (field programmable gate array, FPGA) and state machine. Referring to Figures 3 and 4, which are block diagrams of selected elements of the embodiment of graphics processor 202, respectively. As previously mentioned, an embodiment of the decoding system 200 can be a colorizer within the graphics processor 202 having an extended instruction set and additional hardware components, an embodiment of the graphics processor 202 and corresponding processing will be described later. Although Figures 3 and 4 do not show all of the components of the graphics process, the components shown in Figures 3 and 4 are sufficient for those skilled in the art to understand the functionality and architecture of the associated graphics processor. The center of the programmable processing environment is the computing core 204, which includes the decoding system 200 and can process various instructions. Different types

Client’s Docket No·: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 12 200809689 广著色姦程式可執行或映射到計算核心2〇4,例如頂點、 何像素著色裔私式。多重事件(multi_issue)處理器的計 异核心204可以在單一時脈週期内處理多個指令。 茶考第3圖,圖形處理器2〇2的相關元件包括計 〜204—紋理過濾(fl!tering)單元3〇2、像素包|器、(paeke二 3(^、命令,處理器3〇6、寫回單元·3〇8、以及紋理位址產 生=^〇。第3圖亦包括執行單元集合控制單元206,其中 執行單兀木合控制單元2〇6亦包括頂點快取記憶體和/或串 流(Stream)快取記憶體。舉例來說,如第3圖所顯示, 紋理過渡早元3Q2提供紋素(texel)資料給計算核心2〇4 (輸入A以及輸入B)。在部分實施例中,紋素資料為犯 位元資料。 像素包衣為304提供像素著色輸入給計算核心2〇4(輸 入C以及輸入D) ’像素著色輸入亦為犯位元資料格式。 此外#像素包304向執行單元集合控制單元2()6請求 • Ϊ素^任務’而執行單元集合控制單元2G6便會提供指 行單元號碼及執行緒號碼給像素包裝器3〇4。像素包 衣304及紋理過濾單元3〇2為已知的技術,因此將不再 進-步描述於此。雖然第3圖所顯示之像素及紋素封包為 512位元之資料封包,但是依據圖形處理器逝所需的效 能特徵,可在部分實施例中改變封包的大小。 口p令流處理器306提供三角形頂點索引給執行單元集 合控制單元2G6。在第3圖的實施例中,索引為256位元 之貝料。執行單兀集合控制單元2〇6組合來自串流快取記Client’s Docket No·: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 12 200809689 The wide-color program can be executed or mapped to the computational core 2〇4, such as the vertex, the pixel-coloring private. The discrete core 204 of the multi_issue processor can process multiple instructions in a single clock cycle. Tea test 3, the relevant components of the graphics processor 2〇2 include meter ~ 204 - texture filtering (fl! tering) unit 3 〇 2, pixel package | device, (paeke two 3 (^, command, processor 3 〇 6. Write back unit ·3〇8, and texture address generation =^〇. Figure 3 also includes an execution unit set control unit 206, wherein the execution of the single wood control unit 2〇6 also includes vertex cache memory and / or Stream cache memory. For example, as shown in Figure 3, the texture transition early 3Q2 provides texel data to the computational core 2〇4 (input A and input B). In some embodiments, the texel data is committed bit data. The pixel coating provides pixel coloring input to the computing core 2〇4 (input C and input D). The pixel coloring input is also a bite data format. The pixel package 304 requests the execution unit set control unit 2() 6 to request the unit task control unit 2G6 to provide the line unit number and the thread number to the pixel wrapper 3〇4. The pixel coating 304 And texture filtering unit 3〇2 is a known technique, so it will no longer be described In this case, although the pixel and texel packet shown in FIG. 3 is a 512-bit data packet, the size of the packet may be changed in some embodiments according to the performance characteristics required by the graphics processor. The processor 306 provides a triangle vertex index to the execution unit set control unit 2G6. In the embodiment of Fig. 3, the index is 256 bits of material. The execution unit 兀 set control unit 2 〇 6 combines from the stream cache

Clienfs Docket No.: S3U06-0013-TW TT5s Docket No;0608-A41246twf.doc/NikeyChe: 13 200809689Clienfs Docket No.: S3U06-0013-TW TT5s Docket No;0608-A41246twf.doc/NikeyChe: 13 200809689

着I /憶體的頂點著色輸入,並傳送資料至計算核心204 (輸入 E)。執行早元集合控制早元206亦組合幾何著色輸入並傳 送至計算核心204 (輸入F)。執行單元集合控制單元2〇6 亦控制執行單元輸入402及執行單元輪出404 (第4圖)。 換句话說,執彳^單元集合控制早元206控制各輸入流以及 各輸出流至計算核心204。 經過處理之後,計算核心204提供像素著色輸出(輸 出J1與輪出J2)至寫回單元308。像素著色輸出包括色彩 籲 資訊,例如紅/綠/藍/透明度(RGBA)資訊,其為此技藝之人 士所熟知。像素著色輸出可以是兩條512位元之資料流。 其他實施例亦可使用其他的位元寬度。 相似於像素著色輸出,計算核心204亦輸出包括UVRQ 資訊之紋理座標(輸出K1以及輸出K2)至紋理位址產生 器310。紋理位址產生器310發出紋理描述符號請求至計 算核心204的L2快取記憶體408 (輸入X),而計算核心 204的L2快取記憶體408 (輸出W )會輸出紋理描述符號 馨 資料至紋理位址產生器310。紋理位址產生器310及寫回 單元308為已知的技術,因此將不再進一步描述於此。再 者,雖然URVQ及RGBA是顯示為512位元之資料,但是 此參數亦可隨不同實施例而改變。在第三圖的實施例中, 匯流排分成兩條512位元通道,其中各通道保持四像素的 128位元RGBA色彩值及128位元UVRQ紋理座標。 圖形管線208包括固定功能之圖形處理功能。回應來 自驅動軟體128的命令,例如繪出三角形,則頂點資訊通The vertex shader input of the I/memory is transferred and the data is transferred to the computation core 204 (input E). Performing the early element set control early element 206 also combines the geometric shading inputs and passes them to the computing core 204 (input F). The execution unit set control unit 2〇6 also controls the execution unit input 402 and the execution unit rounding 404 (Fig. 4). In other words, the control unit set control element 206 controls each input stream and each output stream to the computing core 204. After processing, computing core 204 provides pixel shading output (output J1 and round-trip J2) to write-back unit 308. Pixel shaded output includes color appeal information, such as red/green/blue/transparency (RGBA) information, which is well known to those skilled in the art. The pixel shaded output can be two 512-bit data streams. Other embodiments may use other bit widths as well. Similar to the pixel shaded output, the compute core 204 also outputs texture coordinates (output K1 and output K2) including UVRQ information to the texture address generator 310. The texture address generator 310 issues a texture description symbol request to the L2 cache memory 408 (input X) of the computation core 204, and the L2 cache memory 408 (output W) of the computation core 204 outputs the texture description symbol xin data to Texture address generator 310. Texture address generator 310 and write back unit 308 are known techniques and will therefore not be further described herein. Furthermore, although URVQ and RGBA are data shown as 512 bits, this parameter may also vary with different embodiments. In the third embodiment, the bus is divided into two 512-bit channels, each of which holds a four-pixel 128-bit RGBA color value and a 128-bit UVRQ texture coordinate. Graphics pipeline 208 includes graphics processing functions for fixed functions. Respond to commands from the driver software 128, such as drawing a triangle, then the vertex information

Ciienfs Docket No.: S3U06-0013-TW TT^s Docket No:0608>A41246twf.doc/NikeyChen 14 200809689 ,;過計异核心204内的頂點著色邏輯單元以實施頂點轉換。 尤其是從物件空間轉換物件成為工作空間和/或螢幕空間 的三角形。三角形%過計算核心204至圖形管線208的三 角形設定單元’其中圖形管線208結合基元(primitive),Ciienfs Docket No.: S3U06-0013-TW TT^s Docket No: 0608> A41246twf.doc/NikeyChen 14 200809689;; vertex shader logic within the core 204 to implement vertex conversion. In particular, the object is transformed from the object space into a triangle of workspace and/or screen space. The triangle % passes over the calculation of the core 204 to the triangle setting unit of the graphics pipeline 208 where the graphics pipeline 208 incorporates primitives,

務,例如:邊界盒(boundin_ box )產生、 揀述(culling)邊緣功能產生(edge function generation) 以及一角瓜層邊易ι|除(Mangle level rejecti〇n)。三角形設 定單兀傳遞貝料1¾形管線2G8中具有圖塊產生功能的栅 格及圖塊產^單%。因此,資料物件被分割成圖塊(例如8 x8 1 & 16專)教傳遞至其他的固定功能單元以執行深度 (例如z值)處硬,例如心值之高階(例如:在相似的程 序下’南階使用的仅元數比低階少)剔除。然後,根據所 接收之紋理及錢資料,將ζ·值傳回至計算核^ 204的像 素著色这輯元件以作為像素著色功能的效能。計算核心2〇4 將已處理之值輸出至位於圖形管線2Q8内的目的單元。在 不同快取體%要更新内部值之前,目的單元用以執行α 測試及模板測試。 值得注意的是,計算核心204的L2快取記憶體408以 及執行單元集合控制單元206之間亦有512位元之頂點快 取記憶體溢出資料的傳輸。此外,從計算核心2〇4輸出兩 個512位元頂點快取記憶體寫入資料(輸出mi及輸出M2) 至執行早元集合控制單元206做進一步的處理。 參考第4圖,第4圖係顯示計算核心204的附加元件 以及相關元件。計算核心204包括執行單元集合412。在For example, the bounding box (boundin_box) generation, culling edge function generation, and Mangle level rejecti〇n. The triangular setting unit transmits the grid and the block yield % of the block-generating function in the 1⁄4-shaped pipeline 2G8. Therefore, the data object is divided into tiles (eg 8 x 8 1 & 16) to pass to other fixed functional units to perform depth (eg z-value) hard, such as high-order of heart values (eg in a similar program) The next 'Southern order uses only the number of elements is less than the lower order." Then, based on the received texture and money data, the ζ· value is passed back to the pixel coloring component of the computation kernel 204 as the performance of the pixel shading function. The calculation core 2〇4 outputs the processed value to the destination unit located in the graphics pipeline 2Q8. The destination unit is used to perform the alpha test and the template test before the different cache body % wants to update the internal value. It should be noted that there is also a 512-bit vertex cache memory overflow data transfer between the L2 cache memory 408 of the compute core 204 and the execution unit set control unit 206. In addition, two 512-bit vertex cache memory write data (output mi and output M2) are output from the computation core 2〇4 to execute the early element set control unit 206 for further processing. Referring to Figure 4, Figure 4 shows additional components of computing core 204 and associated components. The computing core 204 includes a set of execution units 412. in

Clienfs Docket No.: S3U06-0013-TW TT's Docket No;0608-A41246twf.doc/NikeyChen 15 200809689 一實施例中,執并留- •僅(統稱為二412包括-或多個執行單元 合4Π在内處理多個指令。因此,執行單元集 雖然第4 =是f體上同時處理多個執行緒。 T ^ 们執仃早 70 42〇(標示為 EU0-EU7), 分實施例中;:力?二=執行單元的議8’在部 r,lJ+ 9或疋減夕執行單元的數量。至少一個執 灯執行單元420a ’ 包含解碼系統的 一貫鈀例,其將進—步描述於後。 .計Af核了 204亦包括記憶體存取單元(memory access unit,MXU ) 406,1 φ勺卜雕六订 ^ ^ ^ ,、中圮丨忍脰存取單元406經由記憶體介 "λ^Λ〇 _接於U快取記憶體顿。L2快取記憶體 Φ : 早70集合控制單元2Q6接收頂點快取記憶體溢 貧二_入〇),並提供頂點快取記憶體溢出資料(輸 出H)純行單元集合控制單元206。此外,L2快取記憶 體術攸、、紐位址產生益31〇接收紋理描述符號請求(輸 入X i對所接㈣的請求提供紋理描述符號資料(輸 出w)給紋理位址產生器31〇。 記憶體介面仲裁器410對局部視頻記憶體提供控制介 面(例如:晝面緩衝器或是局部記憶體1〇6)。匯流排介 面單元1.18對系統提供如PCI_E M流排的介面。記憶體介 面仲裁器41G以及匯流排介面單元118提供了記憶體以及 L2快取記憶體408之間的介面。在部分實施例中,L2快 取記憶體408經由記憶體存取單元4〇6耦接至記憶體介面Clienfs Docket No.: S3U06-0013-TW TT's Docket No; 0608-A41246twf.doc/NikeyChen 15 200809689 In one embodiment, hold and stay - only (collectively referred to as the second 412 includes - or multiple execution units Handling multiple instructions. Therefore, the execution unit set, although the 4th = is f body, handles multiple threads at the same time. T ^ We are obsessed with 70 42 仃 (labeled EU0-EU7), in the embodiment; 2 = the number of execution units of the 8' execution unit, the number of execution units in the r, lJ + 9 or 疋. At least one of the light execution units 420a' contains a consistent palladium example of the decoding system, which will be described later. The Af core 204 also includes a memory access unit (MXU) 406, 1 φ scoop, and a 订 订 订 ^ ^ ^ ^ ^, 圮丨 圮丨 脰 access unit 406 via memory " λ ^ Λ 〇_Connected to U cache memory. L2 cache memory Φ: Early 70 set control unit 2Q6 receives vertex cache memory overflow _ 〇 〇), and provides vertex cache memory overflow data (output H Pure line unit set control unit 206. In addition, the L2 cache memory, the new address generates a texture request symbol request (the input Xi provides a texture description symbol data (output w) to the texture address generator 31 for the request of the connected (four). The memory interface arbiter 410 provides a control interface for the local video memory (for example, a face buffer or a local memory 1〇6). The bus interface unit 1.18 provides an interface such as a PCI_E M stream to the system. The interface arbiter 41G and the bus interface unit 118 provide an interface between the memory and the L2 cache memory 408. In some embodiments, the L2 cache memory 408 is coupled to the memory access unit 4〇6 via the memory access unit 4〇6. Memory interface

Client’s Docket No,: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 16 200809689 仲裁器410與匯流排介面單元11δ。記憶體存取單元傷 將從L2快取記憶體顿以及其他區塊得 址轉換成實際記憶體位址。 吸妝位 記憶體介面仲裁器對L2快取記憶㈣8提供兮己情 蹲存取(例如讀出/寫入存取)、指%常數/資料/紋理喊 取、直接汜憶體存取(例如載入/儲存)、暫存存取的索引、 暫存态&出以及頂點快取記憶體内容溢出等。 …Client's Docket No,: S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen 16 200809689 Arbiter 410 and bus interface unit 11δ. The memory access unit will be converted from the L2 cache memory and other block addresses to the actual memory address. The makeup memory interface arbitrator provides access to the L2 cache (4) 8 for access (eg read/write access), % constant/data/texture shouting, direct access (eg Load/storage), index of temporary access, temporary storage & output, and vertex cache memory overflow. ...

及執行單元輸出404可以是交叉開關(cr〇ssbar)或是其他 匯流排,或是其他已知的輸入與輪出架構。 計算核心204更包括執行單元輸入4〇2以及執行單元 輸出404,並分別用於提供輸入給執行單元集合412以及 接收來自執行單兀集合412的輪出。執行單元輸入4〇2以 執行單元輸入402接收來自於執行單元集合控制單元 206的頂點著色輸入(輸入E)以及幾何著色輸入(輸入F), 並提供資訊給執行單元集合412以供各執行單元42〇進行 處理。此外,執行單元輸入402接收像素著色輸入(輪入 C與輸入D)以及紋素封包(輸入a與輸入B),並將這 些封包傳送至執行卓元集合412以供各執行單元420進行 處理。再者,執行單元輸入402從L2快取記憶體408接收 資訊(L2讀取),以及當需要時將這些資訊提供給執行單 元集合412。 在第4圖之實施例中,執行單元輸出404被分配成偶 輸出404a以及奇輸出404b。相似於執行單元輸入402,執 行單元輸出404可以是交叉開關、匯流排或是其他已知的And the execution unit output 404 can be a crossbar switch (cr〇ssbar) or other busbars, or other known input and wheeling architectures. The compute core 204 further includes an execution unit input 4〇2 and an execution unit output 404, and is used to provide input to the execution unit set 412 and receive rounds from the execution unit set 412, respectively. Execution unit input 4〇2 to execute unit input 402 receives vertex shading input (input E) and geometric shading input (input F) from execution unit set control unit 206, and provides information to execution unit set 412 for each execution unit 42〇 for processing. In addition, execution unit input 402 receives pixel shading inputs (round in C and input D) and texel packets (input a and input B) and passes the packets to execution tuple set 412 for processing by execution unit 420. Again, execution unit input 402 receives information from L2 cache 408 (L2 read) and provides this information to execution unit set 412 when needed. In the embodiment of Figure 4, the execution unit output 404 is assigned an even output 404a and an odd output 404b. Similar to execution unit input 402, execution unit output 404 can be a crossbar, busbar, or other known

Client’s Docket No·: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 17 200809689 .架構。執行單元偶輸出·a處理偶執行_德、倒卜 4=以及42Gg的輸出,而執行單元奇輪出魏處理奇執 = ::、伽、42〇f以及42〇h的輪出。執行單元偶 輸及執行單元奇輪出魏共同地接收來自“ =::;至1T2广輸出’例如UVRQ以及腿A, ^ =傳至L2快取記憶體4Q8、或是核; κ 1於輸出至寫回單元3㈣是經由輪ΐ Κ1及輸出Κ2輸出至紋理位址產生器。 出 執行單元集合412的勃杆 級,其包括:描緣内容層級、執行緒===括多個層 令或執行層級。在任—時間點,;_矛力θ、、及’以及指 個描繪内容,其中藉由使用—位:旗;准許兩 別内容。在屬於這個内容的任務開始二二識 亡控制單元206傳遞内容資訊。内容層:資::早:集 器種類、輸入/輸出暫存器的數量、指:起奸位匕括著色 射表、頂點識別符以及個別常數緩衝⑼㈣數、f出映 兀集合412的各執行單元42〇可同 。執行單 緒(例如在部分實施例中有32個執_子。或執行 各執行緒係根據程式計數H來提取指令。 4例中, 執行單元集合控制單元2G6可^為任 ::資料驅動(data_dnven)方法(例如:‘ 像素以及幾何封包)來指派執行單元420 _頁點、 緒。舉例來說,執行單元集合控制單元206指派^執行 給執行單元集合4!2之各執行單元42。内的—空執Client’s Docket No·: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 17 200809689 . Architecture. Execution unit even output · a processing even executes _ de, 倒 4 = and 42Gg output, while the execution unit odd rounds out Wei processing odd = =, gamma, 42 〇 f and 42 〇 h rounds. The execution unit and the execution unit oddly receive the common reception from "=::; to 1T2 wide output" such as UVRQ and leg A, ^ = to L2 cache memory 4Q8, or core; κ 1 to output The write back unit 3 (4) is output to the texture address generator via the rim Κ 1 and the output Κ 2. The pulsing level of the execution unit set 412 includes: a content level, a thread === includes multiple layers or Execution level. At the time-time point, _spray force θ, , and 'and the description of the content, by using the - bit: flag; permit two different content. In the task belonging to this content start two or two death control unit 206 Passing content information. Content layer: Capital:: Early: Collector type, number of input/output registers, refers to: Rape position including coloring table, vertex identifier, and individual constant buffer (9) (four) number, f image Each execution unit 42 of the set 412 can be the same. Execution of a single thread (for example, 32 executions in some embodiments) or execution of each thread is based on the program count H to extract instructions. In 4 cases, the execution unit set Control unit 2G6 can be any:: data drive (data_ The dnven) method (eg, 'pixels and geometry packets') assigns execution units 420_page points. For example, the execution unit set control unit 206 assigns ^ to each execution unit 42 of the execution unit set 4!2. Airborne

Clients Docket No.: S3U06-0013-TW s 〇cket No;0608**A41246twf.doc/NikeyChenClients Docket No.: S3U06-0013-TW s 〇cket No;0608**A41246twf.doc/NikeyChen

Clieni’c ________ 18 200809689 (slot)。當開始執行執行緒之後 他元件或是模組(根據著多哭、^陕取圮恤體、其 在通用暫存缓衝器中 。'"色..種類)所提供的資料將放置 理器2〇2係使用 像素綾衝斋。不把這些元件卷 成何以及 的個別固定功能單it而實施I #具有不同設計以及指令集 是藉由具找—指令集之執料元佩、4= 4tr而 合來執行這些操作。除了執行單元42 括On的集 200,因此具有額外的功能)之外,夂執^括〜碼糸統 相同並且用於編程操作。在—實_中仃=420的設計 可同時地進行多執行緒操作。當頂點::早兀420 以及像素著色器產生不同的著色任務時,著色器 傳送至個別的執行單元_去執行。在使用;^ ^將 -實施例中’解碼系統2〇〇可以被實施,_ :者色益的 和/或與其他執行單元42〇有差 二/、有部分修改 統200的執行單元(例如:執牛^^兄,包含解碼系 元(例如:執觀鶴他執行單 使用一解碼系統鹰。而其他執行單元與執早70 420a 同的地方是在於一或多個對庫 丁早凡42 0a不 200安排。解碼系、统的資;^ ^衝器中解碼系統 曰〕貝枓係猎由連接413以及執行 單元輸入402從記憶體存取單元4〇6所接收。 當個別任務產生時,執行單元集合控制單元2〇6會指 派這些任務給不同執行單元42〇中可使用的執行緒。當任 務完成時,執行單元集合控制單元206進一步管理相關執Clieni’c ________ 18 200809689 (slot). After the execution of the thread, his component or module (based on how much crying, ^Shaping the t-shirt, its in the general temporary buffer, '" color.. type) will be placed 2〇2 uses pixel 斋 斋. These are implemented by not implementing the individual fixed function sheets of these components. I have different designs and instruction sets. These operations are performed by means of a look-up instruction set, and 4=4tr. Except that the execution unit 42 includes the set 200 of On, and therefore has an additional function), it is the same as the code system and is used for the programming operation. In the design of -real_zhong仃=420, multiple thread operations can be performed simultaneously. When vertex:: early 420 and the pixel shader produce different coloring tasks, the shader is passed to the individual execution unit_ to execute. In the embodiment, the 'decoding system 2' can be implemented, _: the color of the syllabus and/or the other execution unit 42 〇 has two /, some of the execution unit of the modified system 200 (for example) : Hold the cow ^^ brother, including the decoding system element (for example: observing the crane he performs a single use of a decoding system eagle. The other execution unit is the same as the early 70 420a is one or more pairs of Kuding 0a is not 200. The decoding system, the system of the system; the decoding system in the buffer] is received by the connection 413 and the execution unit input 402 from the memory access unit 4〇6. The execution unit set control unit 2 6 assigns these tasks to threads that can be used in different execution units 42. When the task is completed, the execution unit set control unit 206 further manages the relevant executions.

Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 19 200809689 订緒的,放。就這點而言’執行單元集合控制單元指 著U '幾何著色器以及像素著色器的任務給不同 mi 420的執行緒’並紀錄相關的任務以及執行緒。 執仃單凡集合控制單元2Q6會維持全部執行單元 隹緒以及記憶體的資源表(未顯示)。執行單元 了田彳單^ Μ6會明確知道哪-個執行緒被指派給任務 田 w執行緒結束後哪-個執行緒會被釋放、多少共 y· I I I h 案 5己 體暫存器(register file mem°ry register ) 田卜,以及每一個執行單元有多少閒置空間可使用。 日士,= 1#指派任務給執行單元(例如執行單元伽) 了仃早兀集合控制單元2〇6將標示 並將全部可使用的共用暫存器檀案記減丁=2 頂點著色器、幾*:覆蓋區是由 〜。里本 象素者色态的狀悲而設定或氺 了^占著色^著色器狀態可以有不同的覆蓋區大小。例如^ :頁點:色_者可以要求10個共用暫存器槽 ’而像素者色器執行緒可以僅:存 暫存器。 化、用暫存态襠案 當執行緒完成其被指派的工作時, 料元42G會發出信號給執行單元集合控制=仃206的執 者’,行k集合控制單元會更新資源表以標七= 订緒未使用,並將全部執行緒共用暫存㈣案 2 加回至^空間。當所有的執行緒都是忙顧是所有^ 用暫存憶體雜分配時(或是剩下的暫存器^Client’s Docket No.: S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen 19 200809689 Preset, put. In this regard, the execution unit set control unit refers to the U 'geometry shader and the task of the pixel shader to the threads of different mi 420' and records related tasks and threads. The execution control unit 2Q6 maintains the execution unit and the resource table (not shown) of the memory. The execution unit has a field single ^ Μ 6 will know exactly which thread is assigned to the task field w. After the thread is finished, which thread will be released, how many y· III h case 5 own register (register File mem°ry register ) The data, as well as how much free space each execution unit has. Japanese, = 1# assigns the task to the execution unit (for example, the execution unit gamma). The collection control unit 2〇6 will mark and reduce all available shared register files. The vertex shader is reduced to 2 vertex shaders. A few *: The coverage area is made up of ~. The color of the pixel is set or ambiguous. The shader state can have different footprint sizes. For example, ^ : page point: color _ can request 10 shared register slots ‘ and pixel chrominator threads can only save: scratchpad. When the thread completes its assigned work, the element 42G will signal the execution unit set control = 仃 206's holder', and the line k set control unit will update the resource table to mark the seventh. = The order is not used, and all the thread sharing temporary storage (4) case 2 is added back to the ^ space. When all the threads are busy, all are used when the temporary memory is allocated (or the remaining registers ^)

Client’s Docket No.: S3U06-0013-丁W TT’s Docket No:0608-A41246twf.doc/NikeyChen 20 200809689 法容納額外的執行緒時),執行單元420被視為 額外dt*執行單元集合控制單元2G6將不會指派任何 貞卜次疋·執行緒給魏行單元。 管理或;^執仃早凡42G内部亦有—個執行緒控制器以負責Client's Docket No.: S3U06-0013-Ding W TT's Docket No: 0608-A41246twf.doc/NikeyChen 20 200809689 When the method accommodates additional threads), the execution unit 420 is treated as an additional dt* execution unit set control unit 2G6 will not Will assign any 疋 疋 执行 执行 to the Wei line unit. Management or; ^ 仃 仃 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42

就這,點而;:2 ^土用 I 解碼***;㈤ 貫施例中,當頂點著色器正在執行 避免幾何h ί功能時,執行單元#合控制單元206可以 # 色為以及像素著色器在同一時間被執行。 204 #%^圖係顯示具有前述圖形處理器202以及計算核心 _次、、、執行單兀42〇a,其包括内嵌解碼系統200的執 灯單兀賢料路徑512。呈體來士、 的方塊圖。在“μΓφ t 5A圖疋執行單元420a 立鱗々 在貝施例中,執仃單元42〇a包括指令快取記 L脰铨制為504、耦接於指令快取記憶體控制器5〇4的 ^緒控制器506、緩衝器508 (例如:常數緩衝器)、共用 暫存斋檔案(common register file,CRF) 510、耦接於執 仃緒控制器506和缓衝器508以及共用暫存器檔案51〇的 執行單元資料路徑(EU datapath,EUDP) 512、執行單元 資料路授先進先出緩衝器(first in first out,FIFO ) 514、 述 ^暫存器檔案(predicate register file,PRF ) 516、純量 暫存器檔案(scalar register file,SRF) 518、資料輸出抑 制器520以及執行緒任務介面524。如前所述,執行單元 420從執行單元輸入402接收輸入,並提供輸出給執行單 元輸出404。 執行緒控制器506提供執行單元420a的控制功能,其In this case, point;; 2 ^ soil I decoding system; (5) In the embodiment, when the vertex shader is performing the avoidance geometry function, the execution unit #合控制 unit 206 can be #色为 and the pixel shader is Executed at the same time. The 204#%^ diagram display has the aforementioned graphics processor 202 and a computing core _, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , A block diagram of the body of the scholar. In the case of "μΓφ t 5A 疋 疋 疋 疋 420 々 々 々 々 々 々 々 々 々 々 々 々 々 〇 〇 〇 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 504 The controller 506, the buffer 508 (for example, a constant buffer), the common register file (CRF) 510, the switch controller 506 and the buffer 508, and the shared register The file data path (EU datapath, EUDP) 512 of the file 51, the first in first out (FIFO) 514, and the predicate register file (PRF) 516. , a scalar register file (SRF) 518, a data output suppressor 520, and a thread task interface 524. As previously described, the execution unit 420 receives input from the execution unit input 402 and provides an output to the execution unit. Output 404. The thread controller 506 provides a control function of the execution unit 420a, which

Client’s Docket No,: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 21 200809689 ‘包括管理各執行緒的功能月划 ‘行執行绪。勃〆 崎功能’例如決定如何執 早侧路徑512包括解碼系請,將 人田m,其通常包括執行不同計算的功能,並勺 含像f雜以及整數計算邏輯單元(arithmetlc loglc ALU)邏輯功能等的邏輯電路。 單元ΓΠ。!控制器52〇將已完成之資料移至耦接於執行 件’例如執行單元集合控制單元: _,讀取5仏體、寫回單元3〇 • 512傳送「任務結束 $ w執订早兀貝抖路徑 知任務已完成。資料輪出^器貝〇輪 成的任務(例如32項目(論⑴包3儲存裔以儲存完 料輸出控制器52〇從儲存哭選擇^及硬數個寫入埠。資 容所指定的暫存驗置;ί11㈣色描繪内 的輪=:二並_發送=所有 別符給Lir集單出元執i單^完成之任務識 响集合控制單一指派 (例如:執行單元420a)。 轨订早几 在一實施例中,緩衝器508可公士 1/: y广 區塊有16槽,而每一槽有128位 固::’其中各 器使用運算元以及索引以存取常數^^向置常數。著色 索引可以是包括η位元不具正,衝/槽。舉例來說, 元不具正負號之常數的暫時暫存/之整數或是接近%位 指令快取記憶體控· 5Q4是到.執行緒控制器浙的Client’s Docket No,: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 21 200809689 ‘Includes the function of managing each thread. ‘Line Thread. The Burgundian function 'for example determines how to implement the early side path 512 including the decoding system, which will be the human field m, which usually includes functions for performing different calculations, and the spoon contains functions such as f and integer arithmetic logic unit (arithmetlc loglc ALU) logic function And other logic circuits. Unit ΓΠ. The controller 52 moves the completed data to the execution unit 'for example, the execution unit collection control unit: _, reads 5 、 body, writes back unit 3 〇 512 512 transmits "task end $ w binding early 兀The Bayer Path knows that the task has been completed. The data rounds out the task of the Bellows (for example, 32 items (on the (1) package 3 storage for the storage of the output controller 52〇 from the storage cry selection ^ and hard number writes埠 暂 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资 资Execution unit 420a). In the first embodiment, the buffer 508 can be 1/: y wide block has 16 slots, and each slot has 128 bits:: 'where each device uses an operand and The index is to access the constant ^^ direction constant. The coloring index can be η bits without positive, rush/slot. For example, the temporary non-signal constant temporary temporary storage / integer or close to % bit instruction fast Take memory control · 5Q4 is to. Thread controller Zhejiang

Clienfs Docket No.; S3U06^0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 22 200809689 介面區塊。當執行緒控制 古己_體提敢$ A貝取請求存在時(例如從指令 ZL·、組釭取可執打著色器 曰7 較佳地萨由崔抑挪#主 曰^快取圮憶體控制器5〇4 ,:m二來:執行命中/未命中 取=::記= 所β求的#曰令將從L 2快取記憶體4 〇 8或是記憶體⑽Clienfs Docket No.; S3U06^0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 22 200809689 Interface Block. When the thread control has been _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Controller 5〇4, :m2: Execution hit/missing ==: remember = β is asked #曰令将从L 2 cache memory 4 〇8 or memory (10)

=時’則未命中發生。當命中發生時,如果沒有來自執行 單入402的請纟,則指令快取記憶體控制器$⑽即可 同思明求’这是因為指令快取記憶體控制器5()4的指令 取記憶體只有-個讀寫埠,而執行單元輸人撕具有^ 的優先權。否則’如果未命中發生時’當快取記憶體: 内有可取代的區塊以及有空間存在於暫停請求的執行單元 資料路徑先進先出緩衝器514巾,齡快取記憶體控制^ 5〇4可同意請求。在一實施例中,指令快取記憶體控制= 504 ^快取記憶體具有32組,其中每—組有4個區塊。: 區塊帶有2位元狀態信號以指示三種狀態,其分別是盔 效、載入、或是有效狀態。在區塊載入L2資料之前,區土鬼 為「無效」狀態;當等候L2資料時,區塊變為「載入」狀 悲,以及當L2資料載入後,區塊變為「有效」狀雜。 經由執行單元資料路徑512可對述詞暫存p檀案$ 16 進行讀寫。執行單元輸入402作為進入資料與執行單元 420a的介面。在一實施例中’執行單元輸入4〇2包含一個 8項目先進先出緩衝器以緩衝進入資料。執行單元輸入4〇2 亦可傳送資料至指令快取記億體控制器504的指令快取$= when 'there is a miss. When a hit occurs, if there is no request from the execution of the single entry 402, the instruction cache memory controller $(10) can be imagined. This is because the instruction cache memory controller 5() 4 is fetched. The memory has only one read/write 埠, and the execution unit loses the weight with ^. Otherwise 'if the miss occurs' when the cache memory: there are replaceable blocks and there is space in the execution unit of the pause request data path FIFO buffer 514 towel, age cache memory control ^ 5〇 4 can agree to the request. In one embodiment, the instruction cache memory control = 504 ^ cache memory has 32 groups, wherein each group has 4 blocks. : The block has a 2-bit status signal to indicate three states, which are helmet, load, or active. Before the L2 data is loaded in the block, the area ghost is "invalid"; when waiting for the L2 data, the block becomes "loaded", and when the L2 data is loaded, the block becomes "valid". Miscellaneous. The predicate temporary file $16 can be read and written via the execution unit data path 512. The execution unit input 402 acts as an interface to the entry data and execution unit 420a. In one embodiment, the execution unit input 4〇2 contains an 8-item FIFO buffer to buffer incoming data. Execution unit input 4〇2 can also transfer data to the instruction cache to capture the instruction cache of the billion-body controller 504.

Clienfs Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 23 200809689 1憶體以及常數緩衝器508。執行單元輸入402亦維持著色 器内容。 執行單元輸出404作為從執行單元420a送出資料至執 行單元集合控制單元206、L2快取記憶體408、以及寫回 單元308的介面。在一實施例中,執行單元輸出404包含Clienfs Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 23 200809689 1 Recall and constant buffer 508. Execution unit input 402 also maintains shader content. The execution unit output 404 serves as an interface for sending data from the execution unit 420a to the execution unit set control unit 206, the L2 cache memory 408, and the write back unit 308. In an embodiment, execution unit output 404 includes

I I 一個4項目先進先出缓衝器,用以接收仲裁之請求,並緩 衝執行單元集合控制單元206的資料。執行單元輸出404 包含多種功能,其包括仲裁指令快取記憶體讀取請求、資 ❿ 料輸出寫入請求以及執行單元資料路徑讀出/寫入請求的 功能。 共用暫存器檔案510用於儲存輸入、輸出、以及暫存 資料。在一實施例中,共用暫存器檔案510包括具有 128x128位元暫存器檔案之一讀一寫埠和一讀寫埠的八個 記憶庫(bank)。一讀一寫埠是由執行單元資料路徑512 所使用,以供由指令執行所初始的讀出以及寫入存取。記 憶庫0、2、4以及6係由偶數執行緒所共用,而記憶庫1、 • 3、5以及7係由奇數執行緒所共用。執行緒控制器506比 對不同執行緒的指令,並確認共用暫存器檔案的記憶體沒 有讀出或寫入記憶庫之衝突。 一讀寫埠是由執行單元輸入402以及資料輸出控制器 520所使用,用以載入初始執行緒輸入資料並將最後執行 緒輸出寫至執行單元集合控制單元資料緩衝器及L2快取 記憶體408或是其他模組。執行單元輸入402以及執行單 元輸出404共用一個讀寫輸入/輸出埠,以及在一實施例I I A 4-item FIFO buffer for receiving requests for arbitration and buffering the data of the execution unit set control unit 206. Execution unit output 404 includes a variety of functions including arbitration instruction cache memory read requests, resource output write requests, and execution unit data path read/write requests. The shared scratchpad file 510 is used to store input, output, and temporary data. In one embodiment, the shared scratchpad file 510 includes eight banks having one of 128x128 bit register files, one read and one write, and one read/write. The read-and-write write is used by the execution unit data path 512 for the initial read and write accesses performed by the instruction. Memory banks 0, 2, 4, and 6 are shared by even threads, while banks 1, 3, 5, and 7 are shared by odd threads. The thread controller 506 compares the instructions of the different threads and confirms that the memory of the shared scratchpad file does not have a read or write memory conflict. A read/write buffer is used by the execution unit input 402 and the data output controller 520 to load the initial thread input data and write the final thread output to the execution unit set control unit data buffer and the L2 cache memory. 408 or other modules. Execution unit input 402 and execution unit output 404 share a single read/write input/output port, and in an embodiment

Clienfs Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 24 200809689 ·, Γ中,寫入比讀出具有較高的優先權。512位元的輸入資料 進入四個不同的記憶庫以避免將資料載入至共用暫存器檔 案510時會發生衝突。傳送2位元通道索引、資料以及ϋ 位元對背基準位址(alignecj base address )以指定輪入資料 ,的開始記憶庫。舉例來說,當開畤通道索引為i時,二設 :·執行緒基準記憶庫偏移量(offset;)為〇,則從最低有效位 元(leStS1gmficantbit,LSB)起算的第一個128位元被载 入至記憶庫1,下一個128位元被載入至記憶庫2···等,以 及隶後一個128位元被載入至記憶庫q。值得注意的是, 使用執行緒ID的兩個最低有效位元來產生記情庫偏矛夕 量,以隨機排列每一個執行緒的開始記憶庫位置。 夕 可使用共用暫存器檔案暫存器索引以及執行緒ID以建 立唯一的邏輯位址,使標籤能比對共用暫存器檔案所 寫入以及讀出的資料。舉例來說,位址可以排成128位元, 即共用暫存器檔案記憶庫的寬度。藉由結合8位元之共用 參暫存器檔案暫存器索引以及5位元之執行緒ID,可以建立 13位元的位址以產生唯一的位址。每一個位元線具 有彳示戴,以及母'一位元線有兩個512位元項目(字元)。 各字元儲存於4個記憶庫中,以及將共用暫存器檔案索引 的兩個最低有效位元加入至目前執行緒的記憶庫偏移量以 建立記憶庫選擇。 標籤比對方法可讓不同執行緒的暫存器共同使用共用 暫存器檔案510以有效利用記憶體,因為執行單元集合控 制單元206紀錄共用暫存器檔案510的記憶體使用程度,Clienfs Docket No.: S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen 24 200809689 · In the middle, the write ratio has a higher priority. The 512-bit input data enters four different banks to avoid collisions when loading data into the shared scratchpad file 510. Transfer the 2-bit channel index, data, and align alignecj base address to specify the starting memory for the wheeled data. For example, when the open channel index is i, the second set: · The thread reference memory offset (offset;) is 〇, then the first 128 bits from the least significant bit (leStS1gmficantbit, LSB) The element is loaded into memory 1, the next 128 bits are loaded into the memory 2, etc., and the next 128 bits are loaded into the memory q. It is worth noting that the two least significant bits of the thread ID are used to generate a sensation library to randomly align the starting memory locations of each thread. The shared scratchpad file register index and the thread ID can be used to create a unique logical address that allows the tag to compare the data written to and read from the shared scratchpad file. For example, the address can be arranged in 128 bits, which is the width of the shared scratch file archive. By combining the 8-bit shared register file register index and the 5-bit thread ID, a 13-bit address can be created to generate a unique address. Each bit line has a 戴 display, and the mother's one line has two 512-bit items (characters). Each character is stored in four banks, and the two least significant bits of the shared scratchpad file index are added to the current library's memory offset to establish a bank selection. The tag comparison method allows the different scratchpads to use the shared scratchpad file 510 to effectively utilize the memory, because the execution unit set control unit 206 records the memory usage of the shared scratchpad file 510.

Client’s Docket No·: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 25 200809689 =保對執行單m的新任務進行足一 對照於目前執行緒之全部共用暫存器槽案暫存 小以檢查目標共用暫存器檔孝帝引。/ 勺大 朴本 茶家引在執行緒控制器50fi 者手進行執行緒以及著色器多行開始之前,輸人 予^期存放在制暫存器檔案:51Q内。#執行緒執行= 後’藉由資料輪出控制器520從共用暫存器構宰51〇 $ 輸出資料。 〃 -貝取 前述執行單元420之實施例包括内含解碼純2〇 實施例的執行單元資料路徑512,第5Β圖係顯示執 貧料路徑512之一實施例。執行單元資料路徑η]包含蘄 存器檔案526、多工器528、向量浮點單元532、向量二^ 計算邏輯單元534、特殊目的單元536、多工器538、$存 器檔案540,以及解碼系統200。解碼系統2〇〇包含一戋^ 個可變長度解碼(variable length decoding,VLD)單元 53〇 其可以解碼一或多個串流。例如,單一可變長度解碼單元 530可以解碼單一串流,兩個可變長度解碼單元$如(女严 線所顯示,因簡潔之故而未顯示其連接關係)可以同時二 碼兩個串流等等。為了說明,之後的敘述僅針對使用單二 可變長度解碼單元530之解碼系統200的操作,可以了解 的是其原則可推衍至超過一個可變長度解碼單元。 如圖所示,執行單元資料路徑512包含對應於可變長 度解碼單元530、向量浮點單元532、向量整數計算邏輯單 元534以及特殊目的單元536的一些平行資料路徑,其相Client's Docket No·: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 25 200809689=Ensure that the new task of executing the single m is compared with the current shared register slot of the current thread. Small to check the target shared register file Xiaodi cited. / Spoon big Park Ben Tea House cited in the thread controller 50fi hand to perform the thread and the multicolor line before the start, the input is stored in the system register file: 51Q. #Threading Execution = After 'The data is output from the shared register by the data rotation controller 520. The embodiment of the foregoing execution unit 420 includes an execution unit data path 512 containing a decoding pure embodiment, and the fifth embodiment shows an embodiment of the persistent material path 512. The execution unit data path η] includes a buffer file 526, a multiplexer 528, a vector floating point unit 532, a vector binary computing logic unit 534, a special purpose unit 536, a multiplexer 538, a memory file 540, and decoding. System 200. The decoding system 2 includes a variable length decoding (VLD) unit 53 which can decode one or more streams. For example, the single variable length decoding unit 530 can decode a single stream, and the two variable length decoding units $, as shown by the female line, do not display the connection relationship due to simplicity, can simultaneously have two codes, two streams, and the like. Wait. For purposes of illustration, the following description is only directed to the operation of decoding system 200 using a single two variable length decoding unit 530, it being understood that the principles can be derived to more than one variable length decoding unit. As shown, execution unit data path 512 includes some parallel data paths corresponding to variable length decoding unit 530, vector floating point unit 532, vector integer calculation logic unit 534, and special purpose unit 536,

Chenfs Docket No.: S3U06-0013-TWChenfs Docket No.: S3U06-0013-TW

Docket No:0608-A41246twf.doc/NikeyCben 26 200809689 ,令執行對應的操作。暫存器槽案526接收 %、、弟5A圖所顯示之共用暫存器檔案51〇、 =案516,和/或純量暫存器檔案518。值得注 二貫施例中:可使用額外的運算元。操作(功 /ϋ 7 542提供各單元530-536接收運算信號的媒介 me t1Um >。當W信號線544耦接至多工器528,傳送編 令之當前值以供各單元53(μ536完成小整數值的整 >運异。指令解碼器(未顯示)提供運算元、運算(功能) ^號二及田蚰仏唬。資料路徑(可包含寫回階段)末端的 多工器538選擇已被選擇之正確資料路徑的輸出結果並提 供輸出給暫存器檔案54〇。輸出暫存器標案54〇包括目標 兀件,其可以是相同於暫存器檔案526或是不同暫存器的 元件值得’主思的是在實施例中,當來源以及目標暫存器 包含相同元件時,指令提供之位元具有由多共器所使用之 來源與目標選擇以多路傳輸資料至/來自適當暫存器檔案。 因此,執行單元420a可視為多階管線(例如4階管綠 具有4個計算邏輯單元),並在4個執行階段中發生解碼 操作。需要實施延遲以允許執行解碼執行緒。舉例來★兒'' 當位元流緩衝器發生向下溢位(皿derfl〇w )、等候初如内 容記憶體、等候將位元流載入至先進先出緩衝器以及 暫存器(解釋於後),和/或處理時間已超過時間之g足定定 限(threshold)時,可以在執行階段加入延遲。 如前所述,在部分實施例中,解碼系統200能使用$Docket No: 0608-A41246twf.doc/NikeyCben 26 200809689, so that the corresponding operation is performed. The scratchpad slot 526 receives the shared scratchpad file 51〇, = the file 516, and/or the scalar register file 518 shown in Figure 5A. Worth note In the second example: additional operands can be used. Operation (work/ϋ 7 542 provides a medium me t1Um > for each unit 530-536 to receive an operation signal. When the W signal line 544 is coupled to the multiplexer 528, the current value of the code is transmitted for each unit 53 (μ536 is small) The whole value of the integer is different. The instruction decoder (not shown) provides the operand, the operation (function), the number II and the field. The multiplexer 538 at the end of the data path (which can include the write back phase) has selected The output of the selected data path is selected and provided to the register file 54. The output register file 54 includes the target element, which may be the same as the register file 526 or a different register. The component is worthy of the main idea. In the embodiment, when the source and the target register contain the same component, the bit provided by the instruction has the source and target selection used by the multi-communicator to multiplex the data to/from the appropriate Thus, the execution unit 420a can be considered as a multi-stage pipeline (eg, 4th-order pipeline green has 4 computational logic units) and a decoding operation occurs in 4 execution phases. A delay needs to be implemented to allow execution of the decoding thread. For example, when the bit stream buffer has a downward overflow (dump derfl〇w), wait for the initial content memory, wait for the bit stream to be loaded into the FIFO buffer, and the scratchpad (interpretation) Thereafter, and/or when the processing time has exceeded the threshold of time, a delay may be added during the execution phase. As previously mentioned, in some embodiments, the decoding system 200 can use $

Client’s Docket No·: S3U06-0013-TW TT^s Docket No:0608-A41246twf,doc/NikeyChen 200809689 *> 。一執行單元420a同時解碼兩個位元流。舉例來說,根據一 個擴充指令集,解碼系統可以使用兩個資料路徑(例如新 增另一可變長度解碼單元530)以同時進行兩個串流的解 碼,然而可一次解碼較多或較少的串流(因此會使用較多 或較少的資料路徑)。當需要多個串流時,解碼系統200 I Ϊ 的部分實施例並未限定於同時解碼。再者,在部分實施例 中,單一可變長度解碼單元530可以執行串流之多個同時 發生的解碼。 • 在實施例中,當解碼系統200使用兩個資料路徑時, 兩個執行緒可以同時運行。例如,在兩串流解碼之實施例 中,執行緒的數量限制為兩個,其中指派第一執行緒(例 如執行緒〇)給解碼系統200的第一記憶庫(即可變長度 解碼單元530),而指派第二執行緒(例如執行緒1)給解 碼系統200的第二記憶庫(例如第5B圖虛線所顯示之可變 長度解碼單元)。在部分實施例中,兩個或多個執行緒可 運作在單一記憶庫。在部分實施例中,雖然顯示解碼系統 — 200是内嵌於執行單元資料路徑512内,其亦可包含其他 的元件,例如執行單元集合控制單元206内的邏輯電路。 在下面的描述中,可變長度解碼單元530以及解碼系統200 可交換使用,而可以了解到解碼系統200可包括一或多個 可變長度解碼單元530。 將描述位於解碼系統200下的結構,而各單獨解碼系 統模式描述如下。特別地,在一實施例中,由驅動軟體128 所提出之下列指令可設定不同模式。進一步描述如下:指Client’s Docket No·: S3U06-0013-TW TT^s Docket No: 0608-A41246twf, doc/NikeyChen 200809689 *> An execution unit 420a simultaneously decodes two bit streams. For example, according to an extended instruction set, the decoding system can use two data paths (eg, add another variable length decoding unit 530) to simultaneously decode two streams, but can decode more or less at a time. Streaming (so more or fewer data paths are used). When multiple streams are required, some embodiments of the decoding system 200 I 并未 are not limited to simultaneous decoding. Moreover, in some embodiments, single variable length decoding unit 530 can perform multiple simultaneous decoding of the stream. • In an embodiment, when the decoding system 200 uses two data paths, the two threads can run simultaneously. For example, in an embodiment of two stream decoding, the number of threads is limited to two, wherein a first thread (eg, thread) is assigned to the first bank of decoding system 200 (ie, variable length decoding unit 530). And assign a second thread (e.g., thread 1) to the second memory of decoding system 200 (e.g., the variable length decoding unit shown by the dashed line in Figure 5B). In some embodiments, two or more threads can operate in a single memory bank. In some embodiments, although the display decoding system - 200 is embedded within the execution unit data path 512, it may also include other components, such as logic circuitry within the execution unit set control unit 206. In the following description, variable length decoding unit 530 and decoding system 200 are used interchangeably, and it is understood that decoding system 200 can include one or more variable length decoding units 530. The structure located under the decoding system 200 will be described, and the individual decoding system modes are described below. In particular, in one embodiment, the following instructions presented by driver software 128 may set different modes. Further described as follows:

Client’s Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 28 200809689 ,令1犯1-(:丁乂(設置解碼系綵200為CABAC處理模式)、 指令INIT—CAVLC (設置解碼系統200為cAVLC處理模 式)、指令INIT—MPEG2 (設置解碼系統2〇〇為MPEG-2 處理模式),以及指令INIT一VC1 (設置解碼系統200為 VC-1/WMV9奉理模式)。在部分實施例中,經由指令 ! INIT—AVS可^供頭外的初始化’其可初始化音頻視頻標準 (audio video standard,AVS )位元流編碼。對 EXP-Golomb 系統而言,在CABAC以及CAVLC編碼下使用 ® EXP-Golomb編碼符號’因此指令iNIT CTX以及指令 INIT—CAVLC下載£乂?-〇〇1〇11113系統的位元流。其中,不 需要對EXP-Golomb系統進行初始。舉例來說,對要被編 碼的符號而言,在位元流(例如在片段標頭位準的位元設 定)所接收之計算編碼旗標會顯示符號為Exp_G〇1〇mb編 碼、CABAC編石馬以及CAVLC編碼。當使用找?-〇〇1〇11^ 編碼時’執行下列所提出之適當的EXP-Golomb編碼指令。 _ 雖然這些模式會影響編碼引擎的實施,其亦會影響初始、 使用以及更新§己憶體的方法’進一步描述於後。 麥考第5C圖,第5C圖係顯示可變長度解碼單元53〇 之功能方塊圖,用以根據所選擇之模式完成任何複數解碼 操作之一。可變長度解碼單元530包括可變長度解碼邏輯 電路550,其中可變長度解碼邏輯電路55〇耦接於由srEG 串流緩衝器/DMA引擎562 (於此亦稱為DMa引擎模組) 所組成之位元流缓衝器管理以及鄰近内容記憶體 (neighborhood context memory,NCM) 564 (亦稱為内容Client's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 28 200809689, let 1 commit 1 - (: Ding Wei (set decoding system color 200 for CABAC processing mode), command INIT-CAVLC ( The decoding system 200 is set to the cAVLC processing mode), the instruction INIT-MPEG2 (the decoding system 2 is set to the MPEG-2 processing mode), and the instruction INIT-VC1 (the decoding system 200 is set to the VC-1/WMV9 processing mode). In some embodiments, via instruction! INIT-AVS can be used for initialization outside the header 'which can initialize the audio video standard (AVS) bitstream encoding. For the EXP-Golomb system, in CABAC and CAVLC The code uses the ® EXP-Golomb coded symbol 'so the instruction iNIT CTX and the instruction INIT-CAVLC to download the bit stream of the system. In this case, there is no need to initialize the EXP-Golomb system. For example For the symbol to be encoded, the computed coding flag received in the bit stream (eg, the bit set at the slice header level) will display the symbol Exp_G〇1〇mb, CABAC, and CABAC. CAVLC coding. When using the ??〇〇1〇11^ encoding, perform the appropriate EXP-Golomb encoding instructions as described below. _ Although these modes affect the implementation of the encoding engine, they also affect the initial, use, and update § recalls. The method of the volume is further described in the following. The McCaw 5C diagram, the 5C diagram shows a functional block diagram of the variable length decoding unit 53 for performing any of the complex decoding operations in accordance with the selected mode. The decoding unit 530 includes a variable length decoding logic circuit 550, wherein the variable length decoding logic circuit 55 is coupled to a bit composed of a srEG stream buffer/DMA engine 562 (also referred to herein as a DMa engine module). Stream buffer management and proximity context memory (NCM) 564 (also known as content)

Client’s Docket No·: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 29 200809689 記憶體)。可變長度解碼單 平7^ 530亦包括一或多個暫在哭 566,其包括用以儲存來自 :¾夕仙θ存為 水目執仃早兀420 (「CONTROL」, 例如使用來自執行單元之解 庚萨<解碼益的控制信號以選擇可變長 度角午碼邏輯笔路5 5 0的桓細、士目 二欠 ]耦、、且)有關給疋模式之選擇的解碼 , 迓才兀C例如「SRC1」以及「SRC2」), 以及轉發暫存器(例如「p 1 ^ M」以及「F2」)。SREG串流 緩衝器/DMA引擎562包括仙…撕六.. , 匕栝SREG暫存态562a以及位元流Client’s Docket No·: S3U06-0013-TW TT^s Docket No: 0608-A41246twf.doc/NikeyChen 29 200809689 Memory). The variable length decoding unit 7 530 also includes one or more temporary crying 566, which is included for storing from: 3⁄4 夕 θ θ 水 水 ( ( ( ("CONTROL", for example using from the execution unit The solution of the Gemsa <Decoding Benefits is to select the variable length angle of the noon code logic pen 5 5 0 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、兀C such as "SRC1" and "SRC2"), and forwarding registers (for example, "p 1 ^ M" and "F2"). SREG Stream Buffer/DMA Engine 562 includes stencil...Tear Six.., 匕栝SREG Temporary State 562a and Bitstream

緩衝裔562b,將進一步解釋於後。 在一貝轭例中,可變長度解碼邏輯電路550包括第5C ®所顯示之模組(亦稱為邏輯電路)。可變長度解碼邏輯 電路55〇包括硬體,其包括暫存器和/或布林或是計算邏輯 電路’用以執行指令並根據所選擇之模式執行解碼。進一 步解釋,可變長度解碼邏輯電路550包括讀取鄰近内容記 憶體模組(read—NCM ) 568、檢查字串(INpSTR)模組 570、讀取模組572、計算前導〗(CL〇)模組574、計算前 導 0( CLZ )模組 576、MPEG 模組 578、CABAC 模組 58〇、 CAVLC模組582,以及耦接於計算前導〇(CLZ)模組576 之Exp-Golomb模組584。計算前導0 (CLZ)模組576以及 计异月il導l(CLO)模組574包括可解碼MpEG_2以及ye」 位元流之指令。關於Exp_GGk)mb模組挪,Exp_G〇i〇mb 符號由跟在1之後的-些前導零所鳊碼,接著—些位元會 等於零的數量。計算前導〇(CLZ)模組576仙前導零二 數量’接著移動這些位元加i 1以記錄前導零的數量。 Exp-Golomb模組584讀取尾隨位元(的出哗沾)的數量,The buffered 562b will be further explained later. In the case of a yoke, the variable length decoding logic 550 includes the module (also referred to as a logic circuit) displayed by the 5C. Variable length decoding logic 55 includes hardware including a register and/or a Boolean or computational logic ' to execute instructions and perform decoding in accordance with the selected mode. Further explained, the variable length decoding logic circuit 550 includes a read adjacent content memory module (read-NCM) 568, an inspection string (INpSTR) module 570, a read module 572, and a calculation preamble (CL〇) mode. The group 574 calculates a leading zero (CLZ) module 576, an MPEG module 578, a CABAC module 58A, a CAVLC module 582, and an Exp-Golomb module 584 coupled to the computing front lead (CLZ) module 576. The Computational Lead 0 (CLZ) module 576 and the CHO module 574 include instructions for decoding the MpEG_2 and ye" bitstreams. Regarding the Exp_GGk) mb module, the Exp_G〇i〇mb symbol is weighted by some leading zeros following 1 and then the number of bits will be equal to zero. Calculate the leading 〇 (CLZ) module 576 sen leading zero nd quantity 'and then move these bits plus i 1 to record the number of leading zeros. The Exp-Golomb module 584 reads the number of trailing bits.

Client’s Docket No·: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 30 200809689 /並根據Exp-Golomb模式而執行計算以判斷值。 讀取鄰近内容記憶體模組568包括對應於產生位.址以 及請求記㈣讀取操作的邏輯電路。在記憶體讀取操作 中,從鄰近内容記憶體564讀取固定的位元數並輸出資料 ^目等暫存11 °鄰近内容記憶體指令為#内容記憶體564 讀取32位兀的資料並經由多工器685傳回所讀取的值給執 打單兀420a的目標暫存器。CABAC以及CAVLc編碼沒 有使用到鄰近内容記憶體指令,然而對其他可變長度解碼 運算而言(例如:VC-l、MPEG-4 ASP (DivX)),可使 用内容記憶體564以維持可變長度解碼表,以及可使用讀 取鄰近内容記憶體模組以讀取可變長度解碼表内的值。 讀取模組572包含邏輯電路以讀取SREG暫存器 562a,且從SREG暫存器、562a之最高有效位元(勘以 significant bit,MSB)部分擷取特定位元數,零延伸&⑽ extend),並將值放入暫存器内。因此,讀取模組572包 馨含邏輯電路以執行讀取操作,其讀取特定位元數並從sreg 暫存為562a移除以傳回不具正負號數值的值給目標暫存 器。檢查字串模組570從SREG暫存器56以讀取固定位元 數,但沒有從SREG暫存器562a移除任何位元(例如不改 、欠才曰“位置),並傳回不具正負號數值的值給目標暫存器。 各模組568-584皆耗接至多工器586,其中多工器586 根據各自的命令而遥擇一模式。在~實施例中,多工器586 的輸出提供至目標暫存器以進一步處理。模組569-582的 輸出亦提供至多工器586,其對應於一命令,選擇模組Client's Docket No:: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 30 200809689 / and perform calculation according to the Exp-Golomb mode to judge the value. The read adjacent content memory module 568 includes logic circuitry corresponding to the generation of the bit address and the request record (four) read operation. In the memory reading operation, the fixed number of bits is read from the adjacent content memory 564 and the data is temporarily stored. 11 ° The adjacent content memory command is #Content memory 564 to read the 32-bit data and The read value is returned via multiplexer 685 to the target register of the hit unit 420a. CABAC and CAVLc encoding do not use adjacent content memory instructions, whereas for other variable length decoding operations (eg VC-1, MPEG-4 ASP (DivX)), content memory 564 can be used to maintain variable length The table is decoded and the neighboring content memory module can be read to read the values in the variable length decoding table. The read module 572 includes logic to read the SREG register 562a, and retrieves a specific number of bits from the SREG register, the most significant bit of the 562a (significant bit, MSB), zero extension & (10) extend), and put the value into the scratchpad. Thus, the read module 572 includes logic to perform a read operation that reads a particular number of bits and removes from sreg staging to 562a to return a value that has no sign value to the target scratchpad. The check string module 570 reads the fixed number of bits from the SREG register 56, but does not remove any bits from the SREG register 562a (eg, does not change, owes "position"), and returns no positive or negative The value of the value is given to the target register. Each module 568-584 is connected to the multiplexer 586, wherein the multiplexer 586 selects a mode according to the respective command. In the embodiment, the multiplexer 586 The output is provided to the target register for further processing. The output of modules 569-582 is also provided to multiplexer 586, which corresponds to a command, select module

Client’s Docket No.: S3U06-0013-TW TT^s Docket No:O608-A41246twf.doc/NikeyChen 31 200809689 569-582的輸出並提供至SREG暫存器562a以作為輸入。 在個別相同的運算期間,提供來自轉發、控制以及運算暫 存器566的資料給CABAC模組580以及CAVLC模組582 使用。經由接收控制信號(標示為第5C圖的 pep—GOLOMB—OP )以致能 Εχρ-Golomb 模組 584。Client's Docket No.: S3U06-0013-TW TT^s Docket No: O608-A41246twf.doc/NikeyChen 31 200809689 The output of 569-582 is provided to the SREG register 562a as an input. The data from the forwarding, control, and operation registers 566 are provided to the CABAC module 580 and the CAVLC module 582 during individual identical operations. The Εχρ-Golomb module 584 is enabled via a receive control signal (labeled as pep-GOLOMB-OP of Figure 5C).

Exp-Golomb模組584接收來自計算前導〇(CLZ)模組576 的輸入並提供輸出至多工器586。CABAC模組580以及 CAVLC模組582可使用内容記憶體564。The Exp-Golomb module 584 receives input from a compute leading 〇 (CLZ) module 576 and provides an output to the multiplexer 586. The content memory 564 can be used by the CABAC module 580 and the CAVLC module 582.

對除了 CABAC以及CAVLC模式之外的全部模式而 吕’碩取指令為從SREG暫存器562a讀取η位元,並經由 多工态586傳回所讀取的值至執行單元42〇a的目標暫存 將描述可變長度解碼單元 度解碼單元530配置在不同模 後0 器。對除了 CABAC以及CAVLC模式的模式而言,使甩 内容記憶體564以維持上方以及左方的内容值,其為自動 讀取以作為解碼程序的部分。這些元件以及可變長度解 單元530的其他元件將結合不同模式而進一步描述ς後^ 值的注意的是在部分實施例中,可變長度解碼邏輯電路 可包括少於(或多於)全部所顯示之模組和/或多工器。 530的一般功能,而可變長 式下的操作將進一步描述於 CABAC解碼 下面簡單解釋CABAC解瑪,然後說明解瑪*** 的-些實施例。通常,H.264標準的CABAC解碼程序可以For all modes except the CABAC and CAVLC modes, the L's fetch instruction reads the n-bit from the SREG register 562a and returns the read value to the execution unit 42A via the multi-mode 586. The target temporary storage will describe that the variable length decoding unit decoding unit 530 is configured in a different mode. For modes other than CABAC and CAVLC modes, the content memory 564 is maintained to maintain the upper and left content values, which are automatically read as part of the decoding process. These elements, as well as other elements of the variable length solution unit 530, which will be further described in connection with different modes, note that in some embodiments, the variable length decoding logic may include less than (or more than) all of Display modules and / or multiplexers. The general function of 530, while the operation under variable length will be further described in CABAC decoding. The CABAC solution is briefly explained below, followed by some embodiments of the solution. Usually, the H.264 standard CABAC decoding program can

Client’s Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChej 200809689 Γ况明為包括解析第一語法成分之已編碼位元流、初始化一 >1段之内合艾數以及第_語法成分之解碼引擎,以及二進 位化(binanzation)。接著,對每一個二進位值進 4亍角午碼#私序包括獲得内容模組以及各語法成分之二進 位值的解碼,直到獲得有意義的字碼(⑶心卿⑷比對。 ^進f步解釋’解碼系統200對語法成分進行解碼,其中 吴^ 他有關巨集區塊(macroblock)的參數,用以 表示影像或是視頻的特定岡 各一個;五m j 琢(fleld)或是圖框㈤me)。 母-仏口法成:可以包含連續的一或多個二進位符 碼系統根據輸入二G或1值。解 長度。 進位付號的發生機率控制輪出位元 當某些符號(稱為主要符缺 CABAC編碼器可提供高效率έ:;他符號更可能發生’ 較小位元職比例來進行:碼 的頻率統計,並適當地調整編碼^的^更新進入資料 型。具有較高可能性的二進位符异以及内容模 probable symbol,MPS ),而甘 付就(mostClient's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChej 200809689 Γ 为 包括 解析 解析 解析 解析 解析 解析 Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni 解析 解析 解析 解析 解析 解析 解析 解析 解析 解析 解析 解析 解析 解析 解析And the decoding engine of the _gram component, and binanzation. Then, for each binary value, enter the 4th corner code #private sequence to obtain the content module and the decoding of the binary values of each syntax component until a meaningful word is obtained ((3) Xinqing (4) comparison. ^Into step f Explain that 'the decoding system 200 decodes the syntax components, where Wu is the parameter of the macroblock, which is used to represent the specific image of the video or video; five mj fl (fleld) or frame (f) me ). The mother-mouth method: can contain consecutive one or more binary code systems based on the input two G or one value. Solution length. The probability of occurrence of the carry-in sign controls the round-out bit when some symbols (called the main character lacking CABAC encoder can provide high efficiency ;:; his symbol is more likely to occur 'smaller bit ratio to carry out: code frequency statistics And appropriately adjust the encoding ^ ^ update into the data type. There is a higher probability of the binary and the content mode probable symbol, MPS), and pay (most

Ueast _able symbol,LPS )二符'虎則為低可能性符號 結合,具有對應於低可能性符梦^位符號與其内容模型 號值的各内容模型。 相可能性以及高可能性符 為了對各二進位符號進行解石馬,解碼 是接收—對應範圍、偏移量从Ueast _able symbol, LPS) The two-character 'Tiger' is a low probability symbol combination with each content model corresponding to the low likelihood symbol and its content model number. Phase probability and high probability symbol In order to solve the tiling of each binary symbol, decoding is the reception - the corresponding range, the offset from

Client’s Docket No.: S3U06-0013-TW XT's Docket No:0608-A41246twf.doc/NikeyChen 33 200809689 ^號種類以及由鄰近空間(例如目前巨集區塊或 月•卜人^碼的相鄰巨集區塊)戶斤決定的内容而從複數個可妒 的内谷松型中所選擇。可由内容模型決定内容辨識符號, 從而並使用以得到高可能性符號值以及用於解碼程序^ 碼引擎的目前狀態。範圍_示—個區間(interval),每^ 過一次二進位解碼就會縮小_次範圍。 、ιClient's Docket No.: S3U06-0013-TW XT's Docket No:0608-A41246twf.doc/NikeyChen 33 200809689 ^Types and adjacent macros from adjacent spaces (such as the current macro block or month • Bu people ^ code) Block) The content determined by the household is selected from a plurality of sturdy inner valleys. The content identification symbols can be determined by the content model and used to obtain high likelihood symbol values and to decode the current state of the program engine. The range _ shows an interval, and the binary decoding is reduced by _ times. , ι

區間分為兩個子範圍,分別對應於高可能性符號值以 及低可3b丨生符號值。藉由將範圍以及已知内容模型所指定 的低可讀符號可能性相乘則可計算出低可能性符號子範 圍。藉由將範圍減去低可能性符號子範圍可計算出高可能 子範圍。偏移量是決定解碼二進位值的標準,且通 常是從,碼位元流中取出前9位元進行初始化。對於已知 一,位付就解碼及内容模型,當偏移量小於高可能性符號 子範圍日守’二進位值為高可能性符號值,而下一次解碼所 使用的範圍會設為高可能性符號子範圍。反之,二進位值 由低可能性符號決定、高可能性符號值的反向值會包含在 相關的内容模型中,以及下一個範圍會設為低可能性符號 子圍。解碼程序的結果為連續的已解碼二進位值,其被 評估以判斷此序列是否符合有意義的字碼。 概括敘述解碼系統的操作與CABAC解碼的關 係,下列敘述提出在CABAC解碼扣A 的各種元件,可將符合實際應==之内容中解碼系統 熟悉此技藝之人士可知下列所::各種變動列入考慮。 尤用的許多術語是出自 H.264規格,為了簡潔不再贅述, 夕 Κ目 U非是有助於了解所述The interval is divided into two sub-ranges, corresponding to high probability symbol values and low 3b twin symbol values, respectively. The low probability symbol sub-range can be calculated by multiplying the range and the low readable symbol likelihood specified by the known content model. The high possible subrange can be calculated by subtracting the low likelihood symbol subrange from the range. The offset is the criterion for determining the decoded binary value, and is usually initialized by taking the first 9 bits from the code bit stream. For the known one, the bit is decoded and the content model, when the offset is less than the high probability symbol sub-range, the 'binary value' is the high probability symbol value, and the range used for the next decoding will be set to high. Sex symbol subrange. Conversely, the binary value is determined by the low probability symbol, the inverse of the high probability symbol value is included in the associated content model, and the next range is set to the low probability symbol sub-square. The result of the decoding process is a continuous decoded binary value that is evaluated to determine if the sequence conforms to a meaningful word. To summarize the relationship between the operation of the decoding system and CABAC decoding, the following description proposes the various components of the CABAC decoding button A. Those who are familiar with the decoding system in the content that meets the actual content should be aware of the following: consider. Many of the terms used in particular are from the H.264 specification, which will not be described for brevity.

Clienfs Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 34 200809689 的不⑽序和/或元件,才會再做進-步之說明。 呈卿i二m弟/f圖係顯示解碼系統及相關元件之 一 不,解碼系統雇具有單 CABAC單元530可斑解圖至第6F圖,所使用之 中,解碼系統可·解碼單M 2QG互換),因此在實施例 „ g ^ , ·馬早一位凡流。同樣的原理可應·用 =頭外可變長度解碼單元的解碼系統2〇〇,可同時解 碼:個(例如兩個)串流。簡單地說,第6 =、_〇之選擇元件的方塊圖,m6b__ 6a 圖戶箱示之選擇元件加上其他元件的功能方塊圖。第π 圖以及弟6E圖係顯示解石馬系統2〇〇之内容記憶體功能的方 塊圖’以及弟6D圖係顯示使用於解瑪巨集區塊之示範機 制的方額。雜下騎述是有集輯解碼的内容, 但是本發日摘提出之原理可制到各種區塊解碼。 。參考第6A圖,可變長度解碼單元5遍包括cabac 邏輯模組580以及記憶體模、组65〇。在一實施例巾, 邏輯模組58〇包含三個模組,其分別是二進位化(丽D) 模組620、取得内容(GCTX)模組622、以及二進位計算 解碼(BARD)引擎624。二進位計算解碼引擎624更包含 狀態索引(pStateldx )暫存器、高可能性符號值(valMps ) 暫存器604、碼長範圍(codlRange)暫存器6〇6,以及碼 長偏移量暫存器(c〇dlOffset) 608。可變長度解碼單元53〇a 更包括記憶體模組650,其包括内容記憶體564 (亦稱為巨 集區塊鄰近内容(mbNeighCtx)記憶體或是内容記憶體陣.Clienfs Docket No.: S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen 34 200809689 No (10) sequence and / or components, will be further described. The image of the decoding system and related components is not shown in the figure. The decoding system employs a single CABAC unit 530 to map the map to the 6F. In use, the decoding system can decode the single M 2QG. Interchange), so in the embodiment „ g ^ , · 马早一凡流. The same principle can be used with = head-of-head variable length decoding unit decoding system 2〇〇, can simultaneously decode: one (for example, two Streaming. Simply put, the block diagram of the selected component of the 6th, _〇, m6b__ 6a diagram of the selected component plus the functional block diagram of the other components. The πth figure and the 6E figure show the solution stone The block diagram of the content memory function of the horse system and the 6D map show the squares used in the demonstration mechanism of the solution matrix. The hybrid horseback is the content of the set decoding, but the hair The principle proposed by the Japanese abstract can be used to decode various blocks. Referring to FIG. 6A, the variable length decoding unit includes the cabac logic module 580 and the memory module, group 65. 5 in one embodiment, the logic module 58〇 includes three modules, which are respectively a binary (Li D) module 620, A content acquisition (GCTX) module 622 and a binary computation decoding (BARD) engine 624. The binary calculation decoding engine 624 further includes a state index (pStateldx) register, a high probability symbol value (valMps) register 604, a code length range (codlRange) register 6〇6, and a code length offset register (c〇dlOffset) 608. The variable length decoding unit 53A further includes a memory module 650 including a content memory 564 (also known as macroblock neighboring content (mbNeighCtx) memory or content memory array.

Client’s Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 35 200809689 ,例)、局部暫存器612、總體暫存器614,以及sreg串流 緩衝器/DMA y擎562 (亦稱為DMA引擎模組,將於第π 圖中做進-步說明),另外還有未顯示之暫存器。在一實 施例中、’内容記憶體564包含如第6C圖之陣列結構,之後 、·:. :月 。記憶體模組650亦包括二進位字串 (binstdiig)暫存器 616。 ; 可變長度解碼單元530a與執行單元42〇a的介面包括 目標(DST)匯流排628、兩個來源匯流排孤1 632以及 SRC2 630、共用以及執行緒資訊匯流排634,以及延遲/重 置匯流排636。目標匯流排628上的資料可以直接或間接 (例如經由中間快取記憶體、暫存器、緩衝器、或記憶體) 傳送至圖形處理單元114内部或外部的視頻處理單元。目 標匯流排628上的資料可以是複數不同格式之―,包括微 軟的DX API格式或是其他格式。這些資料可包含係數、 巨集區塊參數、動作資訊’和/或IPCM取樣或是其他資料。 •可變長度解碼單元偷亦包括具有位址匯流排㈣和資料 匯流排640的記憶體介面。藉由從位址匯流排幻8得到位 址,記憶體介面可存取位元流資料以供存取:#料匯流排㈣ 所接收的資料。在一實施例中’資料匯流排64〇上的資料 可以包括未編碼視頻串流,其包括各種信號參數以及=他 資料與格式。於部分實施例中’可以使用载人_儲存摔; 存取位元流資料。 在開始說明可變長度解碼單元530a的不同元件之前, 簡單說明有關CABAC解碼之執行單元42〇a的整體操作。Client's Docket No.: S3U06-0013-TW TT^s Docket No: 0608-A41246twf.doc/NikeyChen 35 200809689, Example), Local Register 612, Overall Register 614, and sreg Stream Buffer/DMA y擎 562 (also known as the DMA engine module, will be described in the π map), in addition to the scratchpad not shown. In one embodiment, 'content memory 564 includes an array structure as shown in Fig. 6C, followed by::: month. The memory module 650 also includes a binstdiig register 616. The interface of the variable length decoding unit 530a and the execution unit 42A includes a target (DST) bus 628, two source busses 1 632 and SRC2 630, a shared and thread information bus 634, and a delay/reset Bus 636. The data on the target bus 628 can be transferred to the video processing unit internal or external to the graphics processing unit 114, either directly or indirectly (e.g., via intermediate cache, scratchpad, buffer, or memory). The data on the target bus 628 can be in a variety of different formats, including the Microsoft DX API format or other formats. These data may include coefficients, macro block parameters, motion information' and/or IPCM samples or other data. • The variable length decoding unit stealing also includes a memory interface having an address bus (4) and a data bus 640. By obtaining the address from the address bus 8 , the memory interface can access the bit stream data for access: #料汇排排(4) The received data. The data on the 'data bus 64' in one embodiment may include an unencoded video stream that includes various signal parameters as well as = his data and format. In some embodiments, a manned_storage can be used; the bitstream data is accessed. Before starting to explain the different elements of the variable length decoding unit 530a, the overall operation of the execution unit 42A with respect to CABAC decoding will be briefly explained.

Clients Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChe] 36 200809689Clients Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChe] 36 200809689

、'根據片1又“llce)的種類’驅動軟體128 (第1圖) > iE載人CABAC著色器至執行單元42Qa<>CABAC 器使用標準指^,再加上二進位化指令、取得内容指令 以^ =進位計异解碼指令以解碼位元流。因為可變長度解 使用的内容表可根據片段種類改變,其中每一 片/又,心。在—實施例中’在發出其他指令:前,CABAC 著色所執行的第一個指令包含腿丁―ctx指令和 一ADE扣7。這兩個指令使cABAc單元53〇開始解碼 CA^AC位το 亚從自動安排串流解碼的指標載入位元流 至先進先出緩_,稍後將說明這兩織令。 關於解析位元流’從記憶體介面的資料匯流排640接 收位元抓然:後由SREG串流緩衝器/DMA引擎562進行 緩衝。從丨段㈣解析階段提供位元流解碼。亦即,位元 抓(1如.NAL位兀流)包括一或多張圖片,其將切割成 圖片4田頭(header)以及許多片段。片段通常與連續的巨 集區塊有關。在-實施财,外部料(即可變長度解碼 單元遍外部)解析觀位元流、解碼片段槽頭並傳送 指向该片段貧料(例如片段開始處)位置的指標。硬體(加 上軟體)可以從圖形來解析H264位元流。不過,在—實 施例中,CABAC編碼僅出現於諸㈣娃集區塊階段。 通常,驅動軟體128從片段資料階段處理位元流,因為這 疋應用程式以及AP所ϊ提供的功能。指向片段資料位置的 指標還包含片段資料的第一位元組(例如: RBSPbyeAdd觀)以及指出是位元流開始或.標頭位置(例'According to the type of slice 1 and 'llce' drive software 128 (Fig. 1) > iE manned CABAC shader to execution unit 42Qa<>CABAC uses standard finger ^, plus binary instruction, The content instruction is fetched by the ^=received decoding instruction to decode the bit stream. Because the content table used by the variable length solution can be changed according to the type of the segment, each piece/again, heart. In the embodiment, 'issue other instructions Before, the first instruction executed by CABAC shading contains the leg-ctx instruction and an ADE button 7. These two instructions cause the cABAc unit 53 to start decoding the CA^AC bit το from the indicator of automatic scheduling of stream decoding. The bit stream flows to the advanced first-running _, which will be explained later. The parsing bit stream 'receives the bit stream from the memory interface 640 of the memory interface: the SREG stream buffer/DMA The engine 562 buffers. The bit stream decoding is provided from the segment (4) parsing stage. That is, the bit grab (1. .NAL bit stream) includes one or more pictures, which will be cut into pictures 4 headers and Many fragments. Fragments are usually associated with successive macroblocks. - Implementing the asset, the external material (that is, the variable length decoding unit is externally external) parses the bit stream, decodes the segment slot header, and transmits an indicator pointing to the location of the fragment (such as the beginning of the segment). Hardware (plus software) The H264 bitstream can be parsed from the graph. However, in the embodiment, the CABAC encoding only occurs in the (four) mat set block stage. Typically, the driver software 128 processes the bitstream from the fragment data stage because this application And the functions provided by the AP. The indicator pointing to the location of the fragment data also contains the first byte of the fragment data (for example: RBSPbyeAdd view) and indicates whether the bit stream starts or the header position (eg

Client's Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChe] 200809689 ..如.sREGptr)的位元偏移量指標(例如一或多個位元)。 位兀流的初始化將於稍後解釋'。在某些實施例中,可以利 用主機處理器(例如第1圖之中央處理單元126)處理外 4私序以提供圖片段解碼以及片段標頭解碼。在部分實 施例中I,由於解碼系統200的編程特性,可以在任何階段 中進行解碼。 ! /考第5c圖以及第6A圖,SREG串流緩衝器/DMA 弓丨f 562用以分別接收匯流排632以及匯流排㈣的匯流 排SRC1值以及匯流排SRC2值,以及對應於轉發暫存器以 及L暫存裔的貧料qSREg串流緩衝器①碰引擎沿包 3内雜元流緩衝$ 562b,在—實施例中可為啡此⑴⑽ 格式之32位元暫存器以及8個128位元(8χΐ28)暫存器。 L由驅動軟體發出如前述之初始化指令可初射& ^ sreg 串…l、、爰衝态/DMA引擎562。一旦初始化,便自動管理sreg 串流緩衝器/DMA引擎562的内部緩衝器。使用sreg _ =流緩衝器/DMA引擎562以保留解析位元的位置。在一 貫施例中’ SREG串流緩衝器/DMA引擎562使用兩個暫存 态,一快速32位元正反器與一較慢512或1〇24位元記憶 體。位元流會使用位元。SREG暫存器562a以位元進行操 作’而位元流緩衝器562b以位元組進行操作,其可以節省 電源。通常,指令操作在SREG暫存器562a中,並使用少 許位元(例如1-3位元)。當SREG暫存器562a使用超過 一位元組的資料時,資料(以位元組片段)將從位元流緩 衝器562b傳送給SREG暫存器562心然後緩衝器指標會減Client's Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChe] 200809689 .. bit offset indicator such as .sREGptr) (eg one or more bits). The initialization of the bit stream will be explained later. In some embodiments, the host processor (e.g., central processing unit 126 of Figure 1) can be utilized to process the outer 4 private sequence to provide picture segment decoding and segment header decoding. In some embodiments I, due to the programming characteristics of the decoding system 200, decoding can be performed in any stage. / test 5c and 6A, SREG stream buffer / DMA bow f 562 for receiving the bus bar SRC1 value of the bus bar 632 and the bus bar (4) and the bus bar SRC2 value, respectively, and corresponding to the forwarding temporary storage And the Lst memory of the L-storage qSREg stream buffer 1 touches the engine along the packet stream buffer 562b in the packet 3, in the embodiment can be a 32-bit scratchpad of the format (1) (10) and 8 128 Bit (8χΐ28) register. L is issued by the driver software as described above for the initial command & ^ sreg string ..., 爰 state / DMA engine 562. Once initialized, the internal buffer of the sreg stream buffer/DMA engine 562 is automatically managed. The sreg _ = stream buffer / DMA engine 562 is used to preserve the location of the parsing bit. In one embodiment, the SREG Stream Buffer/DMA Engine 562 uses two temporary states, a fast 32-bit flip-flop and a slower 512 or 1 〇 24-bit memory. Bit streams use bits. The SREG register 562a operates in bits and the bit stream buffer 562b operates in a byte group, which saves power. Typically, the instructions operate in the SREG register 562a and use a small number of bits (e.g., 1-3 bits). When the SREG register 562a uses more than one tuple of data, the data (in bytes) will be transferred from the bitstream buffer 562b to the SREG register 562 and the buffer indicator will be decremented.

Clients Docket No.: S3U06-0013-TW TTss Docket No:0608-A41246twf.doc/NikeyChen 38 200809689 /少所傳送的位元組數量。當SREG串流緩衝器/DMA弓丨擎 562的DMA偵測到使用256位元或是更多位元時,從記憶 體知:取256位元以再填滿位元流緩衝器562b。因此,可變 長度解碼單元530a實施一個簡單的循環緩衝器(256饭元 片:段X 4 )以紀錄位元流緩衝器562b並提供填充。在某此 實施例中,可以使用單一緩衝器,不:過一個循環緩衝器需 要更複雜的指標計算以跟上記憶體的速度。 可以利用初始化指令來達成内部緩衝器562b的内部動 _ 作,稱為init_bstr指令。在一實施例中是由驅動軟體ι28 發出INIT一BSTR指令以及其他之後說明的指令。已知值元 流位置的位元組位址及位元偏移量,INIT—BSTR指令將資 料載入至内部位元流緩衝器562b並開始管理程序。對於每 一次呼叫處理片段資料,將發出下列格式之指令: INIT—BSTR offset, RBSPbyteAddress 發出INITJBSTR指令以載入資料至SREG串流緩衝器 /DMA引擎562的内部缓衝器562b。SRC2暫存器提供位元 組位址(RBSPbyteAddress),而SRC1暫存器提供位元偏移 量。如此,可提供下列通用之指令格式: INIT_BSTR SRC2, SRC1, 其中,這個指令中的SRC1以及SRC2以及其他對應於 内部暫存器566的值非限定在這些暫存器。在一實施例 中,使用256位元排列之記憶體提取以存取位元流資料,Clients Docket No.: S3U06-0013-TW TTss Docket No: 0608-A41246twf.doc/NikeyChen 38 200809689 / The number of bytes transmitted. When the DMA of the SREG Stream Buffer/DMA Bower 562 detects the use of 256 bits or more, it is known from the memory that 256 bits are taken to refill the bit stream buffer 562b. Therefore, the variable length decoding unit 530a implements a simple circular buffer (256 chips: segment X 4 ) to record the bit stream buffer 562b and provide padding. In some embodiments, a single buffer can be used. No: a circular buffer requires more complex index calculations to keep up with the speed of the memory. An internal instruction of internal buffer 562b can be implemented using an initialization instruction, referred to as an init_bstr instruction. In one embodiment, the INIT-BSTR instruction and other instructions described later are issued by the driver software ι28. Knowing the byte address and bit offset of the value stream location, the INIT-BSTR instruction loads the data into internal bitstream buffer 562b and begins the hypervisor. For each call processing fragment data, an instruction in the following format will be issued: INIT - BSTR offset, RBSPbyteAddress The INITJBSTR instruction is issued to load the data into the internal buffer 562b of the SREG Stream Buffer / DMA Engine 562. The SRC2 register provides the byte address (RBSPbyteAddress) and the SRC1 register provides the bit offset. Thus, the following general instruction formats are available: INIT_BSTR SRC2, SRC1, where SRC1 and SRC2 in this instruction and other values corresponding to internal register 566 are not limited to these registers. In one embodiment, a 256-bit array of memory fetches is used to access bit stream data,

Clienfs Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 39 200809689 當SREG暫存器562a使用資料時,内部緩衝器兄沘 便冒填充資料。換句話說,SREG串流緩衝器/DMA引擎 562的内邛緩衝态562b作為以3為模(modulo)之循環緩衝 …焉入缝衝器暫存器並傳送至SREG串流緩衝器/dma 引擎562之32位元SREG暫存器偷。於1施例中,在 ^何其他操作針對這些暫存器歧緩衝器的操作開始之 W,位兀流緩衝器562b内的資料是以位元組方式排列。藉 由使用排列指令可實施資料的排,,稱之為abst指令。曰 ABST扣令排列位元流緩衝器562b内的資料,其中在解碼 程序中,排列位元(例如··填充位元)最後將丟棄。Clienfs Docket No.: S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen 39 200809689 When the SREG register 562a uses the data, the internal buffer brothers will fill the data. In other words, the inner buffer state 562b of the SREG stream buffer/DMA engine 562 acts as a modulo buffer of modulo 3...into the slot buffer register and to the SREG stream buffer/dma engine The 562 32-bit SREG register was stolen. In the first embodiment, the data in the bit stream buffer 562b is arranged in a byte group at the beginning of the operation of the other buffers for these registers. The row of data can be implemented by using the permutation instruction, which is called an abst instruction.曰 ABST debits the data in bit stream buffer 562b, where in the decoding process, the aligned bits (e.g., padding bits) are finally discarded.

器以輸入SREG串流缓衝器/DMA引擎562的32位元暫存 态562a。CABAC模組580與讀取模組572 —起可使用 READ指令以從SREG暫存器562a讀取資料。例如,在 H.264規格中,某些符號為固定長度編碼,以及藉由執行 這些彳寸疋位元數的READ指令而得到值,並零延伸至暫存 态的尺寸。READ指令之格式如下: READDST,SRC1, 其中DST對應於輸出或目標暫存器。在一實施例中, SRC 1暫存态包含不具正負號的整數值n。透過指 令,從SREG暫存器Mb讀取n位元。當從π位元暫^ 器562a使用了 256位元的資料(例如解碼一或多個語法成 刀)’自動開始k取動作以彳隻得另一個2%位元的資料以 寫入至内部緩衝器562b的暫存器,接著進入SREG暫存器The device inputs the 32-bit temporary state 562a of the SREG stream buffer/DMA engine 562. The CABAC module 580 and the read module 572 can use the READ command to read data from the SREG register 562a. For example, in the H.264 specification, some symbols are fixed length codes, and the value is obtained by executing these READ instructions for the number of bits, and zeros to the size of the temporary state. The format of the READ instruction is as follows: READDST, SRC1, where DST corresponds to the output or target register. In an embodiment, the SRC 1 temporary state contains an integer value n that is not signed. The n-bit is read from the SREG register Mb through the instruction. When 256-bit data is used from the π-bit 562a (for example, decoding one or more grammars into a knives) 'automatically starts the k-fetching operation so that only another 2% of the data is written to be written internally. Buffer of buffer 562b, then enters SREG register

Clienfs Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 40 200809689 歲濤 / 562a進行使用。 在某些貫施例中’如果對應於—符號解碼之sreg暫 存器、562a的資料已被使用了預定數量的位元或位元組,且 内部緩衝器562b沒有再接收到任何資料,則CABAC^且 580可以經由延遲/重置匯流排636執行延遲,以便執行复 他的執行緒(例如與CABA:C.解碼程序無關之執行緒), 像是頂點著色器操作。 使用SREG串流緩衝器/DMA引擎562的DMA引擎可 以減少所需的全部緩衝器以補償記憶體遲(例如, 些圖形處理單元中,會有三百多週期)。二;位元流某 可以請求流入另外的的位元流資料。如果位元流資料太 低,且位元流緩衝姦562b有向下溢位的風險時(例如已知 週期數量,讓信號從可變長度解碼單元53〇a流至處理器管 線),可傳遞延遲信號給處理器管線以暫停操作直到所等 候的資料到達位元流缓衝器562b。 ⑩ 此外,SREG串流緩衝器/DMA引擎562原本就有處理 錯誤位元流的能力。例如,由於位元流錯誤,有可能會沒 有偵測到片段結尾標示。這種偵測錯誤可能會導致完全地 解碼錯誤,並且使用到後來的圖樣或片段的位元。SREG 串流緩衝器/DMA引擎562紀錄所使用的位元數。當使用 的位元數大於預設的定限值(可針對每一片段改變)時, 結束處理程序並送出異常的信號至處理器(例如··主機處 理器)。接者’處理器執行編碼以嘗試從錯誤中回復。 凊同時參考弟6A圖以及第6B圖,進—步說明可變長Clienfs Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 40 200809689 Years old / 562a for use. In some embodiments, 'if the data of the sreg register, 562a corresponding to the - symbol decoding has been used for a predetermined number of bits or bytes, and the internal buffer 562b has not received any further data, then CABAC^ and 580 can perform a delay via delay/reset bus 636 to execute a complex thread (e.g., a thread independent of CABA: C. decoding), such as a vertex shader operation. The DMA engine using the SREG Stream Buffer/DMA Engine 562 can reduce all of the buffers needed to compensate for memory latency (e.g., there are more than three hundred cycles in some graphics processing units). Second; the bit stream can request the flow of additional bit stream data. If the bitstream data is too low and the bitstream buffer 562b has a risk of a downflow (eg, a known number of cycles, let the signal flow from the variable length decoding unit 53a to the processor pipeline), passable The delay signal is sent to the processor pipeline to suspend operation until the waiting data arrives at the bit stream buffer 562b. In addition, the SREG Stream Buffer/DMA Engine 562 originally had the ability to handle error bitstreams. For example, due to a bit stream error, it may not be detected at the end of the segment. This detection error can result in a complete decoding error and the use of bits in subsequent patterns or fragments. The SREG Stream Buffer/DMA Engine 562 records the number of bits used. When the number of bits used is greater than the preset limit (which can be changed for each segment), the processing is terminated and an abnormal signal is sent to the processor (for example, the host processor). The receiver' processor executes the code to try to reply from the error.凊Please refer to the 6A and 6B pictures at the same time.

Clients Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 41 200809689 ’度解碼單元530a的功能’尤其是解碼引擎(例如·· bard 引擎或是模組624)以及内容變數的初始化。在片段起始 處且在解碼對應於第一巨集區塊的語法成分之前,内容狀 恶以及一進位計算解碼模組624被初始化。在一實施例 中,驅動軟體128發出,INIT-CTX指令以及INIT—ADE指 令來進行初始化。. * • « INIT—CTX指令會啟動CAB AC解碼模式並初始化一個 或多個内容表(例如遠端儲存或是晶片上記憶體,例如 • ROM) °INIT_CTX指令可根據下列指令格式而執行: INIT—CTX SRC2, SRC1 對INIT—CTX指令而言,根據位元位置,運算元SRC1 可具有下列一或多個關於已知H.264巨集區塊參數的值: cabac—init一idc、mbPerLine、constrained—intra一pred一flag、 NAL一unit—type(NUT)以及 MbaffFlag 。需注意到 constrained—intra一pred—flag、NAL—unit_type(NUT)以及 MbaffFlag對應於已知H.264巨集區塊參數。此外,根據位 元位置,運算元SRC2具有下列值:SliceQPY以及 mbAddrCurr。在一實施例中,進一步解釋,執行INIT—CTX 指令(即CAB AC内容表的初始化)需要cabacjnit—idc以 及sliceQPY(如量子化)參數。不過,要初始化整個CABAC 引擎需要三個指令,即INIT_BTSR指令、INIT_CTX指令 以及INIT_ADE指令,因此,SRC1及SRC2 (例如:全部 64位元或各32位元)中的可用位元可以傳遞其他用於Clients Docket No.: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 41 200809689 'The function of degree decoding unit 530a' is especially the decoding engine (for example, bard engine or module 624) and content variables Initialization. The content exception and a carry calculation decoding module 624 are initialized at the beginning of the segment and before decoding the syntax component corresponding to the first macroblock. In one embodiment, the driver software 128 issues the INIT-CTX command and the INIT-ADE command for initialization. * * « The INIT-CTX command initiates the CAB AC decode mode and initializes one or more table of contents (eg remote storage or on-chip memory, eg • ROM). The °INIT_CTX instruction can be executed according to the following instruction formats: INIT —CTX SRC2, SRC1 For the INIT-CTX instruction, the operand SRC1 may have one or more of the following values for known H.264 macroblock parameters, depending on the location of the bit: cabac—init-idc, mbPerLine, Constrained—intra-pred-flag, NAL-unit-type (NUT), and MbaffFlag. It should be noted that constrained-intra-pred-flag, NAL-unit_type (NUT), and MbaffFlag correspond to known H.264 macroblock parameters. Further, according to the bit position, the operand SRC2 has the following values: SliceQPY and mbAddrCurr. In one embodiment, it is further explained that the execution of the INIT-CTX instruction (i.e., initialization of the CAB AC table of contents) requires a cabacjnit_idc and a sliceQPY (e.g., quantization) parameter. However, three instructions are required to initialize the entire CABAC engine, namely the INIT_BTSR instruction, the INIT_CTX instruction, and the INIT_ADE instruction. Therefore, the available bits in SRC1 and SRC2 (for example, all 64-bit or 32-bit each) can pass other uses.

Clienfs Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 42 200809689 • CABAC鄰近内容的參數。因此兩個來源暫存器SRCl以及 SRC2 664可以包含下列值: SRC1[15:0] = cabac—init—idc SRCl [23:16] = mbPerLine SRC 1 [24] = constrained—intra—pred—flagClienfs Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 42 200809689 • Parameters of CABAC proximity content. Therefore, the two source registers SRCl and SRC2 664 can contain the following values: SRC1[15:0] = cabac—init—idc SRCl [23:16] = mbPerLine SRC 1 [24] = constrained—intra—pred—flag

1 I SRCl[27:25] = NAL一unit一type (NUT) :1 I SRCl[27:25] = NAL-unit-type (NUT):

SRCl[28] - MbaffFlag SRC1[31:29]=未定義 • SRC2[15:0] - SliceQPY SRC2[31:16] - mbAddrCurrSRCl[28] - MbaffFlag SRC1[31:29]=Undefined • SRC2[15:0] - SliceQPY SRC2[31:16] - mbAddrCurr

SliceQPY的值是用於初始化位元流缓衝器562b内的 狀態機(未顯示)。 雖然前文已討論各種已知之圖形與片段參數,另外提 供一些關於可變長度解碼單元530a之參數。在一實施例 中,cabac_init_idc是針對未編碼為I-picture和切換 馨 I-picture(SI)之片段所定義。換句話說,cabac_init_idc只能 針對P、SP以及B片段而定義,以及當接收到I和SI片段 時,cabac_init_idc為預設值。舉例來說,當大概460個内 容(例如I以及SI片段)被初始化時,可以將cabac_init_idc 設為3 (因為根據H.264規格,cabac_init_idc的值只能是 0〜2 ),致能2位元以表示該片段為I或SI。 可變長度解碼單元530a亦可使用INIT_CTX指令以初 始化局部暫存器612以及巨集區塊鄰近内容記憶體564陣The value of SliceQPY is used to initialize the state machine (not shown) within bit stream buffer 562b. While various known pattern and segment parameters have been discussed above, some additional parameters regarding variable length decoding unit 530a are provided. In an embodiment, cabac_init_idc is defined for segments that are not encoded as I-picture and switched to I-picture (SI). In other words, cabac_init_idc can only be defined for P, SP, and B segments, and when the I and SI segments are received, cabac_init_idc is a preset value. For example, when approximately 460 contents (such as I and SI fragments) are initialized, cabac_init_idc can be set to 3 (because the value of cabac_init_idc can only be 0~2 according to the H.264 specification), enabling 2 bits To indicate that the fragment is I or SI. The variable length decoding unit 530a may also use the INIT_CTX instruction to initialize the local register 612 and the macroblock neighboring content memory 564 array.

Client’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 43 200809689 •〃列結構或是元件,包括與暫存相鄰巨集區塊有關之暫存 裔。參考第6C圖’在一實施例中,巨集區塊鄰近内容記憶 體564位於圖的上方。在一實施例中,巨集區塊鄰近内容 記憶體564的巨集區塊基準鄰近内容記憶體排列成記憶體 陣列以儲存有t關巨集區塊之列(r〇w )的資料。如圖所示,Client’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 43 200809689 • Queue structure or component, including temporary storage related to temporary neighboring macroblocks. Referring to Figure 6C, in one embodiment, the macroblock adjacent content memory 564 is located above the figure. In one embodiment, the macroblock neighboring content memory of the macroblock adjacent to the content memory 564 is arranged in a memory array to store data of a column of macroblocks (r〇w). as the picture shows,

II

巨集區塊鄰近内容§己憶體564包括陣列元素mbN:eighCtx[0, 1,Λ i+1,…119](標號為601),各元素用以儲存120 個巨集區塊中的一個巨集區塊至一列(例如對應於HDTV 為.1920x1080 像素)。目前 mbNeighCtxCurrent 暫存器 603 用於儲存當前解碼之巨集區塊,而mbNeighCtxLeft暫存器 605用於儲存先前解碼之鄰近(左方)巨集區塊。此外, 利用指標607a、607b和607〇(在第60圖中以箭頭表示) 才曰向暫存器603、6〇5和陣列元素601。為了解碼目前之巨 集區塊’解碼之資料儲存於mbNeighCtxCurrent暫存器 603。已知CABAC解碼之内容本質,根據前次解碼巨集區 塊時所蒐集之資訊來解碼目前的巨集區塊,亦即左方巨集 區塊儲存於左方mbNeighCtxLeft暫存器605並由指標607b 所指向,而上方巨集區塊儲存於陣列元素[i]中並由指標 6〇7c所指向。 繼續解釋初始化指令,INIT_CTX指令用於初始化與目 前巨集區塊(例如巨集區塊鄰近内容記憶體564陣列之元 素)相鄰之巨集區塊有關的上方及左方指標607c及607b。 例如,左方指標607b可以設為〇而上方指標607c可以設 為1。此外,INIT_CTX指令會更新總體暫存器614。The macroblock neighboring content § VIII has an array element mbN:eighCtx[0, 1, Λ i+1,...119] (labeled 601), and each element stores one of 120 macroblocks. The macro block is one column (for example, .1920x1080 pixels corresponding to HDTV). The mbNeighCtxCurrent register 603 is currently used to store the currently decoded macroblock, and the mbNeighCtxLeft register 605 is used to store the previously decoded neighbor (left) macroblock. Further, the indicators 607a, 607b, and 607 (indicated by arrows in Fig. 60) are used to the scratchpads 603, 6〇5 and the array element 601. The decoded data for decoding the current macroblock is stored in the mbNeighCtxCurrent register 603. Knowing the content nature of CABAC decoding, the current macroblock is decoded according to the information collected when the macroblock was decoded last time, that is, the left macroblock is stored in the left mbNeighCtxLeft register 605 and is indexed by the indicator. Pointed at 607b, and the upper macroblock is stored in array element [i] and pointed to by indicator 6〇7c. Continuing with the interpretation of the initialization instructions, the INIT_CTX instruction is used to initialize the upper and left indicators 607c and 607b associated with the macroblocks adjacent to the current macroblock (e.g., the macroblock neighboring elements of the array of content memory 564). For example, the left indicator 607b may be set to 〇 and the upper indicator 607c may be set to 1. In addition, the INIT_CTX instruction updates the overall register 614.

Clients Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twldoc/NikeyChen 200809689 : 關於内容表的初始化,因應呼叫INIT CTX指令,可 變長度解碼單元530a建立一或多個内容表,亦稱為 CTX—TABLE。在一實施例中,CTX—TABLE可以是 4x460x16位元表(8位元給m,另外8位元給η,具正負 ?虎的值)或疋其他資料結構’内容表的每一個項目包含從 狀悲索引暫存态602以及南可能性符號值暫存器604所存 取之 pStateldx 值及 valMPS 值。 INIT—ADE指令起始化二進位計算解碼模組624,亦稱 ⑩ 為解碼引擎。在一實施例中,完成INIT—BTSR指令後呼叫 INIT 一 ADE指令。於執行INIT—ADE指令之後,可變長度 解碼單元530a建立兩個暫存器,分別是碼長範圍 (codlRange )暫存器606以及碼長偏移量(codi〇ffset)暫 存器608,具有下列指令或是數值: codlRange = 0x01FE 以及 codlOffset = ZeroExtend (READ(#9)5 #16) 如此,在一實施例中,這些變數可以是9位元數值。 關於codlOffset指令,9位元是從位元流缓衝器562b所讀 取’零延伸(ZeroExtend)則儲存於16位元碼長偏移量暫 存器608中。部分實施例亦可使用其他數值。二進位計算 解碼模組624使用儲存於暫存器606及608之數值以決定 要輸出0或1,且當二進位解碼之後,這些值將進行更新。 除了初始化碼長範圍暫存裔6 0 6以及碼長偏移量暫存 态608 ’ INIT_ADE指令操作亦初始化二進位字串暫存器Clients Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twldoc/NikeyChen 200809689: Regarding the initialization of the table of contents, the variable length decoding unit 530a establishes one or more content tables, also called the INIT CTX command. CTX-TABLE. In an embodiment, CTX-TABLE can be a 4x460x16 bit table (8 bits for m, another 8 bits for η, positive and negative? Tiger values) or 疋 other data structures. Each item of the content table contains The sad index index state 602 and the pStateldx value and the valMPS value accessed by the south likelihood symbol value register 604. The INIT-ADE instruction initializes the binary calculation decoding module 624, also referred to as 10 as the decoding engine. In one embodiment, the INIT-ADE instruction is called after the INIT_BTSR instruction is completed. After executing the INIT_ADE instruction, the variable length decoding unit 530a establishes two temporary registers, which are a code length range (codlRange) register 606 and a code length offset (codi〇ffset) register 608, respectively. The following instructions or values are: codlRange = 0x01FE and codlOffset = ZeroExtend (READ(#9)5 #16) As such, in one embodiment, these variables can be 9-bit values. Regarding the codlOffset instruction, the 9-bit element is read from the bit stream buffer 562b and the zero extension (ZeroExtend) is stored in the 16-bit code length offset register 608. Some embodiments may also use other values. The binary calculation decoding module 624 uses the values stored in the registers 606 and 608 to determine whether to output 0 or 1, and these values will be updated after the binary decoding. In addition to the initialization code length range temporary storage 6 6 6 and the code length offset temporary state 608 ’ INIT_ADE instruction operation also initializes the binary string register

Clients Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 45 200809689 ,-6]6。在一實施例中,二進位字串暫存器616可以是u位 元暫存器,其接收來自二進位計算解碼模組624的輪出位 元。在部分實施例中可使用其他大小之暫存器。 當巨集區塊編碼成I_;PCM資料時,二進位計算解碼模 組624亦被初始化。已知LPCM資料夸含像素資料,根二 H.2糾規格,其並沒有將轉換或預測模型應用至原始^ ^料。例如’ LPCM可被使用以供無損(1〇ssless)編石馬應 ⑩ &上已描述與解析位元流以及初始絲一 件有關的架構以及指令,下面將描述有關二二:糸接先: 模型貢訊與内容,以及根據模型及内容解碼的—或多 序。通常,可變長度解碼單元530a用於取得解析語法成二 (syntax element ’ SE)所有可能的二進位化,或是經由: 進位化模組620及BIND指令至少足夠取得模型資=-變長度解碼單元530a更經由取得内容模組奶及此丁奸 ⑩令得到已知語法成分的内容,並根據内容及模型資訊, 由二進位計算解碼模組624及BARD指令實施運算解碼、、二 實際上,呼叫GCTX/BARD指令、輪出—位元給二進位 串暫存器616直到發現配合已知語法成分之有意義字碼合 構成一迴圈。在一實施例中,每一次解碼二進位值之後: 提供對應的解碼位元給二進位字串暫存器616,而二進位 字串暫存器被讀回至内容模組622,直到發現配對。 更詳細解釋使用單一可變長度解碼單元53〇&的解碼系 統架構’並同時參考第6A圖與帛6B圖,、經由驅動軟體128Clients Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 45 200809689, -6] 6. In one embodiment, binary string register 616 may be a u-bit register that receives the round-out bits from binary calculation decoding module 624. Other sizes of registers can be used in some embodiments. When the macroblock is encoded into I_; PCM data, the binary computation decoding module 624 is also initialized. It is known that LPCM data exaggerates pixel data, root two H.2 correction specifications, which does not apply the conversion or prediction model to the original material. For example, 'LPCM can be used for lossless (1〇ssless) 编石马应 10 & has described the architecture and instructions related to parsing the bit stream and the initial wire piece, which will be described below. : Model tribute and content, and decoding based on model and content - or multiple order. In general, the variable length decoding unit 530a is configured to obtain all possible binarizations of the syntax element 'SE, or via: the carry module 620 and the BIND instruction are at least sufficient to obtain the model resource = variable length decoding. The unit 530a obtains the content of the known grammatical component by acquiring the content module milk and the scam 10 command, and performs arithmetic decoding and decoding by the binary calculation decoding module 624 and the BARD instruction according to the content and the model information. The GCTX/BARD command is called, and the round-out bit is sent to the binary string register 616 until it is found that the meaningful word codes that match the known syntax components form a loop. In one embodiment, after each decoding of the binary value: a corresponding decoding bit is provided to the binary string register 616, and the binary string register is read back to the content module 622 until a match is found . The decoding system architecture using the single variable length decoding unit 53 〇 & and the reference to the 6A and B6B diagrams are explained in more detail, via the driver software 128.

Client’s Docket No.: S3U06-0013-TW TT s Docket No:0608-A41246twf.doc/NikeyChen 46 200809689 ••所發出的bind指令以致能二進位化模組62〇。於一實施例 中,BIND指令具有下列格式: BIND DST,SRC1, , 其中’ DST對應於目標暫存器,652,而對應16 :位元目别數值’以及srci對應於輸入暫存器SRC!l。BIND 指令操作的輸入包含語法成分(包含16位元目前數值 Imm )以及内容區塊種類(ctxBi〇ckCat)。語法成分可以 包含任何符合H.264規格的任何語法成分型式(例如: MBTypelnl、MBSkipFlagB、IntraChromaPredMode 等)。 呼叫BIND指令會使得驅動軟體128從儲存在記憶體(例 如·曰曰片上έ己憶體或遠端記憶體)中的表單(或其他資料 結構)讀取語法成分,並取得語法成分索引(SEIdx)。語 法成分索引用於存取其他表單或是資料結構以獲得如下文 所描述之各巨集區塊參數。 _ 在一實施例中,目標暫存器652包含32位元暫存器, 其具有下列格式:位元0-8 (ctxIdxOffset)、位元16-18 (maxBinldxCtx)、位元 21-23 (ctxBlockCat)、位元 24-29 (ctxIdxBlockCatOffset)、以及位元 31 (bypass flag)。 這些數值(例如ctxIdxOffset,maxBinldxCtx等等)會傳送 至取得内容模組622當作内容模型之用。在此實施例中, 任何未定義的保留位元可以是〇。根據語法成分索引以及 内容區塊種類的配對結果,ctxIdxBlockOffset可經由儲存 於遠端或晶片上記憶體之表單或其他資料結構而取得。表Client's Docket No.: S3U06-0013-TW TT s Docket No: 0608-A41246twf.doc/NikeyChen 46 200809689 • The bind command is issued to enable the binary module 62〇. In one embodiment, the BIND instruction has the following format: BIND DST, SRC1, , where 'DST corresponds to the target register, 652, and corresponds to 16: the target value ' and srci corresponds to the input register SRC! l. The input to the BIND instruction operation contains the syntax component (containing the 16-bit current value Imm ) and the content block type (ctxBi〇ckCat). The syntax component can contain any syntax component type that conforms to the H.264 specification (for example: MBTypelnl, MBSkipFlagB, IntraChromaPredMode, etc.). Calling the BIND command causes the driver software 128 to read the grammatical component from the form (or other data structure) stored in the memory (eg, on the sputum or the remote memory) and obtain the grammatical component index (SEIdx). ). The grammar component index is used to access other forms or data structures to obtain the macro block parameters as described below. In an embodiment, target register 652 includes a 32-bit scratchpad having the following format: bit 0-8 (ctxIdxOffset), bit 16-18 (maxBinldxCtx), bit 21-23 (ctxBlockCat) ), Bits 24-29 (ctxIdxBlockCatOffset), and Bits 31 (bypass flag). These values (e.g., ctxIdxOffset, maxBinldxCtx, etc.) are passed to the fetch content module 622 for use as a content model. In this embodiment, any undefined reserved bits may be 〇. The ctxIdxBlockOffset can be obtained via a form or other data structure stored on the remote or on-wafer memory based on the result of the syntax component index and the content block type. table

Client’s Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 47 200809689 *>» / 一說明一非限定實施例之表單内容: codeNum (k) Coded 一block_pattem Intra一4x4 Inter 0 47 0 1 31 16 2 15 1 3 6 2 4 2:3 4 5 27 8 6 29 32 7 30 3 8 7 5 9 11 10 10 13 12 11 14 15 12 39 47 13 43 7 14 45 11 15 46 13 16 16 14 17 3 6 18 5 9 19 10 31 20 12 35 21 19 37 22 21 42 23 26 44 24 28 33 25 35 34 26 37 36 27 42 40 28 44 39 29 1 43 30 2 45 31 4 46 32 8 17 33 17 18 34 18 20 35 20 24Client's Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 47 200809689 *>» / A description of a non-limiting embodiment of the form content: codeNum (k) Coded a block_pattem Intra one 4x4 Inter 0 47 0 1 31 16 2 15 1 3 6 2 4 2:3 4 5 27 8 6 29 32 7 30 3 8 7 5 9 11 10 10 13 12 11 14 15 12 39 47 13 43 7 14 45 11 15 46 13 16 16 14 17 3 6 18 5 9 19 10 31 20 12 35 21 19 37 22 21 42 23 26 44 24 28 33 25 35 34 26 37 36 27 42 40 28 44 39 29 1 43 30 2 45 31 4 46 32 8 17 33 17 18 34 18 20 35 20 24

Clients Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 48 200809689Clients Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 48 200809689

如果接收到未定 單元53〇a可以扣本6 ]之;内容區塊種類,則可變長度解 被考慮成具有〇值。疋哉苓數當成〇,使得ctxIdxBlock0ff 乎H BIND才旨令亦會 進位化模組620輪屮石一、 唬(Rst-Slgnal)從 下。 —進位計算解竭模組624,說明 為了說明二進位It,, 提出根據至少—實施^組2G的料輸人與輸出,這裡 二進位化模組 、一進位化核組620擷取語法成分, 亚且經由軟體提供已知的語法成分索引(_χ)1心 法成分索引,二進位化模組62〇查找表單以猂得 讀 B_xCtx、ctxIdx0ffset 以及 bypassFiag 的對應值:這 個查找值會暫時儲存在目標暫存器652的預先定義位元配 置。此外,使用語法成分索引以及内容區塊種類,二進位 化模組620進行第二次表單查找(例如··遠端記憶體或是 晶片上記憶體)以獲得ctxIdxBlockOffset數值。第二次的If the undetermined unit 53A is received, the variable length solution can be considered to have a 〇 value. The number of turns is 〇, so that ctxIdxBlock0ff H BIND will also be used to initialize the module 620 Rs-Slgnal. - a carry-over calculation deactivation module 624, illustrating that in order to illustrate the binary It, it is proposed to input and output according to at least the implementation of the group 2G, where the binaryization module and the digitization kernel group 620 capture the syntax components. And the software provides a known syntactic component index (_χ) 1 heart component index, and the binary module 62 looks up the form to read the corresponding values of B_xCtx, ctxIdx0ffset, and bypassFiag: the search value is temporarily stored in the target. A predefined bit configuration of the scratchpad 652. In addition, using the syntax component index and the content block type, the binary module 620 performs a second form lookup (e.g., remote memory or on-wafer memory) to obtain a ctxIdxBlockOffset value. Second time

Clienfs Docket No.: S3U06-0013-TW TVs Docket No:0608-A41246twf.doc/NikeyChen 49 200809689 ,查找值亦是暫時儲存在目標暫存器652中。因此,已決定 之值將用於建立目標暫存器652以作為32位元數值輸出目 對某些語法成分而言,可使用額外的資訊(語法成’分 與内容區塊種類除叶)以開始H.264解碼操作。例如,對Clienfs Docket No.: S3U06-0013-TW TVs Docket No: 0608-A41246twf.doc/NikeyChen 49 200809689 , the lookup value is also temporarily stored in the target register 652. Therefore, the determined value will be used to establish the target register 652 as a 32-bit value output. For some syntax components, additional information can be used (the syntax is 'division with the content block type except leaf'). Start the H.264 decoding operation. For example, right

I 像是SigCoeffFlag以及lastsigCoeffFlag的巨集區塊參數而 言,使用儲存在巨集區塊鄰近内容記憶體564的陣列元素 maxBinIdxCtx[l]裡的值以及輸入内容區塊種類值以決定巨 ⑩ 集區塊是圖場編碼或是圖框編碼。在某些實施例中,即使 是不同的語法成分’同樣的語法成分數目也使用於這些旗 才示’然後使用 mb—field—decoding一flag( mbNeighCtx[l]欄位) 來識別。 除了上述有關二進位化模組62〇的功能,注意到在第 6B圖中,二進位化模組620可結合二進位索引暫存器654、 多工器單元6W和/或轉發暫存器fi以及F2。至於二進位 φ 索引暫存器654以及多工器單元656,多工器單元656會 根據不同輸入而提供輸出SRC1(例如暫存器SRC1内的值) 給取得内容模組622。 關於標示為F1的轉發暫存器,當BIND (或GCTX) 才曰令產生結果時,結果可被寫入至目標暫存器(例如目標 暫存器652和/或轉發暫存器Fl)。藉由已知指令中的轉發 旗標可表不一個指令以及對應的模組(例如取得内容模組 622或二進位計算解碼模組624)是否使用轉發暫存器?! 以及F2。代表轉發暫存器的符號包括F1 (即使用轉發來源I, like the macro block parameters of SigCoeffFlag and lastsigCoeffFlag, use the value stored in the array element maxBinIdxCtx[l] of the macroblock adjacent to the content memory 564 and the input content block type value to determine the giant 10 cluster. The block is a field code or a frame code. In some embodiments, even the same grammatical component 'the same grammatical component number is used for these flags' and then identified using the mb-field-decoding-flag( mbNeighCtx[l] field). In addition to the above-described functions relating to the binary module 62, it is noted that in FIG. 6B, the binary module 620 can incorporate the binary index register 654, the multiplexer unit 6W, and/or the forwarding register fi. And F2. As for the binary φ index register 654 and the multiplexer unit 656, the multiplexer unit 656 provides the output SRC1 (e.g., the value in the scratchpad SRC1) to the fetch content module 622 based on the different inputs. Regarding the forwarding register labeled F1, when BIND (or GCTX) orders the result, the result can be written to the target register (e.g., target register 652 and/or forwarding register F1). Is it possible to indicate whether an instruction is used by a forwarding flag in a known instruction and whether a corresponding module (e.g., content module 622 or binary calculation decoding module 624) uses a forwarding register? ! And F2. The symbol representing the forwarding scratchpad includes F1 (ie using the forwarding source)

Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChe] 50 200809689 1之值’在一實施例中可以是指令中的位元26所表示)以 及F2 (即使用轉發來源2之值,在一實施例中可以是指令 中的位元27所表示)。對於取得内容模組622以及二進位 計异解碼模組624,資料可被轉發至個別的輸入,說明如 下。 t 别面已說明:二進位化模組620以及相關程序,·這裡將 說明關於取得内容模組622在GCTX指令方面如何取得已 知模型的内容以及二進位索引。簡單地說,取得内容模組 622 的輸入包含脱xBinIdxCtx、binIdx 以及 CtxIdx〇ffset, 描述如下。取得内容模組622使用CtxIdx〇ffset及binIdx 數值來计异Ctxldx之值(為一輸出,代表内容索引)。0(^丁又 指令的不範格式如下: GCTX DST,SRC2, SRC1,Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChe] 50 200809689 1 value 'in one embodiment may be represented by bit 26 in the instruction) and F2 (ie using forwarding) The value of source 2, in one embodiment, may be represented by bit 27 in the instruction). For the acquisition content module 622 and the binary difference decoding module 624, the data can be forwarded to individual inputs, as explained below. The other descriptions have been made for the binary module 620 and related programs. Here, the content and the binary index of how the acquired content module 622 obtains the known model in terms of the GCTX instruction will be described. Briefly, the input to the get content module 622 includes dexBinIdxCtx, binIdx, and CtxIdx〇ffset, as described below. The retrieved content module 622 uses the CtxIdx 〇 ffset and binIdx values to account for the value of Ctxldx (which is an output representing the content index). 0 (^丁又) The instruction is not as follows: GCTX DST, SRC2, SRC1,

其中’ SRC1對應於由多工器單元656所輸出的值並儲 存於暫存器SRC1,而SRC2對應於由目標暫存器652所輪 出的值並儲存於暫存器SRC2,以及DST對應於目標暫^ 器。在一實施例中,各暫存器具有下列數值: 曰 SRCl[7:0]=binIdx ;當目前語法成分包人 codedBlockPattern 時,SRC1 的值(從多工哭留一, /卯早7L 656輪 出,並作為取得内容模組622之輸入)可以是二 暫存器654的值。 立“ ^ SRC1 [15:8]可以是 levelListldx (當計算 sigC〇eff^ 時)、lastSigCoeffFlag 或是 mbPartldx (當計管紙 ^ 田I -r、、漏石馬區塊圖Wherein 'SRC1 corresponds to the value output by the multiplexer unit 656 and is stored in the register SRC1, and SRC2 corresponds to the value rotated by the target register 652 and stored in the register SRC2, and the DST corresponds to Target temporary. In one embodiment, each register has the following values: 曰SRCl[7:0]=binIdx; when the current syntax component wraps codedBlockPattern, the value of SRC1 (from multiplex crying, / 卯 early 7L 656 rounds The output of the content module 622 can be the value of the second register 654. " ^ SRC1 [15:8] can be levelListldx (when calculating sigC〇eff^), lastSigCoeffFlag or mbPartldx (when counting paper ^ Tian I - r,, leaking stone block diagram

Clienfs Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 51 200809689 ,樣之Ref—Idx或是binldx)。當語法成分是sigC〇effFlag或 是lastSigCoeffFlag時,多工器單元656可以用來傳送 levelListldx 〇 SRC1 [16]可包含iCbCr旗標,而當其值為〇時,區塊 為Cb色度;區塊。此外,SRC1 [16]可包含L0/L1值,如果Clienfs Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 51 200809689, Ref-Idx or binldx). When the syntax component is sigC〇effFlag or lastSigCoeffFlag, multiplexer unit 656 can be used to transmit levelListldx 〇SRC1 [16] can include iCbCr flag, and when its value is ,, block is Cb chrominance; block . In addition, SRC1 [16] can contain L0/L1 values if

I 是L0時,:其值為0,熟悉此技藝之人士從本發明的内容可 知L0/L1是用於移動補償預測之圖形參考列表(L0 = list0, Ll=listl) 〇 ⑩ SRC1 [21:20] = mbPartitionMode SRC2 [8:0]二 ctxIdxOffset SRC2 [18:16]二 maxBinldxCtx SRC2 [23:31] = ctxBlockCat SRC2 [29:24] = ctxIdxBlockOffset SRC2 [31] = bypassFlag 再者,DST包括取得内容模組622的輸出並具有下列 值: DST [15:00] = ctxldx DST [ 23:16] = binldx DST [ 27:24] = mbPartldx DST [29:28]二 mbPartitionMode DST [30] - L0 取得内容模組622亦可與轉發暫存器互動。因此,當 使用轉發暫存器時,指令可取得GCTX.F1.F2的格式,其 中F1以及F2指示轉發暫存器被使用’即有2 ·位元在指令When I is L0, the value is 0, and those skilled in the art can know from the content of the present invention that L0/L1 is a graphic reference list for motion compensation prediction (L0 = list0, Ll=listl) 〇10 SRC1 [21: 20] = mbPartitionMode SRC2 [8:0] two ctxIdxOffset SRC2 [18:16] two maxBinldxCtx SRC2 [23:31] = ctxBlockCat SRC2 [29:24] = ctxIdxBlockOffset SRC2 [31] = bypassFlag Again, DST includes the content module The output of group 622 has the following values: DST [15:00] = ctxldx DST [ 23:16] = binldx DST [ 27:24] = mbPartldx DST [29:28]two mbPartitionMode DST [30] - L0 Get the content module Group 622 can also interact with the forwarding register. Therefore, when using the forwarding register, the instruction can obtain the format of GCTX.F1.F2, where F1 and F2 indicate that the forwarding register is used. That is, there are 2 bits in the instruction.

Client’s Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 52 200809689 解碼^如未得到—或兩個轉發旗標,則表 示·暫存器未被使用。當這些位元被設定時(例如設為 1),則使用轉發暫存器的值(内部產生的值)。否則,就 使用來源暫存ϋ的值。因此,轉發暫㈣更提供—個有關 何時碎最早的時間可發幻旨令的輯給編譯程序。當未使 用轉發時’指令可能遇到已知來源暫存器、之寫人後讀取的 延遲。 對GCTX指令而言,當重置信號(⑽一^㈣)被設定 時,SRC1白勺值為〇。當運算邮㈣)成立時, SRC1為來自取得内容模組622内部❸ΗηΜχ值再加上工, 否則SRC1為來自執行單元暫存器的binWx值。可使用二 進位化模組620的輸出作為GCTX指令以及bard指令的 轉餐SRC2值。在後面的指令中,不會發出bind指令直 到BARD才曰令使用到轉發暫存器。進一步解釋,重置信號 以及F1轉發信號結合成一信號(例如2位元信號) {Fl,resetj,其指示輸入至取得内容模組622的SRCl值是 否包括binldx值或是轉發值。提供重置信號的另一個作用 是清除以及重置二進位字串暫存器616,並重置二進位索 引暫存器654成〇。 ’、 繼續討論取得内容模組622以及得到内容資訊,在_ 實施例中,下面表二以及表三所顯示的資訊分別對應於結 構鄰近内谷§己憶體564以及mbNeighCtxCurrent暫存器6〇3 的值。mbNeighCtxCurrent暫存器603包含目前巨集區塊的 解碼輸出結果。在目前巨集區塊處理的最後部分,發出Client's Docket No.: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 52 200809689 Decoding ^ If not obtained - or two forwarding flags, the register is not used. When these bits are set (for example, set to 1), the value of the forwarding register (the internally generated value) is used. Otherwise, the value of the source staging is used. Therefore, the forwarding (4) provides a compilation of the compilation of the illusion of the earliest time. When the forwarding is not used, the instruction may encounter a delay from the known source register, which is read by the writer. For the GCTX instruction, when the reset signal ((10) - ^ (4)) is set, the value of SRC1 is 〇. When the operation (4)) is established, SRC1 is added from the internal value of the content module 622, and SRC1 is the binWx value from the execution unit register. The output of the binary module 620 can be used as the GCTX command and the meal SRC2 value of the bard command. In the following instructions, the bind command will not be issued until the BARD is used to forward the scratchpad. Further, the reset signal and the F1 forward signal are combined into a signal (e.g., a 2-bit signal) {Fl, resetj, which indicates whether the SRCl value input to the acquired content module 622 includes a binldx value or a forward value. Another function of providing a reset signal is to clear and reset the binary string register 616 and reset the binary index register 654 to 〇. ', continue to discuss the content module 622 and get the content information, in the embodiment, the information shown in Table 2 and Table 3 below corresponds to the structure adjacent to the inner valley § 忆 体 564 and mbNeighCtxCurrent register 6 〇 3 Value. The mbNeighCtxCurrent register 603 contains the decoded output of the current macroblock. In the last part of the current macro block processing, issue

Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 53 200809689 ·,CWRITE指令,其複製來自mbNeighCtxCurrent暫存器603 的資訊至鄰近内容記憶體564陣列内所對應的位置。之 後,所複製的資訊被當作頂部鄰近值。 參數 大小(i碗) tra 门 sform_size_8x8一flag f 1 0 mb_field 一decode_flag . 1 1 mb_skip—flag 1 2 Intra_chroma_pred_mode 2 4:3 mb一type 3 7:5 codedBiockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 codedFIagY 1 14 codedFlagCb 1 15 codedFlagCr 1 16 codedFlagTrans 8 24:17 refldx 8 32:25 predMode 4 36:33Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 53 200809689 ·, CWRITE instruction, which copies information from mbNeighCtxCurrent register 603 to the corresponding location in the array of adjacent content memory 564 . The copied information is then treated as the top neighbor value. Parameter size (i bowl) tra gate sform_size_8x8_flag f 1 0 mb_field a decode_flag . 1 1 mb_skip_flag 1 2 Intra_chroma_pred_mode 2 4:3 mb_type 3 7:5 codedBiockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 codedFIagY 1 14 codedFlagCb 1 15 codedFlagCr 1 16 codedFlagTrans 8 24:17 refldx 8 32:25 predMode 4 36:33

表二 參數 ΛΦ (fire) transfomn_size_8x8一flag 1 0 mb一field 一decode—flag 1 1 mb_skip_flag 1 2 Intra一chroma_pred_mode 2 4:3 mbQpDeltaGTO 1 88 codedBiockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 codedFIagY 1 14 codedFlagCb 1 15 codedFlagCr 1 16 codedHag 丁 rans 24 87:64 refldx 16 52:37 predMode 8 60:53 mb—type 3 63:61 表三Table 2 Parameters Λ Φ (fire) transfomn_size_8x8_flag 1 0 mb_field A decode_flag 1 1 mb_skip_flag 1 2 Intra-chroma_pred_mode 2 4:3 mbQpDeltaGTO 1 88 codedBiockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 codedFIagY 1 14 codedFlagCb 1 15 codedFlagCr 1 16 codedHag Ding rans 24 87:64 refldx 16 52:37 predMode 8 60:53 mb-type 3 63:61 Table 3

Client’s Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 54 200809689 在一實施例中,參數codedFlagTrans被分為三部分。 舉例來說,開始的4位元係有關於内容區塊種類為〇或是 1,而上面的4位元係有關於内容區塊種類為3或是4。上 面的4位元更可分為兩部分,較低的2位元給iCbCr二0而 其他2位元給iCbCr=l。參數predMode (預測模式)具有 下列三選項之一:predLO = 0、edL 1 二 1 以及 NiPred = 2。Client's Docket No.: S3U06-0013-TW TT^s Docket No: 0608-A41246twf.doc/NikeyChen 54 200809689 In an embodiment, the parameter codedFlagTrans is divided into three parts. For example, the first 4-bit system has a content block type of 〇 or 1, and the upper 4-bit system has a content block type of 3 or 4. The upper 4 bits can be further divided into two parts. The lower 2 bits give iCbCr 2 and the other 2 bits give iCbCr=l. The parameter predMode has one of the following three options: predLO = 0, edL 1 2, and NiPred = 2.

第6D係顯示參考表二以及表三之參數refldx結構的一 實施例。需注意到參數refldx與使用在圖像復原之參考圖 像列表之索引有關。上述結構可提供記憶體以及邏輯電路 的最佳化。如圖所顯示,計算語法成分結構包括巨集區塊 的了貝邛列609、巨集區塊分區611(如顯示的四區)、l〇/li 值613以及各L0/L1值的儲存位元值Gt〇 (大於〇) 615以 及儲存位元值Gtl (大於U 617。通常,需要存取頂部顯 近巨集區塊609,然而巨集區塊的底部列也是需要存取, 其被分為4x4方陣的一實施例,結果產生四個mbpartiti〇] 611。對各mbPa應on61i *言,L〇/u值613的消息被為 定,但並非實際值。關於L0值以及u值為〗或是大於 的判斷被決定。在一實施例中’藉由儲存⑽615以及❽ 617兩位元而獲得決定,其被使用於計算語法成分。 進-步簡單說明計算語法成分結構,兩個最佳化則 行。在-最佳化中,只有保持2位元(雖然參考值傳统』 較☆),而不需要更多位元以供可變長度解碼單元53〇^ 計算語法成分的解碼。解碼全部的值麵持純行單元車 存器或是記憶體(例如:L2快取記憶體)。第二最佳化^The 6D shows an embodiment of the parameter refldx structure of Reference Table 2 and Table 3. Note that the parameter refldx is related to the index used in the reference image list for image restoration. The above structure provides optimization of memory and logic. As shown in the figure, the computational syntax component structure includes a Bellow column 609 of the macroblock, a macroblock partition 611 (such as the four regions shown), a l〇/li value 613, and a storage bit for each L0/L1 value. The value Gt 〇 (greater than 〇) 615 and the storage bit value Gtl (greater than U 617. Usually, the top neighboring macro block 609 needs to be accessed, but the bottom column of the macro block also needs to be accessed, and it is divided. For an embodiment of the 4x4 square matrix, the result is four mbpartiti〇] 611. For each mbPa should be on61i *, the message L 〇 / u value 613 is determined, but not the actual value. About the L0 value and the u value The decision of greater than or greater is determined. In one embodiment, the decision is made by storing (10) 615 and 617 617 two-element, which is used to calculate the grammatical component. The further step is to explain the computational grammatical component structure, the two best In the optimization, only 2 bits are maintained (although the reference value is traditionally ☆), and no more bits are needed for the variable length decoding unit 53 to calculate the decoding of the syntax components. The value of the face is a pure line cell memory or memory (for example: L2 cache) Recall). Second optimization ^

Client’s Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 55 200809689 有四個元素被維持(例如兩個在頂部而兩個在左方)。四 個元素為再循環,而最後的值會由CWRITE指令寫入於鄰 近,其儲存在記憶體中。之後,只有16位元被維持在 mbNeighCtxCurrent暫存器603,而只有8位元被維持在 mbNeighCtxLeft暫存器605以及陣歹U 564的頂部Client’s Docket No.: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 55 200809689 Four elements are maintained (eg two at the top and two at the left). The four elements are recycled, and the last value is written in the neighborhood by the CWRITE instruction, which is stored in memory. Thereafter, only 16 bits are maintained in the mbNeighCtxCurrent register 603, and only 8 bits are maintained at the top of the mbNeighCtxLeft register 605 and the array U 564.

I mbNeighCtx元素601。在計算邏輯電路使用再儲存,因為 解碼參考值的全部計算被較少位元的布林運算所取代。 mb_type包括如下列表四所顯示。I mbNeighCtx element 601. The storage logic uses re-storage because all calculations of the decoded reference value are replaced by Boolean operations with fewer bits. The mb_type includes the following list four.

mbjype 名稱 4,b000 SI 4’b001 1-4x4 or l—NxN 4’b010 IJ6x16 4’b011 l—PCM 4,b100 P一8x8 4,b101 B一8x8 4’b110 B一Direct一 16x16 4’b111 Others 表四 未顯示在第6B圖的額外暫存器可以被使用,例如 mbPerLine (例如8位元,不具正負號)、mb_qp_delta ( 8 位元’具正負號),以及mbAddrCurr ( 16-bit,目前巨集區 塊位址)。對mbAddrCurr而言,1920x1080陣列被實施, 雖然其只需要13位元。部分實施例會使用位元以幫助 16位元計算的執行。 來自先前所描述之暫存器的值亦被儲存在總體暫存器 614。複製儲存在總體暫存器内的值並儲存在暫存器以 幫助硬體設計。在一實施例中,總體暫存器614包括格式Mbjype name 4, b000 SI 4'b001 1-4x4 or l-NxN 4'b010 IJ6x16 4'b011 l-PCM 4, b100 P-8x8 4, b101 B-8x8 4'b110 B-Direct-16x16 4'b111 Others Table 4 does not show that the extra scratchpad in Figure 6B can be used, such as mbPerLine (eg 8-bit, no sign), mb_qp_delta (8-bit 'with sign), and mbAddrCurr (16-bit, currently giant) Set block address). For mbAddrCurr, the 1920x1080 array is implemented, although it only requires 13 bits. Some embodiments use bits to aid in the execution of 16-bit calculations. Values from the previously described scratchpad are also stored in the overall register 614. The values stored in the overall scratchpad are copied and stored in the scratchpad to aid in hardware design. In an embodiment, the overall register 614 includes a format

Clients Docket No.: S3U06-00B-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 56 200809689 -%»: ,化之32位元暫存器以包含對應於mbPerline、mbAddrCurrClients Docket No.: S3U06-00B-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 56 200809689 -%»: The 32-bit scratchpad to contain the corresponding mbPerline, mbAddrCurr

以及 mb—qp_delta 的值,除了 對應於 NUT、MBAFF_FLAG 以及chroma_format—idc的其他值之外。 可使用INSERT指令來更新總體暫存器614内的不同 攔位。INSERT指令的示範格式描述如下: » » INSERT ; DST,#Imm,SRC1 在上面INSERT指令中,#Imm的一實施例包括10位 元數字,其中前面5位元寬度的資料以及上面5位元指定 _ 資料被***的位置。輸入參數包括下列所述:And the value of mb_qp_delta, except for other values corresponding to NUT, MBAFF_FLAG, and chroma_format_idc. The INSERT instruction can be used to update different blocks within the overall scratchpad 614. The exemplary format of the INSERT instruction is described as follows: » » INSERT ; DST, #Imm, SRC1 In the above INSERT instruction, an embodiment of #Imm includes a 10-bit number, where the first 5-bit width of the data and the above 5-bit designation _ The location where the data was inserted. Input parameters include the following:

Mask = NOT(0xFFFFFFFF«#Imm[4:0])Mask = NOT(0xFFFFFFFF«#Imm[4:0])

Data = SRC 1 & Mask SDATA = Data«#Imm[9:5] SMask = Mask«#Imm[9:5] 輸出DST可表示如下: DST = (DST & NOT(sMask)) I SDATA 需注意到一些攔位(例如:NUT (NAL UNIT TYPE)、C (constrained一intra_pred_flag ) ) 、MBAFF—FLAG、 mbPerLine以及mbAddrCurr值亦可使用INIT—CTX指令來 寫入/初始化至總體暫存器614。 在一實施例中,局部暫存器612包括32位元暫存器, 其具有對應於 b、mb—qp—delta、numDecodAbsLevelEql 以 及mmiDecodAbsLevelGtl的攔位。這些攔位可使用INSERT 指令來更新。局部暫存器612亦被初始化,使得b = 0、 mb_qp_delta=0 、 numDecodAbsLevelEql^-l 以 及Data = SRC 1 & Mask SDATA = Data«#Imm[9:5] SMask = Mask«#Imm[9:5] The output DST can be expressed as follows: DST = (DST & NOT(sMask)) I SDATA Note Some intercept bits (eg, NUT (NAL UNIT TYPE), C (constrained-intra_pred_flag)), MBAFF-FLAG, mbPerLine, and mbAddrCurr values may also be written/initialized to the overall register 614 using the INIT-CTX instruction. In one embodiment, local register 612 includes a 32-bit scratchpad having stalls corresponding to b, mb-qp-delta, numDecodAbsLevelEql, and mmiDecodAbsLevelGtl. These traps can be updated using the INSERT directive. The local register 612 is also initialized such that b = 0, mb_qp_delta=0, numDecodAbsLevelEql^-l, and

Clients Docket No;: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 57 200809689 考, :numDecodAbsLevelGtl = 0。用以提供初始化的指令可使用 下列格式: C WRITE SRC1 ,其中 SRC1 [15:,0]二 mbAddrCurr。CWRITE SRC1 更新總Clients Docket No;: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 57 200809689 Test, :numDecodAbsLevelGtl = 0. The instructions used to provide initialization can use the following format: C WRITE SRC1 , where SRC1 [15:,0] is mbAddrCurr. CWRITE SRC1 update total

I 體暫存器614的pibAddrCurr榻也。在鄰近元素結構以及其 解碼的簡單描述之後,將描述透過CWRITE指令所提供的 額外功能。The body register 614 has a pibAddrCurr couch also. The additional functionality provided by the CWRITE instruction will be described after the adjacent element structure and its simple description of decoding.

在CABAC解碼中,語法值被預期並從其鄰近巨集區 塊模仿。不同方法描述如後,其提供可變長度解碼單元530a 的實施例如何判斷左方以及上方鄰近巨集區塊以及如何判 斷這些巨集區塊為實際上為可使用。如前文所描述,解碼 程序使用鄰近值(例如:從巨集區塊或區塊至上方以及至 左方)。在一實施例中,二進位計算解碼引擎624計算下 列方程式,其使用目前巨集區塊數量以及位於一線 (mbPerLine)之巨集區塊的數量以計算上方巨集區塊的位 址以及左方與上方巨集區塊是否為可用。 舉例來說,為了判斷鄰近巨集區塊(例如:左方鄰近) 疋否存在(即有效)’可執行運异(例如:mbCurrAddr % mbPerLine )以檢查其結果是否為〇。在一實施例中,可執 行下列計算: a = mbCurrAddr — x mbPerLine a - {fnbCurrAddr%mbPerLine) mbCurrAddr mbPerLineIn CABAC decoding, syntax values are expected and mimicked from their neighboring macroblocks. The different methods are described as follows, which provide an embodiment of the variable length decoding unit 530a to determine the left and upper neighboring macroblocks and how to determine that the macroblocks are actually usable. As described earlier, the decoder uses neighboring values (e.g., from a macroblock or block to the top and to the left). In one embodiment, the binary calculation decoding engine 624 calculates the equation using the current number of macroblocks and the number of macroblocks located in one line (mbPerLine) to calculate the address of the upper macroblock and the left side. Is it available with the macro block above? For example, in order to determine whether a neighboring macroblock (eg, left neighbor) is present (ie, valid), the executable can be performed (eg, mbCurrAddr% mbPerLine) to check if the result is 〇. In one embodiment, the following calculations can be performed: a = mbCurrAddr - x mbPerLine a - {fnbCurrAddr%mbPerLine) mbCurrAddr mbPerLine

Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyClien 58 200809689 , 需注意到mbCurrAddr與對應於要解瑪之二進位符號 的目副巨集區塊位置有關,而mbPerLine與每^一已知列之 巨集區塊的數量有關。上面計算是使用一個除法、一個乘 法以及一個減法而實施。 進一步描述由二進位計算解碼引擎624所實施之解碼 I Ϊ 機制,參考第6E圖,其顯示將被解碼的圖像(16x8巨集 區塊且mbPerLine=16 )。當解碼第35巨集區塊時 (mbCurrent標記為35,而第36巨集區塊尚未被完全解碼) • 時,需要來自先前已解碼之上方巨集區塊(標記為19)以 及左方巨集區塊(標記為34)的資料。上方巨集區塊的資 訊 可 從 mbNeighCtx[i] 得 到 ,其中 卜mbCurrent%mbPerLine。因此,就這個例子而言,i =Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyClien 58 200809689 , It should be noted that mbCurrAddr is related to the location of the macroblock corresponding to the binary symbol to be solved, and mbPerLine It is related to the number of macroblocks per known column. The above calculation is performed using a division, a multiplication, and a subtraction. The decoding I Ϊ mechanism implemented by the binary computation decoding engine 624 is further described, with reference to Figure 6E, which shows the image to be decoded (16x8 macroblock and mbPerLine = 16). When decoding the 35th macroblock (mbCurrent is marked 35, and the 36th macroblock has not been fully decoded), it needs to be from the previously decoded upper macroblock (labeled 19) and the left giant The data of the block (labeled 34). The information of the upper macro block can be obtained from mbNeighCtx[i], where mbCurrent%mbPerLine. So, for this example, i =

35%16’則i=3。在目前巨集區塊被解碼後,可使用CWRFTE 指令來更新陣列中的mbNeighCtxLeft 605以及 mbNeigliCtx[i] 601 〇 當另一例子時,考慮下列: 響 mbCurrAddr ε [Ο: max MB -1] 其中,maxMB為8192而mbPerLine二120。在一實施例中, 除可以藉由乘上(1/mbPerLine)而實施,其查找儲存於晶 片上記憶體之表(例如120x11位元的表)。告 mbCurrentAddr為13位元時,可使用13x11位元的乘法器。 在一實施例中,完成乘法運算的結果、儲存上方13位元, 以及執行13x7位元的乘法,藉以儲存較低13位元。最後,35% 16' then i=3. After the current macroblock is decoded, the CWRFTE instruction can be used to update the mbNeighCtxLeft 605 and mbNeigliCtx[i] 601 in the array. When another example, consider the following: ring mbCurrAddr ε [Ο: max MB -1] where maxMB is 8192 and mbPerLine is 120. In one embodiment, in addition to being implemented by multiplying (1/mbPerLine), it looks up a table of memory stored on the wafer (e.g., a table of 120 x 11 bits). When mbCurrentAddr is 13 bits, a 13x11 bit multiplier can be used. In one embodiment, the result of the multiplication operation is completed, the upper 13 bits are stored, and the 13x7 bit multiplication is performed to store the lower 13 bits. At last,

Client’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 59 200809689 /執彳丁 13位兀的減法以決定「a」。運算的全部順序會使用 到2個週期,而結果將被儲存以使用在其他運算,以及當 mbCurrAddr值改變時再計算一次。 田 在刀貫施例中,模數(modulo)運算不會被執行, 反而可,用執行單元内的著色邏輯電路以提供對齊置於片 段之第,線的第-mbAddrCurr值。舉例來說,上述著色邏. 輯電路可執行下列計算:mbAddrCurr= absoluteMbAddrCurr — n * mbPerLme。因為,部分 H 264 ⑩彈性巨集區塊排序(Flexibility Maen>bl(>ek Ordering,FMO ) 模式具有一些非常複雜的鄰近結構,為了複製這些模式, 可在解碼系統200的額外著色器計算左方/上方的可得性, 並載入至可變長度解碼單元53〇a的一或多個暫存器。藉由 離開載入可變長度解碼單元53〇a,當啟動全部H.264模式 以進行符號解碼時可減少硬體的複雜性。 CWRITE指令從mbNeighCtxCurrent 603複製適當的攔 位至 mbNeighCtxT〇p[] 601 以及 mbNeighCtxLeft[](例如陣 列564的左方巨集區塊)。根據是否設定mBaffFrameFlag (MBAFF)以及目前與先前巨集區塊是否為攔位或是圖框 解碼,則特定 mbNeighCtxTop[] 601 以及 mbNeighCtxLeft[] 資料寫入。當(mbAddrCurr % mbPerLine 二=0)成立時, 標記mbNeighCtxLeft 605為不可用(例如其被初始化成 〇 )。使用CWRITE指令可移除mbNeighCtx記憶體564、 局部暫存器612以及總體暫存器614的内容。例如, CWRITE指令移動鄰近内容記憶體564 ·的相關内容至第iClient’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 59 200809689 /Executor 13 The subtraction of 13 digits determines "a". The entire sequence of operations is used for 2 cycles, and the results are stored for use in other operations and again when the mbCurrAddr value changes. In the case of a knife, the modulo operation is not performed. Instead, the color logic circuit in the execution unit is used to provide the first -mbAddrCurr value of the line placed in the segment. For example, the above-described coloring logic circuit can perform the following calculations: mbAddrCurr = absoluteMbAddrCurr - n * mbPerLme. Because the partial H 264 10 elastic macro block sorting (Flexibility Maen> bl (> ek Ordering, FMO) mode has some very complicated neighboring structures, in order to copy these modes, the extra shader of the decoding system 200 can calculate the left The square/upper availability is loaded into one or more registers of the variable length decoding unit 53A. By leaving the load variable length decoding unit 53a, when all H.264 modes are activated The hardware complexity can be reduced for symbol decoding. The CWRITE instruction copies the appropriate block from mbNeighCtxCurrent 603 to mbNeighCtxT〇p[] 601 and mbNeighCtxLeft[] (for example, the left macro block of array 564). mBaffFrameFlag (MBAFF) and whether the current and previous macro blocks are blocked or frame decoded, then the specific mbNeighCtxTop[] 601 and mbNeighCtxLeft[] data are written. When (mbAddrCurr % mbPerLine two = 0) is established, mark mbNeighCtxLeft 605 is unavailable (eg, it is initialized to 〇.) The mbNeighCtx memory 564, the local register 612, and the overall register 614 can be removed using the CWRITE instruction. For example, the CWRITE command moves the related content of the adjacent content memory 564 to the i-th

Client’s Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 60 200809689 /個巨集區塊(例如mbNeighCtx[i]或是目前巨集區塊)的左 方以及上方區塊,並且亦清除mbNeighCtxCurrent暫存器 603。如前文所描述,上方指標607c以及左方指標607b與 鄰近内容記憶體564有關。在CWRITE指令之後,上方索 引增加1,並且目前巨集區塊的内容移動到陣列内的上方Client's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 60 200809689 /The left and upper blocks of a macro block (such as mbNeighCtx[i] or the current macro block) And also clear the mbNeighCtxCurrent register 603. As described above, the upper indicator 607c and the left indicator 607b are associated with the adjacent content memory 564. After the CWRITE instruction, the upper index is incremented by 1, and the contents of the current macro block are moved above the array.

! I 位置以及左方位置·。上述機構可減少讀出/寫入時記憶體陣 列中讀出/寫入埠的數量。 可使用INSERT指令來更新鄰近内容記憶體564、局部 ⑩ 暫存器612以及總體暫存器614的内容,如前文所述。例 如,可使用INSERT指令(例如:INSERT SmbNeighCtxCurrent一 1,SRC1 )來寫入目前巨集區 塊。後來的運算不會影響上方指標607c以及左方指標607b (即只寫入至目前位置)。 INSERT指令以及來自二進位計算解碼模組624之更 新被舄入至鄰近内容記憶體564的mbNeighCtxCurrent陣 列601。左方指標607b指向記憶體564的元素,其相同於 ⑩ 鄰近(鄰近於mbNeighCtx 601 )陣列元素(即 mbNeighCtx[i-l])。 鑑於上述關於得到内容以及模型資訊,下文將根據内 容以及模型資訊討論二進位計算解碼模組624以及計算解 碼。二進位計算解碼模組624在BARD指令下操作。BARD 指令的不乾格式描述如下: BARD DST5 SRC2? SRC1! I position and left position. The above mechanism can reduce the number of read/write turns in the memory array at the time of read/write. The contents of the neighboring content store 564, the local 10 scratchpad 612, and the overall scratchpad 614 can be updated using the INSERT instruction, as previously described. For example, you can use the INSERT instruction (for example: INSERT SmbNeighCtxCurrent-1, SRC1) to write to the current macroblock. Subsequent operations do not affect the upper indicator 607c and the left indicator 607b (ie, only write to the current location). The INSERT command and the update from the binary computational decoding module 624 are broken into the mbNeighCtxCurrent array 601 of the adjacent content memory 564. The left indicator 607b points to the element of the memory 564, which is identical to the 10 neighbor (near mbNeighCtx 601) array element (i.e., mbNeighCtx[i-l]). In view of the above regarding the content and model information, the binary calculation decoding module 624 and the calculation decoding will be discussed below based on the content and model information. The binary calculation decoding module 624 operates under the BARD instruction. The dry format of the BARD instruction is described as follows: BARD DST5 SRC2? SRC1

Clienfs Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 61 200809689 其提供二進位計算解碼運算’复 ― 單-位元_。輸人參數描心:下;—*麵瑪導致 SRC1 = binldx/ctxldx, 以及 為取得内容模組622的輪出 SRC2 bypassFlag,為二進位化模組62〇 的輪出Clienfs Docket No.: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 61 200809689 It provides a binary calculation decoding operation 'complex-single-bit_. Input parameter description: lower; -* faceMa results in SRC1 = binldx/ctxldx, and in order to obtain the rotation of the content module 622 SRC2 bypassFlag, the rounding of the binary module 62〇

當使用轉發暫存器時,一示範格 BARD.F1.F2,其指示轉發暫存器。假如未得到二匕括 應的轉發旗標,則表示轉發暫存器未被使用。^^個對 位計算解碼模組624亦接收如前文所描述的二進 別地,祕收重置信號,後,二進位計算解碼模組^= 持重置仏號直到接收到第一次呼叫BARD指 % 置信號被清除。 後’重 在運算中,二進位計算解碼模組624接收内容索引 (ctxldx)值以及指標至來自取得内容模組622的解碼位元 流(bhidx) t目前位元分析位置。二進位計算解碼模組 624使用來自於碼長偏移量暫存器6〇8以及碼長範圍暫存 器606的偏移量以及範圍值以紀錄解碼引擎的目前間隔狀 態(偏移量,偏移量+範圍)。二進位計算解碼模組624 使用内容索引值以存取内容表(CTX-TABLE),其依序使 用以存取目前可能狀態pStateldx以及高可能性符號值。使 用pStateldx (例如:來自於儲存在遠端或晶片上記憶體之 表單)以讀取低可能性符號子範圍值、下一個高可能性符 號值以及下一個低可能性符號的可能值。When using the forward register, a model BARD.F1.F2, which indicates the forwarding of the scratchpad. If the forwarding flag is not obtained, the forwarding register is not used. ^^ Alignment calculation decoding module 624 also receives the second-in-one, secret recovery signal as described above, and then the binary calculation decoding module ^= holds the reset nickname until the first call is received BARD means that the % signal is cleared. In the latter operation, the binary calculation decoding module 624 receives the content index (ctxldx) value and the index to the current bit analysis position of the decoded bit stream (bhidx) t from the obtained content module 622. The binary calculation decoding module 624 uses the offset and range values from the code length offset register 6 〇 8 and the code length range register 606 to record the current interval state of the decoding engine (offset, partial Shift + range). The binary computational decoding module 624 uses the content index value to access the table of contents (CTX-TABLE), which in turn is used to access the currently possible state pStateldx and the high likelihood symbol value. Use pStateldx (for example, from a form stored on the remote or on-wafer memory) to read the low probability symbol sub-range value, the next high probability symbol value, and the next low probability symbol possible value.

Client’s Docket No·: S3U06-0013-TW TT,s Docket No:0608-A41246twf.doc/NikeyClien 62 200809689 資訊,二進位狀態、下一個範圍以及可能性 可能性符㈣D异解碼模組624計算目前二進位符號的高 (位元或是u二、1 計算解碼模組624輪I進位信號 暫存器616。接i值:如m)至二進位字串 重複程序,如 對下一個二進位的相同或是不同内容 622的回授進位字串暫存器616至取得内容模組 據t可处,κΐ接所顯示。二進位計算解碼模組624根 性:能此此:號值的選擇而更新偏移量以及範圍值和可能 符㈣及=性算解碼模組624將目前高可能性 、、主立_狀,%舄人至内容表以供後來的内容使用。 用,去=關於轉發暫存器F1以及轉發暫存器打的使 田“唬每出轉發時,指令可能或 例如,♦你—W &丨Ub具有延遲。 沒有延田遲麵且化了Γ _轉發至取得内容模組622中, %存在’且可在下—個週期發 取得内容模組622轉發 布7在攸 M m 5,{ λ , s ^ 口十异角午碼模組624中,會 週期。當在週則發出〇 二: 週期(J+5)發出从奶指 才則Ί在 槽最多填充4個職。在從_ ^ ^的缺少會導致延遲 位計算解物且62“ 化模組620轉發至二進 丁4碼換組624中,沒有延遲存在 解碼模組624轉發至取得内容模 一進位汁异 f β_指令時,則可在週期。Μ)發出GC= J ; 伙一進位計算解碼模組624轉 私τ。在 如果第二二進位字串被保留且化模組_中, 二進位化模組620之間有切換存1位叶鼻解碼模組_與 .们柯在,則沒有延遲存在。Client's Docket No·: S3U06-0013-TW TT,s Docket No:0608-A41246twf.doc/NikeyClien 62 200809689 Information, binary status, next range and possibility probability (4) D-decoding module 624 calculates the current binary The height of the symbol (bit or u 2, 1 calculation decoding module 624 round I carry signal register 616. i value: such as m) to binary string repeat procedure, such as the same for the next binary or It is the feedback carry string register 616 of the different content 622 to obtain the content module data t, and is displayed by the κ connection. The binary calculation decoding module 624 root: can be this: the value of the value is updated and the offset value and the range value and the possible symbol (4) and the = arithmetic decoding module 624 will be currently high probability, the main _ shape, % Deaf people to the content table for later use. Use, go = about the forwarding register F1 and forward the register to hit the field "唬 every time the forwarding is forwarded, the command may or for example, ♦ you - W & 丨 Ub has a delay. No Yantian is delayed and turned Γ _ forwarding to the obtained content module 622, % exists ' and can be obtained in the next cycle - the content module 622 is released 7 in the 攸M m 5, { λ, s ^ port ten-nine-day code module 624, The cycle will be issued. When the week is issued, the second cycle: (J+5) is issued from the milk finger, then the slot is filled with up to 4 positions. The lack of _ ^ ^ will cause the delay bit to calculate the solution and 62 The module 620 is forwarded to the binary 4 code change group 624. If there is no delay, the decoding module 624 can forward the code to the received content mode. Μ) Issue GC=J; the one-bit calculation decoding module 624 turns private τ. In the case where the second binary string is reserved and the modulo module _, there is a switch between the binary modulo module 620 and the yoke decoder module _ and .

Chenfs Docket No.: S3U06-0013-TW s oc et No:〇6〇8-A41246twfdoc/NikeyChen 63 200809689 /由保留第一二進位字串,可允許發出bard至BARD扑人 以供不需忍受延遲的旁路(bypass)情況。 曰7 CAVLC解碼 已經柄述用於CABAC解碼的可變長度解碼單元Μ a,目前將針對解碼系統,㈣cavlc.實施例作進 述’其亦稱為可變長度解碼單元53Gb,如第7A圖所顯 在掐述CAVLC架構之前’先簡單描述在可變長 元530b中内容的H.264 cAVLC程序。 馬早 已知,CAVLC程序編碼有關巨集區塊或是其位置之信 號的位準(例如:大小),以及位準何時會重複(例如^ )週期),以避免需要對每一位元做解碼。位元流允孔接 收以及分析上述資訊,其中當資訊由解碼可變長度解碼單 兀530b的解碼引擎使用時,緩衝器被填充。可變長度解碼 單元530b藉由從已接收位元流所擷取具有位準以及運行 (nm)係數的巨集區塊資訊來反向編碼過程並重建俨號。 因此,可變長度解碼料5動從位元流緩衝器562b^收 巨集區塊資訊,並分析串流已分別得到位準以及運行係數 值給位準以及運行陣列的暫時儲存器。舉例來說,位準以 及運行陣列讀出對應於巨集區塊中區塊之4χ4區塊的像 素,接著清除位準以及運行陣列以供下一個區塊使用。依 照H.264標準,軟體可根據4x4構建區塊而使用全部的巨 集區塊。 現在挺供有關於解碼巨集區塊資訊的· 一般操作,下列Chenfs Docket No.: S3U06-0013-TW s oc et No:〇6〇8-A41246twfdoc/NikeyChen 63 200809689 / By retaining the first binary string, it is allowed to issue a bard to BARD for no delay. Bypass situation.曰7 CAVLC decoding has been described for the variable length decoding unit C a for CABAC decoding, and will now be described for the decoding system, (iv) cavlc. The embodiment is also referred to as variable length decoding unit 53Gb, as shown in Fig. 7A. The H.264 cAVLC program which briefly describes the contents in the variable length element 530b is described before the CAVLC architecture. It has long been known that the CAVLC program encodes the level (eg, size) of the signal about the macroblock or its location, and when the level repeats (eg, ^) period, to avoid having to decode each bit. . The bit stream allows the hole to receive and analyze the above information, wherein the buffer is filled when the information is used by the decoding engine that decodes the variable length decoding unit 530b. The variable length decoding unit 530b reverses the encoding process and reconstructs the apostrophe by extracting macroblock information having a level and a running (nm) coefficient from the received bit stream. Therefore, the variable length decoding material 5 receives the macro block information from the bit stream buffer 562b, and analyzes that the stream has respectively obtained the level and the running coefficient value to the level and the temporary storage of the running array. For example, the level and the run array read the pixels corresponding to the 4χ4 block of the block in the macroblock, then clear the level and run the array for use by the next block. According to the H.264 standard, software can use all macroblocks based on 4x4 building blocks. Now there are quite a few general operations for decoding macro block information, the following

Clienfs Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 64 200809689 敘述提出在CAVLC解碼程序之 … 530b的不同元件,可將符合 長度解碼單元 慮。孰悉此技蓺之人w 用的各種變動列入考 不同綠的標號)是出自H.264規格,為了 气例如 除非疋有助於了解ρ述的不同程序 負迷, 一步之說明。: .次7^件,才會再做進 第7A圖係顯示可變長度解碼 一: 塊圖。綱係顯示單一可 貫:, 一可變長度解碼單元53〇b用以在杏 53〇b而早 流。同樣的原理可應用至具有額;:::碼早-位元 -Γ m ± , 頻卜了'父長度解碼單元的解 碼糸統200’可同時解碼多個(例如 : 說,第7Α圖係顯示可變長产自 間早地 而笛7 R長又角午馬早兀53〇b之選擇元件, 而弟7 B圖係顯示CAVLC解碼的表袼結構。雖然 述是有關巨集區塊解碼的内容,但是本發明所提出之原理 可應用到各種區塊解碼,將不再進—步描述相同的部分。Clienfs Docket No.: S3U06-0013-TW TT^s Docket No: 0608-A41246twf.doc/NikeyChen 64 200809689 The description of the different components of the 530b in the CAVLC decoding procedure can be considered to match the length decoding unit. The various changes used by the person who knows this technique are included in the H.264 specification. For the sake of gas, for example, unless the 疋 helps to understand the different procedures of ρ, it is a step-by-step explanation. : . 7 ^ pieces, will be done again. Figure 7A shows variable length decoding one: block diagram. The system shows a single pass: a variable length decoding unit 53〇b is used to flow early on the apricot. The same principle can be applied to have the amount;:::code early-bit-Γ m ± , and the decoding decoder 200 of the parent length decoding unit can decode multiple at the same time (for example: say, the 7th graphic system) The variable length is produced from the early selection of the flute 7 R long and the corner of the early morning horse is 53 〇 b, while the brother 7 B shows the structure of the CAVLC decoding. Although it is about the macro block decoding The content, but the principles proposed by the present invention can be applied to various block decodings, and the same parts will not be described in further detail.

可變長度解碼單元530b用以分析位元流、初始化解碼 硬脰與暫存器/記憶體結構,以及階段_運行解碼。上述 H.264標準的CAVLC解碼程序的上料功能將進—步描述 於後。關於位元流緩衝器操作,在CABAC以及cavlc運 算之間共用SREG串流緩衝h/dma引擎562,因此除了下 面減CABAC以及CAVLC模式之間的操作差異之外,為 了簡潔將不再進-步描述相同的部分。cabac以及 CAVLC解碼只&例皆使用相同的内容記憶體564,但是搁 位(例如:結構)不相同,其將描述於後6因此,當CAVLCThe variable length decoding unit 530b is configured to analyze the bit stream, initialize the decoding hard memory and the scratchpad/memory structure, and stage_run decoding. The loading function of the above-mentioned H.264 standard CAVLC decoding program will be described later. Regarding the bit stream buffer operation, the SREG stream buffer h/dma engine 562 is shared between CABAC and cavlc operations, so that in addition to the following operational differences between CABAC and CAVLC modes, no further steps will be taken for simplicity. Describe the same parts. The cabac and CAVLC decoding only use the same content memory 564, but the seats (eg, structure) are not the same, which will be described in the last 6 and therefore, when CAVLC

Clienfs Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 65 200809689 的内谷δ己丨思體564操作相似於前文所描述的CABAC運算 枯,為了簡潔將不再進一步描述相同的部分。此外,總體 暫存器614以及局部暫存器612亦被使用,因此將不再進 一步描述相同的部分。 參考第7A圖,可變長度解碼單元53〇b包括硬體的不 同模組’其包括係數符記(token)模組(C〇effj;〇ken)710、 位準碼模組(CAVLC—LevelCode ) 712、位準模組 (CAVLC—Level) 714、位準 〇 模組(cAVLC—L0) 716、 •零位準模組(CAVLC—ZL) 718、運行模組(CAVLC_Run) 720、位準陣列(Level Array ) 722以及運行陣列(RunArray ) 724。解碼系統亦包括如前文所描述之sreG串流緩衝器 /DMA引擎562、總體暫存器614、局部暫存器612以及鄰 近内容記憶體564。 可變長度解碼單元530b與執行單元420a的介面包括 相同於前文所述之CABAC實施例的一或多個目標匯流排 與對應的暫存器(例如:目標暫存器),以及兩個來源匯 B 流排與對應的暫存器(SRC1以及SRC2等)。 通常,根據片段的種類,驅動軟體128 (第1圖)準備 並載入CAVLC著色器至執行單元420a^AVLC著色器使 用標準指令集再加上額外的指令集,包括coeff_token、 CAVLC—LevelCode 、CAVLC—Level 、CAVLC_L0 、 CAVLC一ZL·以及CAVLC_Ruii指令以解石馬位兀流。客貝夕卜白勺 指令係包括有關於位準陣列722以及運行陣列724之讀取 以及清除運算的READ—LRUN以及CLRJLRUN指令。在Clienfs Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 65 200809689 The inner valley δ 丨 丨 564 564 operation is similar to the CABAC operation described above, for the sake of simplicity will no longer The same parts are further described. In addition, the overall register 614 and the local register 612 are also used, and thus the same portions will not be further described. Referring to FIG. 7A, the variable length decoding unit 53A includes different modules of the hardware 'including a token module (C〇effj; 〇ken) 710, a level code module (CAVLC-LevelCode) 712, level module (CAVLC-Level) 714, level module (cAVLC-L0) 716, • zero level module (CAVLC-ZL) 718, running module (CAVLC_Run) 720, level array (Level Array) 722 and RunArray 724. The decoding system also includes a sreG stream buffer/DMA engine 562, a global register 614, a local register 612, and a neighboring content memory 564 as previously described. The interface of the variable length decoding unit 530b and the execution unit 420a includes one or more target buss and corresponding registers (eg, target registers) of the CABAC embodiment as described above, and two source sinks. B is lined up with the corresponding register (SRC1 and SRC2, etc.). Typically, depending on the type of segment, the driver software 128 (Fig. 1) prepares and loads the CAVLC shader to the execution unit 420a. The AVLC shader uses the standard instruction set plus an additional instruction set, including coeff_token, CAVLC-LevelCode, CAVLC. The -Level, CAVLC_L0, CAVLC-ZL·, and CAVLC_Ruii instructions are used to solve the turbulence of the stone. The guest system includes READ-LRUN and CLRJLRUN instructions for the read and clear operations of the level array 722 and the run array 724. in

Client’s Docket No.: S3U06-0013-TW TT’s Docket No;0608-A41246twf.doc/NikeyChen 66 200809689 ^ 一貫施例中,在發出其他指令前,CAVLC著色器所執行的 第一個指令包含INIT—CTX指令和iNIT—ADE指令。這兩 個指令初始化可變長度解碼單元53〇 b以解碼CAVLC位元 流,並從自動安排串解碼的指標載入位元流至先進先出緩 衝斋,稍芦將說明這兩個指令。因此,可變長度解碼單元 530b可用:以分析位元流、初始化解碼硬體與暫存器/記憶體 結構’以及階段-運行解碼。H.264標準的CAVLC解碼程 序的上述各功能將進一步描述於後。 春關於分析位元流的指令,除了先前描述於CABAC程 序的READ以及INIT-BSTR指令會共用於cavlC程序之 外’還有兩個其他指令分析位元流存取更有關於CAVLC 程序’即INPSTR指令(對應於檢查字串模組570)以及 INPTRB指令(第5C圖中前次載入至可變長度解碼邏輯電 路550)。INPSTR指令以及INPTRB指令不需要限定在 CAVLC操作(例如上述指令可使用在其他程序,如 CABAC、VCM以及MPEG)。使用INPSTR指令以及 ® INPTRB指令以彳貞測特定圖型(pattern )(例如:資料開始 或是結束圖型)是否出現在片段、巨集區塊等,用以致能 位元流的讀出而不需要進行位元流。在一實施例中,指令 的順序包括INPSTR以及INPTRB然後READ指令的實 施。INPSTR指令的示範格式描述如下:Client's Docket No.: S3U06-0013-TW TT's Docket No;0608-A41246twf.doc/NikeyChen 66 200809689 ^ In the usual example, the first instruction executed by the CAVLC shader contains the INIT-CTX instruction before issuing other instructions. And iNIT-ADE instructions. These two instructions initialize the variable length decoding unit 53 〇 b to decode the CAVLC bit stream, and load the bit stream from the index of the automatically arranged string decoding to the FIFO buffer, which will be explained. Thus, variable length decoding unit 530b can be used to: analyze the bit stream, initialize the decoding hardware and the scratchpad/memory structure', and stage-run the decoding. The above functions of the CAVLC decoding program of the H.264 standard will be further described later. Spring's instructions for analyzing bitstreams, in addition to the READ and INIT-BSTR instructions previously described in the CABAC program, are common to the cavlC program. There are two other instructions for analyzing bitstream access. More about the CAVLC program, ie INPSTR The instructions (corresponding to the check string module 570) and the INPTRB command (previously loaded into the variable length decoding logic 550 in FIG. 5C). The INPSTR instruction and the INPTRB instruction do not need to be limited to CAVLC operations (for example, the above instructions can be used in other programs such as CABAC, VCM, and MPEG). Use the INPSTR instruction and the ® INPTRB instruction to detect whether a particular pattern (eg, data start or end pattern) appears in a fragment, macroblock, etc., to enable reading of the bit stream without A bit stream is required. In one embodiment, the order of the instructions includes the implementation of INPSTR and INPTRB and then the READ instruction. The exemplary format of the INPSTR instruction is described below:

INPSTR DST 其中,在一實施例中,檢查位元流並傳回SREG暫存器562aINPSTR DST wherein, in one embodiment, the bit stream is checked and passed back to the SREG register 562a

Client’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 67 200809689 ,的最高有效16位元在目標暫存器的較低16位元。目標暫 存器的上16位元包含sRJEGbitptr值。由於此操作,資料並 未從SREG暫存器562a移除。根據下列示範偽碼 (pseudocode)可實施 INPSTR指令: MODULE INPSTR (DST) OUTPUT [31:0] DST :Client's Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 67 200809689 The most significant 16 bits are in the lower 16 bits of the target scratchpad. The upper 16 bits of the target register contain the sRJEGbitptr value. Due to this operation, the data is not removed from the SREG register 562a. The INPSTR instruction can be implemented according to the following pseudo-code (pseudocode): MODULE INPSTR (DST) OUTPUT [31:0] DST :

DST 二{ZE (sREGbitptr),sREG [msb: msb-15]}; ENDMODULE 另一個分析位元流的指令為INPTRB指令,其檢查原 始位元組序列承載(raw byte sequence payload ,RBSP) 尾隨位元(例如排列成位元組的位元流)。INPTRB指令 提供位元流暫存器562b的讀取。INPTRB指令的示範格式 描述如下: INPTRB DST 〇 在INPTRB運算中,沒有位元從SREG暫存器562a移 除。當SREG暫存器562a的高有效位元包含例如1〇〇時, 則SREG暫存器562a包含RBSP停止位元,以及位元組内 剩下的位元為alignment zero bits。根據下列示範偽碼可實 施INPTRB指令: MODULE INPTRB(DST) OUTPUT DST; REG [7:0] P;DST 2 {ZE (sREGbitptr), sREG [msb: msb-15]}; ENDMODULE Another instruction to analyze the bit stream is the INPTRB instruction, which checks the raw byte sequence payload (RBSP) trailing bit. (eg, a bit stream arranged in a byte). The INPTRB instruction provides a read of the bit stream register 562b. The exemplary format of the INPTRB instruction is described as follows: INPTRB DST 〇 In the INPTRB operation, no bits are removed from the SREG register 562a. When the high significant bit of the SREG register 562a contains, for example, 1 ,, the SREG register 562a includes the RBSP stop bit, and the remaining bits in the byte are alignment zero bits. The INPTRB instruction can be implemented according to the following exemplary pseudo code: MODULE INPTRB(DST) OUTPUT DST; REG [7:0] P;

Clients Docket No.; S3U06-0013-TW TT5s Docket No:0608-A41246tw£doc/NikeyChen 68 200809689 , P 二 sREG [msb: msb-7];Clients Docket No.; S3U06-0013-TW TT5s Docket No:0608-A41246tw£doc/NikeyChen 68 200809689 , P 2 sREG [msb: msb-7];

Sp = sREGbitptr; T [7:0] = (P » sp) « sp; DST [1] = (T = = 0x80)? 1; 〇; ,DST[0] = ! (CVLC_BufferBytesRemaining > 0);Sp = sREGbitptr; T [7:0] = (P » sp) « sp; DST [1] = (T = = 0x80)? 1; 〇; , DST[0] = ! (CVLC_BufferBytesRemaining >0);

II

ENDMODULE 提供READ指令以供位元流緩衝器562b中資料調正。 現在將描述可變長度解碼單元53Ob的額外位元串缓衝 籲 為操作’目前將針對CAVLC操作的的初始化作描述,尤 其是記憶體、暫存器結構以及解碼引擎(例如:CAVLC模 組582)的初始化。在片段起始處且在解碼對應於第一巨 集區塊暫存器結構的語法成分之前,總體暫存器614、局 部暫存器612以及CAVLC模組582被初始化。在一實施 例中,驅動軟體128發出INIT_CAVLC指令以進行初始 化。INIT—CAVLC指令的示範格式描述如下: INIT CAVLC SRC2, SRC1 其中,SRC2包括片段資料中解碼之位元組的數目。其值寫 入於内部 CVLC—bufferBytesRemaining 内: SRC1 [15:0] = mbAddrCurr ; SRC 1 [23:16] = mbPerLine ; SRC 1 [24] = constrained一intra_predflag ; SRC 1 [27:25] = NAL—unit—type (NUT); SRC 1 [29:28] = chroma—format—idc (—實施例係使用對ENDMODULE provides a READ instruction for data alignment in bit stream buffer 562b. The extra bit string buffering of the variable length decoding unit 53Ob will now be described as "operational description of the initialization of the CAVLC operation, especially the memory, the scratchpad structure, and the decoding engine (eg, CAVLC module 582). Initialization. The global register 614, the local register 612, and the CAVLC module 582 are initialized at the beginning of the segment and before decoding the syntax components corresponding to the first macroblock register structure. In one embodiment, the driver software 128 issues an INIT_CAVLC instruction for initialization. The exemplary format of the INIT-CAVLC instruction is described as follows: INIT CAVLC SRC2, SRC1 where SRC2 includes the number of bytes decoded in the fragment data. Its value is written in the internal CVLC_bufferBytesRemaining: SRC1 [15:0] = mbAddrCurr ; SRC 1 [23:16] = mbPerLine ; SRC 1 [24] = constrained-intra_predflag ; SRC 1 [27:25] = NAL- Unit—type (NUT); SRC 1 [29:28] = chroma—format—idc (—the embodiment uses pairs

Client’s Docket No·: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 69 200809689 應於4:2:0格式之1的chrnrr^以 ,.,., t , 日)cnroma—f〇rmat—idc值,然而部分實 施例可使用其他取樣機制);以及 、 SRC1 [31:30]=未定義。 關於INIT—CAVLC指令,,SRC1内的值被寫入至總體 暫存态614中所對應的欄位。再者,src2内的值被寫入至 由INIT指令所設定的内部暫存器(例如: CVLC^bufferByteRemaining 暫存器)。使用 CVLC—bufferByteRemaining暫存器以復原任何錯誤位元 流,如前文所述。舉例來說,可變長度解碼單元53〇b (例 如·· SREG串流緩衝器/DMA引擎562)紀錄了分析已知片 丰又之位元流中緩衝位元的資訊。當使用位元流時,可變長 度解碼單元530b計數並更新CVLC_bufferByteRemaining 值。當其值低於〇時,其中低於〇的值是表示緩衝器或是 位元流錯誤,提示處理的終止以及返回至應用控制或是由 驅動軟體Π8控制以處理復原。 INIT—CAVLC指令亦初始化可變長度解碼單元53〇b的 不同儲存結構,包括在某方面來說相似於先前描述之 CABAC程序的鄰近内容記憶體564、mbNeighCtxLeft暫存 器 605 以及 mbNeighCtxCurrent 暫存器 603。已知 CAVLC 解碼之内容本質,根據前次解碼巨集區塊時CAVLCJTOTC 指令所蒐集之資訊來解碼目前的巨集區塊,亦即左方巨集 區塊儲存於左方mbNeighCtxLeft暫存器605並由指標607b 所指向,而上方巨集區塊儲存於陣列元素[i]6〇l中並由指Client's Docket No·: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 69 200809689 should be in 4:2:0 format 1 of chrnrr^,.,., t, day) cnroma-f〇 Rmat_idc value, however some embodiments may use other sampling mechanisms); and, SRC1 [31:30] = undefined. Regarding the INIT-CAVLC instruction, the value in SRC1 is written to the corresponding field in the overall temporary state 614. Furthermore, the value in src2 is written to the internal scratchpad set by the INIT instruction (for example: CVLC^bufferByteRemaining register). Use the CVLC_bufferByteRemaining register to restore any error bitstreams, as described earlier. For example, variable length decoding unit 53A (e.g., SREG stream buffer/DMA engine 562) records information that analyzes buffer bits in a known slice stream. When a bit stream is used, the variable length decoding unit 530b counts and updates the CVLC_bufferByteRemaining value. When the value is lower than 〇, the value below 〇 indicates a buffer or bit stream error, prompting the termination of processing and returning to application control or being controlled by the driver software 以8 to handle the recovery. The INIT-CAVLC instruction also initializes different storage structures of variable length decoding unit 53A, including neighboring content memory 564, mbNeighCtxLeft register 605, and mbNeighCtxCurrent register 603, which are similar in some respects to the previously described CABAC program. . Knowing the content nature of CAVLC decoding, the current macroblock is decoded according to the information collected by the CAVLCJTOTC command when decoding the macroblock, that is, the left macroblock is stored in the left mbNeighCtxLeft register 605 and Pointed by indicator 607b, and the upper macroblock is stored in the array element [i]6〇l and is indicated by

Clienf s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen Λ 200809689 -標607c所指向。使用INIT_CAVLC指令來初始化上方指 標607c與左方指標607b,並更新總體暫存器614。 為了判斷鄰近巨集區塊(例如:左方鄰近)是否存在 (即有效),可由CAVLC—TOTC指令執行運算(例如: mbCurrAddr % mbPerLine),其相似於 CABAC 實施例中 所執行的同一程序,因此將不再描述。 相似於所描述的CABAC程序,使用〇界111丁丑指令可 移除鄰近内容記憶體564的内容,而使用INSERT指令可 春 更新鄰近内容記憶體564的内容、局部暫存器612以及總 體暫存器614,其中可使用INSERT指令以供寫入至 mbNeighCtxCurrent暫存器603。維持在鄰近内容記憶體564 之資料的結構可描述如下: mbNeighCtxCurrent[01:00] : 25b : mbType mbNeighCtxCurrent[65:02] : 4’b : TC[16] mbNeighCtxCurrent[81:66] : 45b : TCC[cb][4] mbNeighCtxCurrent[97:82] : 4?b : TCC[cr][4] 當執行CWRITE指令時,更新mbNeighCtx[]鄰近值,然後 初始 mbNeighCtxCurrent 暫存器 603。 已描述由可變長度解碼單元530b初始的内容記憶體結 構以及初始化,下面將描述可變長度解碼單元53〇b (特別 是CAVLC一TOTC指令)如何使用鄰近内容資訊以計算總 係數(TotalCoeff,TC),其之後將被使用來判斷是否應 該使用CAVLC表格以解碼符號。通常,CAVLc的解碼是 利用描述於H.264規格的可變長度解碼表格(於此稱為Clienf s Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen Λ 200809689 - pointed to by 607c. The upper indicator 607c and the left indicator 607b are initialized using the INIT_CAVLC instruction, and the overall register 614 is updated. In order to determine whether a neighboring macroblock (eg, left neighbor) is present (ie, valid), an operation can be performed by the CAVLC_TOTC instruction (eg, mbCurrAddr% mbPerLine), which is similar to the same procedure performed in the CABAC embodiment, thus Will not be described. Similar to the described CABAC program, the content of the adjacent content memory 564 can be removed using the 111 丑 ugly command, while the content of the adjacent content memory 564 can be updated with the INSERT instruction, the local register 612, and the overall temporary storage. 614, wherein an INSERT instruction can be used for writing to the mbNeighCtxCurrent register 603. The structure of the data maintained in the adjacent content memory 564 can be described as follows: mbNeighCtxCurrent[01:00] : 25b : mbType mbNeighCtxCurrent[65:02] : 4'b : TC[16] mbNeighCtxCurrent[81:66] : 45b : TCC [cb][4] mbNeighCtxCurrent[97:82] : 4?b : TCC[cr][4] When the CWRITE instruction is executed, the mbNeighCtx[] neighbor value is updated, and then the mbNeighCtxCurrent register 603 is initialized. The content memory structure initialized by the variable length decoding unit 530b and the initialization have been described, and how the variable length decoding unit 53B (especially the CAVLC-TOTC instruction) uses the neighbor content information to calculate the total coefficient (TotalCoeff, TC) will be described below. ), which will be used later to determine if the CAVLC table should be used to decode the symbols. In general, CAVLc decoding is performed using a variable length decoding table described in the H.264 specification (herein called

Clients Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 71 200809689 崃 ,CAVLC表格)’其中根據先前已解碼符號之内容選擇 CAVLC表格以解碼各符號。即對每一格符號而言,其為不 相同的CAVLC表格。第7B圖係顯示基本表格結構,其為 可變大小的二維陣列。提供表格的陣列(每一個表格可為 一特定符唬),而每一個符號為霍夫曼(Huffman)編碼。 霍夫曼碼被儲存成下:列結構的表格〖 ; struct Table { unsigned head; • struct table { unsigned val; unsigned shv; } table[]; }Table[]; 下面將彳田述根據唯一前置(編碼用以比對的方 法(MatchVLC函數)。通常,CAVLC表格包括可變長度 部分以及固定長度部分。藉由執行一些固定大小的索引查 找(lookup)可簡化比對。在MatchVLC函數中,,可執行 READ運算而不從SREG暫存器562a移除位元。因此,對 處理位元流的位元流缓衝器562b而言,read運算不同於 前文所描述的READ指令。在下面所描述的MatchVLC函 數中,一些位元(fixL)從位元流緩衝器562b被複製,然 後於一指定表格中查找。指定表格内的各項目包含特定格 式(例如:值以及以位元型式的大小)。使用項目的大小Clients Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 71 200809689 崃 , CAVLC table) ' wherein the CAVLC table is selected according to the contents of the previously decoded symbols to decode each symbol. That is, for each cell symbol, it is a different CAVLC table. Figure 7B shows the basic table structure, which is a two-dimensional array of variable sizes. An array of tables is provided (each table can be a specific symbol), and each symbol is a Huffman code. The Huffman code is stored as follows: the table of the column structure 〖 ; struct Table { unsigned head; • struct table { unsigned val; unsigned shv; } table[]; }Table[]; The following is based on the unique front (Coded method for comparison (MatchVLC function). Generally, the CAVLC table includes a variable length part and a fixed length part. The alignment can be simplified by performing some fixed size index lookup. In the MatchVLC function, The READ operation can be performed without removing the bit from the SREG register 562a. Thus, for the bitstream buffer 562b that processes the bitstream, the read operation is different from the READ instruction described above. In the MatchVLC function, some bits (fixL) are copied from the bitstream buffer 562b and then looked up in a specified table. Each item in the specified table contains a specific format (for example: value and size in bit type) Use the size of the project

Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 72 200809689 /以進行位元流。 FUNCTION MatchVLC(Table? maxldx) INPUT Table; INPUT maxldx; , Idxl = CLZ(sREG);//pount number of leading zerosClient’s Docket No.: S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen 72 200809689 / to perform bit stream. FUNCTION MatchVLC(Table? maxldx) INPUT Table; INPUT maxldx; , Idxl = CLZ(sREG);//pount number of leading zeros

Idxl = (Idxl > maxldx)? maxldx : Idxl; fixL = Table[Idxl].head; SHL(sREG5 Idxl+#1); //shift buffer Idxl + 1 bit left Idx2 = (fixL)? 0 : READ(fixL); (val, shv) = Table[Idxl][Idx2]; SHL(sREG,shv); return val;Idxl = (Idxl > maxldx)? maxldx : Idxl; fixL = Table[Idxl].head; SHL(sREG5 Idxl+#1); //shift buffer Idxl + 1 bit left Idx2 = (fixL)? 0 : READ(fixL ); (val, shv) = Table[Idxl][Idx2]; SHL(sREG,shv); return val;

ENDFUNCTON 第7B圖係顯示上述表格結構之示範二維陣列的方塊 馨 圖,用-以描述在CAVLC解碼之内容中的MatchVLC函數。 從H.264標準内的表格9-5中得到當nC — -1時的例子, 其描述如下:ENDFUNCTON Figure 7B shows a block diagram of an exemplary two-dimensional array of the above table structure, with - to describe the MatchVLC function in the content of CAVLC decoding. An example when nC - -1 is obtained from Table 9-5 in the H.264 standard is described as follows:

Coeff—token TrailingOnes TotalCoeff Head Value Shift 1 1 1 0 — 33 0 βΙβΙΒ __雜議 麗遞観 001 2 2 0 66 0 勒轉養 mmmm. 賴#: 變· 麵_麵 響麵»3 我參 6:¾辣S? 纖_纖_ _ϊ議㈣ 9 一編㈡ 000010 0 4 1 4 1 000011 0 3 3 1 麵·1 痴頂7:吊# 黑pCoeff—token TrailingOnes TotalCoeff Head Value Shift 1 1 1 0 — 33 0 βΙβΙΒ __杂义丽递観001 2 2 0 66 0 勒转养mmmm. 赖#: 变·面_面面面»3 我参6: 3⁄4辣 S? 纤_纤_ _ Discussion (4) 9 I (2) 000010 0 4 1 4 1 000011 0 3 3 1 Face · 1 Idiot 7: Hang # black p

Clients Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 73 200809689 職纖雞 00000010 2 4 1 ^ 68 1 00000011 1 4 36 1 麵1¾¾ 賴 麵職 s»lli 在偽碼(pseudocode)方面,上述表格可表示如下: Table9-5[8] = { 〇,{{33,0}}, 1 〇,{{〇,〇}}, : 〇, {{66,0}}, 2, {{2, 2},{99, 2},{34, 2},{1,2}}, 1,{{4, 1},{3, 1}}, 1,{{67, 1},{35, 1}}, 1,{{68, 1},{36, 1}}, 〇, {{1〇〇, 〇}} }; 使用上述表格結構,可使用上述之MatchVLC函數以 實施CAVLC解碼。由於MatchVLC函數,對位元流執行 計算前導0以存取已知語法成分的表格。再者,藉由計算 前導0的值是否大於Idx的最大值,MatchVLC函數可啟動 計算前導〇運算(例如在部分實施例中,使用計算前導〇 模組576與讀取模組572),然後傳回maxldx (其處置的 情況為0000000,如第7B圖的表格所顯示)。MatchVLC 函數以及表格結構的另一優點為不需要多個指令來處置這 些情況,其由下面MatchVLC區段所處置:Idx 1 = CLZ(sREG) 計算前導 0 的數量,以及 Idx 1 = (Idx 1 > maxldx)? maxldx :Clients Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 73 200809689 Professional Fiber Chicken 00000010 2 4 1 ^ 68 1 00000011 1 4 36 1 Face 13⁄43⁄4 赖面职 s»lli in pseudo code (pseudocode), the above table can be expressed as follows: Table9-5[8] = { 〇,{{33,0}}, 1 〇,{{〇,〇}}, : 〇, {{66,0}} , 2, {{2, 2}, {99, 2}, {34, 2}, {1, 2}}, 1, {{4, 1}, {3, 1}}, 1, {{67 , 1},{35, 1}}, 1,{{68, 1},{36, 1}}, 〇, {{1〇〇, 〇}} }; Using the above table structure, you can use the above MatchVLC Function to implement CAVLC decoding. Due to the MatchVLC function, a calculation of leading 0 is performed on the bitstream to access a table of known syntax components. Furthermore, by calculating whether the value of the leading zero is greater than the maximum value of Idx, the MatchVLC function can initiate a computational pre-derivative operation (eg, in some embodiments, using the pre-computation module 576 and the reading module 572), and then pass Go back to maxldx (its disposition is 0000000, as shown in the table in Figure 7B). Another advantage of the MatchVLC function and the table structure is that multiple instructions are not needed to handle these situations, which are handled by the MatchVLC section below: Idx 1 = CLZ(sREG) calculates the number of leading zeros, and Idx 1 = (Idx 1 &gt ; maxldx)? maxldx :

Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 200809689 ♦ ,Idxl。接著,使用MatchVLC函數的下列區段移除已使用 的位元:SHL(sREG,Idxl+#1)。使用下面MatchVLC區段 讀取子陣列(sub-array)的標頭:fixL 二 Table[Idxl].head, 以及Idx2=(!fixL)? 0 : READ(fixL),其傳送最大數量的位元 數以被不確定地讀取;。前導0可以相同,但尾隨位元畔大 小可以改變。因此,在一實施例中,可實施CASEX種類情 況敘述(使用較多記憶體,但較簡單的碼結構)。 使用(val,shv) = Table[Idxl][Idx2]以及 SHL(sREG, shv) 讀取表格的實際值,其亦顯示實際上多少位元為語法成分 所使用。這些位元從位元流被移除,且語法成分的值返回 至目標暫存器。 已描述VLC匹配的方法以及表格結構的配置,接著返 回參考第7A圖以描述CAVLC解碼引擎或是程序(例如: CAVLC模組582)。一旦位元流被載入,且解碼引擎、記 憶體結構以及暫存器被載入,藉由驅動軟體128發出 ⑩ CAVLC—TOTC指令可啟動係數符記模組71〇。在一實施例 中,CAVLC一TOTC指令具有下面示範格式: CAVLC_TOTC DST5 Sl? 其中,S1以及DST分別包括一輸入暫存器以及一内部輸出 暫存器,具有下面所提供的示範格式: SRC1 [3:0] = blkldx SRC1 [18:16] - blkCat SRC1 [24] - iCbCr 剩下的位元為未定義。輸出格式描述如下:Client’s Docket No.: S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen 200809689 ♦ , Idxl. Next, the used bits are removed using the following sections of the MatchVLC function: SHL(sREG, Idxl+#1). Use the following MatchVLC section to read the header of the sub-array: fixL two Table[Idxl].head, and Idx2=(!fixL)? 0 : READ(fixL), which transmits the maximum number of bits To be read indefinitely; The leading 0 can be the same, but the trailing bit size can be changed. Thus, in one embodiment, a CASEX category description can be implemented (using more memory, but a simpler code structure). Use (val,shv) = Table[Idxl][Idx2] and SHL(sREG, shv) to read the actual value of the table, which also shows how many bits are actually used by the syntax component. These bits are removed from the bitstream and the value of the syntax component is returned to the target scratchpad. The method of VLC matching and the configuration of the table structure have been described, and then reference is made to Figure 7A to describe the CAVLC decoding engine or program (e.g., CAVLC module 582). Once the bit stream is loaded and the decoding engine, memory structure, and scratchpad are loaded, the controller software module 128 can initiate the coefficient register module 71 by issuing the 10 CAVLC_TOTC command. In one embodiment, the CAVLC-TOTC instruction has the following exemplary format: CAVLC_TOTC DST5 Sl? wherein S1 and DST respectively include an input register and an internal output register, having the exemplary format provided below: SRC1 [3 :0] = blkldx SRC1 [18:16] - blkCat SRC1 [24] - The remaining bits of iCbCr are undefined. The output format is described as follows:

Client’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 75 200809689 DST [31:16]二 TrailingOnes DST[15:0]-TotalCoeff 因此,如圖所顯示,係數符記模組710接收對應於 mbCurrAddr、mbType、是否正在處理色度通道的指示(例 如:iCbCr) ’ γ及blkldx (例如:區塊索引,因為圖像可Client's Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 75 200809689 DST [31:16]Two TrailingOnes DST[15:0]-TotalCoeff Therefore, as shown in the figure, the coefficient is written. Module 710 receives an indication corresponding to mbCurrAddr, mbType, whether a chroma channel is being processed (eg, iCbCr) 'γ and blkldx (eg: block index, since the image is

I 被分成許多區塊)。對從位元流緩衝器562b所存取的已知 巨集區塊而言’傳送blkldx,不管是8x8像素區塊或是4x4 像素區塊正在已知位置上進行處理。由驅動軟體128提供 上述寅§fl。係數符記模組710包括一查找表。根據前文描 述而輸入至係數符記模組710的查找表,可得到拖尾係數 的個數(TrailingOnes)以及非零係數(TotalCoeff)的個 數。TrailingOnes傳送有多少個1在一列上,而T〇talc〇eff 傳送有多少運行/位準對(run/level pair)係數在從位元流 抽出的塊狀資料上。TrailingOnes以及TotalCoeff分別提供 至CAVLC位準模組714以及零位準模組718。TrailingOnes 亦提供至位準0模組716,其對應於從位元流緩衝器562b 所擷取的第一位準(例如:直流(DC)值)。 位準模組714紀錄符號的字尾(suffix)長度(例如·· 尾隨1的數目),以及位準模組714結合位準碼(levelc〇de) 來計算位準值(level[Idx]),之後位準值儲存在位準陣列 722以及運行陣列724内。位準模組714操作在 CAVLC—LVL指令下,其具有下列袼式: CAVLC—LVL DST,S2, S1 ,其中:I is divided into many blocks). The blkldx is transmitted for a known macroblock accessed from the bitstream buffer 562b, whether it is an 8x8 pixel block or a 4x4 pixel block is being processed at a known location. The above 寅 §fl is provided by the driver software 128. The coefficient signature module 710 includes a lookup table. The number of trailing coefficients (TrailingOnes) and the number of non-zero coefficients (TotalCoeff) can be obtained by inputting the lookup table to the coefficient register module 710 according to the foregoing description. TrailingOnes transmits how many 1s are in a column, and T〇talc〇eff transmits how many run/level pair coefficients are on the block data extracted from the bitstream. TrailingOnes and TotalCoeff are provided to the CAVLC level module 714 and the zero level module 718, respectively. TrailingOnes is also provided to a level 0 module 716 that corresponds to a first level (eg, a direct current (DC) value) drawn from the bit stream buffer 562b. The level module 714 records the suffix length of the symbol (eg, the number of trailing ones), and the level module 714 combines the level code (levelc〇de) to calculate the level value (level[Idx]). The level values are then stored in the level array 722 and the run array 724. The level module 714 operates under the CAVLC-LVL command and has the following formula: CAVLC-LVL DST, S2, S1, where:

Clients Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 76 200809689 , SI = Idx (16-bit); S2 = suffixLength (16-bit);以及 DST = suffixLength (16-bit) 〇 字尾長,度(suffixLength)傳送碼字(c〇dew〇rd)的大 小為何。來自驅動軟體128的輪入提供指定字尾長度之大 小的資訊。此外,在一實施例中,因為字尾長度值被更新, DST以及S2可選擇為同一暫存器。 _ 、 更注思到’轉發暫存器(例如維持由已知模組内部地 產生的資料)亦可被使用,例如;p 1以及F2。由已知指令 内的轉發旗標指示指令以及對應模組是否使用到轉發暫存 器。符號F1 (即使用轉發來源1的值,在一實施例中可由 指令中的位元26所指示)以及符號F2 (即使用轉發來源2 的值,在一實施例中可由指令中的位元27所指示)可表示 轉發暫存器。當使用轉發暫存器時,CAVLCJLVL指令可 ^ 具有下列示範格式: CAVLC—LVL.F1.F2,DST,SRC2, SR1 ,其中當不是F1就是F2被設定時(例如成立),所指定 的轉發來源被當成輸入。在位準模組714的情況中,轉發 暫存器F1對應於由位準模組714產生的位準索引 (level[Idx]),其在遞增(increment)模組内遞增並輸入 至多工器730。同樣地,轉發暫存器F2對應於字尾長度 (suffixLength ),其由位準模組714所產生並輸入至多工Clients Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 76 200809689 , SI = Idx (16-bit); S2 = suffixLength (16-bit); and DST = suffixLength (16- Bit) The length of the suffix length, suffixLength, the size of the transmitted codeword (c〇dew〇rd). The rounding from the driver software 128 provides information specifying the size of the suffix length. Moreover, in an embodiment, because the suffix length value is updated, DST and S2 may be selected to be the same register. _ , and more attention to the 'transfer register (such as maintaining data generated internally by known modules) can also be used, for example; p 1 and F2. The forwarding flag in the known instruction indicates whether the instruction and the corresponding module use the forwarding register. Symbol F1 (i.e., using the value of forwarding source 1, which may be indicated by bit 26 in the instruction in one embodiment) and symbol F2 (i.e., using the value of forwarding source 2, may be in the embodiment a bit 27 in the instruction) Indicated) can represent a forwarding scratchpad. When using the forward register, the CAVLCJLVL instruction can have the following exemplary format: CAVLC-LVL.F1.F2, DST, SRC2, SR1, where when F1 is not set or F2 is set (eg, established), the specified forwarding source Be regarded as input. In the case of the level module 714, the forwarding register F1 corresponds to the level index (level[Idx]) generated by the level module 714, which is incremented in the increment module and input to the multiplexer. 730. Similarly, the forwarding register F2 corresponds to the suffix length (suffixLength), which is generated by the level module 714 and input to the multiplexer.

Client’s Docket No,: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 77 200809689 ★ 器728。多工器730以及多工器728的其他輸入包括執行 單元暫存器輸入(在第7A圖中標示為EU),如下文所描 述。 位準模組714的另一輸入是由位準碼模組712所提供 的位準碼。位準碼模組712以及位準模組714的結合運算 Ϊ ! 解碼可解碼位準值(位準為按比例縮放(scaling)之前的 轉換係數值)。透過具有下列示範格式的指令可致能位準 碼模組712。 ® CAVLC_LC SRC1 ,其中SRC 1 = suffixLength ( 16位元)。當使用轉發暫存 器F1時,指令可表示如下: cavlc_lvl.fi SRC1 ,其中如果設定F1,則轉發SRC1被當成輸入。如第7 A 圖所顯示,當設定F1時(例如Fl = l),位準碼模組712 ^ 獲得轉發SRC1值(例如來自位準模組714的字尾長度) 以作為輸入,否則輸入是從執行單元暫存器所獲得(例如 F1=0) 〇 回到位準模組714,字尾長度輸入可以是由位準模組 714經由多工器728所轉發,或是經由執行單元暫存器透 過多工器728所提供。此外,Idx輸入亦可由位準模組714 經由多工器730所轉發(且由遞增模組來遞增,或是在部 分實施例中,能自動遞增而不需要遞增模組),或是經由Client’s Docket No,: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 77 200809689 ★ 728. The multiplexer 730 and other inputs to the multiplexer 728 include an execution unit register input (labeled EU in Figure 7A), as described below. Another input to the level module 714 is the level code provided by the level code module 712. The combination of the level code module 712 and the level module 714 解码 ! Decodes the decodable level value (the level is the conversion coefficient value before scaling). The level code module 712 can be enabled by an instruction having the following exemplary format. ® CAVLC_LC SRC1 , where SRC 1 = suffixLength (16-bit). When the forwarding register F1 is used, the instruction can be expressed as follows: cavlc_lvl.fi SRC1, where if F1 is set, the forwarding SRC1 is treated as an input. As shown in Figure 7A, when F1 is set (e.g., Fl = l), the level code module 712^ obtains the forwarded SRC1 value (e.g., the suffix length from the level module 714) as an input, otherwise the input is Obtained from the execution unit register (eg, F1=0), returning to the level module 714, the suffix length input may be forwarded by the level module 714 via the multiplexer 728, or via the execution unit register. Provided by multiplexer 728. In addition, the Idx input can also be forwarded by the level module 714 via the multiplexer 730 (and incremented by the incremental module, or in some embodiments, automatically incremented without incrementing the module), or via

Client’s Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 78 200809689 ^執行單元暫存器透過多工器730所提供。再者,位準模組 714亦直接從位準碼模組712接收位準碼輪入。除了至轉 發暫存器的輸出之外,位準模組714亦提供位準索弓j (level[idx])輸出至位準陣列722。 ,如前文所提到,TrailingOnes輸,出至位準〇模組716。 位準〇模組716經由下列指令而致能::Client's Docket No.: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 78 200809689 ^ The execution unit register is provided by the multiplexer 730. Furthermore, the level module 714 also receives the registration of the level code directly from the level code module 712. In addition to the output of the transfer register, the level module 714 also provides a level [jn[idx] output to the level array 722. As mentioned earlier, the TrailingOnes loses the out-of-position module 716. The level register module 716 is enabled via the following instructions:

CAVLCLVLO SRC ,其中 SRC = trailingOnes(coeff—token)。位準 〇 模組 716 的輸出包括位準索引(LeVei[Idx]),其被提供至位準陣列 722。係數值被編碼成為正負號以及大小。位準〇模組71 $ 提供係數的正負號值。結合來自CAVLC位準模組714的 大小值以及來自位準〇模組716的正負號值,並寫入至位 準陣列722。使用位準索引(ievel[Idx])來指定寫入的位 置。在一實施例中,係數是在子區塊(區塊為8χ8)的一 ⑩ 個4χ4矩陣内,而不按照光栅(1^办1^)順序。陣列之後轉 換成4x4矩陣。換句話說,被解碼的係數位準以及運行不 是光柵格式。從位準-運行資料,4x4矩陣可以被重建(但 是以鋸齒形掃描順序),接著重新排列成光柵順序4χ4。 從係數符記模組710輸出的丁(^1(1;〇6€€被提供至零位 準模組718。零位準模組718可經由下列指令而致能: CAVLC_ZL DST5 SRC1 其中,SRC1 = maxNumCoeff( 16 位元)以及 D,ST = ZerosLeftCAVLCLVLO SRC , where SRC = trailingOnes(coeff-token). The output of the level module 716 includes a level index (LeVei[Idx]) that is provided to the level array 722. The coefficient value is encoded as a sign and a size. The register module 71 $ provides the sign value of the coefficient. The size value from the CAVLC level module 714 and the sign value from the level module 716 are combined and written to the level array 722. Use the level index (ievel[Idx]) to specify the location to write. In one embodiment, the coefficients are within a 10 4 χ 4 matrix of sub-blocks (blocks 8 χ 8), not in raster order. The array is then converted to a 4x4 matrix. In other words, the decoded coefficient level and operation are not raster format. From the level-run data, the 4x4 matrix can be reconstructed (but in a zigzag scan order) and then rearranged into a raster order 4χ4. The output from the coefficient register module 710 is provided to the zero level module 718. The zero level module 718 can be enabled via the following commands: CAVLC_ZL DST5 SRC1 where SRC1 = maxNumCoeff (16 bits) and D, ST = ZerosLeft

Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChe] 79 200809689 傳 ,(16位元)。maxNumCoeff係由η·264標準所給定,並被 重送以作為指令的原始值。換句話說,maxNumCoeff是由 軟體所设疋。在部分實施例中,maxNumCoeff可被儲存在 硬體中。變換係數被編碼成(位準,運行)格式,其與被 ,編碼成0之係數(位準)的數目,有關。零位準模組718提 供兩個輸出ZerosLeft以及Reset ( reset = 0 ),其分別被提 供至多工器740以及多工器742。多工器740亦接收來自 運行模組720的轉發暫存器F2。多工器742接收來自運行 模組720之已遞增(在部分實施例中是經由遞增模組或是. 其他方式)的轉發暫存器F1。 運行模組720分別從多工器740以及多工器742接收 ZerosLeft以及Idx輸入並提供運行索引(Run[Idx])輸出 至運行陣列724。如前文所描述,因為運行-長度編碼被用 作進一步壓縮,則係數被編碼成(位準,運行)格式。舉 例來說,假設擁有下列的值1〇 12 12 15 19 1 1 1 0 〇 0 0 0 0 10,則可被編碼成(10,0)(12,1)(15,0)(19,0)(1,2) (0,5) ( 1,0) (〇,〇)。這個碼字通常較短。索引為位準 索引的對應索引。運行模組720可經由下列指令而致能: CAVLC—RUN DST,S2, S1 ,其中,由於ZerosLeft值被更新,DST以及S2可選擇為 相同暫存器。因此,CAVLC—RUN指令的示範不具正負號 值顯示如下: SI = Idx(16-bit),Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChe] 79 200809689 Biography, (16 bit). maxNumCoeff is given by the η·264 standard and is resent as the original value of the instruction. In other words, maxNumCoeff is set by the software. In some embodiments, maxNumCoeff can be stored in hardware. The transform coefficients are encoded into a (level, run) format that is related to the number of coefficients (levels) that are encoded as zero. The zero level module 718 provides two outputs, ZerosLeft and Reset (reset = 0), which are provided to the multiplexer 740 and the multiplexer 742, respectively. The multiplexer 740 also receives the forward register F2 from the run module 720. The multiplexer 742 receives the forwarded register F1 from the run module 720 that has been incremented (in some embodiments via an incremental module or otherwise). The run module 720 receives the ZerosLeft and Idx inputs from the multiplexer 740 and the multiplexer 742, respectively, and provides a run index (Run[Idx]) output to the run array 724. As described earlier, since run-length coding is used for further compression, the coefficients are encoded into a (level, run) format. For example, suppose you have the following values 1〇12 12 15 19 1 1 1 0 〇0 0 0 0 10, then you can encode it into (10,0)(12,1)(15,0)(19,0 )(1,2) (0,5) ( 1,0) (〇,〇). This code word is usually shorter. The index is the corresponding index of the level index. The run module 720 can be enabled via the following instructions: CAVLC - RUN DST, S2, S1, where DST and S2 can be selected to be the same register since the ZerosLeft value is updated. Therefore, the demonstration of the CAVLC-RUN instruction has no sign value as follows: SI = Idx(16-bit),

Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 80 200809689 S2 - ZerosLeft(16-bit) ^ DST = Zerosleft( 16-bit)。 參考第7A圖,轉發暫存器被使用,其中CAVLC_RUN 指令可得到下列格式:Client’s Docket No.: S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen 80 200809689 S2 - ZerosLeft(16-bit) ^ DST = Zerosleft( 16-bit). Referring to Figure 7A, the forward register is used, and the CAVLC_RUN instruction can be obtained in the following format:

I CAVLC.F1.F2: DST,SRC2, SRC1 ,其中,當不是F1就是F2被設定時,則適當的轉發來源 被當成輸入。 , 關於兩暫存器暫列,位準陣列722對應於位準,而運 行陣列724對應於運行。在一實施例中,各陣列包含個 元素。對位準陣列722而言,各元素的大小包括16位元具 正負號的值,而對運行陣列724而言,其值為4位元且不 具正負諕。使用下列指令分別從位準陣列722以及運行陣 列724讀取位準值以及運行値。I CAVLC.F1.F2: DST, SRC2, SRC1, where, when F1 is not set or F2 is set, the appropriate forwarding source is treated as input. With respect to the two register registers, the level array 722 corresponds to the level and the operational array 724 corresponds to the run. In one embodiment, each array contains individual elements. For level array 722, the size of each element includes a 16-bit signed value, while for array 724, the value is 4 bits and is not positive or negative. The level values and run 値 are read from level array 722 and run array 724, respectively, using the following instructions.

READ LRUN ’其中’在-實施例中,DST包括四個128位元連續的新 時暫存器(例如:執行單元暫時或是共用暫存器)。上二 操作讀取可變長度解碼單元53G _位準暫存^及_ 暫存器,鋪存至目標暫存器。#此運行被讀出並儲J 暫時暫存器時,運行值被轉換成i 6位元不具正負號: 舉例來說’前兩個暫存轉持16個16位元的位準 陣列儲存第- 16個係數),而第三以及第四暫存器維^READ LRUN 'where' In the embodiment, the DST includes four 128-bit consecutive new scratchpads (eg, execution unit temporary or shared registers). The second operation reads the variable length decoding unit 53G_bit temporary storage and the _ register, and deposits it to the target register. # When this operation is read and stored in the J temporary register, the running value is converted to i 6 bits without sign: For example, 'the first two temporary storage switches 16 16-bit level array storage - 16 coefficients), while the third and fourth register dimensions ^

Client’s Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChe] 81 200809689 16個16位元的運行值。當超過16個係數時,其被解碼至 記憶體。在一實施例中,以下列順序寫入值:在第一暫存 器中,最低有效16位元包含LEVEL[0]值,而位元16-31 包含LEVEL[1]值等,直到位元112_127包含level[7]值。 接著’對第二暫存器I對而言,最低有效16位元包貪 LEVEL[8]等。相同的方法應用在RUN值。 : 根據下列示範指令格式,可使用CLR—LRUN指令來清 除位準陣列722以及運行陣列724的暫存器。 上述可變長度解碼單元53〇b的軟體(著色程序)以及 硬體操作(例如模組),特別是CAVLC模組582,可使用 下列偽碼來描述。Client’s Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChe] 81 200809689 16 16-bit running values. When there are more than 16 coefficients, it is decoded to the memory. In one embodiment, the values are written in the following order: in the first register, the least significant 16 bits contain the LEVEL[0] value, and the bits 16-31 contain the LEVEL[1] value, etc. until the bit 112_127 contains the level[7] value. Then, for the second register I pair, the least significant 16-bit packet is LEVEL[8] and so on. The same method is applied to the RUN value. The CLR_LRUN instruction can be used to clear the level array 722 and the registers that run the array 724, according to the following exemplary instruction format. The software (shading program) and hardware operations (e.g., modules) of the above variable length decoding unit 53A, particularly the CAVLC module 582, can be described using the following pseudo code.

ResiduaLblock_cavlc( coeffLevel, maxNumCoeff) { ~CLR LEVEL RUN ~~ ~ -- __~_—_ coeff一token if( TotalCoeff( (xeffjoken) > 0) { ___if( TotalCoeff( coeffJoken) > 10 && TraiiingOnes( coeff_token) < 3) __suffixLength = 1ResiduaLblock_cavlc( coeffLevel, maxNumCoeff) { ~CLR LEVEL RUN ~~ ~ -- __~___ coeff-token if( TotalCoeff( (xeffjoken) > 0) { ___if( TotalCoeff( coeffJoken) > 10 && TraiiingOnes ( coeff_token) < 3) __suffixLength = 1

— 日 se ~~ __suffixLength :0 — CAVLC」evelO(); '~ __for( I = Trailing〇nes(coeffJaken); I < TotalCoeff( coeff-token); j++){ CAVLCJevelCode(leveICode,suffixLength); CAVLCJevel(suffixLength, i,levelCode) CAVLC__ZerosLeft(ZerosLeft, maxNumCoeff) for( i = 0; i < TotaiCoeff( coeff_token) -1 ; i++) { CAVLC__run(i, ZerosLeft) READ_LEVEL_RUN(level,run) mn[ TotalCoeff( cceffjoken) -1 ] = zerosLeft coeffNum - -1 _ for( i 二 TotalCoeff( coeff—token) -1; i >= 0; i-) { coeffNum += run[ i ] + 1 _— 日 se ~~ __suffixLength :0 — CAVLC”evelO(); '~ __for( I = Trailing〇nes(coeffJaken); I < TotalCoeff( coeff-token); j++){ CAVLCJevelCode(leveICode,suffixLength); CAVLCJevel( suffixLength, i,levelCode) CAVLC__ZerosLeft(ZerosLeft, maxNumCoeff) for( i = 0; i < TotaiCoeff( coeff_token) -1 ; i++) { CAVLC__run(i, ZerosLeft) READ_LEVEL_RUN(level,run) mn[ TotalCoeff( cceffjoken) - 1 ] = zerosLeft coeffNum - -1 _ for( i Two TotalCoeff( coeff—token) -1; i >= 0; i-) { coeffNum += run[ i ] + 1 _

Client’s Docket No.: S3U06-0013,TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 82 200809689 coefFLevel[ coeffNum ] = level[ i ]Client’s Docket No.: S3U06-0013, TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 82 200809689 coefFLevel[ coeffNum ] = level[ i ]

MPEG解碼 以上已描述用%CABAC解碼(經由CAB AC模組580 的可變長度解碼單元530a)以及CAVLC解碼(經由CAVLC 模組582的可變長度解碼單元530b)的解碼系統200,接 籲 下來將描述解碼系統2〇〇的MPEG實施例,於此稱為可變 長度解碼單元530c。可變長度解碼單元53〇c是根據由 MPEG模組578 (第5C圖所顯示)所執行的運算而操作。 為了簡化,與CABAC以及CAVLC實施例共有的特徵(包 括位元流緩衝器以及對應的指令)被省略,除了下列其他 需要注意的部分。INIT指令設置可變長度解碼單元53〇進 入MPEG模式,以及使用READ、NPSTR、INPTRB (解釋 於前文)以及VLC一MPEG2指令的混合以解碼MPEG-2位元 Φ 流。由著色器程式判斷使用何種方法。MPEG-2位元流具 有全決定文法(fully deterministic grammar ),且著色碼執 行用以解密文法的方法。 在一實施例中,對MPEG-2處理而言,實施表格以霍 夫曼解碼於MatchVLC—X函數,描述於後。因此,兩指令 被載入至MPEG模組578,包括INIT一MPEG2指令以及 VLC—MPEG2指令。INIT_MPEG2指令載入位元流並設定 可變長度解碼單元530進入MPEG2模式。在此模式中,當 第一係數為直流(DC)時,總體暫存器614保持住值。在MPEG Decoding The decoding system 200, which has been decoded with %CABAC (via the variable length decoding unit 530a of the CAB AC module 580) and CAVLC decoded (via the variable length decoding unit 530b of the CAVLC module 582), has been described above. An MPEG embodiment describing a decoding system 2 is referred to herein as a variable length decoding unit 530c. The variable length decoding unit 53A is operated in accordance with an operation performed by the MPEG module 578 (shown in Fig. 5C). For simplification, features common to CABAC and CAVLC embodiments, including bitstream buffers and corresponding instructions, are omitted, except for the following other sections that require attention. The INIT instruction sets the variable length decoding unit 53 to enter the MPEG mode, and uses a mixture of READ, NPSTR, INPTRB (explained above) and VLC-MPEG2 instructions to decode the MPEG-2 bit Φ stream. The color program program determines which method to use. The MPEG-2 bit stream has a fully deterministic grammar, and the shading code performs a method for decrypting the grammar. In one embodiment, for MPEG-2 processing, the implementation table is Huffman decoded in the MatchVLC-X function, as described below. Therefore, the two instructions are loaded into the MPEG module 578, including the INIT-MPEG2 instruction and the VLC-MPEG2 instruction. The INIT_MPEG2 instruction loads the bit stream and sets the variable length decoding unit 530 to enter the MPEG2 mode. In this mode, the overall register 614 holds the value when the first coefficient is direct current (DC). in

Client’s Docket No.: S3U06-0013-TW TTJs Docket No:0608-A41246twf.doc/NikeyChen 83 200809689 MPEG-2中有一或多個串流,其為相同的,但是根據是否 為直流或是交流而有不同的解譯。位元載入至 VLD—globalRegister.InitDC暫存器被使用,而不是創造另 一個指令。注意到對應於總體暫存器614 (例如映射到總 體暫存器614(例如gl〇balregister[0]))的暫存器使用在Client's Docket No.: S3U06-0013-TW TTJs Docket No:0608-A41246twf.doc/NikeyChen 83 200809689 There are one or more streams in MPEG-2, which are the same, but vary depending on whether it is DC or AC. Interpretation. The bit is loaded into the VLD—globalRegister. The InitDC register is used instead of creating another instruction. Note that the scratchpad corresponding to the overall scratchpad 614 (e.g., mapped to the overall scratchpad 614 (e.g., gl〇balregister[0])) is used in

> I CABAC以及:CAVLC模式中,但是因為MPEG2:模式下而 有不同的解譯(以及因此標示不同)。因此,在巨集區塊 的開始,值(VLD一globalRegister.InitDC暫存器内的位元) 被初始化成1。當使用MatchVLC_3函數時,判斷 VLD一globalRegister.InitDC暫存器内的位元是否為1或是 0。如果為1的話,位元被改變成0,以供已知巨集區塊後 來的離散餘弦變換(discrete cosine transform,DCT)符號 進行解碼。由著色器以及内部重置設定上述值。在實體部 分,VLD一globalRegister.InitDC位元為旗標值,其傳送被 解碼的DCT符號是否為已知巨集區塊之DCT符號的開始。 MPEG模組578使用一具有符號之非常特定文法進行 角午碼’其中上述付號是使用限定數量之霍夫曼表格所解 碼。在具有特定符號值的著色器内執行文法的分析,其中 特定符號值是使用具有#Imml6值使用於特定霍夫曼表才久 的VLC—MPEG2指令所得到,其應該被使用以解碼特定符 號。 在描述可變長度解碼單元530c的不同元件之前,用以 實施MPEG-2標準之不同表格的硬體以及軟體結構的簡單 描述如下。在 MPEG-2 標準(ISO-IEC 1381D ( 1995))> I CABAC and: CAVLC mode, but because of the MPEG2: mode, there are different interpretations (and therefore different). Therefore, at the beginning of the macroblock, the value (the bit in the VLD-globalRegister.InitDC register) is initialized to 1. When using the MatchVLC_3 function, it is determined whether the bit in the VLD-globalRegister.InitDC register is 1 or 0. If it is 1, the bit is changed to 0 for decoding by the discrete cosine transform (DCT) symbol after the known macroblock. The above values are set by the color picker and internal reset. In the entity part, the VLD-globalRegister.InitDC bit is a flag value that conveys whether the decoded DCT symbol is the beginning of the DCT symbol of the known macroblock. The MPEG module 578 performs a noon code using a very specific grammar with symbols, where the above-mentioned pay is decoded using a defined number of Huffman tables. The analysis of the grammar is performed in a colorimeter having a particular symbol value, which is obtained using a VLC-MPEG2 instruction having a #Imml6 value for a particular Huffman table, which should be used to decode a particular symbol. A brief description of the hardware and software structures used to implement the different tables of the MPEG-2 standard is described below before describing the different elements of the variable length decoding unit 530c. In the MPEG-2 standard (ISO-IEC 1381D (1995))

Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 84 200809689 中’所使用的編碼被定義在表g 1 標準所提供之已知表格。在可變参2B·15’其為MPEG-2 實施例中,-或多個表碼單^53㈣不同 施,例如合成為邏輯閘。根據實專業硬體型式而實 貝她方式(例如:HDTV、 IiDDVD|等)或是所需之硬體安排,部分 體方式來實施,而是可以使用苴 ’。 ^ ^ ,、他扎令(例如:將描述於 後的EXP-GOLJLJD指令,或是透 上 ^ 疋還過READ指令)來實施。The code used by Client's Docket No.: S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen 84 200809689 is defined in the known table provided by the table g 1 standard. In the variable parameter 2B·15' which is an MPEG-2 embodiment, - or a plurality of table codes are different, for example, synthesized as a logic gate. According to the actual professional hardware type, the method (such as HDTV, IiDDVD|, etc.) or the required hardware arrangement is implemented in a partial manner, but 苴 ’ can be used. ^ ^ , , he is ordered (for example, the EXP-GOLJLJD instruction will be described later, or the READ command will be passed).

舉例來說,雖然表B-2、表B、3以β主Ώ 乂及表B-U的邏輯閘數量 :大’所使用到的加法可能需要韻外的多工器階段,其意 味有關速度以及延遲。在部分實施例中,表β_5至表β_8 不由硬體所支援’因為其不需要支援設定槽。缺而,部分 實施例可透過對效能具有最小影響之不同指令(例如: mPSTR、EXP_G0L_UD以及READ指令)而提供上述支 援。 、纟買參考已知的 MPEG 表格,表 (Macroblock—address一increment)、表 B-i〇( motion一code) 以及表B-9 (coded—block 一 pattern)具有相似的結構。由於 部分相似,上述三個表格可使用由]MPEG模組578執行的 MatchVLC函數而實施以及描述於後。對表以及表B -10 而言,示範的表格結構表示如下: struct Table { unsigned head; //表格位址之位元數 struct table{ unsigned val:6; //表 B-10 中為 5 位元For example, although Table B-2, Table B, and Table 3 have the number of logic gates of the table and the number of logic gates of the table BU: the addition used by the big 'may require a multiplexed multiplexer stage, which means speed and delay. . In some embodiments, the tables β_5 through β_8 are not supported by the hardware' because they do not need to support setting slots. In part, some embodiments may provide such support through different instructions that have minimal impact on performance (e.g., mPSTR, EXP_G0L_UD, and READ instructions). The reference to the known MPEG table, the table (Macroblock-address-increment), the table B-i〇 (motion-code), and the table B-9 (coded-block-pattern) have similar structures. Because of their partial similarity, the above three tables can be implemented using the MatchVLC function performed by the MPEG Module 578 and described later. For the table and Table B -10, the exemplary table structure is represented as follows: struct Table { unsigned head; //The number of bits in the table address struct table{ unsigned val:6; //5 bits in Table B-10 yuan

Client’s Docket No」S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 85 200809689 unsigned shv:2; //實際位元數 }table[]; }Table[]; 對表B-l而言,示範的表格結構表示如下: struct Table { 1 t unsigned head; //表格位址之位元數 struct table{ unsigned val:5; unsigned shv:3; //實際位元數 }table[]; }Table[];Client's Docket No"S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 85 200809689 unsigned shv:2; //The actual number of bits}table[]; }Table[]; For Table Bl, the demonstration The table structure is expressed as follows: struct Table { 1 t unsigned head; //The number of bits in the table address struct table{ unsigned val:5; unsigned shv:3; //the actual number of bits}table[]; }Table[ ];

在下面功能中,只有SHL運算能從SREG暫存器562a 移除資料。不像著色器的READ指令,使用在MatchVLC 函數的READ功能能從SREG暫存器562a移除位元而不 需要從SREG暫存器562b移除任何位元。下面描述使用在 MPEG-2中實施表格之MatchVLC函數以提供作為霍夫恩 解碼。 FUNCTION MatchVLCJ{ T = READ(2); /績取2位元 SHL(2); CASE(T){ 00: OUTPUT ⑴; 01 :〇UTPUT(2); 10:{ 〇=READ⑴; SHL(1); CASE (Q){ 0:〇UTPUT(0);In the following functions, only the SHL operation can remove data from the SREG register 562a. Unlike the shader's READ instruction, the READ function in the MatchVLC function can be used to remove a bit from the SREG register 562a without removing any bits from the SREG register 562b. The following describes the use of the MatchVLC function that implements the table in MPEG-2 to provide as Hofn decoding. FUNCTION MatchVLCJ{ T = READ(2); / score 2 bits SHL(2); CASE(T){ 00: OUTPUT (1); 01 :〇UTPUT(2); 10:{ 〇=READ(1); SHL(1) ; CASE (Q) { 0: 〇UTPUT(0);

Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 86 200809689 1 : OUTPUT(3); 11 :{ ldx=CLO(sREG);//i谓引導 1 ldx=min(ldx,7); shv = (Idx != 7) ldx+1 : Idx; SHL(shv); 〇UTPUT(4+ldx); FUNCTION MatchVLC2{ T = READ(2); //1 酿 2 位元Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 86 200809689 1 : OUTPUT(3); 11 :{ ldx=CLO(sREG);//i means boot 1 ldx=min( Ldx,7); shv = (Idx != 7) ldx+1 : Idx; SHL(shv); 〇UTPUT(4+ldx); FUNCTION MatchVLC2{ T = READ(2); //1 brew 2 bits

SHL(2); CASE (T){ 00: OUTPUT(O); 01 :〇UTPUT(1); 10:〇UTPUT(2); 11 :{SHL(2); CASE (T){ 00: OUTPUT(O); 01 :〇UTPUT(1); 10:〇UTPUT(2); 11 :{

Idx = CLO(sREG);//|f算引導 1Idx = CLO(sREG);//|f count guide 1

Idx = min(ldx,8); shv = (idx != 8) ldx+1 : Idx; SHL(shv); 〇UTPUT(3+ldx);Idx = min(ldx,8); shv = (idx != 8) ldx+1 : Idx; SHL(shv); 〇UTPUT(3+ldx);

FUNCTION MatchVLC_3{ INIT_MB DC = TRUE; T = CLZ(sREG); SHL(T+1); CASE (T){ 0: IF (DC){ DC = FALSE; Q = READ(1); SHL(l); 〇UTPUT({0,SGN(Q)*1});} ELSE{ Q = READ ⑴;FUNCTION MatchVLC_3{ INIT_MB DC = TRUE; T = CLZ(sREG); SHL(T+1); CASE (T){ 0: IF (DC){ DC = FALSE; Q = READ(1); SHL(l); 〇UTPUT({0,SGN(Q)*1});} ELSE{ Q = READ (1);

IF (!Q) {OUTPUT({63,0}); shv=1} // EOB ELSE {R=READ(1); 〇UTPUT({0,SGN(R)*1}); shv=2}IF (!Q) {OUTPUT({63,0}); shv=1} // EOB ELSE {R=READ(1); 〇UTPUT({0,SGN(R)*1}); shv=2}

Client’s Docket No,: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 1:{200809689 SHL(shv); } Q = READ ⑶; CASE (Q){ 1XX: OUTPUT({1, SGN(Q[1])*1}); shv = 2; 01X:〇UTPUT({2, SGN(Q[0])*1}); shv = 3; 00X:〇UTPUT({0, SGN(Q[0])*2}); shv = 3; } , SHL(shv); Q = READ(2); SHL(2); CASE(Q){Client's Docket No,: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 1:{200809689 SHL(shv); } Q = READ (3); CASE (Q){ 1XX: OUTPUT({1, SGN( Q[1])*1}); shv = 2; 01X: 〇UTPUT({2, SGN(Q[0])*1}); shv = 3; 00X: 〇UTPUT({0, SGN(Q[ 0])*2}); shv = 3; } , SHL(shv); Q = READ(2); SHL(2); CASE(Q){

00:{ R=READ(4); CASE (R){ OOOX: 0UTPUT({16, SGN(R[0])*1}); 001X: 0UTPUT({5, SGN(R[Q]r2}); 01 OX: OUTPUT({0, SGN(R[0])*7}); 011X: 0UTPUT({2, SGN(R[0])*3}); 100X: 0UTPUT({1, SGN(R[0])*4}); 101X: 0UTPUT({15, SGN(R[0])*1}); 110X:〇UTPUT({14, SGN(R[0])*1}); 111X: 0UTPUT({4, SGN(R[0])*2}); }00:{ R=READ(4); CASE (R){ OOOX: 0UTPUT({16, SGN(R[0])*1}); 001X: 0UTPUT({5, SGN(R[Q]r2}) ; 01 OX: OUTPUT({0, SGN(R[0])*7}); 011X: 0UTPUT({2, SGN(R[0])*3}); 100X: 0UTPUT({1, SGN(R [0]) *4}); 101X: 0UTPUT({15, SGN(R[0])*1}); 110X: 〇UTPUT({14, SGN(R[0])*1}); 111X: 0UTPUT({4, SGN(R[0])*2}); }

Shv = 4; }Shv = 4; }

01X: SGN = READ(l); OUTPUT({0, SGN*3}); shv= 1; 10X: SGN = READ(1); 0UTPUT({4, SGN*1}); shv= 1; 11X: SGN = READ(1); 0UTPUT({3, SGN*1}); shv = 1; } SHL(shv); } 3:{ Q = READ(3); CASE(Q){ OOX: 0UTPUT({7, SGN(Q[0])*1}); 01X:〇UTPUT({6, SGN(Q[0])*1}); 10X:〇UTPUT({1, SGN(Q[0])*2}); 11X:〇UTPUT({5, SGN(Q[〇F1}); } SHL(3);01X: SGN = READ(l); OUTPUT({0, SGN*3}); shv= 1; 10X: SGN = READ(1); 0UTPUT({4, SGN*1}); shv= 1; 11X: SGN = READ(1); 0UTPUT({3, SGN*1}); shv = 1; } SHL(shv); } 3:{ Q = READ(3); CASE(Q){ OOX: 0UTPUT({7 , SGN(Q[0])*1}); 01X: 〇UTPUT({6, SGN(Q[0])*1}); 10X: 〇UTPUT({1, SGN(Q[0])*2 }); 11X: 〇UTPUT({5, SGN(Q[〇F1}); } SHL(3);

Clienfs Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 88 4:{200809689 Q = READ(3); CASE (Q){ 00X:〇UTPUT({2, SGN(Q[0])*2}); 01X:〇UTPUT({9, SGN(Q[0])*1}); 10X:〇UTPUT({0, SGN(Q[0])*4}); 11X:〇UTPUT({8, SGN(QP)*1}); } , SHL(3); } 5: Q = READ(19); OUTPUT({Q[18:13], Q[12:0]}); 6:{ Q = READ(4);Clienfs Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 88 4:{200809689 Q = READ(3); CASE (Q){ 00X:〇UTPUT({2, SGN(Q[ 0])*2}); 01X: 〇UTPUT({9, SGN(Q[0])*1}); 10X: 〇UTPUT({0, SGN(Q[0])*4}); 11X: 〇UTPUT({8, SGN(QP)*1}); } , SHL(3); } 5: Q = READ(19); OUTPUT({Q[18:13], Q[12:0]}) ; 6:{ Q = READ(4);

CASE (Q){ OOOX:〇UTPUT({16, SGN(Q[0])*1}); 001X:〇UTPUT({5, SGN(Q[0])*2}); 01ΌΧ:〇UTPUT({0, SGN(Q[0])*7}); 011X:〇UTPUT({2, SGN(Q[0])*3}); 100X:〇UTPUT({1,SGN(Q[0])*4}); 101X:〇UTPUT({15, SGN(Q[0]ri}); 110X: 0UTPUT({14, SGN(QP)*1}); 111X: 0UTPUT({4, SGN(Q[0])*2}); } SHL(4); } 7,8,9,10,11:JVLC(TableC[T]);CASE (Q){ OOOX:〇UTPUT({16, SGN(Q[0])*1}); 001X:〇UTPUT({5, SGN(Q[0])*2}); 01ΌΧ:〇UTPUT( {0, SGN(Q[0])*7}); 011X: 〇UTPUT({2, SGN(Q[0])*3}); 100X: 〇UTPUT({1,SGN(Q[0]) *4}); 101X: 〇UTPUT({15, SGN(Q[0]ri}); 110X: 0UTPUT({14, SGN(QP)*1}); 111X: 0UTPUT({4, SGN(Q[ 0])*2}); } SHL(4); } 7,8,9,10,11:JVLC(TableC[T]);

FUNCTION MatchVLC_4{ T = CLZ(sREG); SHL(T+1); CASE (T){ 〇:{ Q = CL〇(sREG); R = min(Q,7); shv=(R!= 7)R+1 :R; SHL(shv); CASE (R){ 0: S = READ(1); OUTPUT({0, SGN(S)*1}); shv=1; 1 : S = READ(1); OUTPUT({0, SGN(S)*2}); shv=1; 2:{FUNCTION MatchVLC_4{ T = CLZ(sREG); SHL(T+1); CASE (T){ 〇:{ Q = CL〇(sREG); R = min(Q,7); shv=(R!= 7) R+1 :R; SHL(shv); CASE (R){ 0: S = READ(1); OUTPUT({0, SGN(S)*1}); shv=1; 1 : S = READ(1 ); OUTPUT({0, SGN(S)*2}); shv=1; 2:{

Clienf s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 89 200809689 R=READ(2); SHL(2); CASE (R){ OX:〇UTPUT({0, SGN(R[0])*4}); 1X: OUTPUT({0, SGN(R[0])*5}); 3:{ R=READ(3); SHL(3); CASE (R){ OOX: 0UTPUT({9, SGN(R[0]ri}); 01X: 〇υΐΡϋΤ({1, SGN(R[0])*3}); 10X:〇UTPUT({10, SGN(R[0])*1}); 11X: OUTPUT({0, SGN(R[0])*8});Clienf s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 89 200809689 R=READ(2); SHL(2); CASE (R){ OX:〇UTPUT({0, SGN (R[0])*4}); 1X: OUTPUT({0, SGN(R[0])*5}); 3:{ R=READ(3); SHL(3); CASE (R){ OOX: 0UTPUT({9, SGN(R[0]ri}); 01X: 〇υΐΡϋΤ({1, SGN(R[0])*3}); 10X: 〇UTPUT({10, SGN(R[0 ])*1}); 11X: OUTPUT({0, SGN(R[0])*8});

4:{ R= READ ⑶; CASE (R){ OXX: OUTPUT({0, SGN(R[0])*9}); shv=2; 10X: OUTPUT({0, SGN(R[0])*12}); shv = 3; 11X: OUTPUT({0, SGN(R[0])*13}); shv = 3; } SHL(shv); } 5 ::{ R=READ(2); SHL(2); CASE (R){ OX: 0UTPUT({2, SGN(R[0])*3});4:{ R= READ (3); CASE (R){ OXX: OUTPUT({0, SGN(R[0])*9}); shv=2; 10X: OUTPUT({0, SGN(R[0]) *12}); shv = 3; 11X: OUTPUT({0, SGN(R[0])*13}); shv = 3; } SHL(shv); } 5 ::{ R=READ(2); SHL(2); CASE (R){ OX: 0UTPUT({2, SGN(R[0])*3});

1X: 0UTPUT({4, SGN(R[0])*2}); 6 : S = READ(1); OUTPUT({0, SGN(S)*14}); shv=1; 7: S = READ(1); 〇UTPUT({0, SGN(S)*15}); shv=1; } SHL(shv); Q = READ(2); SHL(2); CASE(Q){ OX: 0UTPUT({1, SGN(Q[0])*1}); 10:OUTPUT({63,0}); //<EOB> 11 : R = READ(1); SHL(1); 〇UTPUT(0,SGN(R)*3});1X: 0UTPUT({4, SGN(R[0])*2}); 6 : S = READ(1); OUTPUT({0, SGN(S)*14}); shv=1; 7: S = READ(1); 〇UTPUT({0, SGN(S)*15}); shv=1; } SHL(shv); Q = READ(2); SHL(2); CASE(Q){ OX: 0UTPUT ({1, SGN(Q[0])*1}); 10: OUTPUT({63,0}); //<EOB> 11 : R = READ(1); SHL(1); 〇UTPUT( 0, SGN(R)*3});

Clients Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 90 200809689 Q = READ(2); SHL(2); CASE(Q){ 00:{ R = READ(4); shv = 4;Clients Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 90 200809689 Q = READ(2); SHL(2); CASE(Q){ 00:{ R = READ(4); Shv = 4;

CASE(RX 000X:〇UTPUT({1, SGN(R[0])*5}); Op1X: 0UTPUT({11, SGN(R[0])*1}) 010X: OUTPUT({0, SGN(R[0]ri 1}) 0Ί1Χ: OUTPUT({0, SGN(R[0]ri0}) 100X: 0UTPUT({13, SGN(R[0])*1}) 101X: OUTPUT({12, SGN(R[0])*1}) 110X:〇UTPUT({3, SGN(R[0])*2});CASE(RX 000X: 〇UTPUT({1, SGN(R[0])*5}); Op1X: 0UTPUT({11, SGN(R[0])*1}) 010X: OUTPUT({0, SGN( R[0]ri 1}) 0Ί1Χ: OUTPUT({0, SGN(R[0]ri0}) 100X: 0UTPUT({13, SGN(R[0])*1}) 101X: OUTPUT({12, SGN (R[0])*1}) 110X: 〇UTPUT({3, SGN(R[0])*2});

111X:〇UTPUT({1, SGN(R[0])*4}); 01 : R = READ(1); 〇UTPUT({2,SGN(R)*1}); shv=1; 10: R = READ(1);〇UTPUT({1,SGN(R)*2}); shv=1; 11 : R = READ(1); 〇UTPUT({3,SGN(R)*1}); shv=1; } SHL(shv); } 3:{ Q = READ{3);SHL(3); CASE (Q){ OOX:〇UTPUT({0, SGN(Q_*7}); 01X:〇UTPUT({0, SGN(Q[0])*6});111X: 〇UTPUT({1, SGN(R[0])*4}); 01 : R = READ(1); 〇UTPUT({2,SGN(R)*1}); shv=1; 10: R = READ(1); 〇UTPUT({1,SGN(R)*2}); shv=1; 11 : R = READ(1); 〇UTPUT({3,SGN(R)*1}); Shv=1; } SHL(shv); } 3:{ Q = READ{3); SHL(3); CASE (Q){ OOX:〇UTPUT({0, SGN(Q_*7}); 01X:〇 UTPUT({0, SGN(Q[0])*6});

10X:〇UTPUT({4, SGN(Q[0])*1}); 1ΊΧ:〇UTPUT({5, SGN(Q[0])*1}); 4:{ Q = READ(3); SHL(3); CASE (Q){ OOX: 0UTPUT({7, SGN(Q[0])*1}) 01X: 0UTPUT({8, SGN(Q[0])*1}) 10X:〇UTPUT({6, SGN(Q[0])*1}) 11X:〇UTPUT({2, SGN(Q[〇])*2}) 5: Q = READ(19); 〇UTPUT({Q[18:13], Q[12:0]}); 6:{10X: 〇UTPUT({4, SGN(Q[0])*1}); 1ΊΧ:〇UTPUT({5, SGN(Q[0])*1}); 4:{ Q = READ(3); SHL(3); CASE (Q){ OOX: 0UTPUT({7, SGN(Q[0])*1}) 01X: 0UTPUT({8, SGN(Q[0])*1}) 10X: 〇UTPUT ({6, SGN(Q[0])*1}) 11X: 〇UTPUT({2, SGN(Q[〇])*2}) 5: Q = READ(19); 〇UTPUT({Q[18 :13], Q[12:0]}); 6:{

Clienfs Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 91 200809689 Q = READ(2); SHL(2); CASE (Q){ 00: R = READ(1); OUTPUT({5, SGN(R)*2}); shv=1; 01 : R = READ(1); 〇UTPUT({14, SGN(R)*1}); shv=1; 10:{ R=READ(2);shv = 2; CASE (R){ OX: 0UTPUT({2, SGN(R[0])*4}); , 1X:〇UTPUT({16, SGN(R[0])*1}); , .} . 11 : R = READ(1);〇UTPUT({15, SGN(R)*1}); shv=1; } SHL(shv);Clienfs Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 91 200809689 Q = READ(2); SHL(2); CASE (Q){ 00: R = READ(1); OUTPUT ({5, SGN(R)*2}); shv=1; 01 : R = READ(1); 〇UTPUT({14, SGN(R)*1}); shv=1; 10:{ R= READ(2);shv = 2; CASE (R){ OX: 0UTPUT({2, SGN(R[0])*4}); , 1X:〇UTPUT({16, SGN(R[0])* 1}); , .} . 11 : R = READ(1); 〇UTPUT({15, SGN(R)*1}); shv=1; } SHL(shv);

} 7,8, 9,10,11:JVLC(TableC|T|); } } 從上面MatchVLC函數注意到,通常已解碼之最低有 效位元會決定值的正負號,如此可使用SGN功能來檢查, 其描述如下: FUNCTION SGN(R){ RETURN (R == 1)? -1 : 1;} 更注意到對MatchVLC_3以及MatchVLC_4而言,表格為 共同的(或是至少為一超集),因此可使用下面表格來存 取功能。 FUNCTION JVLC(Table){ Q - READ(5); SHL(5); {R,L} = Table[Q]; RETURN {R,L}; } 到MatchVLC的介面,或者應該說MatchVLC_X (其} 7,8, 9,10,11:JVLC(TableC|T|); } } It is noted from the MatchVLC function above that the least significant bit that is usually decoded determines the sign of the value, so you can use the SGN function to check , which is described as follows: FUNCTION SGN(R){ RETURN (R == 1)? -1 : 1;} Note that for MatchVLC_3 and MatchVLC_4, the tables are common (or at least a superset), so Use the form below to access features. FUNCTION JVLC(Table){ Q - READ(5); SHL(5); {R,L} = Table[Q]; RETURN {R,L}; } to the interface of MatchVLC, or should say MatchVLC_X (its

Client’s Docket No,: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 92 200809689 . 中X等於1、2等)函數為下列指令: VLC_MPEG2 DST,#Imml6 ,其中,使用#Imm 16值以選擇適當的表格,且因此以解碼 特定語法成分。使用#Imml6作為表格的索引(例如:〇、1、 2、3 )而從指令存取表格。#Imml 6的值以及對應方法、語Client's Docket No,: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 92 200809689 . The X is equal to 1, 2, etc.) The function is the following command: VLC_MPEG2 DST, #Imml6 , where #Imm 16 value is used To select the appropriate table, and thus to decode a particular grammatical component. Use #Imml6 as the index of the table (for example: 〇, 1, 2, 3) to access the table from the instruction. #Imml 6 value and corresponding method, language

I I 法成分以及MPEG-2表格的關係描述於下面表五。 #lmm16 方法 MPEG-2 VLC 表 0 MatchVLC(B-1,7) Macroblock一addressjncrement B-1 1 MatchVLC(B-9,8) Coded_block_pattem B-9 2 MatchVLC(B-10,6) Motion一code B-10 3 MatchVLC一 1 Dct—dc—size—luminance B-12 4 MatchVLC一2 Dct_dC-Size一chrominance B-13 5 MatchVLC一3 DCT coefficients (Table 0) B-14 6 MatchVLCj4 DCT coefficients (Table 1) B-15 表五 EXP-GOLOMB 解碼 已描述用作CABAC解碼(經由CABAC模組580的 可k長度解瑪單元530a)、CAVLC解碼(經由CAVLC模 組582的可變長度解碼單元53〇b)以及MpEG解碼(經由 MPEG模組578的可變長度解碼單元53〇c)的解碼系統 200,接下來將描述解碼系統2〇〇的Exp_G〇1〇mb實施例, 於此稱為可變長度解碼單元53〇d。可變長度解碼單元兄⑽ 根據EXP-G〇l〇mbtfe584 (g 5C目所顯示)❺運算而操 作。可變長度解碼單元530d使用如CABAC及實 施例所使用的相同硬體以及相同位元流緩衝器排列。因The relationship between the I I component and the MPEG-2 table is described in Table 5 below. #lmm16方法MPEG-2 VLC Table 0 MatchVLC(B-1,7) Macroblock-addressjncrement B-1 1 MatchVLC(B-9,8) Coded_block_pattem B-9 2 MatchVLC(B-10,6) Motion-code B- 10 3 MatchVLC-1 Dct-dc-size-luminance B-12 4 MatchVLC-2 Dct_dC-Size-chrominance B-13 5 MatchVLC-3 DCT coefficients (Table 0) B-14 6 MatchVLCj4 DCT coefficients (Table 1) B- 15 Table 5 EXP-GOLOMB decoding has been described for use as CABAC decoding (via k-length decoding unit 530a via CABAC module 580), CAVLC decoding (variable length decoding unit 53A via CAVLC module 582), and MpEG decoding The decoding system 200 (via the variable length decoding unit 53Ac of the MPEG module 578), next, will describe the Exp_G〇1〇mb embodiment of the decoding system 2, which is referred to herein as the variable length decoding unit 53. d. The variable length decoding unit (10) operates according to the - operation of the EXP-G〇l〇mbtfe584 (g 5C). Variable length decoding unit 530d uses the same hardware as used by CABAC and the embodiment and the same bit stream buffer arrangement. because

Client’s Docket No·: S3U06-0013-丁W TT’s Docket No:0608-A41246twf.doc/Kikey〇ieii 93 200809689 此,與CABAC以及CAVLC實施例共有的特徵被省略,除 了下列需要注意的部分。在描述可變長度解碼單元530d之 前,先提出有關EXP-Golomb的簡單描述。 在EXP-Golomb中,資料包含字首(prefix)以及字尾 (suffix )格式,顯示如下:Client's Docket No:: S3U06-0013-Ding W TT's Docket No: 0608-A41246twf.doc/Kikey〇ieii 93 200809689 Thus, features common to the CABAC and CAVLC embodiments are omitted, except for the following points requiring attention. Before describing the variable length decoding unit 530d, a brief description about EXP-Golomb is first proposed. In EXP-Golomb, the data contains the prefix and suffix formats, as shown below:

I codeNum 範圍 1 0 0 1 X〇 1-2 0 0 1 X1 X〇 3-6 0 0 0 1 X2 X1 x〇 7-14 0 0 0 0 1 X3 X2 Xi x〇 · 15-30 0 0 0 0 1 X4 Xs x2 Xi X〇 31-62I codeNum Range 1 0 0 1 X〇1-2 0 0 1 X1 X〇3-6 0 0 0 1 X2 X1 x〇7-14 0 0 0 0 1 X3 X2 Xi x〇· 15-30 0 0 0 0 1 X4 Xs x2 Xi X〇31-62

因為多數的碼字較短,有壓縮被獲得。再者,多數的 碼字為唯一並且容易解碼。在H.264中,有四種 EXP-Golomb編碼方法使用:不具正負號一元(Unary)、 正負號以及映射(碼字被映射至表格)。這些方法用以編 碼已編碼之巨集區塊圖型以及截短(truncate )。在可變長 度解碼單元530d中,提供單一指令以執行如下面表六所顯 示不同型式之EXP-Golomb碼的解碼。截短EXP-Golomb 解碼描述如下。 codeNum = EXP_G〇L〇MBJJD t = CLZ SHL(t+1) vaI = READ(t) "val不具正負號 codeNum = 2 -1 + val codeNum = EXP_GOLOMB_CD(kOrder) IZ := CountLeadingZero(sREG); sREG := {(sREG «(lz+1)),bitStreamBuffer[0:lz]};Since most of the code words are shorter, compression is obtained. Furthermore, most codewords are unique and easy to decode. In H.264, there are four EXP-Golomb encoding methods used: no sign, unary, sign, and map (codewords are mapped to tables). These methods are used to encode the encoded macroblock pattern and truncation. In the variable length decoding unit 530d, a single instruction is provided to perform decoding of the EXP-Golomb code of a different type as shown in Table 6 below. The truncated EXP-Golomb decoding is described below. codeNum = EXP_G〇L〇MBJJD t = CLZ SHL(t+1) vaI = READ(t) "val is not signed codeNum = 2 -1 + val codeNum = EXP_GOLOMB_CD(kOrder) IZ := CountLeadingZero(sREG); sREG := {(sREG «(lz+1)), bitStreamBuffer[0:lz]};

Clients Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 200809689 J := lz+kOrder-1; val := (J >= 0)? ZeroExtend(sREG[0: J]): 0; sREG := {(sREG «(lz+1)),bitStreamBuffer[0:lz]}; codeNum := (1 «(iz + kOrder)) + (OxFFFFFFF « kOrder) + val; Seval = EXP一GOLOMB一SD k=EXP__G〇L〇MB—UD (―1)_仏7[鲁] Seval= 1 cbp = EXP_GOLOMB__MD(Type) k = EXP GOLOMB UD • — — :cbp = TableCBP[Type][k] 表六 進一步解釋這些指令,EXP_GOLOMB_UD指令解碼一 元編碼之編碼符號。EXP_GOLOMB—SD指令解碼具正負號 之一元編碼的編碼符號。如表六所顯示,對 EXP—GOLOMB—SD指令而言,當k = 0時,在正〇以及負 〇之間沒有差別,因此傳回的值為〇。E XP_GOLOMB_MD (SRC1)指令解碼映射編碼符號,其中SRCl = Type,其 與巨集區塊參數以及coded_block_pattern有關。Type的值 會導致下列 coded—blockjparameter :Clients Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 200809689 J := lz+kOrder-1; val := (J >= 0)? ZeroExtend(sREG[0: J] ): 0; sREG := {(sREG «(lz+1)), bitStreamBuffer[0:lz]}; codeNum := (1 «(iz + kOrder)) + (OxFFFFFFF « kOrder) + val; Seval = EXP A GOLOMB-SD k=EXP__G〇L〇MB-UD (―1)_仏7[Lu] Seval= 1 cbp = EXP_GOLOMB__MD(Type) k = EXP GOLOMB UD • — — :cbp = TableCBP[Type][k] Table 6 further explains these instructions, and the EXP_GOLOMB_UD instruction decodes the unary encoded code symbols. The EXP_GOLOMB-SD instruction decodes a coded symbol with a signed one-element code. As shown in Table 6, for the EXP-GOLOMB-SD instruction, when k = 0, there is no difference between positive and negative, so the value returned is 〇. The E XP_GOLOMB_MD (SRC1) instruction decodes the mapped coded symbols, where SRCl = Type, which is related to the macroblock parameters and the coded_block_pattern. The value of Type will result in the following coded_blockjparameter :

Type 二 0 今 Intra 4 X 4 Type = 1 ^ Inter 可使用表格(例如:晶片上記憶體或是遠端記憶體内的表 格)以根據巨集區塊預測核式(例如:碼數量、k )而指定 值給 coded—Mock—parameter 〇 解碼截短Exp-Golomb符號的EXP-Golomb指令更描述 如下: EXP_GOLOMBJTD DST,SRC1 ,其中,SRC1為範圍。至少在一實施例中,執行截短Type 2 Today Intra 4 X 4 Type = 1 ^ Inter Use a table (for example: on-wafer memory or a table in the remote memory) to predict the kernel according to the macroblock (eg number of codes, k) The EXP-Golomb instruction that specifies the value to coded_Mock_parameter 〇 decodes the truncated Exp-Golomb symbol is described as follows: EXP_GOLOMBJTD DST, SRC1, where SRC1 is the range. In at least one embodiment, performing truncation

Client’s Docket No.: S3U06-0013-TW TT,s Docket No:0608-A41246twf.doc/NikeyClien 95 200809689 "ExP_Golomb編碼時,需要先知道範圍。接著,截短 Exp-Golomb編碼可被推導如下: codeNum - EXP^GOLOMB^TD(range){ else if(range==l) return READ(1)A1; e^se return EXP GOLOMB UE· ? 一 一 5 } : 因此,EXP—GOLOMB—D指令被提供。 解釋運算碼以及驅動-發出軟體指令之間的差異是有 ❿ 用的。通常,當設計ISA時,至少有兩個影響在工作上·· (1 )讓指令解碼器較簡單以及在單一管線階段中完成(即 快速),以及(2 )讓程式設計師助記 (mnemonics )車交簡 單。參考五種EXP-Golomb基準的運算,從使用者的觀點 來看這些運异為有區別的。再者,有兩種不同格式:全部 EXP-Golomb基準的運算輸出相同值,但是只有部分運算具 有一輸入(除了内含在運算中的位元流),其提供至少— 馨 基本區別。傳統上,CPU指令不具有隱含輸入,但是卻透 過運算包括隱含輸入。然而,位元流不經由運算而揭露, 但是卻是内部自動管理以及使用INIT指令進行初始。 從硬體的觀點,可使用EXP-GOLOMB-UD的相同硬體 硬體的相同核心(或是至少)以及有關核心硬體的小加法 來執行全部的其他EXP-GOLOMB-UD運算(例如在軟體内 相似於CASE/SWITCH的部分)。因此編譯器/翻譯器可映 射全部的運算至單一指令。再者,這些運算為固定(例如Client’s Docket No.: S3U06-0013-TW TT,s Docket No:0608-A41246twf.doc/NikeyClien 95 200809689 "ExP_Golomb encoding, you need to know the scope first. Next, the truncated Exp-Golomb encoding can be derived as follows: codeNum - EXP^GOLOMB^TD(range){ else if(range==l) return READ(1)A1; e^se return EXP GOLOMB UE· ? 5 } : Therefore, the EXP-GOLOMB-D instruction is provided. It is useful to interpret the difference between the opcode and the driver-issued software instructions. Usually, when designing ISA, there are at least two effects at work... (1) making the instruction decoder simpler and completing in a single pipeline phase (ie fast), and (2) letting the programmer help (mnemonics) The car is simple. Referring to the calculations of the five EXP-Golomb benchmarks, these differences are different from the user's point of view. Furthermore, there are two different formats: all of the EXP-Golomb benchmarks output the same value, but only some of the operations have an input (except for the bitstream contained in the operation), which provides at least a fundamental difference. Traditionally, CPU instructions do not have implicit input, but pass through operations including implicit input. However, the bit stream is not exposed through computation, but is internally managed automatically and initialized using the INIT instruction. From a hardware point of view, all other EXP-GOLOMB-UD operations can be performed using the same core of the same hardware hardware of the EXP-GOLOMB-UD (or at least) and small additions to the core hardware (eg in software) It is similar to the part of CASE/SWITCH). So the compiler/translator can map all operations to a single instruction. Again, these operations are fixed (for example

I 運算不會動態改變)。參考下面表七的pseudonym行,注I operation does not change dynamically). Refer to the pseudonym line in Table 7 below, note

Clienfs Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 96 200809689 -意到對 ΕΧΡ-GOLOMB-UD 以及 ΕΧΡ-GOLOMB-SD 運算, SRC1可以被加入(或是由核心所忽略),具有機制用以區 別這些運算。同樣地,注意到沒有單一來源指令分組存在, 但是可被映射至暫存器-立即分組。藉由使用如表七所顯示 不同指令的明顯立即數目,可以得到這些指令之間的區 別,因此導致只有一:個主要/次要運算碼而不是五個,其包 括一個有意義的儲存。即只有一個次要運算碼被使用因為 可使用立即格式指令,以及藉由編碼帶有適當資料的立即 鲁 資料欄位並指定Pseudonym可完成不同EXP_Golomb指令 之間的區別。 EXP—GOLOMB—D Dst? #Type? Srcl.lane ,其中經由下列表七可決定#Type : #Type Pseudonym 齢 0x0 EXP一GOLOMB—UD Dst EGOLD Dst, 0x0, Src1 0x1 EXP一G〇L〇MB_SD Dst EGOLD Dst, 0x1, Src1 0x2 EXP_GOLOMB_TD Dst,Src1 EGOLD Dst, 0x2, Src1 0x3 EXP_G〇L〇MB_MD Dst, Srd EGOLD Dst, 0x3, Src1 0x4 EXP一GOLOMB一CD Dst, Srd EGOLD Dst, 0x4, Src1 表七 進一步解釋表七,對#type=〇X〇或是#typeK)xl而言, 沒有Srcl欄位是需要的,以及不需要指定這些指令至另一 主要或是次要運异碼群組’因為可指定虛擬(dummy ) Src 或是Src以及Dst可被標示為相同。 EXP-Golomb編碼符號被編碼成如下圖所顯示(例如包 括0或是多個引導〇、跟隨著1,以及然後是對應於引導〇 之數量的一些位元):Clienfs Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 96 200809689 - Intention to ΕΧΡ-GOLOMB-UD and ΕΧΡ-GOLOMB-SD operation, SRC1 can be added (or by the core Ignore), with mechanisms to distinguish these operations. Again, note that no single source instruction packet exists, but can be mapped to a scratchpad-immediate packet. By using a distinct immediate number of different instructions as shown in Table 7, the distinction between these instructions can be obtained, thus resulting in only one primary/secondary opcode instead of five, including a meaningful store. That is, only one secondary opcode is used because the immediate format instruction can be used, and the difference between the different EXP_Golomb instructions can be accomplished by encoding the immediate Lu data field with the appropriate material and specifying Pseudonym. EXP_GOLOMB-D Dst? #Type? Srcl.lane, which can be determined via the following list #Type: #Type Pseudonym 齢0x0 EXP-GOLOMB-UD Dst EGOLD Dst, 0x0, Src1 0x1 EXP-G〇L〇MB_SD Dst EGOLD Dst, 0x1, Src1 0x2 EXP_GOLOMB_TD Dst, Src1 EGOLD Dst, 0x2, Src1 0x3 EXP_G〇L〇MB_MD Dst, Srd EGOLD Dst, 0x3, Src1 0x4 EXP-GOLOMB-CD Dst, Srd EGOLD Dst, 0x4, Src1 Table 7 Further Explain Table 7. For #type=〇X〇 or #typeK)xl, no Srcl field is needed, and there is no need to specify these instructions to another primary or secondary transport code group. Specifying dummy Src or Src and Dst can be marked as the same. The EXP-Golomb code symbols are encoded as shown in the following figure (for example including 0 or more bootstraps, followed by 1, and then some bits corresponding to the number of bootstraps):

Ciienfs Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 97 200809689 , codeNum 範圍 1 0 0 1 X〇 1-2 0 0 1 X1 X〇 3-6 0 0 0 1 X2 X1 x〇 7-14 0 0 Ϊ 0 0 1 X3 X2 Xi x〇 15-30 0 0 0 0 1 X4 Xs X2 Xi x〇 31-62 這些位元如何被解釋是根據特定Golomb型式而定(這裡 是根據H.264的三種型式以及AVS的第四型式)。使用 UD以及SD (不具正負號以及正負號)計算邏輯單元來計 算值。例如,當位元流為0001010時,則UD的值為 (1<<3)-1+2 = 9,而 SD 的值為(-l)A10*ceil(9/2) = +5。CD 也發生相似的程序。然而,對MD而言,表單查找被執行 (例如當UD編碼時,對值作解碼,接著使用此值做為索 引進入表格,傳回6位元的值(在表格中储存成6位元的 值,但是傳回值是從0延伸至暫存器的寬度))。在一實 施例中有兩表格,——表格為Intra編碼而另一表格為Inter 編瑪。 上述指令轉換如何被使用在EXP-Golomb解碼之内容 中的例子,可藉由H· 264片段標頭部分解碼之示範偽碼顯 示如下。 sliceHeaderDecode: EXP_G〇L〇MBJJD firstMBSIice EXP_G〇L〇MB__UD siiceType EXP_G〇L〇MB_UD picParameterSetlD READ frameNum, NvaiCiienfs Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 97 200809689 , codeNum Range 1 0 0 1 X〇1-2 0 0 1 X1 X〇3-6 0 0 0 1 X2 X1 X〇7-14 0 0 Ϊ 0 0 1 X3 X2 Xi x〇15-30 0 0 0 0 1 X4 Xs X2 Xi x〇31-62 How these bits are interpreted depends on the specific Golomb type (here is based on Three types of H.264 and the fourth type of AVS). Use UD and SD (without sign and sign) to calculate the logical unit to calculate the value. For example, when the bit stream is 0001010, the value of UD is (1<<3)-1+2 = 9, and the value of SD is (-l)A10*ceil(9/2) = +5. A similar procedure occurs on the CD. However, for MD, the form lookup is performed (for example, when UD encoding, the value is decoded, then this value is used as an index into the table, and the value of 6 bits is returned (stored as 6 bits in the table) Value, but the return value is from 0 to the width of the scratchpad)). In one embodiment there are two tables, the table is Intra coded and the other table is Inter code. An example of how the above instruction conversion is used in the content of the EXP-Golomb decoding can be shown as an exemplary pseudo code decoded by the H.264 fragment header portion as follows. sliceHeaderDecode: EXP_G〇L〇MBJJD firstMBSIice EXP_G〇L〇MB__UD siiceType EXP_G〇L〇MB_UD picParameterSetlD READ frameNum, Nvai

Clienfs Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 98 200809689 IB_GT frameMbsOnlyFlag, ZERO, $Label1Clienfs Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 98 200809689 IB_GT frameMbsOnlyFlag, ZERO, $Label1

^ READ fieldPicFlag, ONE IB_EQ fieldPicFlag, ZERO, $Labe!1 .^ READ fieldPicFlag, ONE IB_EQ fieldPicFlag, ZERO, $Labe!1 .

READ bottomFieldFlag, ONEREAD bottomFieldFlag, ONE

Labell: ISUBI t1,#5,nalUnitType IB—NEQ ZERO,t1,$LabeI2Labell: ISUBI t1, #5, nalUnitType IB—NEQ ZERO, t1, $LabeI2

EXP_G〇L〇MB_UD idrPicIDEXP_G〇L〇MB_UD idrPicID

Label2: B—NEQ, READ : LabeI3: ICMPI—EQ [p1]M〇V [!p1]M〇V AND B NEQ ZERO, picOrderCntType, $LabeI3 picOrderCntLSB, NvaltLabel2: B-NEQ, READ: LabeI3: ICMPI-EQ [p1]M〇V [!p1]M〇V AND B NEQ ZERO, picOrderCntType, $LabeI3 picOrderCntLSB, Nvalt

p1,ONE,fieldPicFiag nfie!dPicF!ag,ZERO nfieidPicFlag, ONE t1, picOrderPresentFlag, nfieldPicFlag ONE, tl,$Label4 EXP_G〇L〇MB__SD deltaPicOrderCntBottom Label4: 車_至 sliceHeaderDecode:P1, ONE, fieldPicFiag nfie!dPicF!ag, ZERO nfieidPicFlag, ONE t1, picOrderPresentFlag, nfieldPicFlag ONE, tl, $Label4 EXP_G〇L〇MB__SD deltaPicOrderCntBottom Label4: Car_to sliceHeaderDecode:

EGOLD firstMBSlice, #0, ZEROEGOLD firstMBSlice, #0, ZERO

EGOLD sliceType, #0, ZEROEGOLD sliceType, #0, ZERO

EGOLD picParameterSetID, #0, ZERO READ frameNum, Nval IB__GT frameMbsOnlyFlag, ZERO, $Label1EGOLD picParameterSetID, #0, ZERO READ frameNum, Nval IB__GT frameMbsOnlyFlag, ZERO, $Label1

READ fieldPicFlag, ONE !B_EQ fieldPicFlag, ZERO, $LabeI1READ fieldPicFlag, ONE !B_EQ fieldPicFlag, ZERO, $LabeI1

READ bottomFieidFlag, ONEREAD bottomFieidFlag, ONE

LabeH: ISUBI t1,#5, nalUnitType 旧—NEQ ZERO, t1,$Label2LabeH: ISUBI t1, #5, nalUnitType old—NEQ ZERO, t1, $Label2

EGOLD idrPicID, #0, ZEROEGOLD idrPicID, #0, ZERO

Label2: 旧一NEQ ZERO, picOrderCntType, $Labei3 READ picOrderCntLSB, NvaltLabel2: Old one NEQ ZERO, picOrderCntType, $Labei3 READ picOrderCntLSB, Nvalt

Label3: ICMPI—EQ p1,ONE, fieldPicFlagLabel3: ICMPI—EQ p1, ONE, fieldPicFlag

[p1]MOV nfieldPicFlag, ZERO[p1]MOV nfieldPicFlag, ZERO

[!p1]MOV nfieldPicFlag, ONE AND t1, picOrderPresentFlag, nfieldPicFlag B_NEQ ONE, t1,$Labe!4[!p1]MOV nfieldPicFlag, ONE AND t1, picOrderPresentFlag, nfieldPicFlag B_NEQ ONE, t1,$Labe!4

EGOLD deltaPicOrderCntBottom, #1, ZEROEGOLD deltaPicOrderCntBottom, #1, ZERO

Clienfs Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 99 200809689 VC-1解碼 已描述用作CABAC解碼(經由CaBAC模組580的 可變長度解碼單元530a)、CAVLC解碼(經由CAVLC模 組582的可變長度解碼單元530b)、MPEG解碼(經由 MPEG:模組578的可變長度解碼單:元53〇c)以及 EXP-Golomb解碼(經由EXP-Golomb模組584的可變長度 解碼單元530 (1)的解碼系統200,接下來將描述解碼系統 200的VCM實施例,於此稱為可變長度解碼單元53〇〇。可 變長度解碼單元530e根據計算前導1模組574、計算前導 〇模組576的運算而操作。vcj使用霍夫曼編碼且具有更 多表格。代替建立以及測試這些表格,既然位元率需要較 低,但是驗證成本較高,必要的表格被載入至鄰近内容記 憶體564。表格格式相同於MPEG-2所使用,而使用Rea]> VLC—CLZ、VLC—CLO以及服3丁11指令以解碼位元流。 例如,使用下列偽碼可執行特定表格:Clienfs Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 99 200809689 VC-1 decoding has been described for use as CABAC decoding (variable length decoding unit 530a via CaBAC module 580), CAVLC decoding (via variable length decoding unit 530b of CAVLC module 582), MPEG decoding (variable length decoding unit via MPEG: module 578: element 53 〇 c), and EXP-Golomb decoding (via EXP-Golomb module 584) The decoding system 200 of the variable length decoding unit 530(1), next, will describe a VCM embodiment of the decoding system 200, referred to herein as a variable length decoding unit 53. The variable length decoding unit 530e calculates a preamble according to the first mode. Group 574, which operates on the operation of the preamble module 576. vcj uses Huffman coding and has more tables. Instead of establishing and testing these tables, since the bit rate needs to be lower, the verification cost is higher, and the necessary table is necessary. It is loaded into the adjacent content memory 564. The table format is the same as that used by MPEG-2, and the Res]> VLC-CLZ, VLC-CLO, and 3D 11 instructions are used to decode the bit stream. For example, the following pseudo is used. Code executable Fixed form:

mBLE -1 Picture CBPCY VLC TABLEmBLE -1 Picture CBPCY VLC TABLE

VLC一CLZDST0,#8 CASE DSTO 0: VALUE = 0; BREAK; //USE MOVL 1:VLC_CLZDST1#5 CASE DST1 1:T = READ(2);VLC-CLZDST0, #8 CASE DSTO 0: VALUE = 0; BREAK; //USE MOVL 1:VLC_CLZDST1#5 CASE DST1 1:T = READ(2);

CASET 0: VALUE = 48; BREAK; 1 : VALUE = 56; BREAK; 2: GO20; BREAK; 3: VALUE =1; BREAK;CASET 0: VALUE = 48; BREAK; 1 : VALUE = 56; BREAK; 2: GO20; BREAK; 3: VALUE =1; BREAK;

CASE一ENDCASE one END

Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 100 200809689 2:VAUJE = 2;BREAK; 3:VLC^CLODST2,#5 CASE DST2 0:VA山 E = 28;BREAK; 1: VALUE = 22; BREAK; 2: VALUE = 43; BREAK; 3: VALUE = 30; BREAK; 4: VALUE = 41; BREAK; 5: VALUE = 49; BREAK; ?Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 100 200809689 2:VAUJE = 2;BREAK; 3:VLC^CLODST2,#5 CASE DST2 0:VA Mountain E= 28;BREAK 1: VALUE = 22; BREAK; 2: VALUE = 43; BREAK; 3: VALUE = 30; BREAK; 4: VALUE = 41; BREAK; 5: VALUE = 49; BREAK;

CASE_END 4: T = READ(1); VALUE = (T)? (READ(1) ? 31 : 54):: 27; BREAK; 5: VALUE = 6; BREAK;CASE_END 4: T = READ(1); VALUE = (T)? (READ(1) ? 31 : 54):: 27; BREAK; 5: VALUE = 6; BREAK;

CASE一END 2: VLCCLZ DS1#4 CASE DST1 1: VALUE = 3; BREAK; 2: T = READ(1); VALUE = (T)? 19:36; BREAK; 3:T = READ(2);CASE-END 2: VLCCLZ DS1#4 CASE DST1 1: VALUE = 3; BREAK; 2: T = READ(1); VALUE = (T)? 19:36; BREAK; 3:T = READ(2);

CASET 0: VALUE = 38; BREAK; 1: VALUE = 47; BREAK; 2: VALUE = 59; BREAK; 3: VALUE = 5; BREAK; CASE—END 4: VALUE = 7; BREAK;CASET 0: VALUE = 38; BREAK; 1: VALUE = 47; BREAK; 2: VALUE = 59; BREAK; 3: VALUE = 5; BREAK; CASE_END 4: VALUE = 7;

CASE一END 3: T = READ(l); VALUE = (T)? 16: 8; BREAK; 4: T= READ(1); VALUE = (T) GO10 ?: 12; BREAK; 5: VALUE = 20; BREAK; 6: VALUE = 44; BREAK; 7: T = READ(1); VALUE = (1)? 33:58; BREAK; //USE SEL?? 8: VALUE =15; BREAK;CASE-END 3: T = READ(l); VALUE = (T)? 16: 8; BREAK; 4: T= READ(1); VALUE = (T) GO10 ?: 12; BREAK; 5: VALUE = 20 ;BREAK; 6: VALUE = 44; BREAK; 7: T = READ(1); VALUE = (1)? 33:58; BREAK; //USE SEL?? 8: VALUE =15; BREAK;

CASE END GO10: INPSTR S1,#3 READ一NCM S2, #0, off+S1»2 VALUE = S2&0x63; Q:(S2»6)&0x3; READ SO, Q RETURN; G〇20:CASE END GO10: INPSTR S1, #3 READ-NCM S2, #0, off+S1»2 VALUE = S2&0x63;Q:(S2»6)&0x3; READ SO, Q RETURN; G〇20:

Clients Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 101 200809689 INPSTRS1,#4 " READ一 NCM S2, #0, off+s1 »2 VALUE = S2& 0x63; Q = (S2 » 6)&0x3; READ SO, Q RETURN; 在部分實施例中,可用分支指令代替CASE敘述。因 * 此,和MPEG-2 —樣的VC-1具有容易定義的文法。文法 : 中的符號具有特定方法(表格)丨,其可被執行成著色器, 如上述編碼所顯示。 0 本發明雖以較佳實施例揭露如上,然其並非用以限定 本發明的範圍,任何熟習此項技藝者,在不脫離本發明之 精神和範圍内,當可做些許的更動與潤飾,因此本發明之 保護範圍當視後附之申請專利範圍所界定者為準。 【圖式簡單說明】 第1圖係顯示圖形處理器系統實施例之方塊圖,其中 可執行不同的解碼系統(及方法); 第2圖係顯示示範處理環境之方塊圖,其中可執行解 _ 碼系統的不同實施例; 第3圖係顯示第2圖所顯示之示範處理環境的選擇元 件方塊圖; 第4圖係顯示第2、3圖所顯示之示範處理環境的計算 核心方塊圖,其中可執行解碼系統的不同實施例; 第5A圖係顯示第4圖中計算核心之執行單元的選擇元 件方塊圖,其中可執行解碼系統的不同實施例; 第5B圖係顯示執行單元資料路徑之方塊圖,其中可執Clients Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 101 200809689 INPSTRS1,#4 " READ-NCM S2, #0, off+s1 »2 VALUE = S2&0x63; Q = (S2 » 6) &0x3; READ SO, Q RETURN; In some embodiments, a branch instruction can be used instead of a CASE statement. Because of this, VC-1 like MPEG-2 has an easily defined grammar. Grammar: The symbol in the middle has a specific method (table), which can be executed as a colorizer, as shown by the above code. The present invention is not limited to the scope of the present invention, and may be modified and retouched without departing from the spirit and scope of the present invention. Therefore, the scope of the invention is defined by the scope of the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a graphics processor system in which different decoding systems (and methods) can be executed; and FIG. 2 is a block diagram showing an exemplary processing environment in which an executable solution is implemented. Different embodiments of the code system; Figure 3 is a block diagram showing the selection elements of the exemplary processing environment shown in Figure 2; Figure 4 is a block diagram showing the calculation of the exemplary processing environment shown in Figures 2 and 3, wherein Different embodiments of the executable decoding system are executable; Figure 5A is a block diagram showing the selection elements of the execution unit of the computing core in Figure 4, in which different embodiments of the decoding system can be executed; Figure 5B shows the block of the execution unit data path Figure, which can be executed

Client’s Docket No,: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 102 200809689 命 > 行解碼糸統的不同實施例, 第5C圖係顯示第5B圖中解碼系統實施例之方塊圖, 其適用於複數編碼標準,以及更顯示對應之位元流缓衝器 的實施例; 第6A圖係顯示第5C圖中解碼系統實施例之方塊圖,Client's Docket No,: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 102 200809689 Life> Different embodiments of the line decoding system, FIG. 5C shows the decoding system embodiment of FIG. 5B a block diagram, which is applicable to a complex coding standard, and an embodiment that further displays a corresponding bitstream buffer; FIG. 6A is a block diagram showing an embodiment of a decoding system in FIG. 5C,

I 用以進行CABAC解碼;: 第6B圖係顯示第6A圖中解碼系統實施例之方塊圖; 第6C圖係顯示第6A圖中解碼系統之内容記憶結構及 • 相關暫存器實施例之方塊圖; 第6D圖係顯示使用第6A圖中解碼系統之巨集區塊劃 分機制; 第6E圖係顯示使用第6A圖中解碼系統所執行之示範 巨集區塊解碼機制的方塊圖; 第7A圖係顯示第5C圖中解碼系統實施例之方塊圖, 用以進行CABAC解碼;以及 第7B圖係顯示第7A圖中解碼系統所使用的表格結構 馨實施例之方塊圖。 【主要元件符號說明】 100〜 圖形處理器系統 102- -顯示裝置 104〜 顯示介面單元 106- -局部記憶體 110〜 記憶介面單元 114〜圖形處理單元 118〜 PCI-E匯流排介面單元 122- -晶片組 124〜 糸統記憶體 126- i中央處理單元 128〜 驅動軟體 200〜解碼系統I is used for CABAC decoding; 6B is a block diagram showing a decoding system embodiment in FIG. 6A; FIG. 6C is a block showing a content memory structure of the decoding system in FIG. 6A and a block of the associated register embodiment Figure 6D shows a macroblock partitioning mechanism using the decoding system of Figure 6A; Figure 6E shows a block diagram of the exemplary macroblock decoding mechanism performed by the decoding system of Figure 6A; The figure shows a block diagram of a decoding system embodiment in FIG. 5C for CABAC decoding; and FIG. 7B shows a block diagram of a table structure embodiment used in the decoding system of FIG. 7A. [Description of main component symbols] 100~ Graphics processor system 102--Display device 104~ Display interface unit 106--Local memory 110~ Memory interface unit 114~ Graphics processing unit 118~ PCI-E bus interface unit 122- Chip set 124~ 记忆 memory 126- i central processing unit 128~ drive software 200~ decoding system

Clients Docket No.: S3U06-0013-TW TT?s Docket No:0608-A41246twf.doc/NikeyChen 103 200809689 202〜圖形處理器 206〜勃;^即— 204 。十异核心 合控制以及頂點/串流快取單元 208 304 308 402Clients Docket No.: S3U06-0013-TW TT?s Docket No:0608-A41246twf.doc/NikeyChen 103 200809689 202~Graphic Processor 206~勃@^—204. Ten different core control and vertex/streaming cache unit 208 304 308 402

圖形管線 像素包裝器 寫回單元 執行單元輪入 4〇如〜執行單元偶輪出 406〜記憶體存取單元 "^己fe體介面仲裁器 /指令快取記憶體控制器 執行緒控制器 5〇8 共用暫存器檔案 512 執行單元資料路徑FIFOGraphic pipeline pixel wrapper write back unit execution unit round 4 such as ~ execution unit even round out 406 ~ memory access unit " ^ self body interface arbiter / instruction cache memory controller thread controller 5 〇8 shared register file 512 execution unit data path FIFO

410 504 506' 510, 51心 516-520-526、 532- 3〇2〜紋理過濾單元 306〜命令流處理器 1 〇〜紋理位址產生哭 412〜執行單元集合: 〜執行單元奇輪出 408〜L2快取記憶體 /述詞暫存器檔案 /資料輸出控制器 '暫存器檔案 向量浮點單元 518 52Φ 530^- ^緩衝器 ^執行單元資料路徑 /純量暫存器檔案 '執行緒任務介面 可變長度解碼單元 534〜向量整數計算邏輯單元 536〜特殊目的單元 540〜獻+ ⑽ 〇暫存器檔案 562〜SREG串流缓衝器/DMA弓丨擎 562a〜SREG暫存器 564〜鄰近内容記憶體 562b〜位元流緩衝器 568〜讀取鄰近内文記憶體模組 570〜檢查字串模組 572〜 讀取模組410 504 506' 510, 51 heart 516-520-526, 532- 3〇2~ texture filtering unit 306~ command stream processor 1 纹理 ~ texture address generation crying 412~ execution unit set: ~ execution unit odd round out 408 ~ L2 cache memory / predicate register file / data output controller 'scratch file file vector floating point unit 518 52Φ 530 ^- ^ buffer ^ execution unit data path / scalar register file 'execution Task interface variable length decoding unit 534 ~ vector integer calculation logic unit 536 ~ special purpose unit 540 ~ offer + (10) 〇 register file 562 ~ SREG stream buffer / DMA bow engine 562a ~ SREG register 564 ~ Adjacent content memory 562b~bit stream buffer 568~read adjacent context memory module 570~check string module 572~ read module

Clienfs Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246tw£doc/NikeyChen 104 200809689Clienfs Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246tw£doc/NikeyChen 104 200809689

574〜計算引導1模組 578〜MPEG模組 582〜〇4\^(:模組 602〜狀態索引 606〜碼長範圍574~Compute Boot 1 Module 578~MPEG Module 582~〇4\^(:Module 602~Status Index 606~Code Length Range

I 612〜局部暫存器 616〜二進位字串暫存器 622〜取得内容模組 624〜二進位計算解碼引擎 628〜目標 632 〜SRC1 634〜共用以及執行緒資訊 636〜延遲/重置 640〜資料 654〜二進位索引 712〜位準碼模組 716〜位準0模組 720〜運行模組 7 24〜運行陣列 576〜計算引導〇模組 580〜CABAC模組 584〜Exp-Golomb 模組 604〜高可能性符號值 608〜碼長偏移量 614〜總體暫存器· 62 0〜·一進位化模組 630 〜SRC2 63 8〜位址 650〜記憶體模組 710〜係數符記模組 714〜位準模缸 718〜零位準模組 722〜位準陣歹,jI 612~local register 616~binary string register 622~get content module 624~binary calculation decoding engine 628~target 632~SRC1 634~share and thread information 636~delay/reset 640~ Data 654~binary index 712~bit code module 716~level 0 module 720~running module 7 24~run array 576~computing guide module 580~CABAC module 584~Exp-Golomb module 604 ~ High probability symbol value 608 ~ code length offset 614 ~ overall register · 62 0~ · a carry module 630 ~ SRC2 63 8 ~ address 650 ~ memory module 710 ~ coefficient register module 714~bit modulo cylinder 718~zero level module 722~ level quasi-array, j

Client’s Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 105Client’s Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 105

Claims (1)

200809689 十、申請專利範圍: 1 · 一種角午碼系統,包括: 車人脰了編私核心處理早元,具有^_可變長度解碼單 元,用以執行一著色器,上述著色器係選擇性地執行一視 頻串流之一解碼步驟以輸出一解碼資料,其中上述視頻串 流係根據複i編碼方法編碼而得,且上述解碼今驟係使用 軟體以及硬體之一組合而執行。200809689 X. Patent application scope: 1 · A noon code system, including: The car owner has edited the core processing early element, and has a variable length decoding unit for executing a shader. The above shader system is selective. A decoding step of a video stream is performed to output a decoded data, wherein the video stream is encoded according to a complex i encoding method, and the decoding is performed using a combination of software and hardware. 2·如申請專利範圍第1項所述之解碼系統,其中上述解 碼步驟係於一圖形處理單元之内容編程内,透過執行於上 述圖形處理單元資料路徑之硬體以及於一位元流緩衝器 中,用以自動管理之額外硬體而完成,以及其中上述複數 編碼方法包括内容適應二進位算術編碼(CABAC)、内容 適應可變長度編碼(CAVLC) 、EXp_G〇1〇mb、動晝專= 群(MPEG-2)以及VC-1之至少二者。 豕 3·如申請專利範圍第2項所述之解碼系統,其中 用以適應二進位算術解碼之可變長度解碼單元更包括μ ; 一二進位化(BIND)模組,用以接收 — 以:-内容區塊種類之一第一資訊,以及,對二由去:: —進位化模組所執行之上述著色器之一第—人 处 於上述内容模型之上述第一資訊而提供對應據用 集區塊參數的一第二資訊; 、或夕個巨 以及 得到内容(GCTX)模組,㈣接收 對應於由上述得到内容模組所 一貧訊, 合而摇徂田你-:任丁之i边著色器之 二進位以及内容識 第二指令而提供用於二進位解碼 之 Clienfs Docket No. : S3U06-0013-TW s Docket No:0608.A41246twf.doc/NikeyChen l〇6 200809689 高可能性符號或 1!:可=述内容識別資訊對應於. 疋低可此性符號機率;以及 位:=:計算解瑪⑽D)模組,用 位、上述内容識別警 β以接收上述二進 胃4、一偏移 — 應於由上述二進位計嘗 及乾圍,以及,對 第三指令而解瑪」組所執行之上述著色器之— 容記= 二專項所述之解幾,更包括1 其中上述内容記 巨集區塊單元,其中 、、」巨木區塊以及一鄰近 上述著色器之-第五扑Γ \上述传到内容模組所執行之 包含由上述暫存器至二内容::::容模組係用以根據 林邏輯運算主而寫入至上述内容記:陣^數值轉換的布 辦長5/^/_利耗圍第1項所述之解瑪系統,其中上述可 之r更包括一二進位字串暫存器,用以接收-解碼過之-錢符號並提供更新過之内容資訊。 進位X申二專_第5項所述之解m其中上述二 器係用以接收表示一解碼過之語法成分的複 數^一進位付5虎。 7.如申請專職圍第丨項所述之解%、統,其中對應於 用於適應可變長度解碼之可變長度解碼單元更包括: -係數付記(e〇eff_t〇ken)模組,用以接收巨集區塊 資訊,以及,對應於上述著色器之一第六指令 (CAVLC—TOTC)而提供一施尾係數(Tramng〇nes)以及 Clienf s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 107 200809689 ♦ ,一非零係數(TotalCoeff)之資訊; 一位準(CAVLC一Level)模組,用以接收上述尾隨i 資訊以及一位準碼資訊,以及,對應於上述著色器之一第 七指令(CAVLC一LVL)而提供一字尾長度資訊以及一位準 索引(Level[Idx])資訊,其中上述位準索引(Level[Idx]) 資訊為遞增;· , 一位準碼(CAVLC 一 LevelCode)模組,用以接收上述 字尾長度資訊,以及,對應於上述著色器之一第八指令 • (CAVLC—LC)而提供上述位準碼資訊至上述位準模组; 一位準0(CAVLC_L0)模組,用以接收上述尾隨i 資訊,以及,對應於上述著色器之一第九指令 (CAVLC-LVL0) ’提供一第二位準索引資訊(Level[Idx])至 一位準陣列,其中上述第二等級索引資訊為遞增; 一零位準(CAVLC—ZL)模組,用以接收上述總係數 資訊以及係數資訊之一最大值,以及,對應於上述著色器 之一第十指令(CAVLC-ZL)❿提供一零剩餘資訊以及一 重置值至一第一以及一第二多工器;以及 一運行(CAVLC—Run)模組,用以分別接收來自上述 第一以及第一多工裔之上述零剩餘資訊以及第二位準索引 資訊’以及’對應於上述著色器之一第十一指令 (CAVLC—RUN)而提供一運行索引(Run[Idx])至—運行陣 列。 8·如申請專利範圍第7項所述之解碼系統,其中上述位 準陣列以及上述運行陣列係用以對應於上述著色.器之一第 Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 108 200809689 *i¥ , 十二指令(READJLRUN)而提供一解碼過之位準值以及 一解瑪過之運行值。 9.如申請專利範圍第7項所述之解碼系統,其中上述位 準陣列以及上述運行陣列係對應於上述著色器之一第十三 指令(CLR—LRUN)而被清除。 I ! ; 10,如申請專利範圍第1項所:述之解碼系統,其中上述 可變長度解碼單元更用以使用一指令中之位元數而決定是 否使用儲存於一内部暫存器之一前一運算之一結果,或是 • 在一來源運算元之一資料應使用於在一或多個模組之一目 刖運异。 11. 如申請專利範圍第1項所述之解碼系統,其中上述 可變長度解碼單元更包括一直接記憶體存取引擎模組,包 括一位元流緩衝器以及一直接記憶體存取引擎,上述直接 記憶體存取引擎用以對應於每片段之上述著色器之一指令 之執行而於一既定數量的位元數被使用時,重複地且自動 地在上述位元流緩衝器缓衝上述既定數量的位元數,上述 w 位元數係對應於上述視頻串流。 12. 如申請專利範圍第11項所述之解碼系統,其中上述 可變長度解碼單元更用以對應於在上述位元流緩衝器中之 預期向下溢位二延遲上述直接記憶體存取引擎模組。 13. 如申請專利範圍第11項所述之解碼系統,其中上述 直接記憶體存取引擎係更用以追蹤在上述位元流緩衝器中 所使用之位元數,以及對應於上述位元數係大於一既定數 量之偵測而停止上述位元流緩衝器運算,並轉換控制至一 Client’s Docket No,: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 109 200809689 主機處理器。 14.如申請專利範圍第丨項所述之解碼系統,其 於用於刪Μ解碼之上述可變長度解碼單元更包括/ 一 MPEG模組,用以佬用一 之用 或多個MatchVLC函數以 執行MPEG標準表格,每一 要文U 甘上迷—或多個MatchVLC函 對應於一個別語法成分,/、+、主 北人 上述表格選擇係依據上述著色器 之一指令。 15 ·如申请專利範圍第14盲所2. The decoding system of claim 1, wherein the decoding step is performed in a content programming of a graphics processing unit, and is implemented in a hardware of the data processing unit of the graphics processing unit and in a bit stream buffer. In the above, the automatic hardware is automatically managed, and the above complex encoding method includes content adaptive binary arithmetic coding (CABAC), content adaptive variable length coding (CAVLC), EXp_G〇1〇mb, dynamic 昼 = At least two of the group (MPEG-2) and VC-1.豕3. The decoding system of claim 2, wherein the variable length decoding unit adapted to the binary arithmetic decoding further comprises a μ; a binary binary (BIND) module for receiving - to: a first information of the content block type, and a pair of the following:: - one of the above-mentioned shaders executed by the carry-in module - the person is in the first information of the content model and provides corresponding use a second information of the block parameter; or a large and get content (GCTX) module, (4) receiving a poor news corresponding to the content module obtained by the above, and shaking the field you -: Ren Dingzhi Clienfs Docket No. for the binary decoding and the second instruction of the i-side shader. S3U06-0013-TW s Docket No:0608.A41246twf.doc/NikeyChen l〇6 200809689 High probability symbol Or 1!: can be described as content identification information corresponding to. 疋 low can be this symbol probability; and bit: =: calculate Sigma (10) D) module, use the bit, the above content to identify the police β to receive the above two stomach 4 An offset - should be caused by the above binary Taste and dry, and, for the third instruction, the above-mentioned shader executed by the group "Jiji = the solution described in the second special item, and includes 1 of the above contents, the macro block unit, , "Juju block" and a neighboring shader - the fifth hit Γ \ the above-mentioned content module is executed by the above-mentioned register to the second content::::Capacity module is used according to the forest logic operation The main write to the above content record: the array of the value conversion 5 / ^ / _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ For receiving-decoding the money symbol and providing updated content information. The solution described in item 5 is the solution described in item 5, wherein the above two devices are used to receive a complex number of bits representing a decoded grammatical component. 7. The solution for the variable length decoding unit for adapting variable length decoding further includes: - coefficient payment (e〇eff_t〇ken) module, To provide macro block information, and to provide a tail coefficient (Tramng〇nes) and Clienf s Docket No. corresponding to the sixth instruction (CAVLC-TOTC) of the above shader: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 107 200809689 ♦ , a non-zero coefficient (TotalCoeff) information; a quasi (CAVLC-Level) module for receiving the above-mentioned trailing i information and a quasi-code information, and Corresponding to the seventh instruction (CAVLC-LVL) of one of the above shaders, providing a suffix length information and a level index (Level[Idx]) information, wherein the level index (Level[Idx]) information is incremented ;· , a quasi-code (CAVLC-LevelCode) module for receiving the above-mentioned suffix length information, and corresponding to the eighth instruction of the above shader (CAVLC-LC) to provide the above-mentioned level code information to The above level module; one level 0 (CAVLC _L0) module for receiving the following trailing i information, and corresponding to one of the above shader ninth instructions (CAVLC-LVL0) 'providing a second level index information (Level[Idx]) to a quasi-array The second level index information is incremented; a zero level (CAVLC-ZL) module is configured to receive the total coefficient information and a maximum value of the coefficient information, and corresponding to one of the tenth instructions of the shader (CAVLC-ZL) provides a zero residual information and a reset value to a first and a second multiplexer; and a running (CAVLC-Run) module for receiving the first and first from the above The above-mentioned zero residual information of the multiplexer and the second level index information 'and 'corresponding to the eleventh instruction (CAVLC-RUN) of one of the above shaders provides a running index (Run[Idx]) to the running array. 8. The decoding system of claim 7, wherein the level array and the running array are used to correspond to one of the coloring devices: Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 108 200809689 *i¥ , Twelve instructions (READJLRUN) provide a decoded level value and an escaped run value. 9. The decoding system of claim 7, wherein the level array and the running array are cleared corresponding to a thirteenth command (CLR_LRUN) of one of the shaders. The decoding system of claim 1, wherein the variable length decoding unit is further configured to use the number of bits in an instruction to determine whether to use one of the internal registers stored in an internal register. One of the results of the previous operation, or • One of the data in one source of the operation should be used in one or more of the modules. 11. The decoding system of claim 1, wherein the variable length decoding unit further comprises a direct memory access engine module, including a one-bit stream buffer and a direct memory access engine. The direct memory access engine is configured to repeatedly and automatically buffer the bit stream buffer in the bit stream buffer when the number of bits is used corresponding to the execution of one of the shaders of each segment. For a given number of bits, the above-mentioned w-bit number corresponds to the video stream described above. 12. The decoding system of claim 11, wherein the variable length decoding unit is further configured to delay the direct memory access engine corresponding to an expected downward overflow in the bit stream buffer. Module. 13. The decoding system of claim 11, wherein the direct memory access engine is further configured to track the number of bits used in the bit stream buffer and to correspond to the number of bits. The above bit stream buffer operation is stopped and the conversion control is performed to a Client's Docket No, which is greater than a predetermined number of detections, S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 109 200809689 Host processor. 14. The decoding system of claim 2, wherein the variable length decoding unit for deleting decoding further comprises an MPEG module for using one or more MatchVLC functions. Executing the MPEG standard form, each of the essays is fascinated - or a plurality of MatchVLC functions correspond to a different grammatical component, /, +, the main norther, the above table selection is based on one of the above shader instructions. 15 · If you apply for patent coverage, the 14th blind spot Λ, 固乐Μ項所逑之解碼系統,其中上述 MatchVLC函數係至少部分以硬體來執行。 16.如申#專利圍第i項所述之解碼系統,其中對應 於用於跳㈤omb解碼之上述可變長度解碼單元更包括: 一 EXP_GGlGmb模組’用以使用-單-運算碼執行禮 數EXP—運算,每一上述複數娜G〇i〇mb運算係 使用在-著色器指令中之—立即資料攔位值之個別值來加 以區別。 17·如申請專利範圍帛!項所述之解碼系統,其中對應 於用於VC-1解碼之上述可變長度解碼單元係甩以選擇性 地載入VC-1表格至一内容記憶體陣列,其中上述解碼係 根據上述選擇性載入之表格。 18.—種圖形處理單元,耦接至一主機處理器以及記憶 體,上述圖形處理單元包括: 一圖形處理裔,具有一軟體可編程核心處理單元,上 述軟體可編程核心處理單元包括一或多個執行單元,上述 一或多個執行單元包括執行單元資料路徑.硬體,·上述執行 Client’s Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 110 200809689 , 單元資料路徑硬體包括一可變長度解碼單元,上述可變長 度解碼單元用以執行一著色器,上述著色器係選擇性地執 行一視頻串流之解碼步驟以輸出一解碼資料,其中上述視 頻串流係根據複數編碼方法編碼而得。 19. 如申請專利範圍第18項所述之圖形處理單元,其中 I I 上述解碼步驟係於一圖形處理單元之内容編程内,透過執: 行於上述圖形處理單元資料路徑之硬體以及於一位元流缓 衝器中,用以自動管理之額外硬體而完成,以及其中上述 • 複數編碼方法包括内容適應二進位算術編碼(CABAC)、 内容適應可變長度編碼(CAVLC ) 、EXP-Golomb、動晝 專家群(MPEG-2)以及VC-1之至少二者 20. 如申請專利範圍第18項所述之圖形處理單元,更包 括具有與上述可變長度解碼單元類似結構之一或多個額外 可變長度解碼單元,其中上述可變長度解碼單元以及上述 一或多個額外可變長度解碼單元係用以同步地解碼多視頻 串流。 Client’s Docket No·: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 111Λ, 固乐Μ The decoding system, wherein the MatchVLC function is at least partially executed by hardware. 16. The decoding system of claim i, wherein the variable length decoding unit corresponding to the hopping (f)omb decoding further comprises: an EXP_GG1Gmb module for performing a ritual EXP using a single-operating code - Operation, each of the above complex 〇G〇i〇 mb operations is used in the - shader instruction - the individual values of the immediate data block values are distinguished. 17·If you apply for a patent scope帛! The decoding system of the item, wherein the variable length decoding unit system for VC-1 decoding is configured to selectively load a VC-1 table to a content memory array, wherein the decoding is based on the above selectivity The form to load. 18. A graphics processing unit coupled to a host processor and a memory, the graphics processing unit comprising: a graphics processing family having a software programmable core processing unit, the software programmable core processing unit including one or more Execution unit, the one or more execution units include an execution unit data path. Hardware, the above execution Client's Docket No.: S3U06-0013-TW TT^s Docket No: 0608-A41246twf.doc/NikeyChen 110 200809689, unit The data path hardware includes a variable length decoding unit for performing a shader, the shader selectively performing a video stream decoding step to output a decoded data, wherein the video string The stream is obtained by encoding according to a complex coding method. 19. The graphics processing unit of claim 18, wherein the decoding step is performed in a content programming of a graphics processing unit, and the hardware is executed in the data processing path of the graphics processing unit. The elementary stream buffer is implemented by an additional hardware for automatic management, and wherein the above-mentioned complex coding methods include content adaptive binary arithmetic coding (CABAC), content adaptive variable length coding (CAVLC), EXP-Golomb, At least two of the MPEG-2 and the VC-1. The graphics processing unit of claim 18, further comprising one or more of the similar structures to the variable length decoding unit described above. An additional variable length decoding unit, wherein said variable length decoding unit and said one or more additional variable length decoding units are configured to synchronously decode a plurality of video streams. Client’s Docket No·: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 111
TW96120728A 2006-06-08 2007-06-08 Decoding system unit TWI354239B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US81182106P 2006-06-08 2006-06-08

Publications (2)

Publication Number Publication Date
TW200809689A true TW200809689A (en) 2008-02-16
TWI354239B TWI354239B (en) 2011-12-11

Family

ID=38899303

Family Applications (4)

Application Number Title Priority Date Filing Date
TW096120896A TWI348653B (en) 2006-06-08 2007-06-08 Decoding of context adaptive binary arithmetic codes in computational core of programmable graphics processing unit
TW96120728A TWI354239B (en) 2006-06-08 2007-06-08 Decoding system unit
TW96120726A TWI428850B (en) 2006-06-08 2007-06-08 Decoding method
TW96120899A TWI344795B (en) 2006-06-08 2007-06-08 Decoding of context adaptive variable length codes in computational core of programmable graphics processing unit

Family Applications Before (1)

Application Number Title Priority Date Filing Date
TW096120896A TWI348653B (en) 2006-06-08 2007-06-08 Decoding of context adaptive binary arithmetic codes in computational core of programmable graphics processing unit

Family Applications After (2)

Application Number Title Priority Date Filing Date
TW96120726A TWI428850B (en) 2006-06-08 2007-06-08 Decoding method
TW96120899A TWI344795B (en) 2006-06-08 2007-06-08 Decoding of context adaptive variable length codes in computational core of programmable graphics processing unit

Country Status (2)

Country Link
CN (4) CN101072350B (en)
TW (4) TWI348653B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8686921B2 (en) 2008-12-31 2014-04-01 Intel Corporation Dynamic geometry management of virtual frame buffer for appendable logical displays
TWI587694B (en) * 2012-03-30 2017-06-11 英特爾股份有限公司 Preempting fixed function media devices
TWI723075B (en) * 2015-12-20 2021-04-01 美商英特爾股份有限公司 Method and processor for vector permute and vectort permute unit
TWI805080B (en) * 2020-12-11 2023-06-11 大陸商武漢新芯集成電路製造有限公司 Monotonic counter and counting method thereof

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8156410B2 (en) * 2008-03-05 2012-04-10 Himax Technologies Limited Fast debugging tool for CRC insertion in MPEG-2 video decoder
CN101577629B (en) * 2009-05-14 2011-05-25 北京邮电大学 Dynamic allocation method of coding vector based on graph coloring in multicast network
CN101908200B (en) * 2009-06-05 2012-08-08 财团法人资讯工业策进会 Graphics processing system with power gating function and method
US8681162B2 (en) * 2010-10-15 2014-03-25 Via Technologies, Inc. Systems and methods for video processing
GB2488159B (en) * 2011-02-18 2017-08-16 Advanced Risc Mach Ltd Parallel video decoding
US9378560B2 (en) * 2011-06-17 2016-06-28 Advanced Micro Devices, Inc. Real time on-chip texture decompression using shader processors
US9231616B2 (en) * 2011-08-05 2016-01-05 Broadcom Corporation Unified binarization for CABAC/CAVLC entropy coding
CN103037213B (en) * 2011-09-28 2016-02-17 晨星软件研发(深圳)有限公司 The cloth woods entropy decoding method of cloth woods entropy decoder and image playing system
US9888261B2 (en) 2011-11-08 2018-02-06 Samsung Electronics Co., Ltd. Method and device for arithmetic coding of video, and method and device for arithmetic decoding of video
US9451258B2 (en) * 2012-04-03 2016-09-20 Qualcomm Incorporated Chroma slice-level QP offset and deblocking
CA2863549C (en) * 2012-05-29 2018-12-04 Mediatek Inc. Method and apparatus for coding of sample adaptive offset information
US9196014B2 (en) * 2012-10-22 2015-11-24 Industrial Technology Research Institute Buffer clearing apparatus and method for computer graphics
CN103813177A (en) * 2012-11-07 2014-05-21 辉达公司 System and method for video decoding
US9947084B2 (en) 2013-03-08 2018-04-17 Nvidia Corporation Multiresolution consistent rasterization
JP6379107B2 (en) * 2013-05-21 2018-08-22 株式会社スクウェア・エニックス・ホールディングス Information processing apparatus, control method therefor, and program
CN107037984B (en) * 2013-12-27 2019-10-18 威盛电子股份有限公司 Data memory device and its method for writing data
US9455743B2 (en) * 2014-05-27 2016-09-27 Qualcomm Incorporated Dedicated arithmetic encoding instruction
TW201626218A (en) * 2014-09-16 2016-07-16 輝達公司 Techniques for passing dependencies in an API
US10205957B2 (en) 2015-01-30 2019-02-12 Mediatek Inc. Multi-standard video decoder with novel bin decoding
US10250912B2 (en) 2015-02-17 2019-04-02 Mediatek Inc. Method and apparatus for entropy decoding with arithmetic decoding decoupled from variable-length decoding
CN104869398B (en) * 2015-05-21 2017-08-22 大连理工大学 A kind of CABAC realized based on CPU+GPU heterogeneous platforms in HEVC parallel method
GB2542162B (en) 2015-09-10 2019-07-17 Imagination Tech Ltd Trailing or leading digit anticipator
US9537504B1 (en) * 2015-09-25 2017-01-03 Intel Corporation Heterogeneous compression architecture for optimized compression ratio
US10375395B2 (en) * 2016-02-24 2019-08-06 Mediatek Inc. Video processing apparatus for generating count table in external storage device of hardware entropy engine and associated video processing method
CN106921859A (en) * 2017-05-05 2017-07-04 郑州云海信息技术有限公司 A kind of CABAC entropy coding methods and device based on FPGA
CN107277505B (en) * 2017-05-19 2020-06-16 北京大学 AVS-2 video decoder device based on software and hardware partition
CN107242882A (en) * 2017-06-05 2017-10-13 上海瓴舸网络科技有限公司 A kind of B ultrasound shows auxiliary equipment and its control method
CN110710219B (en) * 2017-12-08 2022-02-11 谷歌有限责任公司 Method and apparatus for context derivation for coefficient coding
TWI674558B (en) 2018-06-12 2019-10-11 財團法人工業技術研究院 Device and method for processing numercial array data, and color table generation method thereof
CN109818855B (en) * 2019-01-14 2020-12-25 东南大学 Method for obtaining content by supporting pipeline mode in NDN (named data networking)
CN110458120B (en) * 2019-08-15 2022-01-04 中国水利水电科学研究院 Method and system for identifying different vehicle types in complex environment
CN111028135B (en) * 2019-12-10 2023-06-02 国网重庆市电力公司电力科学研究院 Image file repairing method
US11748011B2 (en) 2021-03-31 2023-09-05 Silicon Motion, Inc. Control method of flash memory controller and associated flash memory controller and storage device
US11733895B2 (en) * 2021-03-31 2023-08-22 Silicon Motion, Inc. Control method of flash memory controller and associated flash memory controller and storage device
CN114816434B (en) * 2022-06-28 2022-10-04 之江实验室 Programmable switching-oriented hardware parser and parser implementation method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1599049A3 (en) * 2004-05-21 2008-04-02 Broadcom Advanced Compression Group, LLC Multistandard video decoder
US7742544B2 (en) * 2004-05-21 2010-06-22 Broadcom Corporation System and method for efficient CABAC clock
KR100612015B1 (en) * 2004-07-22 2006-08-11 삼성전자주식회사 Method and apparatus for Context Adaptive Binary Arithmetic coding
US7800620B2 (en) * 2004-11-05 2010-09-21 Microsoft Corporation Optimizing automated shader program construction

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8686921B2 (en) 2008-12-31 2014-04-01 Intel Corporation Dynamic geometry management of virtual frame buffer for appendable logical displays
TWI447705B (en) * 2008-12-31 2014-08-01 Intel Corp Dynamic geometry management of virtual frame buffer for appendable logical displays
TWI587694B (en) * 2012-03-30 2017-06-11 英特爾股份有限公司 Preempting fixed function media devices
TWI723075B (en) * 2015-12-20 2021-04-01 美商英特爾股份有限公司 Method and processor for vector permute and vectort permute unit
TWI805080B (en) * 2020-12-11 2023-06-11 大陸商武漢新芯集成電路製造有限公司 Monotonic counter and counting method thereof

Also Published As

Publication number Publication date
CN101072353B (en) 2013-02-20
CN101072353A (en) 2007-11-14
TWI348653B (en) 2011-09-11
CN101072349B (en) 2012-10-10
CN101072349A (en) 2007-11-14
TWI344795B (en) 2011-07-01
TWI354239B (en) 2011-12-11
TW200821982A (en) 2008-05-16
CN101072350A (en) 2007-11-14
CN101072350B (en) 2012-12-12
TW200813884A (en) 2008-03-16
TW200803526A (en) 2008-01-01
TWI428850B (en) 2014-03-01
CN101087411A (en) 2007-12-12

Similar Documents

Publication Publication Date Title
TW200809689A (en) Decoding system and graphics processing unit
US7626518B2 (en) Decoding systems and methods in computational core of programmable graphics processing unit
US7626521B2 (en) Decoding control of computational core of programmable graphics processing unit
US7656326B2 (en) Decoding of context adaptive binary arithmetic codes in computational core of programmable graphics processing unit
JP6808341B2 (en) Coding concepts that allow parallel processing, transport demultiplexers and video bitstreams
JP2023058526A (en) Method and apparatus for unified significance map encoding
USRE44923E1 (en) Entropy decoding methods and apparatus using most probable and least probable signal cases
CN107071446B (en) Image encoding and decoding methods, encoding and decoding apparatuses, and computer-readable storage medium
US9392292B2 (en) Parallel encoding of bypass binary symbols in CABAC encoder
ES2705760T3 (en) Procedure for palette mode coding
US20120057637A1 (en) Arithmetic Decoding Acceleration
TW200952498A (en) CABAC decoding unit and decoding method
CN107409229A (en) Indicate the syntactic structure of the end of coding region
WO2003034746A2 (en) Improved variable length decoder
US9001882B2 (en) System for entropy decoding of H.264 video for real time HDTV applications
CN103918273A (en) Method of determining binary codewords for transform coefficients
US20170359591A1 (en) Method and device for entropy encoding or entropy decoding video signal for high-capacity parallel processing
Juurlink et al. Scalable parallel programming applied to H. 264/AVC decoding
TW200826516A (en) Entropy processor for decoding
JP2010118939A (en) Image decoding apparatus
TWI606719B (en) Video encoding device, video encoding method, and video encoding program
Cho et al. Parallelizing the H. 264 decoder on the cell BE architecture
JPH09307899A (en) Variable length decoder
Lin Implementation of H. 264 Decoder in Bluespec SystemVerilog
Wu et al. Hardware-assisted syntax decoding model for software AVC/H. 264 decoders