TWI482117B

TWI482117B - Filtering for vpu

Info

Publication number: TWI482117B
Application number: TW096121890A
Authority: TW
Inventors: Hussain Zahid
Original assignee: Via Tech Inc
Priority date: 2006-06-16
Filing date: 2007-06-15
Publication date: 2015-04-21
Also published as: TWI350109B; CN101072351A; CN101072351B; CN101083764B; CN101083763B; TW200816820A; TW200803528A; CN101083763A; TW200803525A; TWI348654B; TWI383683B; CN101068364B; CN101068365B; CN101083764A; CN101068353A; TW200821986A; TWI444047B; CN101068353B; CN101068364A; TW200816082A

Description

可程式視訊處理單元之系統與處理方法Programmable video processing unit system and processing method

本發明是關於處理視訊以及圖形資料，更特定言之，本發明是關於提供一種具有可程式核心之視訊處理單元。SUMMARY OF THE INVENTION The present invention relates to processing video and graphics materials, and more particularly to providing a video processing unit having a programmable core.

隨著電腦技術之不斷發展，對計算設備之需求亦隨之提升。更特定言之，許多電腦應用程式及/或資料流需要對視訊資料進行處理，隨著視訊資料變得愈加複雜，對視訊資料之處理要求亦隨之增加。With the continuous development of computer technology, the demand for computing equipment has also increased. More specifically, many computer applications and/or data streams require the processing of video data. As video data becomes more complex, the processing requirements for video data increase.

目前，許多計算架構提供用於處理包括視訊以及圖形資料之中央處理單元(CPU)，雖然CPU可提供用於一些視訊以及圖形之適當處理能力，但CPU亦需處理其他資料。因此，在處理複雜視訊以及圖形中對CPU之需求可能會不利地影響整個系統之效能。Currently, many computing architectures provide a central processing unit (CPU) for processing video and graphics data. Although the CPU can provide appropriate processing power for some video and graphics, the CPU also needs to process other data. Therefore, the need for CPU in processing complex video and graphics can adversely affect the performance of the overall system.

另外，許多計算架構包括用於處理資料之一或多個執行單元(EU)。更特定言之，在至少一架構中EU可用以處理多個不同類型之資料。如同CPU般，對EU之需求衍生自處理複雜視訊以及圖形資料可能會不利地影響整個計算系統之效能。另外，由EU處理複雜視訊以及圖形資料可能增加功率消耗以致超過可接受的臨限值。此外，資料之不同協定或規格更會限制EU處理視訊以及圖形資料之能力。另外，目前許多計算架構提供32位元命令，該情況可能降低效率，因而影響處理速度。此外，單一組件中利用多個操作亦是另一需求。In addition, many computing architectures include one or more execution units (EU) for processing data. More specifically, the EU can be used to process multiple different types of data in at least one architecture. As with CPUs, the need for EU is derived from the processing of complex video and graphical data that can adversely affect the performance of the entire computing system. In addition, processing complex video and graphics data by the EU may increase power consumption beyond acceptable thresholds. In addition, different agreements or specifications of the data may limit the EU's ability to process video and graphical data. In addition, many computing architectures currently offer 32-bit commands, which may reduce efficiency and therefore speed up processing. In addition, the use of multiple operations in a single component is another requirement.

因此，工業領域中存在解決上述缺陷以及不足之迄今仍未解決的需求。Therefore, there is a need in the industrial field to solve the above-mentioned drawbacks and deficiencies that have not yet been solved.

本發明包括用於根據一指令處理視訊資料的可程式視訊處理單元，包含一接收邏輯電路，用以接收選自複數個格式之一的視訊資料；一濾波邏輯電路，用以根據指令濾波該視訊資料；以及一轉換邏輯電路，用以根據指令轉換該濾波資料。指令包含一模式指示欄位用以指示濾波邏輯電路與轉換邏輯電路根據視訊資料之格式運作。The present invention includes a programmable video processing unit for processing video data according to an instruction, comprising a receiving logic circuit for receiving video data selected from one of a plurality of formats; and a filtering logic circuit for filtering the video according to the command And a conversion logic circuit for converting the filter data according to the instruction. The command includes a mode indication field for indicating that the filter logic circuit and the conversion logic circuit operate in accordance with the format of the video material.

本發明之另一實施例包括一種用以處理至少兩種格式之視訊資料的可程式視訊處理單元，包含：一辨識邏輯電路，用以辨識視訊資料的格式；一動態補償邏輯電路，用以執行一動態補償操作；一離散餘弦反轉換邏輯電路，用以執行一離散餘弦反轉換操作；以及一整數轉換邏輯電路，用以執行一整數轉換操作。其中離散餘弦反轉換邏輯電路與整數轉換邏輯電路根據該辨識邏輯電路的辨識結果分別被關閉。Another embodiment of the present invention includes a programmable video processing unit for processing video data of at least two formats, including: an identification logic circuit for recognizing a format of video data; and a dynamic compensation logic circuit for performing a dynamic compensation operation; a discrete cosine inverse conversion logic circuit for performing a discrete cosine inverse conversion operation; and an integer conversion logic circuit for performing an integer conversion operation. The discrete cosine inverse conversion logic circuit and the integer conversion logic circuit are respectively turned off according to the identification result of the identification logic circuit.

本發明亦包括用於處理視訊資料之方法的實施例。至少一實施例包括接收一指令；接收選自至少兩種格式之一的視訊資料；根據指令濾波該視訊資料；以及根據指令轉換該視訊資料。其中此指令包含一模式識別欄位用以指示濾波與轉換該視訊資料之步驟根據視訊資料之格式運作。The invention also includes embodiments of a method for processing video material. At least one embodiment includes receiving an instruction, receiving video material selected from one of at least two formats, filtering the video material in accordance with the instruction, and converting the video material in accordance with the instruction. The instruction includes a pattern recognition field for indicating that the step of filtering and converting the video data operates according to the format of the video data.

本發明揭露之其他系統、方法、特徵以及優點在檢視了以下圖式以及詳細描述之後對於熟習該項技術者將是明顯的或變得明顯。預期將所有此等額外系統、方法、特徵以及優點包括於此描述內容內及本揭露內容之範疇內。Other systems, methods, features, and advantages of the invention will be apparent to those skilled in the <RTIgt; All such additional systems, methods, features, and advantages are intended to be included within the scope of the description and the scope of the disclosure.

圖1為用於處理視訊資料之計算架構的一實施例。如圖1所示，計算裝置可包括執行單元(Execution Unit，EU)之集區(pool)146。執行單元之集區146可包括用於在圖1之計算架構中執行資料之一或多個執行單元。執行單元之集區146(本文中稱為“EUP 146”)可耦接至資料流快取記憶體116，且自資料流快取記憶體116接收資料。EUP 146亦可耦接至輸入端口142以及輸出端口144。輸入端口142可用以自具有快取記憶體子系統之EUP控制器118接收資料。輸入端口142亦可自L2快取記憶體114以及後封裝器160接收資料。EUP 146可處理所接收之資料，且將經處理後的資料輸出至輸出端口144。1 is an embodiment of a computing architecture for processing video data. As shown in FIG. 1, the computing device can include a pool 146 of an Execution Unit (EU). The pool 146 of execution units may include one or more execution units for executing data in the computing architecture of FIG. The pool of execution units 146 (referred to herein as "EUP 146") can be coupled to the stream cache 116 and receive data from the stream cache 116. The EUP 146 can also be coupled to the input port 142 and the output port 144. Input port 142 can be used to receive data from EUP controller 118 having a cache memory subsystem. Input port 142 can also receive data from L2 cache memory 114 and post-packager 160. The EUP 146 can process the received data and output the processed data to the output port 144.

另外，具有快取記憶體子系統之EUP控制器118可將資料發送至記憶體存取單元(memory access unit，以下簡稱MXU)A 164a以及三角與屬性配置單元(triangle and attribute setup)134。L2快取記憶體114亦可將資料發送至MXU A 164a，且自MXU A 164a接收資料。頂點快取記憶體(vertex cache)112以及資料流快取記憶體110亦可與MXU A 164a通信，記憶體存取端口108亦與MXU A 164a通信。記憶體存取端口108可與匯流排介面單元(bus interface unit，BIU)90、記憶體介面單元(memory interface unit，MIU)A106a、MIU B 106b、MIU C 106c以及MIU D 106d通信資料，記憶體存取端口108亦可耦接至MXU B 164b。Additionally, the EUP controller 118 having a cache memory subsystem can send data to a memory access unit (MXU) A 164a and a triangle and attribute setup 134. The L2 cache memory 114 can also send data to the MXU A 164a and receive data from the MXU A 164a. The vertex cache 112 and the stream cache 110 can also communicate with the MXU A 164a, and the memory access port 108 also communicates with the MXU A 164a. The memory access port 108 can communicate with a bus interface unit (BIU) 90, a memory interface unit (MIU) A 106a, a MIU B 106b, a MIU C 106c, and a MIU D 106d, and the memory Access port 108 can also be coupled to MXU B 164b.

MXU A 164a亦耦接至命令流處理器(command stream processor，以下簡稱CSP)前端120以及CSP後端128。CSP前端120耦接至3D與狀態組件122，3D與狀態組件122耦接至具有快取記憶體子系統之EUP控制器118。CSP前端120亦耦接至2D前置組件(pre component)124，2D前置組件124耦接至2D先進先出(FIFO)組件126。CSP前端120亦與清晰度及型號紋理處理器(clear and type texture processor)130以及高級加密系統(advanced encryption system，AES)加密/解密組件132通信資料。CSP後端128耦接至跨距像磚產生器(span－tile generator)136。The MXU A 164a is also coupled to a command stream processor (CSP) front end 120 and a CSP back end 128. The CSP front end 120 is coupled to the 3D and state component 122, and the 3D and state component 122 is coupled to the EUP controller 118 having a cache memory subsystem. The CSP front end 120 is also coupled to a 2D pre-component 124 that is coupled to a 2D first in first out (FIFO) component 126. The CSP front end 120 also communicates with a clear and type texture processor 130 and an advanced encryption system (AES) encryption/decryption component 132. The CSP back end 128 is coupled to a span-tile generator 136.

三角與屬性配置單元134耦接至3D與狀態組件122、具有快取記憶體子系統之EUP控制器118以及跨距像磚產生器136。跨距像磚產生器136可用以將資料發送至ZL1快取記憶體128，跨距像磚產生器136亦可耦接至ZL1 138，ZL1 138可將資料發送至ZL1快取記憶體128。ZL2 140可耦接至Z(例如，深度緩衝快取記憶體)及模板(stencil，ST)快取記憶體148。Z及ST快取記憶體148可透過寫回單元162來發送及接收資料，且可耦接至頻寬(以下簡稱BW)壓縮器146。BW壓縮器146亦可耦接至MXU B 164b，MXU B 164b可耦接至紋理快取記憶體與控制器166。紋理快取記憶體與控制器166可耦接至紋理濾波單元(texture filter unit，以下簡稱TFU)168，TFU168可將資料發送至後封裝器160。後封裝器160可耦接至內插器158。前封裝器156可耦接至內插器158以及紋理位址產生器150。寫回單元162可耦接至2D處理組件(pro component)154、D快取記憶體152、Z與ST快取記憶體148、輸入端口142以及CSP後端128。The triangle and attribute configuration unit 134 is coupled to the 3D and state component 122, the EUP controller 118 having the cache memory subsystem, and the span tile generator 136. The span tile generator 136 can be used to send data to the ZL1 cache 128, and the span tile generator 136 can also be coupled to the ZL1 138, which can send data to the ZL1 cache 128. The ZL2 140 can be coupled to Z (eg, deep buffer cache memory) and stencil (ST) cache memory 148. The Z and ST cache memory 148 can transmit and receive data through the write back unit 162 and can be coupled to a bandwidth (hereinafter referred to as BW) compressor 146. The BW compressor 146 can also be coupled to the MXU B 164b, which can be coupled to the texture cache and controller 166. The texture cache and controller 166 can be coupled to a texture filter unit (TFU) 168, and the TFU 168 can send the data to the post wrapper 160. The rear packager 160 can be coupled to the interposer 158. Front wrapper 156 can be coupled to interpolator 158 and texture address generator 150. The write back unit 162 can be coupled to a 2D pro component 154, a D cache 152, a Z and ST cache 148, an input port 142, and a CSP back end 128.

圖1之實施例經由利用EUP 146來處理視訊資料。更特定言之，在至少一實施例中，執行單元之一或多者可用以處理視訊資料。雖然此架構可適用於一些應用，但此架構可能消耗過量功率；另外，此架構在處理H.264資料中可能頗具難度。The embodiment of FIG. 1 processes video material via the use of EUP 146. More specifically, in at least one embodiment, one or more of the execution units can be used to process video material. While this architecture can be applied to some applications, this architecture can consume excessive power; in addition, this architecture can be difficult to process in H.264 data.

圖2為類似於圖1架構且引入了視訊處理單元(video processing unit，以下簡稱VPU)之計算架構的一實施例。更特定言之，在圖2之實施例中，可在圖1之計算架構中提供具有可程式核心之VPU199。VPU 199可耦接至CSP前端120以及TFU168。VPU 199可作為用於視訊資料之專用處理器。另外，VPU 199可用以處理以動畫專家群(以下簡稱MPEG)、VC－1以及H.264協定編碼之視訊資料。2 is an embodiment of a computing architecture similar to the architecture of FIG. 1 and incorporating a video processing unit (VPU). More specifically, in the embodiment of FIG. 2, a VPU 199 having a programmable core can be provided in the computing architecture of FIG. The VPU 199 can be coupled to the CSP front end 120 and the TFU 168. The VPU 199 can be used as a dedicated processor for video data. In addition, the VPU 199 can be used to process video material encoded in an animation expert group (hereinafter referred to as MPEG), VC-1, and H.264 protocols.

更特定言之，在至少一實施例中，可在執行單元(EU)146之一或多者上執行遮影器碼(shader code)。指令可經解碼及自暫存器提取，主要以及次要操作碼可用以判定運算元被投送之EU以及可基於此運算元執行運算之函數。若操作屬於SAMPLE類型(舉例而言，所有VPU指令皆為SAMPLE類型)，則可自EUP146調度指令。儘管VPU 199可用以減少使用TFU濾波硬體，但VPU 199也可與TFU168一起駐存。More specifically, in at least one embodiment, a shader code can be executed on one or more of the execution units (EU) 146. The instructions can be decoded and extracted from the scratchpad, and the primary and secondary operational codes can be used to determine the EU in which the operand is being dispatched and the function by which the operation can be performed based on the operand. If the operation is of the SAMPLE type (for example, all VPU instructions are of the SAMPLE type), the instructions can be scheduled from the EUP 146. Although the VPU 199 can be used to reduce the use of TFU filtering hardware, the VPU 199 can also reside with the TFU 168.

用於SAMPLE操作之EUP146構建580位元之資料結構(見表格1)。EUP146提取SAMPLE指令所指示之來源暫存器，此資料被置放於EUP－TAG介面結構之最低有效512位元中。EUP146***於此結構中之其他相關資料為：REG_TYPE：此應為0 ThreadID－用以將結果投送回正確的遮影器程式 ShaderResID－ ShaderType＝PS CRFIndex－目的暫存器 SAMPLE_MODE－此為待執行之VPU濾波操作 ExeMode＝垂直The EUP146 for SAMPLE operations builds a 580-bit data structure (see Table 1). The EUP 146 extracts the source register indicated by the SAMPLE instruction, and this data is placed in the least significant 512 bits of the EUP-TAG interface structure. The other relevant information that EUP146 inserts into this structure is: REG_TYPE: This should be 0 ThreadID - used to route the result back to the correct shader program ShaderResID - ShaderType = PS CRFIndex - destination register SAMPLE_MODE - this is pending VPU filtering operation ExeMode=Vertical

此資料結構隨後可被發送至紋理位址產生器(texture address generator，以下簡稱TAG)150。TAG 150可用以檢查SAMPLE_MODE位元以判定資料欄位是否含有紋理樣本資訊或實際資料。若含有實際資料，則TAG 150將資料直接轉發至VPU 199，否則TAG 150可啟始紋理提取。This data structure can then be sent to a texture address generator (TAG) 150. The TAG 150 can be used to check the SAMPLE_MODE bit to determine if the data field contains texture sample information or actual data. If the actual data is included, the TAG 150 forwards the data directly to the VPU 199, otherwise the TAG 150 initiates texture extraction.

若SAMPLE_MODE為MCF、SAD、IDF_VC－1、IDF_H264_0或IDF_H264_1中之一者，則其需要提取紋理資料，否則資料在Data欄位中。If SAMPLE_MODE is one of MCF, SAD, IDF_VC-1, IDF_H264_0 or IDF_H264_1, it needs to extract the texture data, otherwise the data is in the Data field.

TAG 150用以產生位址所需且傳遞至紋理快取記憶體控制器(texture cache controller，以下簡稱TCC)166的資訊可在Data欄位之最低有效128位元中找到：位元[31：0]－U、V座標，此構成紋理塊之位址(4x4x8位元) 位元[102：96]－T# 位元[106：103]－S#The information required by the TAG 150 to generate the address and passed to the texture cache controller (TCC) 166 can be found in the least significant 128 bits of the Data field: Bits [31: 0]-U, V coordinates, this constitutes the address of the texture block (4x4x8 bits) Bit [102:96]-T# Bit [106:103]-S#

T#、S#、U以及V為自特定表面提取之紋理所需的充分資訊。U、V、T#、S#可在解碼期間自INSTRUCTION之SRC1欄位提取，且可用於填充以上欄位。因此，可在執行期間動態地修改U、V、T#、S#。T#, S#, U, and V are sufficient information for the texture extracted from a particular surface. U, V, T#, S# can be extracted from the SRC1 field of INSTRUCTION during decoding and can be used to fill the above fields. Therefore, U, V, T#, S# can be dynamically modified during execution.

隨後SAMPLE_MODE以及含有此資訊之資料的最低有效128位元可置放於VPU 199之命令先進先出記憶體(以下簡稱COMMAND FIFO)中，相對應的資料先進先出記憶體(DATA FIFO)可填充以自紋理快取記憶體被轉發的資料(位元[383：128])或256位元(最大)。此資料將在VPU 199中***作運算，該操作是由COMMAND FIFO的資訊來判定的，其結果(最大256位元)可使用ThreadID以及CRFIndex作為傳回位址傳回至EUP 146以及EU暫存器。The SAMPLE_MODE and the least significant 128 bits of the information containing this information can then be placed in the VPU 199 command FIFO (hereinafter referred to as COMMAND FIFO), and the corresponding data FIFO can be filled. Data (bits [383:128]) or 256 bits (maximum) that are forwarded by the texture cache. This data will be manipulated in VPU 199. This operation is determined by the COMMAND FIFO information. The result (maximum 256 bits) can be returned to EUP 146 and EU temporary storage using ThreadID and CRFIndex as the return address. Device.

另外，本發明包括由EUP 146提供且可供VPU 199使用之指令集，其指令可格式化成64位元，然而此非必要。更特定言之，在至少一實施例中，VPU指令集可包括一或多個動態補償濾波(motion compensation filter，以下簡稱MCF)指令。在此實施例中可能存在以下MCF指令之一或多者：SAMPLE_MCF_BLR DST、S#、T#、SRC2、SRC1 SAMPLE_MCF_VC1 DST、S#、T#、SRC2、SRC1 SAMPLE_MCF_H264 DST、S#、T#、SRC2、SRC1Additionally, the present invention includes a set of instructions provided by EUP 146 and available to VPU 199, the instructions of which can be formatted into 64 bits, however this is not necessary. More specifically, in at least one embodiment, the VPU instruction set can include one or more motion compensation filter (MCF) instructions. There may be one or more of the following MCF instructions in this embodiment: SAMPLE_MCF_BLR DST, S#, T#, SRC2, SRC1 SAMPLE_MCF_VC1 DST, S#, T#, SRC2, SRC1 SAMPLE_MCF_H264 DST, S#, T#, SRC2 , SRC1

SRC1之第一組32位元含有U、V座標，其中最低有效16位元為U。由於可不使用或可忽略SRC2，因此SRC2可為任何值，例如為含有4元素濾波核心之32位元值，每一元素為如下揭示帶正負號之8位元。The first set of 32 bits of SRC1 contains U and V coordinates, of which the least significant 16 bits are U. Since SRC2 may not be used or may be ignored, SRC2 may be any value, such as a 32-bit value containing a 4-element filter core, each element being an 8-bit signed with a sign as follows.

另外，VPU 199之指令集還包括關於迴路內解塊濾波(Inloop Deblocking Filtering，以下簡稱IDF)之指令，如以下指令之一或多者：SAMPLE_IDF_VC1 DST、S#、T#、SRC2、SRC1 SAMPLE_IDF_H264_0 DST、S#、T#、SRC2、SRC1 SAMPLE_IDF_H264_1 DST、S#、T#、SRC2、SRC1 SAMPLE_IDF_H264_2 DST、S#、T#、SRC2、SRC1In addition, the instruction set of the VPU 199 also includes instructions for Inloop Deblocking Filtering (IDF), such as one or more of the following: SAMPLE_IDF_VC1 DST, S#, T#, SRC2, SRC1 SAMPLE_IDF_H264_0 DST , S#, T#, SRC2, SRC1 SAMPLE_IDF_H264_1 DST, S#, T#, SRC2, SRC1 SAMPLE_IDF_H264_2 DST, S#, T#, SRC2, SRC1

對於VC－1 IDF之操作，TFU 168可將8x4x8位元(或4x8x8位元)資料提供至濾波緩衝器中。然而，對於H.264，由TFU 168輸送之資料量可視H.264 IDF操作之類型加以控制。For operation of the VC-1 IDF, the TFU 168 can provide 8x4x8 bit (or 4x8x8 bits) data to the filter buffer. However, for H.264, the amount of data transported by the TFU 168 can be controlled by the type of H.264 IDF operation.

對於SAMPLE_IDF_H264_0指令，TFU供應8x4x8位元(或4x8x8位元)的資料塊。對於SAMPLE_IDF_H264_1指令，TFU 168供應一4x4x8位元之資料塊，且另一4x4x8位元資料由遮影器(EU)146(圖2)供應。另外，藉由SAMPLE_IDF_H264_2，兩個4x4x8位元資料塊皆可由遮影器(位於EU)146供應，而非來自TFU 168。For the SAMPLE_IDF_H264_0 instruction, the TFU supplies 8x4x8 bits (or 4x8x8 bits) of data blocks. For the SAMPLE_IDF_H264_1 instruction, the TFU 168 supplies a 4x4x8 bit data block, and another 4x4x8 bit data is supplied by the shader (EU) 146 (FIG. 2). In addition, with SAMPLE_IDF_H264_2, both 4x4x8 bit data blocks can be supplied by the shader (located in EU) 146 instead of from TFU 168.

另外，VPU 199之指令集還包括動態估計(motion estimation，以下簡稱ME)指令，其可包括諸如以下列出之指令：SAMPLE_SAD DST、S#、T#、SRC2、SRC1。In addition, the instruction set of VPU 199 also includes motion estimation (ME) instructions, which may include instructions such as SAMPLE_SAD DST, S#, T#, SRC2, SRC1.

以上指令可映射至以下主要以及次要操作碼且採取以上所述之格式。以下在相關指令部分中論述SRC以及DST格式之細節。The above instructions can be mapped to the following primary and secondary opcodes and take the format described above. The details of the SRC and DST formats are discussed below in the relevant instruction section.

SAMPLE指令依循圖3中所示之執行路徑。另外，EUP－TAG介面如以下表格6，其他介面亦會在稍後更詳細地描述。The SAMPLE instruction follows the execution path shown in Figure 3. In addition, the EUP-TAG interface is as shown in Table 6 below, and other interfaces will be described in more detail later.

應注意紋理樣本濾波操作亦可映射至 Sample Mode 欄位，在此種狀況下值為00XXX。值11XXX目前保留以供未來使用。另外，在本文中所揭露之至少一實施例中，一些視訊功能可***至紋理管線中以再利用L2快取記憶體邏輯電路以及一些L2以過濾載入MUX的資料，如ME(動態估計)、MC(動態補償)、TC(轉換編碼)以及ID(迴路內解塊)。It should be noted that the texture sample filtering operation can also be mapped to the Sample Mode field, in which case the value is 00XXX. The value 11XXX is currently reserved for future use. Additionally, in at least one embodiment disclosed herein, some video functions can be inserted into the texture pipeline to reuse the L2 cache memory logic and some L2 to filter the data loaded into the MUX, such as ME (Dynamic Estimation). , MC (dynamic compensation), TC (transition coding), and ID (in-loop deblocking).

以下表格總結對於不同樣本指令之自TCC 166及/或TFU 168之資料載入準則。應注意視特殊架構而定，Sample_MC_H264可僅用於Y平面，但對於CrCb平面並非為必需的。The following table summarizes the data loading criteria from TCC 166 and/or TFU 168 for different sample instructions. It should be noted that depending on the particular architecture, Sample_MC_H264 may only be used for the Y plane, but is not required for the CrCb plane.

在本文中所指露之至少一實施例中，Y平面可包括HSF_Y0Y1Y2Y3_32BPE_VIDEO2鋪磚格式。CrCb平面包括交錯CrCb通道且被視為HSF_CrCb_16BPE_VIDEO鋪磚格式。若不要求CbCr交錯平面，則對於Cb或Cr，均可利用與Y平面相同的格式。In at least one embodiment disclosed herein, the Y-plane may comprise a HSF_Y0Y1Y2Y3_32BPE_VIDEO2 tiled format. The CrCb plane includes interleaved CrCb channels and is considered to be the HSF_CrCb_16BPE_VIDEO tiled format. If the CbCr interlaced plane is not required, the same format as the Y plane can be used for Cb or Cr.

另外，已將以下指令添加至遮影器指令集架構(ISA)。In addition, the following instructions have been added to the Shader Instruction Set Architecture (ISA).

SAMPLE_MCF_BLR DST、S#、T#、SRC2、SRC1 SAMPLE_MCF_VC1 DST、S#、T#、SRC2、SRC1 SAMPLE_MCF_H264 DST、S#、T#、SRC2、SRC1 SAMPLE_IDF_VC1 DST、S#、T#、SRC2、SRC1 SAMPLE_IDF_H264_0 DST、S#、T#、SRC2、SRC1 SAMPLE_IDF_H264_1 DST、S#、T#、SRC2、SRC1 SAMPLE_SAD DST、S#、T#、SRC2、SRC1 SAMPLE_TCF_MPEG2 DST、#ctrl、SRC2、SRC1 SAMPLE_TCF_I4x4 DST、#ctrl、SRC2、SRC1 SAMPLE_TCF_M4x4 DST、#ctrl、SRC2、SRC1 SAMPLE_MADD DST、#ctrl、SRC2、SRC1 SAMPLE_IDF_H264_2 DST、#ctrl、SRC2、SRC1 用於SAMPLE_IDF_H264_2之#ctrl應為零。SAMPLE_MCF_BLR DST, S#, T#, SRC2, SRC1 SAMPLE_MCF_VC1 DST, S#, T#, SRC2, SRC1 SAMPLE_MCF_H264 DST, S#, T#, SRC2, SRC1 SAMPLE_IDF_VC1 DST, S#, T#, SRC2, SRC1 SAMPLE_IDF_H264_0 DST , S#, T#, SRC2, SRC1 SAMPLE_IDF_H264_1 DST, S#, T#, SRC2, SRC1 SAMPLE_SAD DST, S#, T#, SRC2, SRC1 SAMPLE_TCF_MPEG2 DST, #ctrl, SRC2, SRC1 SAMPLE_TCF_I4x4 DST, #ctrl, SRC2 , SRC1 SAMPLE_TCF_M4x4 DST, #ctrl, SRC2, SRC1 SAMPLE_MADD DST, #ctrl, SRC2, SRC1 SAMPLE_IDF_H264_2 DST, #ctrl, SRC2, SRC1 #ctrl for SAMPLE_IDF_H264_2 should be zero.

SRC1、SRC2以及#ctrl(可用時)可用以形成如以下表格8中所示在EU/TAG/TCC介面中的512位元資料欄位。SRC1, SRC2, and #ctrl (when available) can be used to form a 512-bit data field in the EU/TAG/TCC interface as shown in Table 8 below.

參看表格8，Tr＝轉置；FD＝濾波方向(垂直＝1)；bS＝邊界強度(Boundary Strength)；bR＝bR控制，YC位元(於CbCr平面YC＝1；於Y平面則YC＝0)，以及CEF＝色度邊緣旗幟(Chroma Edge Flag)。另外，當32位元或(或更少位元)使用於SRC1或SRC2(剩餘未定義)時，可規定巷(lane)選擇以減低暫存器之使用。See Table 8, Tr = transpose; FD = filtering direction (vertical = 1); bS = Boundary Strength; bR = bR control, YC bit (in the CbCr plane YC = 1; Y plane in the Y plane = 0), and CEF = Chroma Edge Flag. In addition, when 32 bits or (or fewer bits) are used for SRC1 or SRC2 (remaining undefined), a lane selection may be specified to reduce the use of the scratchpad.

雖然以上描述了指令格式，但以下在表格10中包括對指令操作之概述。Although the instruction format is described above, an overview of the operation of the instructions is included in Table 10 below.

另外，對於SAMPLE_MADD而言，#ctrl可為11位元的立即值，此外還須執行兩個4 x 4矩陣(SRC1以及SRC2)之加法。任一矩陣之一或多個元素可為16位元帶正負號之整數，其結果(DST)為4 x 4 16位元矩陣。矩陣可如以下在表格11中所示置放於來源/目的暫存器中，此可為VPU內之個別單元。另外，SRC1以及#ctrl資料於週期1時可供存取，且SRC2於隨後之週期亦可存取，因此，可每兩週期發布一個操作。In addition, for SAMPLE_MADD, #ctrl can be an immediate value of 11 bits, in addition to the addition of two 4 x 4 matrices (SRC1 and SRC2). One or more elements of either matrix may be a 16-bit signed integer, and the result (DST) is a 4 x 4 16-bit matrix. The matrix can be placed in the source/destination register as shown below in Table 11, which can be an individual unit within the VPU. In addition, the SRC1 and #ctrl data are available for access at cycle 1, and SRC2 is also accessible during subsequent cycles, so an operation can be issued every two cycles.

#ctrl[0]指示是否執行飽和(saturation，SAT)操作。#ctrl[0] indicates whether to perform a saturation (SAT) operation.

#ctrl[1]指示是否執行捨入(rounding，R)操作。#ctrl[1] indicates whether to perform a rounding (R) operation.

#ctrl[2]指示是否執行1位元右移(shift，S)操作。#ctrl[2] indicates whether to perform a 1-bit right shift (shift, S) operation.

#ctrl[10：3]忽略。#ctrl[10:3]Ignore.

另外，與此資料相關的邏輯準則可包括以下：#Lanes：＝16；#Lanewidth：＝16； If(#ctrl[1])R＝1；ELSER＝0； If(#ctrl[2])S＝1；ELSES＝0； IF(#ctrl[0])SAT＝1；ELSESAT＝0； For(I：＝0；I<#Lanes；I＋＝1){ Base：＝I * #Lanewidth； Top：＝Base＋#Lanewidth－1； Source1[I]：＝SRC1[Top..Base]； Source2[I]：＝SRC2[Top..Base]； Destination[I]：＝(Source1[I]＋Source2[I]＋R)>>S； IF(SAT)Destination[I]＝ MIN(MAX(Destination[I],0),255)； DST[Top..Base]＝Destination[I]；In addition, the logic criteria associated with this material may include the following: #Lanes:=16;#Lanewidth:=16; If(#ctrl[1])R=1; ELSER=0; If(#ctrl[2])S =1; ELSES=0; IF(#ctrl[0])SAT=1; ELSESAT=0; For(I:=0;I<#Lanes;I+=1){ Base:=I * #Lanewidth; Top: =Base+#Lanewidth-1; Source1[I]:=SRC1[Top..Base]; Source2[I]:=SRC2[Top..Base]; Destination[I]:=(Source1[I]+Source2[I] +R)>>S; IF(SAT)Destination[I]= MIN(MAX(Destination[I],0),255); DST[Top..Base]=Destination[I];

再次參看圖9，其為執行純量矩陣相乘。#ctrl為11位元立即值，此值可為0(亦即，#ctrl信號將忽略)。此指令在與SAMPLE_TCF以及SAMPLE_IDF_H264_2相同的群中。與此指令相關的邏輯準則可包括以下：#Lanes：＝16；#Lanewidth：＝16； MMODE＝Control_4[17：16]； SM＝Control_4[7：0]； SP＝Control_4[15：8]；//僅使用最低有效5位元 For(I：＝0；I<#Lanes；I＋＝1){ Base：＝I * #Lanewidth； Top：＝Base＋#Lanewidth－1； Source2[I]：＝SRC2[Top..Base]； Destination[I]：＝(SM * Source2[I])>>SP； DST[Top..Base]＝Destination[I]；}Referring again to Figure 9, the multiplication of the scalar matrix is performed. #ctrl is an 11-bit immediate value, which can be 0 (ie, the #ctrl signal will be ignored). This instruction is in the same group as SAMPLE_TCF and SAMPLE_IDF_H264_2. The logic criteria associated with this instruction may include the following: #Lanes:=16;#Lanewidth:=16; MMODE=Control_4[17:16]; SM=Control_4[7:0]; SP=Control_4[15:8]; //Use only the least significant 5 bits For (I:=0;I<#Lanes;I+=1){ Base:=I * #Lanewidth; Top:=Base+#Lanewidth-1; Source2[I]:=SRC2 [Top..Base]; Destination[I]:=(SM * Source2[I])>>SP; DST[Top..Base]=Destination[I];}

此是使用VPU中用於執行MCF/TCF之FIR_FILTER_BLOCK單元來實施的。SM為施加至所有巷之加權(例如，W[0]＝W[1]＝W[2]＝W[3]＝SM)，Pshift為SP。當執行此操作時，FIR_FILTER_BLOCK中之總和加法器被越過，自16x8位元乘法所得之四個結果可被移位，且每一結果之最低有效16位元被收集在一起成為16個16位元結果，以回傳遞至EU。This is implemented using the FIR_FILTER_BLOCK unit in the VPU for performing MCF/TCF. SM is the weight applied to all lanes (for example, W[0]=W[1]=W[2]=W[3]=SM), and Pshift is SP. When this is done, the sum adder in FIR_FILTER_BLOCK is crossed, the four results from the 16x8 bit multiplication can be shifted, and the least significant 16 bits of each result are collected together into 16 16-bits. The result is passed back to the EU.

圖3為說明如圖2之計算架構中用於處理視訊資料之過程之流程圖的實施例。更特定言之，如圖3之實施例所說明，命令流處理器可將資料以及指令發送至EUP 146。EUP 146相應地可用以讀取指令且處理所接收之資料。EUP 146隨後可將指令、經處理之資料以及來自EUP紋理位址產生器(TAG)介面242之資料發送至紋理位址產生器(TAG)150。TAG 150可用以產生已處理資料之位址。TAG 150隨後可將資料以及指令發送至紋理快取記憶體控制器(texture cache controller，TCC)166。TCC 166可用以快取用於紋理濾波單元(texture filter unit，TFU)168之資料。TFU168可根據所接收之指令來濾波所接收之資料，且將經濾波之資料發送至視訊可程式單元(VPU)199。VPU 199可根據所接收之指令來處理所接收之資料，且將經處理資料發送至後封裝器(postpacker，PSP)160。PSP 160可自諸如TFU 168之各組件來收集像素封包。若像磚是部分完整的，則PSP 160可封裝多個像磚且使用被發送至管線之特定識別符號將像磚發回至EUP 146。3 is an embodiment of a flow diagram illustrating a process for processing video material in the computing architecture of FIG. 2. More specifically, the command stream processor can send data and instructions to the EUP 146 as illustrated in the embodiment of FIG. The EUP 146 is accordingly available to read instructions and process the received data. The EUP 146 can then send the instructions, processed data, and data from the EUP Texture Address Generator (TAG) interface 242 to the Texture Address Generator (TAG) 150. The TAG 150 can be used to generate the address of the processed data. The TAG 150 can then send the data and instructions to a texture cache controller (TCC) 166. The TCC 166 can be used to cache data for a texture filter unit (TFU) 168. The TFU 168 may filter the received data according to the received instructions and send the filtered data to a Video Programmable Unit (VPU) 199. The VPU 199 can process the received data according to the received instructions and send the processed data to a postpacker (PSP) 160. PSP 160 can collect pixel packets from components such as TFU 168. If the brick is partially intact, the PSP 160 can package multiple bricks and send the brick back to the EU 146 using a particular identification symbol sent to the pipeline.

圖4A為說明在計算裝置(諸如具有圖2之計算架構的計算裝置)中資料流之功能流程圖的實施例。如圖4A之實施例所說明，可將加密的資料流發送至CSP 120,128上之解密組件236。在至少一實施例中，加密位元流可經解密且寫回至視訊記憶體。隨後可使用可變長度解碼器(VLD)硬體來解碼所解密之視訊。解密組件236可解密所接收之位元流以形成編碼位元流238。編碼位元流238可發送至VLD、霍夫曼(Huffman)解碼器、複雜適應性可變長度編碼器(complex adaptive variable length decodeT，CAVLC)及/或二進制算術編碼器(Context Based Binary Arithmetic Coder，CABAC)240(本文中稱為“解碼器”)。解碼器240將所接收之位元流解碼，且將所解碼之位元流發送至DirectX視訊加速(DirectX Video Acceleration，DXVA)資料結構242。另外，在DXVA資料結構242處接收到的資料為外部MPEG－2 VLD反掃描、反量化與反DC預測，以及外部VC－1 VLD反掃描、反量化與反DC/AC預測。隨後可經由圖像標頭244、記憶體緩衝器0(MB0)246a,MB1 246b,MB2 246c,...,MBN 246n等而將此資料擷取於DXVA資料結構242中。資料隨後可進入跳躍塊250、252以及254，以在圖4B以及圖4C中繼續。4A is an embodiment of a functional flow diagram illustrating data flow in a computing device, such as a computing device having the computing architecture of FIG. 2. As illustrated in the embodiment of FIG. 4A, the encrypted data stream can be sent to the decryption component 236 on the CSPs 120,128. In at least one embodiment, the encrypted bit stream can be decrypted and written back to the video memory. The variable length decoder (VLD) hardware can then be used to decode the decrypted video. Decryption component 236 can decrypt the received bitstream to form encoded bitstream 238. The encoded bit stream 238 can be sent to a VLD, a Huffman decoder, a complex adaptive variable length decodeT (CAVLC), and/or a Context Based Binary Arithmetic Coder (Context Based Binary Arithmetic Coder, CABAC) 240 (referred to herein as "decoder"). The decoder 240 decodes the received bit stream and sends the decoded bit stream to a DirectX Video Acceleration (DXVA) data structure 242. In addition, the data received at the DXVA data structure 242 are external MPEG-2 VLD inverse scan, inverse quantization, and inverse DC prediction, as well as external VC-1 VLD inverse scan, inverse quantization, and inverse DC/AC prediction. This data can then be retrieved from the DXVA data structure 242 via image header 244, memory buffer 0 (MB0) 246a, MB1 246b, MB2 246c, ..., MBN 246n, and the like. The data can then enter jump blocks 250, 252, and 254 to continue in Figures 4B and 4C.

圖4B為圖4A之功能流程圖的延續。如圖所示，自圖4A之跳躍塊250、252以及254，在反掃描反Q組件264以及反DC/AC預測組件262處接收資料。此資料經處理且發送至交換器265。交換器265判定資料經由Intra/Inter輸入端發送與否，將選定資料發送至跳躍塊270。另外，將來自跳躍塊260之資料發送至編碼圖案塊重建組件266。Figure 4B is a continuation of the functional flow diagram of Figure 4A. As shown, data is received at the inverse scan inverse Q component 264 and the inverse DC/AC prediction component 262 from the skip blocks 250, 252, and 254 of FIG. 4A. This material is processed and sent to switch 265. The switch 265 determines whether the data is sent via the Intra/Inter input and transmits the selected data to the jump block 270. Additionally, the data from the skip block 260 is sent to the coded pattern block reconstruction component 266.

圖4C為圖4A以及圖4B之功能流程圖的延續。如圖所示，來自跳躍塊272、274(圖4A)之資料於濾波器組件280處被接收。此資料根據多個協定之任一者由MC濾波器282濾波。更特定言之，若資料以MPEG－2格式被接收，則該資料以像素偏差來構造，可使用一雙通(two pass)濾波器來同時執行垂直濾波與水平濾波。若資料以VC－1格式被接收，則利用4抽頭(4－tap)濾波器；當資料為1/2準度時操作於雙線性(bilinear)模式下，當資料為1/4準度時則操作於雙立方(bicubic)模式下。另一方面，若資料以H.264格式被接收，則可利用6抽頭濾波器；當資料取樣為四分像素時使用亮度內插，當資料取樣為八分像素時則使用色度內插。經濾波之資料隨後發送至重建參考組件284，與濾波器組件280相關的資料發送至交換器組件288。交換器組件288亦接收零。交換器組件可基於所接收之Intra/Inter資料來判定那些資料將發送至加法器298。4C is a continuation of the functional flow diagram of FIGS. 4A and 4B. As shown, the data from jump blocks 272, 274 (Fig. 4A) is received at filter assembly 280. This information is filtered by MC filter 282 according to any of a number of protocols. More specifically, if the data is received in MPEG-2 format, the data is The pixel offset is constructed by using a two pass filter to perform both vertical and horizontal filtering. If the data is received in the VC-1 format, a 4-tap filter is used; when the data is 1/2-degree, it operates in bilinear mode, when the data is 1/4-degree. When operating in double-cubic mode. On the other hand, if the data is received in the H.264 format, a 6-tap filter can be used; luminance interpolation is used when the data is sampled as a quarter-pixel, and chrominance interpolation is used when the data is sampled as an octant. The filtered data is then sent to a reconstruction reference component 284, and the data associated with the filter component 280 is sent to the switch component 288. Switch component 288 also receives zeros. The switch component can determine which data will be sent to adder 298 based on the received Intra/Inter data.

另外，反轉換組件296自編碼圖案塊重建組件286接收資料，以及經由跳躍塊276自交換器265(圖4B)接收資料。反轉換組件296執行對於MPEG－2資料之8×8離散餘弦反轉換(IDCT)、對於VC－1資料之8×8、8×4、4×8及/或4×4整數轉換以及對於H.264資料之4×4整數轉換，並根據所要執行的轉換，將此資料發送至加法器298。In addition, inverse conversion component 296 receives data from coded pattern block reconstruction component 286 and receives data from switch 265 (FIG. 4B) via jump block 276. The inverse conversion component 296 performs 8x8 discrete cosine inverse transform (IDCT) for MPEG-2 data, 8x8, 8x4, 4x8, and/or 4x4 integer conversions for VC-1 data and for H The .264 data is converted to a 4x4 integer and sent to adder 298 based on the conversion to be performed.

加法器298將反轉換組件296以及交換器288之資料相加求和，且將求和所得的資料發送至迴路內濾波器296。迴路內濾波器296過濾所接收之資料，且將經過濾之資料發送至重建框架組件290。重建框架組件290將資料發送至重建參考組件284。重建框架組件290可將資料發送至解塊與去環(dering)濾波器292，濾波器292可將經過濾之資料發送至用於解交錯之解交錯(de－interlacing)組件294，此資料隨後可供顯示。The adder 298 sums the data of the inverse conversion component 296 and the switch 288 and sends the summed data to the in-loop filter 296. The in-loop filter 296 filters the received data and sends the filtered data to the reconstruction framework component 290. The reconstruction framework component 290 sends the data to the reconstruction reference component 284. The reconstruction framework component 290 can send the data to a deblocking and detling filter 292, which can send the filtered data to a de-interlacing component 294 for deinterlacing, which is followed by Available for display.

圖5A為說明在VPU中(諸如在圖2之計算架構中)可用於提供動態壓縮(MC)及/或離散餘弦轉換(DCT)操作之組件之實施例的功能方塊圖。更特定言之，如圖5A之實施例所說明，匯流排A可用以將16位元資料發送至PE 3 314d之輸入埠b，匯流排A亦將資料發送至Z^－1 延遲組件300，以將16位元資料發送至PE 2 314c之第二輸入端。匯流排A亦將此資料發送至Z^－1 延遲組件302以將16位元資料發送至PE 1 314b，此資料亦發送至Z^－1 延遲組件304，其隨後進入PE 0 314a以及Z^－1 延遲組件306。在穿過Z^－1 延遲組件306之後，將匯流排A之低位8位元資料發送至PE 0 314a，此資料由Z^－1 306延遲且發送至PE 1 314b以及Z^－1 延遲組件310。在到達Z^－1 延遲組件310之後，此資料之低位8位元發送至PE 2 314c以及Z^－1 延遲組件312；在到達Z^－1 延遲組件312之後，此資料之低位8位元發送至PE 3 314d。另外，匯流排B將64位元資料發送至PE 3 314d、PE 2 314c、PE 1 314b以及PE 0 314a之每一者。5A is a functional block diagram illustrating an embodiment of components that may be used to provide dynamic compression (MC) and/or discrete cosine transform (DCT) operations in a VPU, such as in the computing architecture of FIG. 2. More specifically, as illustrated in the embodiment of FIG. 5A, bus A can be used to send 16-bit data to input 埠b of PE 3 314d, and bus A also sends data to Z ^-1 delay component 300 to The 16-bit data is sent to the second input of PE 2 314c. Bus A also sends this data to Z ^-1 delay component 302 to send 16-bit data to PE 1 314b, which is also sent to Z ^-1 delay component 304, which then enters PE 0 314a and Z ^-1 delay. Component 306. After passing through the Z ^-1 delay component 306, the lower 8 bits of bus A are sent to PE 0 314a, which is delayed by Z ^-1 306 and sent to PE 1 314b and Z ^-1 delay component 310. After reaching the Z ^-1 delay component 310, the lower 8 bits of this data are sent to the PE 2 314c and the Z ^-1 delay component 312; after reaching the Z ^-1 delay component 312, the lower 8 bits of the data are sent to the PE. 3 314d. In addition, bus B transmits 64-bit metadata to each of PE 3 314d, PE 2 314c, PE 1 314b, and PE 0 314a.

處理元件0(Processing Elelment，PE 0)314a可促進過濾所接收資料。更特定言之，PE可為FIR濾波器之一元件。當PE 0 314a、PE 1 314b、PE 2 314c以及PE 3 314d與加法器330組合時，此可形成4抽頭/8抽頭FIR濾波器。資料之一部分首先發送至Z^－3 延遲組件316。多工器318選擇資料以使輸入資料自欄位輸入回應組件(Field Input Response，FIR)輸出至多工器318之選擇埠，此資料自多工器318發送至加法器330。Processing Element 0 (Processing Elelment, PE 0) 314a can facilitate filtering of the received data. More specifically, the PE can be one of the components of the FIR filter. When PE 0 314a, PE 1 314b, PE 2 314c, and PE 3 314d are combined with adder 330, this can form a 4-tap/8-tap FIR filter. A portion of the data is first sent to the Z ^-3 delay component 316. The multiplexer 318 selects data to cause the input data to be output from the field input response component (FIR) to the selection of the multiplexer 318, which is sent from the multiplexer 318 to the adder 330.

同樣地，來自PE 1 314b之資料發送至多工器322，其中一些資料首先在Z^－2 延遲組件320處被接收。多工器322經由所接收之FIR輸入端而自所接收之資料進行選擇，選定資料發送至加法器330。PE 2 314c之資料發送至多工器326，其中一些資料首先發送至Z^－1 延遲組件324。FIR輸入選擇待發送至加法器330之資料，自PE 3 314d之資料發送至加法器330。Likewise, data from PE 1 314b is sent to multiplexer 322, some of which is first received at Z ^-2 delay component 320. The multiplexer 322 selects from the received data via the received FIR input, and the selected data is sent to the adder 330. The data of PE 2 314c is sent to multiplexer 326, some of which is first sent to Z ^-1 delay component 324. The FIR input selects the data to be sent to the adder 330, and the data from the PE 3 314d is sent to the adder 330.

亦輸入至加法器330的是N移位器332之反饋迴路。此資料經由Z^－1 延遲組件326在多工器328處被接收。亦在多工器328處接收到的為捨入資料。多工器328在多工器328之選擇埠處經由較寬輸入端而對所接收之資料進行選擇。多工器328將選定資料發送至加法器330，加法器330加上所接收之資料且將所加之資料發送至N移位器332，此16位元移位資料被發送至輸出端。Also input to adder 330 is a feedback loop of N shifter 332. This data is received at multiplexer 328 via Z ^-1 delay component 326. Also received at multiplexer 328 is rounded data. The multiplexer 328 selects the received data via the wider input at the selection of the multiplexer 328. The multiplexer 328 sends the selected data to the adder 330, which adds the received data and sends the added data to the N shifter 332, which is sent to the output.

圖5B為圖5A之圖的延續。更特定言之，如圖5B之實施例所說明，來自記憶體緩衝器340a、340b、340c以及340d之資料被發送至多工器342a。多工器342a將16位元資料發送至跳躍塊344a以及346a。同樣地，多工器342b自記憶體緩衝器340b、340c、340d以及340e接收資料，且將資料發送至跳躍塊344b以及346b；多工器342c自340c、340d、340e以及340f接收資料且將資料發送至344c以及346c；多工器342d自340d、340e、340f以及340g接收資料且將資料發送至跳躍塊344d以及346d；多工器342e自340e、340f、340g以及340h接收資料且將資料發送至344e以及346e；多工器342f自340f、340g、340h以及340i接收資料且將資料發送至344f以及346f；多工器342g自340g、340h、340i以及340h接收資料且將資料發送至跳躍塊344g以及346g；多工器342h自340h、340i、340j以及340k接收資料且將資料發送至344h以及346h；多工器342i自340i、340j、340k以及3401接收資料且將資料發送至跳躍塊344i以及346i。Figure 5B is a continuation of the diagram of Figure 5A. More specifically, as illustrated in the embodiment of FIG. 5B, the data from the memory buffers 340a, 340b, 340c, and 340d is sent to the multiplexer 342a. The multiplexer 342a transmits 16-bit metadata to the skip blocks 344a and 346a. Similarly, multiplexer 342b receives data from memory buffers 340b, 340c, 340d, and 340e and transmits the data to jump blocks 344b and 346b; multiplexer 342c receives data from 340c, 340d, 340e, and 340f and stores the data. Sended to 344c and 346c; multiplexer 342d receives data from 340d, 340e, 340f, and 340g and sends the data to jump blocks 344d and 346d; multiplexer 342e receives the data from 340e, 340f, 340g, and 340h and sends the data to 344e and 346e; multiplexer 342f receives data from 340f, 340g, 340h, and 340i and transmits the data to 344f and 346f; multiplexer 342g receives the data from 340g, 340h, 340i, and 340h and transmits the data to jump block 344g and 346g; multiplexer 342h receives data from 340h, 340i, 340j, and 340k and transmits the data to 344h and 346h; multiplexer 342i receives the data from 340i, 340j, 340k, and 3401 and transmits the data to jump blocks 344i and 346i.

圖5C為圖5A以及圖5B之圖的延續。更特定言之，自多工器342a之資料(經由跳躍塊348a)發送至記憶體緩衝器B、槽350a；自多工器342b之資料(經由跳躍塊348b)發送至記憶體B、槽350b；自多工器342c之資料(經由跳躍塊348c)發送至記憶體B、槽350c；自多工器342d之資料(經由跳躍塊348d)發送至記憶體B、槽350d；自多工器342e之資料(經由跳躍塊348e)發送至記憶體B、槽350e；自多工器342f之資料(經由跳躍塊348f)發送至記憶體B、槽350f；自多工器342g之資料(經由跳躍塊348g)發送至記憶體B、槽350g；自多工器342h之資料(經由跳躍塊348h)發送至記憶體B、槽350h；自多工器342i之資料(經由跳躍塊348i)發送至記憶體B、槽350i。Figure 5C is a continuation of the Figures 5A and 5B. More specifically, the data from the multiplexer 342a (via the jump block 348a) is sent to the memory buffer B, the slot 350a; the data from the multiplexer 342b (via the jump block 348b) is sent to the memory B, the slot 350b The data from the multiplexer 342c (via the jump block 348c) is sent to the memory B, the slot 350c; the data from the multiplexer 342d (via the jump block 348d) is sent to the memory B, the slot 350d; the multiplexer 342e The data is sent to the memory B, the slot 350e via the jump block 348e; the data from the multiplexer 342f (via the jump block 348f) is sent to the memory B, the slot 350f; the data from the multiplexer 342g (via the jump block) 348g) is sent to the memory B, the slot 350g; the data from the multiplexer 342h (via the jump block 348h) is sent to the memory B, the slot 350h; the data from the multiplexer 342i (via the jump block 348i) is sent to the memory B, slot 350i.

同樣地，自跳躍塊362j－362r之資料(自圖5D，以下論述)發送至轉置(Transpose)網路360。轉置網路360轉置所接收之資料；且將其發送至記憶體緩衝器B，記憶體緩衝器B將資料發送至跳躍塊366j－366r。Similarly, the data from the skip blocks 362j-362r (from Figure 5D, discussed below) is sent to the Transpose network 360. The transposed network 360 transposes the received data; and sends it to the memory buffer B, which sends the data to the skip blocks 366j-366r.

圖5D為圖5A－圖5C之圖的延續。更特定言之，資料在多工器369a處自跳躍塊368a(圖5B，經由多工器342a)以及跳躍塊368j(圖5C，經由記憶體緩衝器B)被接收，此資料由vert信號選擇且經由匯流排A(見圖5A)發送至FIR濾波器塊0370a。同樣地，多工器369b－369i自跳躍塊368b－368i以及368k－368r接收資料，此資料發送至FIR濾波器塊370b－370i且經處理，就如關於圖5A所敘述。自FIR濾波器塊0370a輸出之資料發送至跳躍塊372b以及372j；FIR濾波器塊370b輸出至跳躍塊372c以及372k；FIR濾波器塊370c輸出至跳躍塊372d以及3721；FIR濾波器塊370d輸出至跳躍塊372e以及372m；FIR濾波器塊370e輸出至跳躍塊372f以及372n；FIR濾波器塊370f輸出至跳躍塊372g以及372o；FIR濾波器塊370g輸出至跳躍塊372h以及372p；FIR濾波器塊370h輸出至跳躍塊372i以及372q；FIR濾波器塊370i輸出至跳躍塊372j以及372r。如上所論述，自跳躍塊372j－372r之資料由圖5C之轉置網路360接收。跳躍塊372b－372j在圖5E中繼續。Figure 5D is a continuation of the Figures 5A-5C. More specifically, the data is received at multiplexer 369a from jump block 368a (Fig. 5B, via multiplexer 342a) and jump block 368j (Fig. 5C, via memory buffer B). This data is selected by the vert signal. And sent to the FIR filter block 0370a via bus bar A (see FIG. 5A). Similarly, multiplexers 369b-369i receive data from jump blocks 368b-368i and 368k-368r, which are sent to FIR filter blocks 370b-370i and processed as described with respect to FIG. 5A. The data output from the FIR filter block 0370a is sent to the skip blocks 372b and 372j; the FIR filter block 370b is output to the skip blocks 372c and 372k; the FIR filter block 370c is output to the skip blocks 372d and 3721; and the FIR filter block 370d is output to Jump blocks 372e and 372m; FIR filter block 370e output to jump blocks 372f and 372n; FIR filter block 370f output to jump blocks 372g and 372o; FIR filter block 370g output to jump blocks 372h and 372p; FIR filter block 370h Output to jump blocks 372i and 372q; FIR filter block 370i outputs to jump blocks 372j and 372r. As discussed above, the data from the skip blocks 372j-372r is received by the transpose network 360 of Figure 5C. Jump blocks 372b-372j continue in Figure 5E.

圖5E為圖5A－圖5D之圖的延續。更特定言之，如圖5E之實施例中所說明，自跳躍塊376b之資料(經由圖5D之FIR濾波器塊370a)發送至記憶體緩衝器C、槽380b。同樣地，自跳躍塊376c之資料(經由圖5D之FIR濾波器塊370b)發送至記憶體緩衝器C、槽380c；自跳躍塊376d之資料(經由圖5D之FIR濾波器塊370c)發送至記憶體緩衝器C、槽380d；自跳躍塊376e之資料(經由圖5D之FIR濾波器塊370d)發送至記憶體緩衝器C、槽380e；自跳躍塊376f之資料(經由圖5D之FIR濾波器塊370e)發送至記憶體緩衝器C、槽380f；自跳躍塊376g之資料(經由圖5D之FIR濾波器塊370f)發送至記憶體緩衝器C、槽380g；自跳躍塊376h之資料(經由圖5D之FIR濾波器塊370g)發送至記憶體緩衝器C、槽380h；自跳躍塊376i之資料(經由圖5D之FIR濾波器塊370h)發送至記憶體緩衝器C、槽380i；自跳躍塊376j之資料(經由圖5D之FIR濾波器塊370i)發送至記憶體緩衝器C、槽380j。Figure 5E is a continuation of the Figures 5A-5D. More specifically, as illustrated in the embodiment of FIG. 5E, the data from the skip block 376b (via the FIR filter block 370a of FIG. 5D) is sent to the memory buffer C, slot 380b. Similarly, the data from the skip block 376c (via the FIR filter block 370b of FIG. 5D) is sent to the memory buffer C, slot 380c; the data from the skip block 376d (via the FIR filter block 370c of FIG. 5D) is sent to The memory buffer C, the slot 380d; the data from the skip block 376e (via the FIR filter block 370d of FIG. 5D) is sent to the memory buffer C, the slot 380e; and the data of the self-jump block 376f (via the FIR filter of FIG. 5D) The block 370e) is sent to the memory buffer C, the slot 380f; the data from the skip block 376g (via the FIR filter block 370f of FIG. 5D) is sent to the memory buffer C, the slot 380g; and the data of the self-jump block 376h ( Transmitted to the memory buffer C, slot 380h via the FIR filter block 370g) of FIG. 5D; the data from the skip block 376i (via the FIR filter block 370h of FIG. 5D) is sent to the memory buffer C, slot 380i; The data of the skip block 376j (via the FIR filter block 370i of FIG. 5D) is sent to the memory buffer C, slot 380j.

多工器382a自記憶體緩衝器C、槽380b、380c以及380d接收資料；多工器382b自記憶體緩衝器C、槽380d、380e以及380f接收資料；多工器382c自記憶體緩衝器C、槽380f、380g以及380h接收資料；多工器382d自記憶體緩衝器C、槽380h、380i以及380j接收資料。一旦接收到資料，多工器382a－382d便將資料發送至ALU 384a－384d。加法器382d接收此資料以及值“1”以處理所接收之資料並將經處理之資料分別發送至移位器386a－386d，移位器386a－386d將所接收之資料移位且將經移位之資料發送至Z塊388a－388d，接著將資料自Z塊388a－388d分別發送至多工器390a－390d。The multiplexer 382a receives data from the memory buffer C, the slots 380b, 380c, and 380d; the multiplexer 382b receives data from the memory buffer C, the slots 380d, 380e, and 380f; the multiplexer 382c is from the memory buffer C. The slots 380f, 380g, and 380h receive data; the multiplexer 382d receives data from the memory buffer C, the slots 380h, 380i, and 380j. Once the data is received, the multiplexers 382a-382d send the data to the ALUs 384a-384d. Adder 382d receives the data and the value "1" to process the received data and sends the processed data to shifters 386a-386d, respectively, and shifter 386a-386d shifts the received data and shifts The bit data is sent to Z blocks 388a-388d, and the data is then sent from Z blocks 388a-388d to multiplexers 390a-390d, respectively.

另外，Z塊388a自跳躍塊376b接收資料且將資料發送至多工器390a；Z塊388b自跳躍塊376c接收資料且將資料發送至多工器390b；Z塊388c自跳躍塊376d接收資料且將資料發送至多工器390c；Z塊388d自跳躍塊376e接收資料且將資料發送至多工器390d；多工器390a－390d亦接收選擇輸入且將選定資料發送至輸出端。In addition, Z block 388a receives data from jump block 376b and sends the data to multiplexer 390a; Z block 388b receives the data from jump block 376c and sends the data to multiplexer 390b; Z block 388c receives the data from jump block 376d and stores the data Send to multiplexer 390c; Z block 388d receives data from jump block 376e and sends the data to multiplexer 390d; multiplexer 390a-390d also receives the select input and sends the selected data to the output.

圖5F為圖5A－圖5E之組件的總圖之實施例。更特定言之，如圖5F之實施例所說明，資料在記憶體緩衝器A 340處被接收。此資料在多工器342處與記憶體緩衝器A 340中之其他資料一起多工。多工器342選擇資料，且將選定資料發送至記憶體緩衝器B 350。記憶體緩衝器B 350亦自傳送網路360接收資料。記憶體緩衝器B 350將資料發送至多工器369，多工器369亦自多工器342接收資料。多工器369選擇資料，且將選定資料發送至FIR濾波器370。FIR濾波器將所接收之資料過濾，且將經過濾之資料發送至記憶體緩衝器C 380、Z組件388以及傳送網路360。記憶體緩衝器C 380將資料發送至多工器382，多工器382自從記憶體緩衝器C 380接收之資料進行選擇。被選定的資料發送至ALU 384，ALU 384自所接收資料計算結果，且將計算所得的資料發送至移位器386。接著經移位之資料被發送至多工器390，多工器390亦自Z組件388接收資料，多工器390選擇結果且將此結果發送至輸出端。Figure 5F is an embodiment of the general view of the components of Figures 5A-5E. More specifically, as illustrated in the embodiment of FIG. 5F, data is received at memory buffer A 340. This data is multiplexed with other data in memory buffer A 340 at multiplexer 342. The multiplexer 342 selects the material and sends the selected material to the memory buffer B 350. Memory buffer B 350 also receives data from transport network 360. The memory buffer B 350 sends the data to the multiplexer 369, which also receives the data from the multiplexer 342. The multiplexer 369 selects the material and sends the selected data to the FIR filter 370. The FIR filter filters the received data and sends the filtered data to memory buffer C 380, Z component 388, and transport network 360. The memory buffer C 380 sends the data to the multiplexer 382, which selects the data received from the memory buffer C 380. The selected data is sent to the ALU 384, and the ALU 384 calculates the result from the received data and sends the calculated data to the shifter 386. The shifted data is then sent to multiplexer 390, which also receives the data from Z component 388, which selects the result and sends the result to the output.

圖5A－圖5F中所示之組件可用以提供動態壓縮(MC)及/或離散餘弦轉換(DCT)。更特定言之，視特殊實施例及/或資料格式而定，資料可在遞迴操作中通過圖5A－圖5F之組件多次以達成所要結果。另外，視特殊操作及特殊資料格式而定，資料可自EU 146及/或TFU 168接收。The components shown in Figures 5A-5F can be used to provide dynamic compression (MC) and/or discrete cosine transform (DCT). More specifically, depending on the particular embodiment and/or data format, the data may be passed through the components of Figures 5A-5F multiple times in a recursive operation to achieve the desired result. In addition, data may be received from EU 146 and/or TFU 168 depending on the particular operation and the particular data format.

如一非限制性實施例，在實際操作中，圖5A－圖5F之組件可用以接收關於待執行之操作(例如，運動補償、離散餘弦變換等)的指示。另外，還可接收關於資料格式(例如，H.264、VC－1、MPEG－2等)之指示。如一實施例，對於H.264格式而言，動態補償(MC)資料可在多個週期中穿過FIR濾波器370，且隨後進入之記憶體緩衝器C380以轉換為像素格式。如下更詳細論述，在H.264格式下之其他操作或其他資料可利用圖5A－圖5F之組件的相同或不同用途。另外，乘法器陣列可用以作為乘法器之陣列以執行16個16位元相乘及/或用作向量或矩陣乘法器。此一實例為SMMUL指令。As a non-limiting embodiment, in actual operation, the components of Figures 5A-5F can be used to receive an indication of operations to be performed (e.g., motion compensation, discrete cosine transform, etc.). In addition, an indication of a material format (eg, H.264, VC-1, MPEG-2, etc.) may also be received. As an embodiment, for the H.264 format, dynamic compensation (MC) data may pass through the FIR filter 370 in multiple cycles and then enter the memory buffer C380 to be converted to Pixel format. As discussed in more detail below, other operations or other materials in the H.264 format may utilize the same or different uses of the components of Figures 5A-5F. In addition, the multiplier array can be used as an array of multipliers to perform 16 16-bit multiplications and/or as vector or matrix multipliers. An example of this is the SMMUL instruction.

圖6為可用於計算架構(諸如圖2之計算架構)中之像素處理引擎的功能方塊圖。更特定言之，如圖6之實施例所說明，匯流排A(在移位暫存器前)以及匯流排B(見圖5A)將16位元資料發送至多工器400。多工器400之選擇埠處接收來自FIR濾波器370之否定信號，並選擇一筆16位元資料，將此資料發送至多工器406。另外，多工器402可用以接收匯流排A資料(在移位暫存器後)以及零資料。多工器402可在選擇埠處自6抽頭資料中選擇所要結果，此16位元結果可發送至16位元無正負號加法器404。16位元無正負號加法器404亦可用以自匯流排A接收資料(在移位暫存器前)。6 is a functional block diagram of a pixel processing engine that can be used in a computing architecture, such as the computing architecture of FIG. 2. More specifically, as illustrated in the embodiment of FIG. 6, bus A (before the shift register) and bus B (see FIG. 5A) send 16-bit data to multiplexer 400. The selection of the multiplexer 400 receives the negative signal from the FIR filter 370 and selects a 16-bit data to send this data to the multiplexer 406. Additionally, multiplexer 402 can be used to receive bus A data (after shift register) and zero data. The multiplexer 402 can select the desired result from the 6 tap data at the selection port, and the 16 bit result can be sent to the 16 bit unsigned adder 404. The 16 bit unsigned adder 404 can also be used for the sink. Row A receives the data (before the shift register).

16位元無正負號加法器404可加總所接收之資料，且將結果發送至多工器406。多工器406可用以自選擇埠處之所接收的通路反相6抽頭資料中進行選擇，選定之資料可發送至16x8乘法器410，乘法器410亦可接收模式資料。24位元結果隨後可發送至移位器412以提供32位元結果。The 16-bit unsigned adder 404 can sum the received data and send the result to the multiplexer 406. The multiplexer 406 can be selected by inverting the 6-tap data received from the selected port, the selected data can be sent to the 16x8 multiplier 410, and the multiplier 410 can also receive the mode data. The 24-bit result can then be sent to shifter 412 to provide a 32-bit result.

圖7A為可用於VC－1迴路內濾波器中(諸如在圖2之計算架構中)之組件功能方塊圖。如圖7A之實施例所說明，多工器420可在輸入埠處接收“1”值以及“0”值，多工器420亦可接收A0絕對值<Pquant與否作為選擇輸入。同樣地，多工器422可接收“1”值以及“0”值，以及A3<A0 490c絕對值與否。多工器424可接收“1”值、“0”值作為輸入，以及clip(剪輯)值不等於0與否(自圖7C之移位器468)作為選擇輸入。另外，自多工器420輸出之資料可發送至邏輯或閘426，邏輯或閘426可將資料發送至多工器428。多工器428亦可接收filter_other_3資料作為輸入。更特定言之，如圖7A中所示可產生filter_other_3信號，此信號若不為零，則指示需過濾其他三列像素；否則，可不過濾(修改)此4x4塊。多工器428根據在選擇輸入端所接收之處理像素資料3而選擇輸出資料。7A is a functional block diagram of components that may be used in a VC-1 in-loop filter, such as in the computing architecture of FIG. 2. As illustrated in the embodiment of FIG. 7A, the multiplexer 420 can receive a "1" value and a "0" value at the input port, and the multiplexer 420 can also receive the A0 absolute value <Pquant or not as a selection input. Similarly, multiplexer 422 can receive a "1" value as well as a "0" value, and A3 < A0 490c absolute value or not. The multiplexer 424 can receive a "1" value, a "0" value as an input, and a clip value that is not equal to 0 or not (from the shifter 468 of Figure 7C) as a selection input. Additionally, data output from multiplexer 420 can be sent to logic OR gate 426, which can send data to multiplexer 428. The multiplexer 428 can also receive the filter_other_3 data as an input. More specifically, as shown in FIG. 7A, a filter_other_3 signal can be generated. If the signal is not zero, it indicates that the other three columns of pixels need to be filtered; otherwise, the 4x4 block may not be filtered (modified). The multiplexer 428 selects the output data based on the processed pixel data 3 received at the selection input.

圖7B為圖7A之圖的延續。更特定言之，如圖7A之實施例所說明，絕對值組件430接收9位元輸入A1 490a(自圖7D)，絕對值組件432接收9位元輸入A2 490b(自圖7D)。藉由計算所接收資料之絕對值，最小值組件434判定所接收資料之最小值，且將此資料作為輸出A3並發送至2進位補數組件(2's compliment component)436。2進位補數組件436計算所接收資料之2進位補數，且將此資料發送至減法組件438。減法組件438自輸入資料A0 490c(自圖7D)減去此資料，隨後發送至移位器440以將結果向左移位兩位並發送至加法器442。另外，減法組件438之輸出將輸入至加法器442中，因此允許電路不使用乘法器就可執行乘以5的操作。Figure 7B is a continuation of the Figure 7A. More specifically, as illustrated in the embodiment of FIG. 7A, absolute value component 430 receives 9-bit input A1 490a (from Figure 7D) and absolute value component 432 receives 9-bit input A2 490b (from Figure 7D). The minimum component 434 determines the minimum value of the received data by calculating the absolute value of the received data, and sends this data as output A3 and sends it to the 2's compliment component 436. The 2-bit complement component 436 The 2-bit complement of the received data is calculated and sent to subtraction component 438. The subtraction component 438 subtracts this data from the input data A0 490c (from Figure 7D) and then sends it to the shifter 440 to shift the result to the left by two bits and to the adder 442. Additionally, the output of subtraction component 438 will be input to adder 442, thus allowing the circuit to perform an operation multiplied by 5 without using a multiplier.

加法器442加總所接收之資料，且將結果發送至移位器444。移位器444將所接收之資料向右移三位，且將資料發送至鉗位組件(clamp component)446。鉗位組件446亦接收剪輯值clip(自移位器468，圖7C)，且將結果發送至輸出端。應注意濾波器之結果可為負或大於255。因此此鉗位組件446可用以將結果鉗位至無正負號8位元值。因此，若輸入d為負的，則d將被設定為0。若d>剪輯值clip，則d可被設定為剪輯值clip。The adder 442 sums up the received data and sends the result to the shifter 444. The shifter 444 shifts the received data to the right by three bits and sends the data to a clamp component 446. Clamp component 446 also receives the clip value clip (self-shifter 468, Figure 7C) and sends the result to the output. It should be noted that the result of the filter can be negative or greater than 255. This clamp component 446 can therefore be used to clamp the result to an unsigned 8-bit value. Therefore, if input d is negative, d will be set to zero. If d>the clip value clip, d can be set as the clip value clip.

圖7C為圖7A以及圖7B之圖的延續。如圖7C之實施例，P1資料450a、P5資料450e以及P3資料450c被發送至多工器452。多工器452接收選擇輸入並選擇資料以發送至減法組件460。多工器亦將輸出資料發送至多工器454之選擇輸入端。Figure 7C is a continuation of the Figures 7A and 7B. As in the embodiment of FIG. 7C, P1 data 450a, P5 data 450e, and P3 data 450c are sent to multiplexer 452. The multiplexer 452 receives the selection input and selects the data for transmission to the subtraction component 460. The multiplexer also sends the output data to the selection input of multiplexer 454.

多工器454亦自P4 450d、P8 450h以及P6 450f接收輸入資料。多工器454將輸出資料發送至減法組件460。減法組件460對所接收之資料作減法，並將結果發送至移位器466。移位器466將所接收之資料向左移一位，且將此結果發送至跳躍塊474。Multiplexer 454 also receives input data from P4 450d, P8 450h, and P6 450f. The multiplexer 454 sends the output data to the subtraction component 460. Subtraction component 460 subtracts the received data and sends the result to shifter 466. Shifter 466 shifts the received data one bit to the left and sends the result to jump block 474.

同樣地，多工器456接收輸入P2 450b、P3 450c以及P4 450d。多工器456自多工器454接收選擇輸入，且將所選定之資料發送至減法組件464。多工器458自多工器456接收選擇輸入，且自P3 450c、P7 450g以及P5 450e接收輸入資料。多工器將輸出資料發送至減法組件464，減法組件464對所接收之資料作減法，並將此資料發送至移位器470以及加法器472。移位器470將所接收之資料向左移兩位，且將經移位之資料發送至加法器472，加法器472相加所接收之資料且將結果發送至跳躍塊480。Similarly, multiplexer 456 receives inputs P2 450b, P3 450c, and P4 450d. The multiplexer 456 receives the selection input from the multiplexer 454 and sends the selected data to the subtraction component 464. Multiplexer 458 receives the selection input from multiplexer 456 and receives input data from P3 450c, P7 450g, and P5 450e. The multiplexer sends the output data to subtraction component 464, which subtracts the received data and sends the data to shifter 470 and adder 472. Shifter 470 shifts the received data to the left by two bits and transmits the shifted data to adder 472, which adds the received data and sends the result to jump block 480.

另外，減法組件462自P4 450d以及P5 450e接收資料、對所接收之資料作減法並將結果發送至移位器468。移位器468將所接收之資料向右移一位，且輸出此資料作為剪輯資料clip以輸入至鉗位組件446以及多工器424。另外，P4 450d被發送至跳躍塊476而P3 450e資料被發送至跳躍塊478。In addition, subtraction component 462 receives the data from P4 450d and P5 450e, subtracts the received data, and sends the result to shifter 468. The shifter 468 shifts the received data to the right by one bit, and outputs the material as a clip data clip for input to the clamp component 446 and the multiplexer 424. Additionally, P4 450d is sent to jump block 476 and P3 450e data is sent to jump block 478.

圖7D為圖7A－圖7C之圖的延續。更特定言之，如圖7D之實施例，減法組件486自跳躍塊482以及跳躍塊484接收資料。減法組件486對所接收之資料作減法且將結果發送至移位器488。移位器488將所接收之資料向右移三位且將結果發送至A1 490a、A2 490b以及A0 490c。Figure 7D is a continuation of the Figures 7A-7C. More specifically, as in the embodiment of FIG. 7D, subtraction component 486 receives data from skip block 482 and jump block 484. Subtraction component 486 subtracts the received data and sends the result to shifter 488. Shifter 488 shifts the received data to the right by three bits and sends the result to A1 490a, A2 490b, and A0 490c.

另外，多工器496接收輸入資料“0”以及“d”。此操作可包括：If(Do_filter){ P4[I]＝P4[I]－D[I] P5[I]＝P5[I]＋D[I]}In addition, the multiplexer 496 receives the input data "0" and "d". This operation may include: If(Do_filter){ P4[I]=P4[I]-D[I] P5[I]=P5[I]+D[I]}

多工器496經由do_filter選擇輸入而選擇所要結果。所述結果發送至減法組件500。減法組件500亦自跳躍塊492接收資料(經由跳躍塊476，圖7C)，對所接收之資料作減法並將結果發送至P4 450d。The multiplexer 496 selects the desired result via the do_filter selection input. The result is sent to subtraction component 500. Subtraction component 500 also receives data from skip block 492 (via skip block 476, Figure 7C), subtracts the received data and sends the result to P4 450d.

多工器498亦接收“0”以及“d”作為輸入以及do_filter作為選擇輸入。多工器498多工此資料且將結果發送至加法器502。加法器502亦自跳躍塊494接收資料(經由跳躍塊478，圖7C)、相加所接收之輸入且將結果發送至P5 450e。The multiplexer 498 also receives "0" and "d" as inputs and do_filter as selection inputs. The multiplexer 498 multiplexes this data and sends the result to the adder 502. Adder 502 also receives data from hop block 494 (via hop block 478, Figure 7C), adds the received input, and sends the result to P5 450e.

圖8為可用於在計算架構(諸如圖2之計算架構)中執行絕對差和(sum of absolute difference，SAD)計算之邏輯區塊的方塊圖。更特定言之，如圖8之實施例，組件504接收32位元資料A[31：0]之一部分以及32位元資料B之一部分。組件504藉由判定若(C)s＝Not(S)＋1則{C,S}←A－B與否，而將輸出提供至加法器512。同樣地，組件506接收A資料以及B資料，且基於與組件504類似之判定將輸出發送至加法器512，除了組件506所接收之A資料以及B資料為[23：16]位元的部分以外，相對於組件504所接收之資料為[31：24]位元的部份。同樣地，組件508接收[15：8]位元部份的資料、執行與組件504以及506類似的計算且將結果發送至加法器512。組件510接收[7：0]位元部份的資料、執行與組件504、506以及508類似的計算且將結果發送至加法器512。8 is a block diagram of logic blocks that may be used to perform a sum of absolute difference (SAD) calculation in a computing architecture, such as the computing architecture of FIG. 2. More specifically, as in the embodiment of FIG. 8, component 504 receives a portion of 32-bit metadata A[31:0] and a portion of 32-bit metadata B. The component 504 provides an output to the adder 512 by determining if (C) s = Not (S) + 1 then {C, S} ← A - B or not. Similarly, component 506 receives the A data and the B data and sends the output to adder 512 based on a determination similar to component 504, except for the A data received by component 506 and the portion of B data that is [23:16] bits. The data received relative to component 504 is part of [31:24] bits. Likewise, component 508 receives the data for the [15:8] bit portion, performs calculations similar to components 504 and 506, and sends the result to adder 512. Component 510 receives the data for the [7:0] bit portion, performs calculations similar to components 504, 506, and 508 and sends the result to adder 512.

另外，組件514、516、518以及520接收資料A對應於位元[63：32]之32位元的部分(與在組件504－510處所接收之[31：0]位元部份的資料相對)。更特定言之，組件514接收資料A以及資料B中[31：24]位元部份的資料。組件514執行如上所論述之類似計算，且將8位元結果發送至加法器522。同樣地，組件516接收[23：16]位元部份的資料、執行類似計算，且將所得資料發送至加法器522。組件518如上所述接收資料A以及資料B中[15：8]位元部份的資料、處理所接收之資料，且將結果發送至加法器522。組件520如上所論述接收資料A以及資料B中[7：0]位元部份的資料、處理所接收之資料，且將結果發送至加法器522。In addition, components 514, 516, 518, and 520 receive portions of data A corresponding to 32 bits of bits [63:32] (as opposed to data for the [31:0] bit portions received at components 504-510. ). More specifically, component 514 receives data from data A and the [31:24] bit portion of data B. Component 514 performs a similar calculation as discussed above and sends the 8-bit result to adder 522. Similarly, component 516 receives the data for the [23:16] bit portion, performs a similar calculation, and sends the resulting data to adder 522. Component 518 receives the data of the [15:8] bit portion of data A and data B as described above, processes the received data, and sends the result to adder 522. Component 520 receives the data of the [7:0] bit portion of data A and data B as discussed above, processes the received data, and sends the result to adder 522.

組件524－530接收A資料以及B資料中[95：64]位元部份之32位元。更特定言之，組件524接收[31：24]位元，組件526接收[23：16]位元，組件528接收[15：8]位元，而組件530接收[7：0]位元的資料。一旦接收到此資料，組件524－530可用以處理所接收之資料，如上所述，經處理資料隨後可發送至加法器532。同樣地，組件534－540接收A資料以及B資料中[127：96]位元部份之32位元資料。更特定言之，組件534接收A資料以及B中[31：24]位元部份的資料，組件536接收[23：16]位元部份的資料，組件538接收[15：8]位元部份的資料，組件540接收[7：0]位元部份的資料。所接收資料如上所論述經處理且發送至加法器541。另外，加法器512、522、532以及542對所接收之資料作加法，且將10位元結果發送至加法器544。加法器544相加所接收之資料，且將12位元資料發送至輸出端。Components 524-530 receive the A data and the 32 bits of the [95:64] bit portion of the B data. More specifically, component 524 receives [31:24] bits, component 526 receives [23:16] bits, component 528 receives [15:8] bits, and component 530 receives [7:0] bits. data. Upon receipt of this material, components 524-530 can be used to process the received data, which can then be sent to adder 532, as described above. Similarly, components 534-540 receive the 32-bit data of the A material and the [127:96] bit portion of the B data. More specifically, component 534 receives the A data and the data of the [31:24] bit portion of B, component 536 receives the data for the [23:16] bit portion, and component 538 receives the [15:8] bit. For some of the data, component 540 receives the data for the [7:0] bit portion. The received data is processed as discussed above and sent to adder 541. In addition, adders 512, 522, 532, and 542 add the received data and send the 10-bit result to adder 544. The adder 544 adds the received data and sends the 12-bit data to the output.

圖9為類似於圖8所示可用於執行絕對差和(SAD)計算之過程之另一實施例的流程圖。更特定言之，如圖9之實施例，“i”之定義為塊尺寸BlkSize且suma初始化為“0”(區塊550)。首先判定i是否大於“0”(方塊552)，若i大於“0”，則vecx[i]＝Tabelx[i]、vecy[i]＝Tabely[i]、vectx＝mv_x＋vecx[i]且vecty＝mv_y＋vecy[i](方塊554)。接著可利用vectx以及vecty計算位址，亦可自PredImage提取4×4記憶體資料(位元組對準)(方塊556)。128位元預測資料可發送至SAD 44(見圖8)，如方塊558中所說明。另外，方塊560可接收塊資料且計算位址。在方塊560，亦可自RefImage提取4×4記憶體資料並執行位元組對準。128位元Ref[i]資料隨後可發送至SAD 44(方塊558)。和值可自SAD 44發送至方塊562，其中總和值suma增加“1”而i減少“1”。接著可判定總和值suma是否大於臨限值(方塊564)。若是，則過程可停止；另一方面，若總和值suma不大於該臨限值，則過程可返回方塊552以判定i是否大於0。若i不大於0，則過程可結束。9 is a flow chart similar to another embodiment of the process shown in FIG. 8 that can be used to perform absolute difference sum (SAD) calculations. More specifically, as in the embodiment of Fig. 9, "i" is defined as the block size BlkSize and suma is initialized to "0" (block 550). First, it is determined whether i is greater than "0" (block 552). If i is greater than "0", then vecx[i]=Tabelx[i], vecy[i]=Tabely[i], vectx=mv_x+vecx[i], and vecty= Mv_y+vecy[i] (block 554). The address can then be calculated using vectx and vecty, or 4x4 memory data (byte alignment) can be extracted from PredImage (block 556). The 128-bit prediction data can be sent to the SAD 44 (see Figure 8) as illustrated in block 558. Additionally, block 560 can receive the block data and calculate the address. At block 560, 4x4 memory data may also be extracted from the RefImage and byte alignment performed. The 128-bit Ref[i] data can then be sent to the SAD 44 (block 558). The sum value can be sent from SAD 44 to block 562 where the sum value suma is increased by "1" and i is decreased by "1". It can then be determined if the sum value suma is greater than the threshold (block 564). If so, the process can be stopped; if the sum value suma is not greater than the threshold, the process can return to block 552 to determine if i is greater than zero. If i is not greater than 0, the process can end.

圖10A為可用於解塊操作中(諸如可在圖2之電腦架構中執行)之多個組件的方塊圖。如圖10A之實施例，ALU 580接收輸入資料p2以及p0，且將資料發送至絕對值組件586。絕對值組件586計算所接收資料之絕對值且輸出資料a_p ，判定組件590判定a_p 是否小於β且將資料發送至跳躍塊596。ALU 580亦將資料發送至跳躍塊594。同樣地，ALU 582自q0以及q2接收資料。在計算結果之後，ALU 582將資料發送至絕對值組件588，絕對值組件588判定所接收資料之絕對值，並將a_p 發送至判定組件592。判定組件592判定a_q 是否小於β且將資料發送至跳躍塊598。FIG. 10A is a block diagram of various components that may be used in a deblocking operation, such as may be performed in the computer architecture of FIG. 2. As with the embodiment of FIG. 10A, ALU 580 receives input data p2 and p0 and sends the data to absolute value component 586. The absolute value component 586 calculates the absolute value of the received data and outputs the data a _p . The decision component 590 determines if a _p is less than β and sends the data to the skip block 596. The ALU 580 also sends the data to the jump block 594. Similarly, ALU 582 receives data from q0 and q2. After the calculation, ALU 582 transmits the data to the absolute value of the assembly 588, assembly 588 determines the absolute value of the absolute value of the received data, and a _p is sent to the decision component 592. Decision component 592 determines if a _q is less than β and sends the data to jump block 598.

ALU 600自q0以及p0接收資料、計算結果且將結果發送至絕對值組件606。絕對值組件606判定與所接收資料的絕對值，且將其發送至判定組件612。判定組件612判定所接收之值是否小於α，且將結果發送至及閘620。The ALU 600 receives the data from q0 and p0, calculates the result, and sends the result to the absolute value component 606. Absolute value component 606 determines the absolute value of the received data and sends it to decision component 612. Decision component 612 determines if the received value is less than a and sends the result to AND gate 620.

ALU 602自p0以及p1接收資料、計算結果且將結果發送至絕對值組件608。絕對值組件608判定所接收資料之絕對值，且將此值發送至判定組件614。判定組件614判定所接收資料是否小於β，且將結果發送至及閘620。ALU 604自q0以及q1接收資料、計算結果且將結果發送至絕對值組件610。絕對值組件610判定所接收資料之絕對值，且將結果發送至判定組件616。判定組件616判定所接收資料是否小於β，且將結果發送至及閘620。另外，及閘620自判定組件618接收資料，判定組件618接收bS資料且判定此資料是否不等於零。ALU 602 receives the data from p0 and p1, calculates the result, and sends the result to absolute value component 608. Absolute value component 608 determines the absolute value of the received data and sends this value to decision component 614. Decision component 614 determines if the received data is less than β and sends the result to AND gate 620. The ALU 604 receives the data from q0 and q1, calculates the result, and sends the result to the absolute value component 610. Absolute value component 610 determines the absolute value of the received data and sends the result to decision component 616. Decision component 616 determines if the received data is less than β and sends the result to AND gate 620. Additionally, the AND gate 620 receives the data from the decision component 618, and the decision component 618 receives the bS data and determines if the data is not equal to zero.

圖10B為圖10A之圖的延續。更特定言之，ALU 622自p1以及q1接收資料、計算結果且將資料發送至ALU 624。ALU 624亦自跳躍塊646接收資料(經由圖10A的ALU 580)以及在進位輸入端之4位元資料。ALU 624隨後計算結果且將結果發送至移位器626，移位器626將所接收之資料向右移三位。移位器626隨後將資料發送至剪輯3(clip3)組件628，clip3組件628亦自跳躍塊630接收資料(經由圖10D的ALU 744，以下更詳細描述)。clip3組件628將資料發送至多工器634且發送至”非(NOT)”閘632。非閘632反轉所接收資料，且將反相資料發送至多工器634。多工器634亦在選擇輸入端接收t_c0 資料，且將選定資料發送至ALU 636。ALU 636亦自多工器640接收資料。多工器640自q0以及p0接收資料，且自！left_top接收選擇輸入。ALU 636之進位輸入端接收來自多工器642之資料。多工器642接收“1”以及“0”以及！left_top資料。ALU 636將結果發送至SAT(0,255)638，SAT(0,255)638將資料發送至跳躍塊644(在多工器790處繼續，圖10E)。Figure 10B is a continuation of the diagram of Figure 10A. More specifically, ALU 622 receives data from p1 and q1, calculates the results, and sends the data to ALU 624. ALU 624 also receives data from jump block 646 (via ALU 580 of Figure 10A) and 4-bit data at the carry input. The ALU 624 then calculates the result and sends the result to the shifter 626, which shifts the received data to the right by three bits. The shifter 626 then sends the data to the clip 3 component (clip 3) component 628, which also receives the data from the skip block 630 (via the ALU 744 of Figure 10D, described in more detail below). The clip3 component 628 sends the data to the multiplexer 634 and sends it to the "NOT" gate 632. The non-gate 632 inverts the received data and sends the inverted data to the multiplexer 634. The multiplexer 634 also receives the _tc0 data at the select input and sends the selected data to the ALU 636. The ALU 636 also receives data from the multiplexer 640. The multiplexer 640 receives data from q0 and p0, and from! Left_top receives the selection input. The carry input of ALU 636 receives data from multiplexer 642. The multiplexer 642 receives "1" and "0" and! Left_top data. ALU 636 sends the result to SAT (0, 255) 638, which sends the data to jump block 644 (continued at multiplexer 790, Figure 10E).

另外，ALU 648自q0以及p0接收資料以及在選擇輸入端接收一位元資料，ALU 648計算結果且將此資料發送至移位器650。移位器650將所接收之資料向右移一位，且將所移位之資料發送至ALU 652。同樣地，多工器656自p1以及q1接收資料以及！left_top作為選擇輸入，多工器656判定結果，且將結果發送至移位器658。移位器658將所接收之資料向左移一位，且將所移位之資料發送至ALU 652，ALU 652計算結果且將資料發送至ALU 662。ALU 662亦自多工器660接收資料，多工器660接收q2以及p2以及來自跳躍塊680之資料(經由圖10E的非閘802)。In addition, ALU 648 receives data from q0 and p0 and receives a bit of metadata at the selection input, and ALU 648 calculates the result and sends this data to shifter 650. The shifter 650 shifts the received data one bit to the right and transmits the shifted data to the ALU 652. Similarly, multiplexer 656 receives data from p1 and q1 as well! Left_top is selected as the input, and multiplexer 656 determines the result and sends the result to shifter 658. Shifter 658 shifts the received data one bit to the left and sends the shifted data to ALU 652, which calculates the result and sends the data to ALU 662. ALU 662 also receives data from multiplexer 660, which receives q2 and p2 and data from hop block 680 (via non-gate 802 of Figure 10E).

ALU 662計算結果且將此資料發送至移位器664，移位器664將所接收之資料向右移一位，且將所移位之資料發送至剪輯3(clip3)組件668。clip3組件668亦接收，且將資料發送至ALU 670。ALU 670亦自多工器656接收資料，計算結果後將此資料發送至多工器672。多工器672亦自多工器656接收資料以及自跳躍塊678接收資料(經由圖10E的多工器754)，並將資料發送至跳躍塊674。The ALU 662 calculates the result and sends this data to the shifter 664, which shifts the received data one bit to the right and sends the shifted data to the clip 3 component 668. Clip3 component 668 also receives And send the data to ALU 670. The ALU 670 also receives data from the multiplexer 656, and sends the data to the multiplexer 672 after calculating the result. The multiplexer 672 also receives data from the multiplexer 656 and receives data from the hop block 678 (via the multiplexer 754 of FIG. 10E) and sends the data to the hop block 674.

圖10C為圖10A以及圖10B之圖的延續。如圖10C之實施例，多工器682自p2、p1以及！left_top接收資料，並將選定資料發送至加法器706。多工器684接收p1以及p0與！left_top並將結果發送至移位器700。移位器700將所接收之資料向左移一位，且將其發送至加法器706。多工器686自p0以及q1以及！left_top接收資料。多工器686將資料發送至移位器702，移位器702將所接收之資料向左移一位，且將所移位之資料發送至加法器706。多工器688自q0以及q1以及！left_top接收資料，並將選定資料發送至移位器704，移位器704將所接收之資料向左移一位，且將其發送至加法器706。多工器690自q1以及q2以及！left_top接收資料且將資料發送至加法器706。加法器706亦接收進位輸入端之4位元，且將輸出發送至跳躍塊708。Figure 10C is a continuation of the Figures 10A and 10B. As in the embodiment of Figure 10C, multiplexer 682 is from p2, p1, and ! The left_top receives the data and sends the selected data to the adder 706. The multiplexer 684 receives p1 and p0 with! Left_top sends the result to shifter 700. The shifter 700 shifts the received data one bit to the left and sends it to the adder 706. Multiplexer 686 from p0 and q1 and! Left_top receives data. The multiplexer 686 sends the data to the shifter 702, which shifts the received data one bit to the left and sends the shifted data to the adder 706. Multiplexer 688 from q0 and q1 and! The left_top receives the data and sends the selected data to the shifter 704, which shifts the received data one bit to the left and sends it to the adder 706. Multiplexer 690 from q1 and q2 and! The left_top receives the data and sends the data to the adder 706. Adder 706 also receives the 4-bit of the carry input and sends the output to jump block 708.

同樣地，多工器691接收q2、p0以及！left_top，並選擇一結果將其發送至加法器698。多工器692接收p1、p0以及！left_top且將選定結果發送至加法器698。多工器694自q0、q1以及！left_top接收資料，並選擇一結果將其發送至加法器698。多工器696接收q0、q2以及！left_top，並選擇所要結果將此資料發送至加法器698。加法器698亦接收進位輸入端之2位元且將輸出發送至跳躍塊710。Similarly, multiplexer 691 receives q2, p0, and ! Left_top, and select a result to send it to adder 698. The multiplexer 692 receives p1, p0, and ! Left_top and sends the selected result to adder 698. Multiplexer 694 from q0, q1 and ! Left_top receives the data and selects a result to send it to adder 698. The multiplexer 696 receives q0, q2, and ! Left_top, and select the desired result to send this data to adder 698. Adder 698 also receives the 2 bits of the carry input and sends the output to skip block 710.

多工器712接收p3、q3以及！left_top且將結果發送至移位器722。移位器722將所接收之資料向左移一位，且將其發送至加法器726。多工器714接收p2、q2以及！left_top，且將選定結果發送至移位器724以及加法器726。移位器724將所接收之資料向左移一位，且將所移位之結果發送至加法器726。多工器716接收p1、q1以及！left_top且將選定結果發送至加法器726。多工器718接收p0、q0以及！left_top，且將選定結果發送至加法器726。多工器720接收p0、q0以及！left_top，且將選定結果發送至加法器726。加法器726在進位輸入端接收四位元與所接收之資料相加，加總後之資料發送至跳躍塊730。The multiplexer 712 receives p3, q3, and ! Left_top and sends the result to shifter 722. The shifter 722 shifts the received data one bit to the left and sends it to the adder 726. The multiplexer 714 receives p2, q2, and ! Left_top, and the selected result is sent to the shifter 724 and the adder 726. The shifter 724 shifts the received data one bit to the left and sends the shifted result to the adder 726. The multiplexer 716 receives p1, q1, and ! Left_top and sends the selected result to adder 726. The multiplexer 718 receives p0, q0, and ! Left_top, and the selected result is sent to adder 726. The multiplexer 720 receives p0, q0, and ! Left_top, and the selected result is sent to adder 726. The adder 726 receives the four bits at the carry input and adds the received data, and the summed data is sent to the jump block 730.

圖10D為圖10A－圖10C之圖的延續。更特定言之，如圖10D之實施例，α表格750接收IndexA以及輸出α。β表格748接收IndexB且將資料輸出至零擴展(Zero Extend)組件752，零擴展組件752輸出β。Figure 10D is a continuation of the Figures 10A-10C. More specifically, as in the embodiment of FIG. 10D, the alpha table 750 receives IndexA and output a. The beta table 748 receives the IndexB and outputs the data to the Zero Extend component 752, which outputs beta.

同樣地，多工器736接收“1”以及“0”以及來自跳躍塊732之資料(經由圖10A的判定塊590)，並選擇結果將其發送至ALU 740。多工器738亦接收“1”以及“0”以及來自跳躍塊734之資料(經由圖10A的判定塊592)，並將選定結果發送至ALU 740。ALU 740計算結果且將資料發送至多工器742。多工器742亦接收“1”以及色度邊緣旗標(chroma edge flag)資料，並選擇結果且將其發送至ALU 744。ALU 744亦接收t_c0 、計算結果t_c 且將結果發送至跳躍塊746。Likewise, multiplexer 736 receives "1" and "0" and data from hop block 732 (via decision block 590 of FIG. 10A) and selects the result to send it to ALU 740. The multiplexer 738 also receives "1" and "0" and data from the hop block 734 (via decision block 592 of FIG. 10A) and sends the selected result to the ALU 740. The ALU 740 calculates the result and sends the data to the multiplexer 742. The multiplexer 742 also receives the "1" and chroma edge flag data and selects the result and sends it to the ALU 744. The ALU 744 also receives t _c0 , computes the result t _c and sends the result to the skip block 746 .

圖10E為圖10A－圖10D之圖的延續。更特定言之，如圖10E實施例，多工器754接收與關係式“ChromaEdgeFlag＝＝0)&&(a_p <β)”相關的資料，以及與關係式“ChromaEdgeFlag＝＝0)&&(a_q <β)”相關的資料，並自非組件802接收資料，且將選定資料發送至跳躍塊756(至圖10B之多工器672)。Figure 10E is a continuation of the Figures 10A-10D. More specifically, as shown in the embodiment of FIG. 10E, the multiplexer 754 receives data related to the relation "ChromaEdgeFlag==0)&&(a _p <β)", and the relationship "ChromaEdgeFlag==0)&&(a _q <β)" related material, and receives data from non-component 802, and sends the selected data to jump block 756 (to multiplexer 672 of Figure 10B).

另外，多工器780接收與關係式“ChromaEdgeFlag＝＝0)&&(a_p <β)&&(abs(p0－q0)<((α>>2)＋2)”相關的資料以及與關係式“ChromaEdgeFlag＝＝0)&&(a_q <β)&&(abs(p0－q0)<((α>>2)＋2))”相關的資料，多工器780亦自非組件802接收選擇輸入，依此選擇所要結果且將其發送至多工器782、784以及786。In addition, the multiplexer 780 receives the data related to the relation "ChromaEdgeFlag==0)&&(a _p <β)&&(abs(p0-q0)<((α>>2)+2)" and the relation "ChromaEdgeFlag==0)&&(a _q <β)&&(abs(p0-q0)<((α>>2)+2))"), the multiplexer 780 also receives the selection input from the non-component 802, This selects the desired result and sends it to multiplexers 782, 784, and 786.

多工器757自p1、q1以及非組件802接收資料，將選定資料發送至移位器763，移位器763將所接收之資料向左移一位，且將其發送至加法器774。多工器759自非組件802接收p0、q0以及資料，且將選定資料發送至加法器774。多工器761自q1、p1以及非組件802接收資料，且將資料發送至加法器774。加法器774亦在進位輸入端接收兩位元之資料，且將輸出發送至多工器782。The multiplexer 757 receives the data from p1, q1 and non-component 802, sends the selected data to the shifter 763, which shifts the received data one bit to the left and sends it to the adder 774. The multiplexer 759 receives p0, q0, and data from the non-component 802 and sends the selected data to the adder 774. The multiplexer 761 receives the data from q1, p1, and the non-component 802, and transmits the data to the adder 774. Adder 774 also receives the two-bit data at the carry input and sends the output to multiplexer 782.

移位器764自跳躍塊758接收資料(經由圖10C的加法器706)且將所接收之資料向右移三位，接著將所移位之資料發送至多工器782。移位器766自跳躍塊760接收資料(經由圖10C的加法器698)且將所接收之資料向右移兩位，接著將所移位之資料發送至多工器784。移位器768自跳躍塊762接收資料(自圖10C的加法器726)且將所接收之資料向右移三位，接著將所移位之資料發送至多工器786。The shifter 764 receives the data from the skip block 758 (via the adder 706 of FIG. 10C) and shifts the received data to the right by three bits, and then transmits the shifted data to the multiplexer 782. The shifter 766 receives the data from the skip block 760 (via the adder 698 of FIG. 10C) and shifts the received data to the right by two bits, and then transmits the shifted data to the multiplexer 784. The shifter 768 receives the data from the skip block 762 (from the adder 726 of FIG. 10C) and shifts the received data to the right by three bits, and then transmits the shifted data to the multiplexer 786.

如以上所論述，多工器782自移位器764以及加法器782以及多工器780接收資料，自此資料選擇結果且將其發送至多工器790。同樣地，多工器784自移位器766、資料多工器780與多工器776接收資料。多工器776接收p1、q1以及來自非組件802之資料，接著將選定結果發送至多工器798。多工器786自移位器768、多工器780與多工器778接收資料。多工器778接收p2、q2以及來自非組件802之資料。多工器786將選定資料發送至多工器800。As discussed above, multiplexer 782 receives data from shifter 764 and adder 782 and multiplexer 780, from which the results are selected and sent to multiplexer 790. Similarly, multiplexer 784 receives data from shifter 766, data multiplexer 780, and multiplexer 776. The multiplexer 776 receives p1, q1 and data from the non-component 802, and then sends the selected result to the multiplexer 798. The multiplexer 786 receives data from the shifter 768, the multiplexer 780, and the multiplexer 778. Multiplexer 778 receives p2, q2, and data from non-component 802. The multiplexer 786 sends the selected data to the multiplexer 800.

如上所論述，多工器790自多工器782接收資料。另外，多工器790自跳躍塊772(經由圖10B的SAT組件638)以及多工器794接收資料。多工器794接收p0、q0以及非組件802之資料。多工器790亦接收bSn & nfilterSampleFlag資料作為選擇輸入，並將選定資料發送至緩衝器808以及810。同樣地，多工器798自多工器784、跳躍塊755(經由圖10B的多工器674)與多工器792接收資料以及選擇輸入的bSn & nfilterSampleFlag資料。多工器792接收p1、q1以及非組件802之資料。多工器798將資料發送至緩衝器806以及812。同樣地，多工器800自多工器786接收資料且接收bSn & nfilterSampleFlag資料作為選擇輸入。另外，多工器800自多工器788接收資料。多工器788接收p2、q2以及非組件802之資料。多工器800選擇所要資料，且將資料發送至緩衝器806以及814。緩衝器804－814亦自非組件802接收資料，且將資料分別發送至p2、p1、p0、q0、q1以及q2。As discussed above, multiplexer 790 receives data from multiplexer 782. Additionally, multiplexer 790 receives data from jump block 772 (via SAT component 638 of FIG. 10B) and multiplexer 794. The multiplexer 794 receives the data of p0, q0 and non-component 802. The multiplexer 790 also receives the bSn & nfilterSampleFlag data as a selection input and sends the selected data to the buffers 808 and 810. Similarly, multiplexer 798 receives data from multiplexer 784, jump block 755 (via multiplexer 674 of FIG. 10B) and multiplexer 792, and selects the input bSn & nfilterSampleFlag data. The multiplexer 792 receives the data of p1, q1 and non-component 802. The multiplexer 798 sends the data to the buffers 806 and 812. Similarly, multiplexer 800 receives data from multiplexer 786 and receives bSn & nfilterSampleFlag data as a selection input. Additionally, multiplexer 800 receives data from multiplexer 788. The multiplexer 788 receives the data of p2, q2, and non-component 802. The multiplexer 800 selects the desired material and sends the data to the buffers 806 and 814. Buffers 804-814 also receive data from non-component 802 and send the data to p2, p1, p0, q0, q1, and q2, respectively.

圖11為說明可用於在計算架構(諸如圖2之計算架構)中執行資料之過程之實施例流程圖。如圖11之實施例紋理位址產生器TAG的奇數方塊880以及偶數方塊882(亦見圖2之150)接收來自輸出端口144(圖2)的資料。接著產生用於所接收資料的位址，且此過程進行至紋理快取記憶體與控制器(TCC)884、886(亦見圖2，166)。11 is a flow diagram illustrating an embodiment of a process that can be used to execute data in a computing architecture, such as the computing architecture of FIG. 2. The odd block 880 and the even block 882 (see also FIG. 2 of FIG. 2) of the texture address generator TAG of FIG. 11 receive data from the output port 144 (FIG. 2). An address for the received data is then generated and the process proceeds to texture cache memory and controller (TCC) 884, 886 (see also Figures 2, 166).

資料隨後可發送至快取記憶體890以及紋理濾波先進先出組件(Texture Cache First In First Out，TFF)888、892，其可用以充當延遲佇列/緩衝器。資料隨後發送至紋理濾波單元894、896(Texture Filter Unit，TFU，亦見圖2，168)。一旦資料經過濾波後，TFU894、896便將資料發送至VPU 898、900(亦見圖2，199)。視指令是否要求動態補償濾波、紋理快取記憶體濾波、互解塊濾波及/或絕對差和而定，資料可發送至不同VPU及/或相同VPU之不同部分。在處理了所接收之資料之後，VPU 898、900可將資料發送至輸入端口902、904之輸出端(亦見圖2，142)。The data can then be sent to cache memory 890 and Texture Cache First In First Out (TFF) 888, 892, which can be used to act as a delay queue/buffer. The data is then sent to texture filtering units 894, 896 (Texture Filter Unit, TFU, see also Figures 2, 168). Once the data has been filtered, TFU 894 and 896 send the data to VPU 898, 900 (see also Figure 2, 199). Depending on whether the instruction requires dynamic compensation filtering, texture cache memory filtering, mutual deblocking filtering, and/or absolute difference, the data can be sent to different VPUs and/or different parts of the same VPU. After processing the received data, the VPU 898, 900 can send the data to the outputs of the input ports 902, 904 (see also Figures 2, 142).

本文中所揭露之實施例可在硬體、軟體、韌體或其組合中實施。本文中所揭露之至少一實施例在儲存於記憶體中，且由適當指令執行系統所執行之軟體及/或韌體中實施。若在硬體中實施，如在替代實施例中，則本文中所揭露之實施例可以以下技術之任一者或組合來實施：具有用於對資料信號實施邏輯功能之邏輯閘的離散邏輯電路、具有適當組合邏輯閘之特殊應用積體電路(ASIC)、可程式閘陣列(PGA)、場可程式閘陣列(FPGA)等。Embodiments disclosed herein can be implemented in hardware, software, firmware, or a combination thereof. At least one embodiment disclosed herein is stored in a memory and implemented in a software and/or firmware executed by an appropriate instruction execution system. If implemented in hardware, as in an alternate embodiment, the embodiments disclosed herein may be implemented in any one or combination of the following: discrete logic circuits having logic gates for performing logic functions on data signals Special Application Integrated Circuit (ASIC), Programmable Gate Array (PGA), Field Programmable Gate Array (FPGA) with appropriate combination of logic gates.

應注意本文中所包括之流程圖展示軟體及/或硬體之可能實施例的架構、功能以及操作。關於此，可將每一方塊解釋為表示模組、區段或代碼之一部分，其包括用於實施規定邏輯功能之一或多個可執行指令。亦應注意在一些替代實施例中，方塊中所註釋之功能可異乎尋常及/或根本不出現。舉例而言，視所包括之功能而定，連續展示之兩方塊實際上可實質上同時執行或方塊有時可以相反順序執行。It should be noted that the flowcharts included herein represent the architecture, functionality, and operation of possible embodiments of software and/or hardware. In this regard, each block may be interpreted as representing a module, segment, or portion of code that includes one or more executable instructions for implementing the specified logical functions. It should also be noted that in some alternative embodiments, the functions noted in the blocks may be unusual and/or non-existent. For example, two blocks shown in succession may be executed substantially concurrently or the blocks can sometimes be performed in the reverse order.

應注意本文中所列出程式之任一者(其可包括用於實施邏輯功能之可執行指令的有序列表)可體現於由指令執行系統、裝置或設備(諸如以電腦為基礎的系統、含有處理器之系統或可自指令執行系統、裝置或設備提取指令且執行指令之其他系統)使用或結合所述各項使用之任何電腦可讀媒體中。在此文獻之上下文中，“電腦可讀媒體”可為可含有、儲存、傳送或輸送由指令執行系統、裝置或設備使用或結合其進行使用之程式的任何構件。電腦可讀媒體例如可為(但不限於)電子、磁、光、電磁、紅外線或半導體系統、裝置或設備。電腦可讀媒體之更多特定實例(非詳盡清單)可包括具有一或多個導線之電連接(電子)、攜帶型電腦碟片(磁)、隨機存取記憶體(RAM)(電子)、唯讀記憶體(ROM)(電子)、可抹除可程式唯讀記憶體(EPROM或快閃記憶體)(電子)、光纖(光)以及攜帶型壓縮光碟唯讀記憶體(CDROM)(光)。另外，此揭露內容之某些實施例的範疇可包括：體現以硬體或軟體架構之媒體中所體現之邏輯中所述的功能。It should be noted that any of the programs listed herein, which may include an ordered list of executable instructions for implementing logical functions, may be embodied by an instruction execution system, apparatus, or device (such as a computer-based system, A system containing a processor or other system that can fetch instructions and execute instructions from an instruction execution system, apparatus or device can be used or incorporated in any computer readable medium for use with the various items. In the context of this document, a "computer-readable medium" can be any component that can contain, store, communicate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media may include electrical connections (electronics) with one or more wires, portable computer disks (magnetic), random access memory (RAM) (electronic), Read-only memory (ROM) (electronic), erasable programmable read-only memory (EPROM or flash memory) (electronic), optical fiber (light), and portable compact disk read-only memory (CDROM) (light) ). In addition, the scope of certain embodiments of the disclosure may include functionality embodied in the logic embodied in the media in a hardware or software architecture.

亦應注意條件性語言(諸如)尤其是“可(can、could、might或may)”，除非另外特別規定或在所使用之上下文內另有理解，否則大體上旨在傳達某些實施例包括(而其他實施例不包括)某些特徵、元件及/或步驟。因此，此等條件性語言一般並非旨在暗示特徵、元件及/或步驟總是被一或多個特殊實施例所需，或暗示一或多個特殊實施例必定包括在採用或不採用使用者輸入或提示之情況下用於決策之邏輯，而不管任何特殊實施例中是否將包括或執行此等特徵、元件及/或步驟。It should also be noted that conditional language, such as, in particular, "can, could, might or may", unless specifically stated otherwise or otherwise understood in the context of use, is generally intended to convey that certain embodiments include (Other embodiments do not include) certain features, elements, and/or steps. Therefore, such conditional language is not intended to suggest that the features, elements, and/or steps are always required by one or more particular embodiments, or that one or more specific embodiments are necessarily included in the user. The logic used for decision making in the case of input or prompt, regardless of whether any of the features, elements and/or steps will be included or executed in any particular embodiment.

應強調以上所述之實施例僅為實施例之可能實例、僅陳述以便清晰理解此揭露內容之原理。在實質上不偏離揭露內容之精神以及範疇的情況下可對以上所述之實施例進行許多變化以及修改。所有此等修改以及變化欲包括於本文中在此揭露內容之範疇內。It is to be understood that the above-described embodiments are only possible examples of the embodiments, and are merely stated in order to clearly understand the principles of the disclosure. Many variations and modifications of the embodiments described above are possible without departing from the spirit and scope of the disclosure. All such modifications and variations are intended to be included within the scope of the disclosure herein.

88、102．．．內部邏輯分析器88, 102. . . Internal logic analyzer

90、104．．．匯流排介面單元BIU90, 104. . . Bus interface unit BIU

106a、106b、106c、106d．．．記憶體介面單元MIU106a, 106b, 106c, 106d. . . Memory interface unit MIU

108．．．記憶體存取端口108. . . Memory access port

110、116．．．資料流快取記憶體110, 116. . . Data stream cache

112．．．頂點快取記憶體112. . . Vertex cache memory

114．．．L2快取記憶體114. . . L2 cache memory

118．．．具有快取記憶體子系統之EUP控制器118. . . EUP controller with cache memory subsystem

120．．．命令流處理器(CSP)前端120. . . Command stream processor (CSP) front end

122．．．3D與狀態組件122. . . 3D and state components

124．．．2D前置組件124. . . 2D front component

126．．．2D先進先出(FIFO)組件126. . . 2D first in first out (FIFO) component

128．．．CSP後端/ZL1快取記憶體128. . . CSP backend / ZL1 cache memory

130．．．清晰度與型號紋理處理器130. . . Definition and model texture processor

132．．．高級加密系統(AES)加密/解密組件132. . . Advanced Encryption System (AES) Encryption/Decryption Component

134．．．三角與屬性配置單元134. . . Triangle and attribute hive

136．．．跨距像磚產生器136. . . Span tile generator

138．．．ZL1138. . . ZL1

140．．．ZL2140. . . ZL2

142、902、904．．．輸入端口142, 902, 904. . . Input port

144．．．輸出端口144. . . Output port

146．．．執行單元之集區EUP/BW壓縮器146. . . Execution unit pool EUP/BW compressor

148．．．Z與ST快取記憶體148. . . Z and ST cache memory

150．．．紋理位址產生器TAG150. . . Texture address generator TAG

152．．．D快取記憶體152. . . D cache memory

154．．．2D處理組件154. . . 2D processing component

156．．．前封裝器156. . . Front wrapper

158．．．內插器158. . . Interpolator

160．．．後封裝器160. . . Post wrapper

162．．．寫回單元162. . . Write back unit

164a、164b．．．記憶體存取單元MXU164a, 164b. . . Memory access unit MXU

166、884、886．．．紋理快取記憶體與控制器TCC166, 884, 886. . . Texture cache memory and controller TCC

168、894、896．．．紋理濾波單元TFU168, 894, 896. . . Texture filtering unit TFU

199、898、900．．．視訊處理單元VPU199, 898, 900. . . Video processing unit VPU

234．．．加密位元流234. . . Encrypted bit stream

236．．．解密組件236. . . Decryption component

238．．．編碼位元流238. . . Coded bit stream

240．．．VLD、霍夫曼(Huffman)解碼器、CAVLC、CABAC240. . . VLD, Huffman decoder, CAVLC, CABAC

242．．．EUP TAG介面242. . . EUP TAG interface

244．．．圖像標頭244. . . Image header

246a、246b、246c、246n．．．記憶體緩衝器MB246a, 246b, 246c, 246n. . . Memory buffer MB

250、252、254、256、258、260、270、272、274、276、344a~i、346a~i、348a~i、362j~r、366j~r、368a~r、372b~r、376b~j、474、476、478、480、482、484、492、494、594、596、598、630、644、646、674、678、680、708、710、730、732、734、746、755、756、758、760、762、770、772．．．跳躍塊250, 252, 254, 256, 258, 260, 270, 272, 274, 276, 344a~i, 346a~i, 348a~i, 362j~r, 366j~r, 368a~r, 372b~r, 376b~ j, 474, 476, 478, 480, 482, 484, 492, 494, 594, 596, 598, 630, 644, 646, 674, 678, 680, 708, 710, 730, 732, 734, 746, 755, 756, 758, 760, 762, 770, 772. . . Jump block

262．．．反DC/AC預測組件262. . . anti-DC/AC prediction component

264．．．反掃描反Q組件264. . . Anti-scan anti-Q component

265．．．交換器265. . . Exchanger

266．．．編碼圖案塊重建組件266. . . Code pattern block reconstruction component

280．．．濾波器組件280. . . Filter component

282．．．MC濾波器282. . . MC filter

284．．．重建參考組件284. . . Rebuild reference component

286．．．編碼圖案塊重建286. . . Code pattern block reconstruction

288．．．交換器組件288. . . Exchanger component

290．．．重建框架組件290. . . Reconstruction framework component

292．．．解塊及去環濾波器292. . . Deblocking and de-looping filter

294．．．解交錯組件294. . . Deinterlacing component

296．．．反變換組件/迴路內濾波器296. . . Inverse transform component / in-loop filter

298、330、442、472、502、512、522、532、542、544、698、706、726、774．．．加法器298, 330, 442, 472, 502, 512, 522, 532, 542, 544, 698, 706, 726, 774. . . Adder

300、302、304、306、308、310、312、324．．．Z^－1 延遲組件300, 302, 304, 306, 308, 310, 312, 324. . . Z ^-1 delay component

314a、314b、314c、314d．．．PE314a, 314b, 314c, 314d. . . PE

316．．．Z^－3 延遲組件316. . . Z ^-3 delay component

320．．．Z^－2 延遲組件320. . . Z ^-2 delay component

318、322、326、328、342、342a~i、369、369a~i、382、382a~d、390、390a~d、400、402、404、406、408、420、422、424、428、452、454、456、458、496、498、634、640、642、656、660、672、682、684、686、690、691、692、694、696、712、714、716、718、720、736、738、742、754、757、759、761、776、778、780、782、784、786、788、790、792、794、796、798、800．．．多工器318, 322, 326, 328, 342, 342a~i, 369, 369a~i, 382, 382a~d, 390, 390a~d, 400, 402, 404, 406, 408, 420, 422, 424, 428, 452, 454, 456, 458, 496, 498, 634, 640, 642, 656, 660, 672, 682, 684, 686, 690, 691, 692, 694, 696, 712, 714, 716, 718, 720, 736, 738, 742, 754, 757, 759, 761, 776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800. . . Multiplexer

332．．．N移位器332. . . N shifter

340、304a~l．．．記憶體緩衝器340, 304a~l. . . Memory buffer

350、350a~i．．．記憶體B、槽350, 350a~i. . . Memory B, slot

360．．．轉置網路360. . . Transpose network

370、370a~i．．．FIR濾波器塊370, 370a~i. . . FIR filter block

380、380b~j．．．記憶體緩衝器C、槽380, 380b~j. . . Memory buffer C, slot

384、384a~d、580、582、600、602、604、622、624、636、648、652、662、670、740、744、．．．ALU384, 384a~d, 580, 582, 600, 602, 604, 622, 624, 636, 648, 652, 662, 670, 740, 744,. . . ALU

386、386a~d、412、440、444、466、468、470、488、626、650、658、664、700、702、704、722、724、763、764、766、768．．．移位器386, 386a~d, 412, 440, 444, 466, 468, 470, 488, 626, 650, 658, 664, 700, 702, 704, 722, 724, 763, 764, 766, 768. . . Shifter

388、388a~d．．．Z塊388, 388a~d. . . Z block

410．．．乘法器410. . . Multiplier

426．．．邏輯或閘426. . . Logic or gate

430、432、586、606、608、610．．．絕對值組件430, 432, 586, 606, 608, 610. . . Absolute component

434．．．最小值組件434. . . Minimum component

436．．．2進位補數組件436. . . 2-bit complement component

438、460、462、464、486、500．．．減法組件438, 460, 462, 464, 486, 500. . . Subtraction component

446．．．鉗位組件446. . . Clamp assembly

450a~h．．．P1~8資料450a~h. . . P1~8 data

490a．．．A1490a. . . A1

490b．．．A2490b. . . A2

490c．．．A0490c. . . A0

504、506、508、510、514、516、518、520、524、526、528、530、534、536、538、540．．．組件504, 506, 508, 510, 514, 516, 518, 520, 524, 526, 528, 530, 534, 536, 538, 540. . . Component

590、592、612、614、616、618．．．判定組件590, 592, 612, 614, 616, 618. . . Decision component

620．．．及閘620. . . Gate

628、668．．．clip3組件628,668. . . Clip3 component

632．．．非閘632. . . Non-gate

638．．．SAT組件638. . . SAT component

748．．．β表格748. . .表格 table

750．．．α表格750. . .表格 table

752．．．零擴展組件752. . . Zero expansion component

802．．．非組件802. . . Non-component

804、806、808、810、812、814．．．緩衝器804, 806, 808, 810, 812, 814. . . buffer

880、882．．．紋理位址產生器－TAG方塊880, 882. . . Texture Address Generator - TAG Block

888、891．．．紋理濾波先進先出組件TFF888, 891. . . Texture Filtering FIFO Component TFF

890．．．快取記憶體890. . . Cache memory

圖1為用於處理視訊資料之計算架構的實施例。1 is an embodiment of a computing architecture for processing video data.

圖2為類似於圖1之架構之引入了視訊處理單元(VPU)之計算架構的實施例。2 is an embodiment of a computing architecture incorporating a video processing unit (VPU) similar to the architecture of FIG.

圖3為諸如在圖2之計算架構中用於處理視訊以及圖形資料之過程之流程圖實施例。3 is a flow diagram embodiment of a process for processing video and graphics data, such as in the computing architecture of FIG. 2.

圖4A為在計算裝置(諸如具有圖2之計算架構的計算裝置)中之資料流之功能流程圖實施例。4A is a functional flow diagram embodiment of a data flow in a computing device, such as a computing device having the computing architecture of FIG. 2.

圖4B為圖4A之功能流程圖的延續。Figure 4B is a continuation of the functional flow diagram of Figure 4A.

圖4C為圖4A以及圖4B之功能流程圖的延續。4C is a continuation of the functional flow diagram of FIGS. 4A and 4B.

圖5A為諸如在圖2之計算架構中可用於提供動態壓縮(MC)及/或離散餘弦轉換(DCT)操作之組件實施例的功能方塊圖。5A is a functional block diagram of an embodiment of a component that can be used to provide dynamic compression (MC) and/or discrete cosine transform (DCT) operations, such as in the computing architecture of FIG. 2.

圖5B為圖5A之圖的延續。Figure 5B is a continuation of the diagram of Figure 5A.

圖5C為圖5A以及圖5B之圖的延續。Figure 5C is a continuation of the Figures 5A and 5B.

圖5D為圖5A－圖5C之圖的延續。Figure 5D is a continuation of the Figures 5A-5C.

圖5E為圖5A－圖5D之圖的延續。Figure 5E is a continuation of the Figures 5A-5D.

圖5F為圖5A－圖5E之組件之總圖的實施例。Figure 5F is an embodiment of a general view of the components of Figures 5A-5E.

圖6為可用於計算架構(諸如圖2之計算架構)之像素處理引擎的功能方塊圖。6 is a functional block diagram of a pixel processing engine that can be used in a computing architecture, such as the computing architecture of FIG. 2.

圖7A為說明可用於VC－1迴路內濾波器(諸如在圖2之計算架構中)之組件的功能方塊圖。7A is a functional block diagram illustrating components that may be used in a VC-1 in-loop filter, such as in the computing architecture of FIG. 2.

圖7B為圖7A之圖的延續。Figure 7B is a continuation of the Figure 7A.

圖7C為圖7A以及圖7B之圖的延續。Figure 7C is a continuation of the Figures 7A and 7B.

圖7D為圖7A－圖7C之圖的延續。Figure 7D is a continuation of the Figures 7A-7C.

圖8為可用於在計算架構(諸如圖2之計算架構)中執行絕對差和計算之組件的方塊圖。8 is a block diagram of components that can be used to perform absolute differences and calculations in a computing architecture, such as the computing architecture of FIG. 2.

圖9為類似於圖8可用於執行絕對差和計算之過程之實施例的流程圖。9 is a flow chart similar to the embodiment of FIG. 8 that may be used to perform the process of absolute difference sum calculation.

圖10A為說明可用於解塊操作中(諸如可在圖2之電腦架構中執行)之多個組件的方塊圖。FIG. 10A is a block diagram illustrating various components that may be used in a deblocking operation, such as may be performed in the computer architecture of FIG. 2.

圖10B為圖10A之圖的延續。Figure 10B is a continuation of the diagram of Figure 10A.

圖10C為圖10A以及圖10B之圖的延續。Figure 10C is a continuation of the Figures 10A and 10B.

圖10D為圖10A－圖10C之圖的延續。Figure 10D is a continuation of the Figures 10A-10C.

圖10E為圖10A－圖10D之圖的延續。Figure 10E is a continuation of the Figures 10A-10D.

圖11為可用於在計算架構(諸如圖2之計算架構)中執行資料之過程之實施例流程圖。11 is a flow diagram of an embodiment of a process that can be used to execute data in a computing architecture, such as the computing architecture of FIG. 2.

102．．．內部邏輯分析器102. . . Internal logic analyzer

104．．．匯流排介面單元BIU104. . . Bus interface unit BIU

106a．．．記憶體介面單元MIU A106a. . . Memory interface unit MIU A

106b．．．MIU B106b. . . MIU B

106c．．．MIU C106c. . . MIU C

106d．．．MIU D106d. . . MIU D

108．．．記憶體存取交叉棒108. . . Memory access crossbar

110．．．資料流快取記憶體110. . . Data stream cache

112．．．頂點快取記憶體112. . . Vertex cache memory

114．．．L2快取記憶體114. . . L2 cache memory

116．．．資料流快取記憶體116. . . Data stream cache

118．．．具有快取記憶體子系統之EU集區控制器118. . . EU pool controller with cache memory subsystem

120．．．命令令流處理器(CSP)前端120. . . Command stream processor (CSP) front end

122．．．3D以及狀態組件122. . . 3D and status components

124．．．2D準備組件124. . . 2D preparation component

128．．．CSP後端/ZL1快取記憶體128. . . CSP backend / ZL1 cache memory

130．．．清晰度及型號紋理處理器130. . . Sharpness and model texture processor

134．．．三角及屬性配置134. . . Triangle and attribute configuration

136．．．跨距像磚產生器136. . . Span tile generator

138．．．ZL1138. . . ZL1

140．．．ZL2140. . . ZL2

142．．．輸入端口142. . . Input port

144．．．輸出端口144. . . Output port

148．．．Z與ST快取記憶體148. . . Z and ST cache memory

150．．．紋理位址產生器150. . . Texture address generator

152．．．D快取記憶體152. . . D cache memory

154．．．2D處理組件154. . . 2D processing component

156．．．前封裝器156. . . Front wrapper

158．．．內插器158. . . Interpolator

160．．．後封裝器160. . . Post wrapper

162．．．寫回單元162. . . Write back unit

164a．．．記憶體存取單元MXU A164a. . . Memory access unit MXU A

164b．．．MXU B164b. . . MXU B

166．．．紋理快取記憶體及控制器166. . . Texture cache memory and controller

168．．．紋理濾波單元168. . . Texture filtering unit

199．．．視訊處理單元VPU199. . . Video processing unit VPU

Claims

一種用於根據一指令處理視訊資料的可程式視訊處理單元，包含：一接收邏輯電路，用以接收選自複數個格式之一的視訊資料；一濾波邏輯電路，用以根據該指令濾波該視訊資料；以及一轉換邏輯電路，用以根據該指令轉換該濾波資料；其中該指令包含一模式指示欄位用以指示該濾波邏輯電路與該轉換邏輯電路根據該視訊資料之格式運作。 A programmable video processing unit for processing video data according to an instruction, comprising: a receiving logic circuit for receiving video data selected from one of a plurality of formats; and a filtering logic circuit for filtering the video according to the command And a conversion logic circuit for converting the filter data according to the instruction; wherein the instruction includes a mode indication field for indicating that the filter logic circuit and the conversion logic circuit operate according to the format of the video data.

如申請專利範圍1所述之可程式視訊處理單元，其中該濾波邏輯電路執行一動態補償濾波操作。 The programmable video processing unit of claim 1, wherein the filtering logic circuit performs a dynamic compensation filtering operation.

如申請專利範圍2所述之可程式視訊處理單元，其中該模式指示欄位為MPEG-2格式時，該濾波邏輯電路在一包含垂直濾波與水平濾波的雙通模式下運作。 The programmable video processing unit of claim 2, wherein the mode logic field operates in a two-pass mode including vertical filtering and horizontal filtering when the mode indication field is in the MPEG-2 format.

如申請專利範圍2所述之可程式視訊處理單元，其中該模式指示欄位為VC-1格式1/2準度時，該濾波邏輯電路在一雙線性模式下運作；該模式指示欄位為VC-1格式1/4準度時，該濾波邏輯電路在一雙立方模式下運作。 The programmable video processing unit of claim 2, wherein the mode logic field operates in a bilinear mode when the mode indication field is a VC-1 format 1/2 degree; the mode indication field The filter logic operates in a double cube mode when the VC-1 format is 1/4 accurate.

如申請專利範圍2所述之可程式視訊處理單元，其中該模式指示欄位為H.264格式四分像素時，該濾波邏輯電路在一亮度模式下運作；該模式指示欄位為H.264格式八分像素時，該濾波邏輯電路在一色度模式下運作。 The programmable video processing unit of claim 2, wherein the mode indication field is a quarter-pixel of H.264 format, the filter logic circuit operates in a brightness mode; the mode indication field is H.264 The filter logic operates in a chrominance mode when the format is eight pixels.

如申請專利範圍1所述之可程式視訊處理單元，其中該模式指示欄位為MPEG-2格式時，該轉換邏輯電路執行一離散餘弦反轉換操作。 The programmable video processing unit according to claim 1, wherein the conversion logic circuit performs a separation when the mode indication field is in the MPEG-2 format. The scattered cosine inverse conversion operation.

如申請專利範圍1所述之可程式視訊處理單元，其中該模式指示欄位為VC-1與H.264格式其中之一時，該轉換邏輯電路執行一整數轉換操作。 The programmable video processing unit of claim 1, wherein the mode indicating that the field is one of the VC-1 and H.264 formats performs an integer conversion operation.

如申請專利範圍1所述之可程式視訊處理單元，更包含用以執行迴路內濾波的一解塊邏輯電路。 The programmable video processing unit of claim 1 further includes a deblocking logic circuit for performing intra-loop filtering.

一種可程式視訊處理單元，包含：一辨識邏輯電路，用以辨識視訊資料的格式；一動態補償邏輯電路，用以執行一動態補償操作；一離散餘弦反轉換邏輯電路，用以執行一離散餘弦反轉換操作；以及一整數轉換邏輯電路，用以執行一整數轉換操作；其中該離散餘弦反轉換邏輯電路與該整數轉換邏輯電路根據該辨識邏輯電路的辨識結果分別被關閉。 A programmable video processing unit includes: an identification logic circuit for recognizing a format of video data; a dynamic compensation logic circuit for performing a dynamic compensation operation; and a discrete cosine inverse conversion logic circuit for performing a discrete cosine And an integer conversion logic circuit for performing an integer conversion operation; wherein the discrete cosine inverse conversion logic circuit and the integer conversion logic circuit are respectively turned off according to the identification result of the identification logic circuit.

如申請專利範圍9所述之可程式視訊處理單元，其中該辨識結果為VC-1與H.264格式兩者之一時，該離散餘弦反轉換邏輯電路被關閉。 The programmable video processing unit of claim 9, wherein the discrete cosine inverse conversion logic is turned off when the identification result is one of a VC-1 and an H.264 format.

如申請專利範圍9所述之可程式視訊處理單元，其中該辨識結果為MPEG-2格式時，該整數轉換邏輯電路被關閉。 The programmable video processing unit of claim 9, wherein the integer conversion logic is turned off when the recognition result is in the MPEG-2 format.

如申請專利範圍9所述之可程式視訊處理單元，於該辨識結果為VC-1與H.264格式兩者之一時，更包含一解塊邏輯電路用以執行一迴路內濾波操作。 The programmable video processing unit of claim 9 further includes a deblocking logic circuit for performing an in-loop filtering operation when the identification result is one of a VC-1 and an H.264 format.

如申請專利範圍9所述之可程式視訊處理單元，其中該辨識結果為MPEG-2格式時，該動態補償邏輯電路執行於一雙通模式下。 The programmable video processing unit of claim 9, wherein the dynamic compensation logic circuit is executed when the identification result is an MPEG-2 format. In a dual pass mode.

如申請專利範圍9所述之可程式視訊處理單元，其中該辨識結果為VC-1格式時，該動態補償邏輯電路執行於下列模式其中之一：雙線性模式與雙立方模式。 The programmable video processing unit of claim 9, wherein the dynamic compensation logic circuit is implemented in one of the following modes: a bilinear mode and a bicubic mode.

如申請專利範圍9所述之可程式視訊處理單元，其中該辨識結果為H.264格式時，該動態補償邏輯電路執行於下列模式其中之一：亮度模式與色度模式。 The programmable video processing unit of claim 9, wherein the dynamic compensation logic circuit is executed in one of the following modes: a luminance mode and a chrominance mode.

一種視訊資料處理方法，包含：接收一指令；接收選自至少兩種格式之一的視訊資料；根據該指令濾波該視訊資料；以及根據該指令轉換該視訊資料；其中該指令包含一模式識別欄位用以指示濾波與轉換該視訊資料之步驟根據該視訊資料之格式運作。 A video data processing method, comprising: receiving an instruction; receiving video data selected from one of at least two formats; filtering the video data according to the instruction; and converting the video data according to the instruction; wherein the instruction includes a pattern recognition field The steps for indicating filtering and converting the video data operate in accordance with the format of the video material.

如申請專利範圍16所述之視訊資料處理方法，其中濾波該視訊資料之步驟包含執行一動態補償濾波。 The video data processing method of claim 16, wherein the step of filtering the video data comprises performing a dynamic compensation filtering.

如申請專利範圍17所述之視訊資料處理方法，其中該模式識別欄位為MPEG-2格式時，該動態補償濾波運作於一雙通模式下。 The video data processing method according to claim 17, wherein the dynamic compensation filtering operates in a dual pass mode when the pattern recognition field is in the MPEG-2 format.

如申請專利範圍17所述之視訊資料處理方法，其中該模式指示欄位為VC-1格式1/2準度時，該動態補償濾波運作於一雙線性模式下；該模式指示欄位為VC-1格式1/4準度時，該動態補償濾波運作於一雙立方模式下。 The video data processing method of claim 17, wherein when the mode indication field is VC-1 format 1/2 precision, the dynamic compensation filtering operates in a bilinear mode; the mode indication field is When the VC-1 format is 1/4-degree, the dynamic compensation filter operates in a double-cube mode.

如申請專利範圍17所述之視訊資料處理方法，其中該模式指示欄位為H.264格式四分像素時，該動態補償濾波運作於一亮度模式下；該模式指示欄位為H.264格式八分像素時，該動態補償濾波運作於一色度模式下。 The method for processing video data according to claim 17, wherein When the mode indication field is a quarter-pixel of the H.264 format, the dynamic compensation filtering operates in a brightness mode; when the mode indicates that the field is an octant of the H.264 format, the dynamic compensation filtering operates in a chrominance mode. under.

如申請專利範圍16所述之視訊資料處理方法，其中該模式識別欄位為MPEG-2格式時，該轉換之步驟包含執行一離弦反轉換。 In the video data processing method of claim 16, wherein the pattern recognition field is in the MPEG-2 format, the converting step comprises performing a off-string inverse conversion.

如申請專利範圍16所述之視訊資料處理方法，其中該模式識別欄位為VC-1與H.264格式其中之一時，該轉換之步驟包含執行一整數轉換。 The video data processing method of claim 16, wherein the pattern recognition field is one of a VC-1 and H.264 format, the step of converting comprises performing an integer conversion.

如申請專利範圍16所述之視訊資料處理方法，更包含執行一迴路內解塊濾波。 The method for processing video data according to claim 16 further includes performing deblocking filtering in one loop.