TW516320B - Implementation of quantization for SIMD architecture - Google Patents

Implementation of quantization for SIMD architecture Download PDF

Info

Publication number
TW516320B
TW516320B TW90103748A TW90103748A TW516320B TW 516320 B TW516320 B TW 516320B TW 90103748 A TW90103748 A TW 90103748A TW 90103748 A TW90103748 A TW 90103748A TW 516320 B TW516320 B TW 516320B
Authority
TW
Taiwan
Prior art keywords
shift
quantization
multiplication
group
module
Prior art date
Application number
TW90103748A
Other languages
Chinese (zh)
Inventor
Chung-Tao Chu
Wei Ding
Original Assignee
Intervideo Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intervideo Inc filed Critical Intervideo Inc
Application granted granted Critical
Publication of TW516320B publication Critical patent/TW516320B/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • H04N19/126Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

A system for improving the speed of the video encoding process by decreasing the number of cycles to perform the quantization. The disclosed system achieves the improvement through use of a parallel processor, such as one having a single instruction, multiple sata (SIMD) architecture. Concurrent processing during one instruction cycle is accomplished, thereby leading overall to the use of fewer instruction cycles. In a preferred embodiment of the invention, an MMX instruction set is used for executing four quantizations in parallel. The disclosed system also achieves a higher precision of the quantization during the encoding of video signals with the SIMD architecture by using a larger multiplier and larger shift factor.

Description

516320 A7 B7 五、發明說明(/ ) 相關申請案之fL照 此申請案主張2000年2月22日所申請之美國臨時申 請案第60/184066號之權益。 荖作權之聲明 一部分此專利文件之揭示含有訴諸於著作權保護之材 料。此著作權所有人並不反對任何人複製該專利文件或專 利揭示於當該等文件出現於專利及商標機關之專利檔案或 記錄時,否則無論如何將保留所有著作權之追訴權。 發明之領域 本發明大致地有關影像、動畫及視頻信號之編碼,且 更特定地有關係數之序列的量化。 發明之背景 經濟部智慧財產局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁) 用於觀視根據MPEG格式之動畫及視頻信號所遵循之 國際標準界定了編碼及解碼視頻信號之整個要求,其表示 可採取連續之個別畫像的形式。各圖像可視爲包含熟知爲 畫素之畫像元素的二維陣列。如傳統所熟知地及如第1圖 中所示爲習知技術者,操作1之序列必須執行於動畫可予 以觀視之前。未編碼之視頻影像源2可以用諸如CCIR 601 格式之種種格式存在。如國際標準所傳統熟知地,所輸入 之視頻信號係數位化且根據亮度及兩個色差信號(Y,Cr,Cb) 予以表現。某一形式之預處理3係完成於該源2之上而轉 換視頻資料爲適當的解析度以用於隨後之編碼4。例如該 等色差信號(Cr,Cb)之次取樣係相對於該亮度藉2: 1之比 例在垂直及水平兩者方向中完成。接著視需要再格式化該 3 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 經濟部智慧財產局員工消費合作社印製 516320 A7 _ B7 五、發明說明(7) 信號爲非交錯之信號。在編碼4期間’必須確定窶像形式 以用於序列中之各畫像。接著編碼器可預估移動向量以用 於畫像中之各16x16巨區塊。依據所使用之畫像形式,需 要一個或更多個向量,且在編碼之前必須再整理毚像序列 。在編碼過程之後則可轉移位元流至儲存媒體5。爲觀視 該動畫,必須使用解碼器6來接連該視頻位元流。在解碼 之後,完成該視頻信號之後處理7以顯示8該等動鼙。 雖然國際標準要求編碼器知道解碼器緩衝器之容量及 解碼器之需要以匹配媒體之速率至用以保持連續窶像之緩 衝器的充塡速率,但國際標準並未使編碼過程特定化。而 是僅指示該位元流之語法及語意學以及完成於解碼器中之 信號處理。因此,將有許多選擇有用於編碼視頻信號。 大致地,第2圖顯示編碼器之功能性方塊圖,如第2 圖中所示,有關係之模組包含離散餘弦變換(DCT)l〇、反 離散餘弦變換(DCT“)12、量化(Q)14、反量化(Q’16、及 可變長度編碼(VLC)18。 在數位系統中,可表示爲矩陣表Z[i]之量化意指除法 一範圍之數値爲單一整數、碼或分類,如以下方程式(1)所 示: 則=1,其中i=〇..63,....................⑴ 神] 其中d[i]表示以矩陣格式之資料區塊,及qp [i]表示取自標 準値之預設表或視需由一般熟習於本項技術者所建構之量 化矩陣。變數i表示於個別矩陣內之資料項目數目且選擇 爲範圍自〇至63以代表爲了說明目的之64位元。在暫存 4 本紙張尺度適用中國國家標準(CNS)A4規格(21〇 X 297公爱) (請先閱讀背面之注意事項再填寫本頁) -裝--------訂---------. 516320 Α7 Β7 五、發明說明($ ) 器層級處,此除法運算係以乘法運算隨之以移位運算來實 現,如下一方程式⑵所示: (請先閱讀背面之注意事項再填寫本頁) Ζ[Γ]=4小一,其中 i=0..63,....................⑵ <ιρ\Λ 其中1/ qp[i]爲乘數,亦即,一個少於表示爲固定點者之 數目,該固定點數目係藉取qp[i]之反函數即完成若干數 目位元之向上移位(<<)而獲得。所以該反運算可根據以下 方程式所示: = 移位,...................⑶ 神] 其中完成若干數目位元之向上移位以表示整數値。所以, 方程式⑴可藉下式實現: Ζ[/] = φ·]χ_>>移位,....................⑷ 其中執行若干數目位元之向下移位(>>)以響應於前施行於 方程式⑶中反向之向上移位。 經濟部智慧財產局員工消費合作社印製 本發明人已發現習知之編碼器及量化系統缺乏速度且 需要比實際所需之更多指令週期。因此,藉使用更快速處 理器以用於計算該量化來改善編碼之速度將係理想的。確 實地,藉使用例如平行處理器之更快速處理器,僅需少數 之指令週期即可執行方程式⑷的量化。而所需的僅爲一種 利用平行處理架構的實際限制來施行該量化的方式。 然而,使用平行處理器所遭遇之進一步問題在於執行 該量化以便取得高精確之係數。具有習知平行處理器之此 問題起源於實際的處理器限制,因此乘數及移位運算及暫 存器無法超過某一數目之位元。所以,所需的是一種取得 5 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 516320 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(if) 最高精確之量化器,而該量化器係以平行處理器藉使用最 大乘數,亦即,具最大之位元數目,隨之以最大可行的位 元移位來施行的方式。 發明槪述 本發明之目的在於藉減少週期之數目來改善視頻編碼 過程之速度而執行量化。本發明藉使用諸如具有單指令多 資料(SIMD)架構者之平行處理器來達成此目的。在利用具 有以平行所處理之複數個元件的處理器來編碼視頻影像之 系統中,一個模組將建構初始的乘法表及初始的移位表。 用於SIMD架構之各該等元件之最大移位値係確定於當執 行單一指令時。在一指令之週期期間將完成次暫存器之同 時處理,藉此導致整體使用少數的指令週期。在本發明之 較佳實施例中,一個MMX指令組係使用於平行執行4個 量化,亦即,平行執行之4個乘法次指令,隨之以平行之 4個移位。 本發明之進一步的目的係在視頻信號之編碼期間以 SIMD架構藉使用較大的乘數及較大的移位因數來達成高 精確的量化。利用根據最小的量化表之値及相對應最大的 移位値來調整初始乘法及移位表以用於動態地建構群組乘 法表及群組移位表的模組,則可以用本發明來選擇用於平 行處理之最大乘數及移位値。該動態調整符合於SIMD架 構所准許之位元長度的限制,而提供了用於相乘該視頻影 像資料群組乘法表以用於在指令期間平行處理之各該等元 件的模組。在預定群組內藉由根據該群組移位表的常數量 6 (請先閱讀背面之注意事項再填寫本頁) · 裝---- 訂--------- 本紙張尺度適用中國國家標準(CNS)A4規格(21〇 χ 297公髮) 516320 A7 五、發明說明(f ) 之各元素的位移會在編碼過程之期間產生更高精確的量化 Ο 本發明之該等及其他目的將從下文本發明之詳細說明 以及實施例、附圖、規格及申請專利範圍而呈更明顯於該 等熟習於本項技術之人士。 圖忒之簡單說ϋ 第1圖顯示習知技術用於編碼及解碼MPEG視頻信號 之一般過程的功能性方塊圖; 第2圖顯示習知技術用於如國際標準所示之第1圖之 編碼的方塊圖; 第3圖顯示根據本發明較佳實施例之一部分用於量化 之過程步驟的流程圖; 第4圖顯示根據本發明較佳實施例之用於量化之模組 的系統方塊圖; 第5圖顯示用於建構第4圖之初始乘法及移位表之過 程步驟的流程圖; 第6圖顯示用於建構第4圖之群組乘法表及群組移位 表之過程步驟的流程圖; 經濟部智慧財產局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁) 第7圖顯示用於根據第4圖執行量化器乘法及移位之 過程步驟的流程圖; 第8圖顯示一序列之藉分割各訊框爲資料區塊所壓縮 之個別視頻訊框的槪略視圖;以及 第9圖係方塊圖,顯示本發明可應用於其中來自第8 圖之各資料區塊可藉區塊內方法或區塊間方法予以壓縮之 7 1本紙張尺度適用中國國家@(CNS)A4規格(21Q x 297 516320 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明(L) 情況。 1明之詳細說明 現請詳細地參閱該等圖式,本發明針對一種用於在視 頻影像資料之編碼期間以具有平行處理之複數個畫像資料 元素的處理器來使係數量化之軟體系統。尤其藉使用諸如 該等具有SIMD架構之平行處理器,例如Inte^MMX技術 、UltraSparc、PowerPC及其他微處理器,因爲資料元件之 處理係平行地完成,故可適合於更快速之多媒體及通訊軟 體。藉執行包含在編碼期間利用SIMD架構與單一指令週 期之係數量化中所涉及之乘法及移位,可施行多指令於資 料次暫存器中之複數個元素之上。 例如在較佳實施例中,若MMX指令暫存器適合於64 位元,則分隔該MMX指令暫存器爲4個元件(或各爲16 位元之次暫存器)將允許具有一個指令命令之4個資料元件 的平行運算。爲此目的,乘法運算係平行地執行於各該等 4個資料元件之上,且之後,移位指令將平行地執行於各 該等4個資料元件之上。因爲此量化之平行處理之結果僅 需較少數之週期,故可改善用於編碼之整體速度。 在該較佳實施例中,用於量化矩陣表qp[i]之典型的 項目値可範圍自8至64之値,且甚至高至128而不具有特 定順序之値於表內。在該量化矩陣內之特定組之値的反向 係被進行施行方程式⑶。大致地,用於該量化矩陣表中之 項目,該値愈小則其倒數之値愈大。依序地,該倒數之値 愈大則在資料次暫存器內可行之移位量愈小。相對之下, 8 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) -裝---- 111111. 516320 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明(^)) 在到達資料次暫存器內所准許之最大移位量之前,量化矩 陣中之項目愈大,則其倒數之値愈小而有效於資料次暫存 器內之移位量會愈大。本發明人已實現的是,爲適合 SIMD架構中之次暫存器的實際限制以及使量化之精確度 最大,必須預處理來自量化矩陣之預定組成群組的項目値 且亦必須根據該等群組來平行處理資料區塊d[64]。將更詳 細地描述於下文的是,該量化矩陣表將預先地檢查以定位 最小的項目於預定組之內,而其倒數將爲最大的數目,藉 此限制各群組之値的移位量。此使一群組之量化的整體最 大精確度能藉補償其個別之精確度由於實際架構之限制而 將必須減少之若干數目而予以獲得。若在預定組之中的項 目値傾向於接近時,則根據本發明之移位量的確定亦將傾 於接近更高的整體精確度。一旦確定該量化矩陣中致力於 最小移位量之此特定的項目時,則在該預定組之內的每一 項目必須移位相同量於各該等次暫存器之中。 參閱第3至4圖,含乘法因素之初始乘法表M[i]及含 移位因數之初始移位表S[i]係以第一模組40建構。此係藉 方法步驟32予以表示。模組40之輸入係爲量化矩陣qp [i],其中例如i=〇至63。模組40之輸出爲M[i]及S[i],其 中i=0至63。如將參照第5圖所描述的,該第一模組40 會確定將執行於具有總計64位元之指令暫存器內之各16 位元的4個元件上之初始乘法。第二模組50係利用例如由 模組40所產生之輸入M[i]及S[i]而動態地建構群組乘法表 M’[i]及群組移位表S’[i]。模組50根據預定組之量化器矩 9 (請先閱讀背面之注意事項再填寫本頁)516320 A7 B7 V. Description of Invention (/) Photo of the related application This application claims the benefit of US Provisional Application No. 60/184066 filed on February 22, 2000. Disclaimer of Copyrights Part of the disclosure of this patent document contains material that recourses to copyright protection. This copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent archives or records of the Patent and Trademark Office, otherwise the copyright will be reserved in any event. FIELD OF THE INVENTION The present invention relates generally to the coding of video, animation, and video signals, and more specifically to the quantization of sequences of coefficients. Background of the invention Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs (please read the notes on the back before filling this page) For viewing international standards based on MPEG format animation and video signals Defines encoding and decoding video signals The entire requirement is expressed in the form of successive individual portraits. Each image can be viewed as a two-dimensional array containing portrait elements known as pixels. As is well known in the art and as shown in Figure 1 by the skilled artisan, the sequence of operation 1 must be performed before the animation can be viewed. The unencoded video image source 2 can exist in various formats such as the CCIR 601 format. As is well known in the international standard, the input video signal coefficients are bitized and expressed based on the brightness and two color difference signals (Y, Cr, Cb). Some form of pre-processing 3 is done on the source 2 and the video data is converted to an appropriate resolution for subsequent encoding4. For example, the sub-sampling of the color difference signals (Cr, Cb) is performed in both vertical and horizontal directions with respect to the brightness by a ratio of 2: 1. Then reformat the 3 paper sizes as required. Applicable to China National Standard (CNS) A4 specifications (210 X 297 mm). Printed by the Consumers ’Cooperative of the Intellectual Property Bureau of the Ministry of Economy. 516320 A7 _ B7 V. Description of the invention (7) The signal is Non-interlaced signals. During encoding 4 ', an artifact format must be determined for each portrait in the sequence. The encoder can then estimate the motion vector for each 16x16 giant block in the portrait. Depending on the form of portrait used, one or more vectors are required, and the artifact sequence must be rearranged before encoding. The bit stream can be transferred to the storage medium 5 after the encoding process. To view the animation, a decoder 6 must be used to connect the video bitstream. After decoding, the video signal is processed 7 to display 8 dynamics. Although the international standard requires the encoder to know the capacity of the decoder buffer and the decoder's need to match the media rate to the charge rate of the buffer used to maintain continuous artifacts, the international standard does not specify the encoding process. Instead, it only indicates the syntax and semantics of the bitstream and the signal processing done in the decoder. Therefore, there will be many options for encoding video signals. Generally, Figure 2 shows the functional block diagram of the encoder. As shown in Figure 2, the related modules include discrete cosine transform (DCT) 10, inverse discrete cosine transform (DCT ") 12, and quantization ( Q) 14, inverse quantization (Q'16, and variable length coding (VLC) 18. In a digital system, a quantization that can be expressed as a matrix table Z [i] means a number in a range of division 法 is a single integer, code Or classification, as shown in the following equation (1): then = 1, where i = 〇..63, ......... God] where d [ i] represents a data block in matrix format, and qp [i] represents a preset table taken from the standard frame or a quantized matrix constructed by a person skilled in the art, if necessary. The variable i represents the individual matrix The number of data items is selected from 0 to 63 to represent 64 bits for the purpose of explanation. 4 paper sizes are temporarily stored in accordance with the Chinese National Standard (CNS) A4 specification (21〇X 297 public love) (Please read the back first Please note this page and fill in this page again) -Install -------- Order ---------. 516320 Α7 Β7 V. Description of Invention ($) At the device level, this division operation is multiplication operation Followed by displacement Calculated to achieve, as shown in the following formula ⑵: (Please read the precautions on the back before filling in this page) ZZ [Γ] = 4Primary One, where i = 0..63, ......... ........... ⑵ < ιρ \ Λ where 1 / qp [i] is a multiplier, that is, one less than the number expressed as a fixed point, which is borrowed from qp The inverse function of [i] is obtained by completing the upward shift (< <) of a number of bits. So the inverse operation can be shown according to the following equation: = shift, ......... ......... ⑶ God] where an upward shift of a number of bits is performed to represent the integer 値. Therefore, the equation ⑴ can be implemented by the following formula: ZO [/] = φ ·] χ_ > > shift Bits, .......... where a number of bit-wise downward shifts (> >) are performed in response to the previous implementation of the inverse in equation ⑶ Moving up. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs. The inventor has found that the known encoder and quantization system lacks speed and requires more instruction cycles than actually needed. Therefore, by using a faster processor It would be ideal to use this quantization to improve the encoding speed Indeed, by using a faster processor such as a parallel processor, the quantization of equation ⑷ can be performed with only a few instruction cycles. All that is needed is a way to perform this quantization using the practical limitations of a parallel processing architecture However, a further problem encountered with the use of parallel processors is to perform the quantization in order to obtain highly accurate coefficients. This problem with conventional parallel processors originates from actual processor limitations, so multipliers and shift operations and temporary storage The device cannot exceed a certain number of bits. Therefore, what is needed is to obtain 5 paper sizes that are applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 516320 A7 B7 printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. The highest precision of the invention (if) The quantizer is implemented in a parallel processor by using the largest multiplier, that is, with the largest number of bits, followed by the largest feasible bit shift. SUMMARY OF THE INVENTION The object of the present invention is to perform quantization by reducing the number of cycles to improve the speed of the video encoding process. The present invention achieves this by using a parallel processor such as one with a single instruction multiple data (SIMD) architecture. In a system that uses a processor with a plurality of components processed in parallel to encode video images, a module will construct an initial multiplication table and an initial shift table. The maximum shift of each of these components used in the SIMD architecture is determined when a single instruction is executed. The simultaneous processing of the sub-registers will be completed during the cycle of one instruction, thereby resulting in the use of a small number of instruction cycles as a whole. In a preferred embodiment of the present invention, an MMX instruction set is used to execute 4 quantizations in parallel, that is, the 4 multiplication times are executed in parallel and then shifted by 4 in parallel. A further object of the present invention is to achieve a highly accurate quantization by using a larger multiplier and a larger shift factor in the SIMD architecture during the encoding of a video signal. The module for adjusting the initial multiplication table and the shift table according to the smallest quantization table and the corresponding largest shift for dynamically constructing the group multiplication table and the group shift table can be used by the present invention to Select the maximum multiplier and shift for parallel processing. The dynamic adjustment conforms to the limit of the bit length allowed by the SIMD architecture, and provides a module for multiplying the video image data group multiplication table for each of these elements processed in parallel during the instruction. In the predetermined group, according to the constant number of the group shift table 6 (please read the precautions on the back before filling this page) · Install ---- Order --------- This paper size Applicable to China National Standard (CNS) A4 specification (21 × χ 297) 516320 A7 V. The displacement of each element of the invention description (f) will produce a more accurate quantification during the encoding process. Other objects will be more apparent to those skilled in the art from the detailed description of the invention and the embodiments, drawings, specifications, and scope of patent application. Brief description of the figure ϋ Figure 1 shows a functional block diagram of the general process of the conventional technique for encoding and decoding MPEG video signals; Figure 2 shows the conventional technique for the coding of the first figure as shown in the international standard Figure 3 shows a flowchart of part of the process steps for quantization according to a preferred embodiment of the present invention; Figure 4 shows a system block diagram of a module for quantization according to a preferred embodiment of the present invention; Figure 5 shows a flowchart of the process steps used to construct the initial multiplication and shift table of Figure 4; Figure 6 shows the flow of the process steps used to construct the group multiplication table and group shift table of Figure 4 Figure; Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs (please read the notes on the back before filling out this page) Figure 7 shows the flowchart of the process steps for performing quantizer multiplication and shift according to Figure 4; Figure 8 shows a sequence of individual video frames compressed by segmenting each frame into a data block; and Figure 9 is a block diagram showing the invention can be applied to each data area from Figure 8 Block Debitable Area The internal or inter-block method is used to compress the paper. 7 1 This paper size is applicable to China National @ (CNS) A4 specification (21Q x 297 516320). Printed by A7 B7 of the Consumer Cooperatives of Intellectual Property Bureau of the Ministry of Economic Affairs. 5. Description of invention (L). 1 Detailed description Please refer to these drawings in detail. The present invention is directed to a software system for quantizing coefficients by using a processor having a plurality of image data elements processed in parallel during the encoding of video image data. Using such parallel processors with SIMD architecture, such as Inte ^ MMX technology, UltraSparc, PowerPC and other microprocessors, because the processing of data components is done in parallel, it can be suitable for faster multimedia and communication software. Performing the multiplication and shift involved in using the SIMD architecture and coefficient quantization in a single instruction cycle during encoding can execute multiple instructions on multiple elements in the data sub-register. For example, in a preferred embodiment, If the MMX instruction register is suitable for 64-bit, the MMX instruction register is divided into 4 components (or 16-bit secondary registers) ) Will allow a parallel operation of 4 data elements with one instruction command. For this purpose, the multiplication operation is performed in parallel on each of these 4 data elements, and thereafter, the shift instruction will be performed in parallel on each of the 4 data elements. Wait for 4 data elements. Because the result of parallel processing of this quantization requires only a few cycles, the overall speed for encoding can be improved. In the preferred embodiment, it is used to quantize the matrix table qp [i] Typical items 范围 can range from 8 to 64, and even as high as 128 without a specific order in the table. The inverse system of a specific set of 内 in the quantization matrix is performed by Equation (3). In general, the smaller the 値 used for the items in the quantization matrix table, the larger the 倒 is. In sequence, the larger the countdown, the smaller the feasible shift amount in the data sub-register. In contrast, 8 paper sizes are applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) (please read the precautions on the back before filling out this page)-installed ---- 111111. 516320 Intellectual Property of the Ministry of Economic Affairs Printed by the Consumer Cooperative of the Bureau A7 B7 V. Description of Invention (^)) Before reaching the maximum shift amount allowed in the data secondary register, the larger the item in the quantization matrix, the smaller and more effective the reciprocal. The greater the amount of shift in the data sub-register. What the inventors have achieved is that in order to fit the practical limitations of the secondary register in the SIMD architecture and to maximize the accuracy of quantization, the items from a predetermined composition group of the quantization matrix must be preprocessed, and must also be based on these groups Group to process data block d [64] in parallel. As will be described in more detail below, the quantization matrix table will be checked in advance to locate the smallest item within a predetermined group, and its reciprocal will be the largest number, thereby limiting the amount of shift in each group. . This enables the overall maximum accuracy of a group of quantifications to be obtained by compensating for a number that their individual accuracy will have to be reduced due to the limitations of the actual architecture. If the items 预定 in the predetermined group tend to be close, the determination of the shift amount according to the present invention will also tend to approach a higher overall accuracy. Once this particular item in the quantization matrix that is committed to the minimum shift amount is determined, each item within the predetermined group must be shifted by the same amount in each of these secondary registers. Referring to FIGS. 3 to 4, the initial multiplication table M [i] including multiplication factors and the initial shift table S [i] including shift factors are constructed by the first module 40. This is represented by method step 32. The input of the module 40 is a quantization matrix qp [i], where, for example, i = 0 to 63. The outputs of module 40 are M [i] and S [i], where i = 0 to 63. As will be described with reference to FIG. 5, the first module 40 determines an initial multiplication to be performed on four 16-bit 4 elements each having a total of 64-bit instruction registers. The second module 50 constructs the group multiplication table M '[i] and the group shift table S' [i] dynamically using the inputs M [i] and S [i] generated by the module 40, for example. Module 50 according to the quantizer moment 9 of the predetermined group (please read the precautions on the back before filling this page)

-I ----II--訂— — I--I I I I-I ---- II--Order-- — I--I I I I

P 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 經濟部智慧財產局員工消費合作社印製 516320 A7 ---- B7 五、發明說明(f ) 陣項目來調整初始乘法表M[i]及初始位表S[i]而建構群組( 或動態)乘法及移位表。例如將更詳細地參照第6圖所描述 者’預定組之最大移位値係藉確定及選擇於具有用於預定 組之最小値,亦即,Smin之qp [i]中的項目而獲得。如第3 圖中所示,模組50以反覆之方式操作直到確定群組表(Md[ ]’M’[i] ; Sd[ ]=S’[i])爲止。例如步驟34爲檢查是該量化器 矩陣表qp[i]之項目爲第一或新的輸入之條件性敘述。用 於該量化器矩陣中各項目,步驟36將建構該群組(動態)乘 法及移位値以及在步驟38中將它們寫入或儲存於群組乘法 及移位表。第三模組60接收形成模組50輸出之群組乘法 及移位表M’[i]及S’[i]作爲輸入。如將進一步詳細地參照 第7圖所描述者,模組60將以該群組乘法及移位表來執行 視頻影像資料d[64]之乘法以爲了產生量化矩陣Z[64]。 參閱第5圖,初始之乘法及移位矩陣表的建構係利用 重複的迴路來確定直到獲得各表的所有値爲止。在步驟42 處,計數器初始於零;而在步驟44處其增量。在步驟46 處,若干操作發生。對於該量化表內之各項目,過程會確 定是否此項目係根據移位量s而在2的倍數範圍內,如下 式所示: 2s<q<2s+1.................................(5) 因爲數目q已知自該量化矩陣表,爲了確定該移位値s, 若來自量化表之項目係如下式所示地向下移位, q,==q»s....................⑹ q’’=»(s+l)…… ........⑺ 10 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 裝---- 訂--------- 經濟部智慧財產局員工消費合作社印製 516320 A7 B7 ---- 五、發明說明(1) 且若q,>0及q’’=〇時,則S爲移位値。若q係以二進位形 式表示時,則移位値S爲位兀之數目減去1。例如若q=9 ,則取4個位元來以二進位形式表示9,因此S==3 ,施加 於方程式⑸,23<9<23+1。 回到步驟46,變數b代表個別次暫存器中所代表之各 元件之運算元中的位元數目。一般熟練於本項技術之人士 將理解的是,b可爲8、16或32位元、或更大。在步驟46 中,乘法矩陣表M[i]含有用於各項目所計算之倍數因數以 及移位因數於移位表s[i]中,其所有均可儲存於RAM中。 例如爲加速量化期間之編碼過程,MMX可平行地執 行4個量化,亦即4個平行乘法接著4個平行移位。因爲 MMX指令需固定數量或長度以致能移位運算,故先如上 文所述地建構具有固定移位量之靜態(初始)的乘法及移位 表以用於可彳了範圍中之所有量化器値。例如若量化器範匿| 自10至8以用於5位元之MMX操作時,則整個範圍將爲P This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 516320 A7 ---- B7 V. Description of the invention (f) Matrix items to adjust the initial multiplication The table M [i] and the initial bit table S [i] construct a group (or dynamic) multiplication and shift table. For example, reference will be made in more detail to the maximum displacement of the predetermined group described in FIG. 6 obtained by determining and selecting the item having the minimum value for the predetermined group, that is, qp [i] of Smin. As shown in FIG. 3, the module 50 operates in an iterative manner until the group table (Md [] 'M' [i]; Sd [] = S '[i]) is determined. For example, step 34 is a conditional statement that checks whether the item of the quantizer matrix table qp [i] is the first or new input. For each item in the quantizer matrix, step 36 will construct the group (dynamic) multiplication and shift, and write or store them in the group multiplication and shift table in step 38. The third module 60 receives as input the group multiplication and shift tables M '[i] and S' [i] output from the forming module 50. As will be described in further detail with reference to FIG. 7, the module 60 will perform the multiplication of the video image data d [64] with the group multiplication and shift table in order to generate a quantization matrix Z [64]. Referring to Fig. 5, the construction of the initial multiplication and shift matrix table is determined using repeated loops until all 値 of each table are obtained. At step 42 the counter is initially at zero; at step 44 it is incremented. At step 46, several operations occur. For each item in the quantization table, the process will determine whether the item is within a multiple of 2 according to the shift amount s, as shown in the following formula: 2s < q < 2s + 1 ......... ........ (5) Because the number q is known from the quantization matrix table, in order to determine the shift 値 s, if it comes from the quantization table The items are shifted downward as shown in the following formula, q, == q »s .......... ⑹ q '' =» (s + l ) …… ........ ⑺ 10 This paper size applies to Chinese National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page) Order --------- Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 516320 A7 B7 ---- V. Description of the invention (1) and if q, > 0 and q '' = 0, then S is shift 値. If q is expressed in binary form, shift 値 S by the number of bits minus 1. For example, if q = 9, then 4 bits are taken to represent 9 in binary form, so S == 3, which is applied to equation ⑸, 23 < 9 < 23 + 1. Returning to step 46, the variable b represents the number of bits in the operands of each element represented in the individual sub-register. Those of ordinary skill in the art will understand that b may be 8, 16, or 32 bits, or greater. In step 46, the multiplication matrix table M [i] contains the multiplication factors and shift factors calculated for each item in the shift table s [i], all of which can be stored in the RAM. For example, to speed up the encoding process during quantization, MMX can perform 4 quantizations in parallel, that is, 4 parallel multiplications followed by 4 parallel shifts. Because the MMX instruction requires a fixed number or length to enable the shift operation, a static (initial) multiplication and shift table with a fixed shift amount is first constructed as described above for all quantizers in the available range value. For example, if the quantizer range | from 10 to 8 for 5-bit MMX operation, the entire range would be

Qf={l、2、3、4、5、6、7、8},乘數之乘法表將爲 M={32、16、11、8、6、5、5、4},以及移位表將爲 s={5 、5、5、5、5、5、5、5}。爲適合此SIMD架構,移位値必 須保持相同以用於各量化器値。然而,由於較少數位之使 用,該等靜態表並不會獨自算地補償精確性之損失。如下 文所述地,藉使用最多之數位來執行擔負乘法及移位之反 向,可增加精確度。 描述第6圖之方法步驟的另一方式係寫入動態乘法及 移位表以用於模組50。在MMX施行之例中,由於使用於 (請先閱讀背面之注音?事項再填寫本頁) 裝Qf = {l, 2, 3, 4, 5, 6, 7, 8}, the multiplier table will be M = {32, 16, 11, 8, 6, 5, 5, 4}, and shift The table will be s = {5, 5, 5, 5, 5, 5, 5, 5}. To fit this SIMD architecture, the shifting 値 must remain the same for each quantizer 値. However, due to the use of fewer digits, these static tables do not alone compensate for the loss of accuracy. As described below, by using the most digits to perform the inverse of multiplication and shift, accuracy can be increased. Another way to describe the method steps of FIG. 6 is to write a dynamic multiplication and shift table for the module 50. In the example of MMX implementation, because it is used in (Please read the phonetic on the back? Matters before filling out this page)

516320 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(/p) 代表總計64位元之4個資料次暫存器的架構,各元件扮演 長度16位元的資料形式。在步驟51處’分別地讀入所輸 入之乘法及移位矩陣表M[64]及S[64]之項目。替換性地, 如步驟54所示,該等表之値可載入自RAM或若干其他的 儲存媒體。不論表M[64],S[64]之讀入方式爲何,步驟53 指出一個對照指令係執行於該等初始表之上。更特別地, 在步驟53中,各乘法及移位表內之項目係藉相對應於該 SIMD架構之資料暫存器數目之預定大小(例如4個)的群組 而選擇(例如MMX處理器具有4個次暫存器,各爲16位元 長度以用於總計64位元)。因此,用於利用MMX技術之 本發明的較佳實施例,各乘法及移位表具有16個各4個項 目的群組。步驟53可本質地描述爲載入各乘法及移位表之 下一 4個項目以用於各該等群組。名稱Mp及Sp僅指示在 一時間處同時取出或檢驗預定數目(例如4個)之個別表內 之項目。將由一般熟習於本項技術之人士理解的是,各預 定之數目將對應於所使用之特定SIMD架構之次暫存器的 數目。在步驟52中之計數器初始於零而根據條件性步驟 58而增量於步驟57中。爲簡潔起見,在步驟54與S3間 之計數器的初始化,群組=0並未予以顯示。 爲取得利用SIMD架構之最高精確度,使用中間乘法 表及中間移位表。該等表將根據量化器輸入來動態地調整 乘法及移位運算,只要有恆常之移位因數供上述實例中每 4個量化用。例如,用於任何既定之量化器輸入(例如8><8 量化表),中間乘法表M,[i]及中間移位表S,[i]係藉同時處 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297 (請先閱讀背面之注意事項再填寫本頁) _ 装--- 訂--------- 12 經濟部智慧財產局員工消費合作社印製 516320 A7 __ B7 五、發明說明(//) 理每個預定數目之項目(例如4個以用於MMX技術)於各矩 陣中而予以建構。無論何時當輸入新的量化器表之時,該 等中間表需予以更新,其通常係以訊框接著訊框來完成。 然而,將由一般熟習於本項技術人士理解的是,此輸入可 視需要地以巨區塊接著巨區塊來施行。此外,所輸入之量 化器表係以SIMD指令可直接地提取及使用該資料之方式 儲存。 仍請參閱第6圖,爲完成上述結果,步驟55將確定最 大的移位量以用於各預定群組或組之項目。也就是說,該 移位量係藉檢驗群組內之各元件而取得。當利用處理器邏 輯予以倒數或反向時,具有最小値之項目將需要最小的移 位量。該最小的移位量係相對應之資料次暫存器可向上地 移位而不會超過次暫存器之實際限制的位元量。因此,該 移位量將根據項目群組內之最小値而確定,如藉Smin=min (Si、S2、S3、S4)予以指定。在步驟56中,施行該等群 組乘法及移位表來產生M’[i]及S’[i]。如下一方程式所示 M,[i]=M[i]»(SrSmin)...................⑻ 該群組乘法矩陣表M’[i]係建構自原始之乘法値M[i]藉原始 移位量Si減去最小移位量Smin予以移位而得。所以,第一 群組之預定項目{M’[l]、M’[2]、M’[3]、M’[4]}將個別地平 行地處理於各4個資料次暫存器或元件中。動態移位表係 藉使所有項目相等於Smin而予以建構。所以S’[i]=Smin。因 此,用於第一群組之 4 個項目,S’[l]=S’[2]=S’[3]=S’[4]=Smin。 13 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)516320 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs V. Invention Description (/ p) Represents the structure of four data sub-registers with a total of 64 bits, and each component plays a 16-bit data form. At step 51 ', the items of the entered multiplication and shift matrix tables M [64] and S [64] are read in, respectively. Alternatively, as shown in step 54, the tables may be loaded from RAM or several other storage media. Regardless of how the tables M [64] and S [64] are read, step 53 indicates that a comparison instruction is executed on the initial tables. More specifically, in step 53, the items in each multiplication and shift table are selected (for example, MMX processor) by a group of a predetermined size (for example, 4) corresponding to the number of data registers of the SIMD structure. There are 4 sub-registers, each 16-bit long for 64-bit total). Therefore, in the preferred embodiment of the present invention for utilizing MMX technology, each multiplication and shift table has 16 groups of 4 items each. Step 53 can be essentially described as loading the next 4 entries of each multiplication and shift table for each of these groups. The names Mp and Sp only indicate that a predetermined number (for example 4) of the items in the individual tables are simultaneously taken out or inspected at one time. It will be understood by those generally familiar with this technology that each predetermined number will correspond to the number of secondary registers of the particular SIMD architecture used. The counter in step 52 is initially zero and incremented in step 57 according to conditional step 58. For brevity, the initialization of the counter between steps 54 and S3, group = 0 is not displayed. To obtain the highest accuracy using the SIMD architecture, an intermediate multiplication table and an intermediate shift table are used. These tables will dynamically adjust the multiplication and shift operations according to the quantizer input, as long as there is a constant shift factor for every 4 quantizations in the above example. For example, for any given quantizer input (for example, 8 > < 8 quantization table), the intermediate multiplication table M, [i] and the intermediate shift table S, [i] are simultaneously applied to Chinese paper standards by applying this paper scale (CNS) A4 specification (210 X 297 (please read the precautions on the back before filling out this page) _ installed --- ordered ---------- 12 printed by the Intellectual Property Bureau employee consumer cooperative of the Ministry of Economic Affairs 516320 A7 __ B7 5. Invention description (//) Each predetermined number of items (for example, 4 for MMX technology) is constructed in each matrix. Whenever a new quantizer table is entered, these intermediates The table needs to be updated, and it is usually completed with a frame followed by a frame. However, it will be understood by those skilled in the art that this input can be implemented in giant blocks followed by giant blocks as needed. In addition, all The input quantizer table is stored in such a way that the SIMD instruction can directly extract and use the data. Still referring to Figure 6, in order to complete the above results, step 55 will determine the maximum shift amount for each predetermined group or Group of items. That is, the amount of displacement is checked by Obtained by each component in the group. When the processor logic is used to count down or reverse, the item with the smallest 値 will require the smallest shift amount. The smallest shift amount is the corresponding data register. The amount of bits that can be shifted without exceeding the actual limit of the secondary register. Therefore, the shift amount will be determined according to the minimum value in the project group, such as Smin = min (Si, S2, S3, S4 ) Is specified. In step 56, these group multiplications and shift tables are performed to generate M '[i] and S' [i]. As shown in the following formula, M, [i] = M [i] »( SrSmin) ......... The group multiplication matrix table M '[i] is constructed from the original multiplication 値 M [i] borrows the original shift amount Si It is obtained by subtracting the minimum shift amount Smin and shifting. Therefore, the predetermined items {M '[l], M' [2], M '[3], M' [4]} of the first group will be individually Processed in parallel in each of the four data sub-registers or components. The dynamic shift table is constructed by making all items equal to Smin. So S '[i] = Smin. Therefore, it is used in the first group. 4 items, S '[l] = S' [2] = S '[3] = S' [4] = Smin. 13 books Paper size applies to China National Standard (CNS) A4 (210 X 297 mm)

-------% (請先閱讀背面之注意事項再填寫本頁) 516320 A7 B7 五、發明說明 經濟部智慧財產局員工消費合作社印製 因爲已使用於預定群組之各該等移位量相等於smin,故群 組中之各元件會藉恆常之移位量予以移位。所以各資料次 暫存器會以恆常量來移位,此需與SIMD操作一致,其中 4個元件之移位可以一週期執行;然而,移位量必須相同 以用於各元件。在本發明之下,雖然爲槪略之近似法,但 將獲得最大的精確度而適合於SIMD架構之恆常移位要件 〇 參閱第7圖,藉模組60利用所建構之表所完成之乘法 典型地以矩陣格式中之資料區塊施行。步驟62初始化用於 該方法之變數而重複直到已處理所有的群組爲止。該變數 增量於步驟68中,而是否所有群組(例如16個)均已處理 之情形則藉步驟70予以檢查。在步驟64中,載入資料區 塊d[64]以及群組乘法矩陣表M’[i]及移位矩陣表S’[i]。具 有各該等表,預定群組之項目(例如4個)將載入於該等元 件或次暫存器之內。所以(1[山、d2、d3、d4]代表矩陣表 d[64]之第一個4個元件。同樣地Μ’{Μ、、M’2、Μ、、 M’4}代表群組乘法矩陣表之第一個4個項目,以及SIS、 、S’2、S’3、S’4}代表群組移位矩陣表之第一個4個項目。 在步驟66中,量化過程係根據方程式(4)中所示之乘法及 向下移位而施行。用於各群組及使用個別之次暫存器,來 自乘法表M’[i]之個別値係平行地與來自資料表d之項目的 群組相乘以及根據群組移位表之內容向下移位以使視頻影 像資料能在編碼過程之期間量化。 以上述之較佳實施例,可描繪用於5位元MMX運算 14 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) __ 裝 11111111 _ 經濟部智慧財產局員工消費合作社印製 516320 A7 B7 五、發明說明(f)) 之實例。整個範圍爲 Qf=d=={l、2、3、4、5、6、7、8}。 初始之乘法矩陣爲M={32、32、22、32、26、21、18、 32}以及初始之移位矩陣爲{5、6、6、7、7、7、7、8}。取 自量化器表之項目的群組利用最大5位元。接著,若qp ={2、3、5、7}以用於各4個既定的量化器輸入, Mpl=M’[l]={32、22、26、18}以用於第一群組以及 Spl=S’[l]={6、6、7、7}。則在動態地使該移位表Ss均一 化時,貝[J Mp2={32、22、13、9}及 Sp2={6、6、6、6}。 在另一實施例中,該動態表之建構係施行於當使用量 化於標度離散餘弦變換(DCT)之前。乘數及移位則藉繼承 自該標度DCT之最後級的加權係數予以調整。 將由該等熟習於本項技術著理解的是,本發明可以用 種種方式實施且可應用於遵循ISO MPEG、MPEG-1、 MPEG-2、MPEG-4、H.263、Η.263+標準,以及 ITU-Telecom H.262之視頻標準。例如熟知地,藉分割各訊框爲 資料區塊(例如16X16,或8X8之巨區塊)所壓縮之個別視 頻訊框之序列可讀取自DVD-ROM、硬碟、任何儲存媒介 以及捕獲裝置、即時地加以數位化並且傳送至RAM。此係 顯示於第8圖之中。本發明可應用於任一該等方法所產生 之視頻資料。如第9圖中所示,本發明亦可藉執行產生資 料之區塊d[64]之DCT而應用於其中各資料區塊可藉區塊 內方法所壓縮之情形。資料之區塊d[64]形成進入量化器之 第一個輸入而第二個輸入係源自量化矩陣qp [64],使得量 化器之輸出爲Z[64]。本發明亦可以用第9圖中所示之區塊 15 (請先閱讀背面之注意事項再填寫本頁) I --------訂-----I ---· 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 516320 A7 B7 五、發明說明( 間之方法予以作業,其中在輸入於DCT功能且之後進入量 化過程之前,自資料之區塊減去稱爲預測區塊之任一資料 區塊(例如次組合)。 此外,諸如該等使用於GIF及JPEG檔之靜止影像亦 可以用本發明來量化。 雖然本發明已特定地參照等所描繪之實施例予以描述 ,但將理解的是,種種改變、修正及改編均可根據本發明 之揭示予以完成,且均打算涵蓋於本發明之範疇內。雖然 本發明已結合目前認爲係最實用及最佳實施例予以描述, 但將理解的是,本發明並未受限於所揭示之該等實施例, 而是相反地,本發明打算涵蓋種種包含於附錄申請專利範 圍之範疇內的修正及等效設置。 元件符號 (請先閱讀背面之注意事項再填寫本頁) 裝--------訂--------- 經濟部智慧財產局員工消費合作社印製 1 操作序列 2 源 3 預處理 4 編碼 5 儲存媒體 6 解碼器 7 後處理 8 顯TpC 10 離散餘弦變換 12 反離散餘弦變換 14 量化(Q) 16 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 516320 A7 _B7 五、發明說明({f) 16 反量化(Q“) 18 可變長度編碼(VLC) 40 第一模組 50 第二模組 60 第三模組 (請先閱讀背面之注意事項再填寫本頁)-------% (Please read the notes on the back before filling out this page) 516320 A7 B7 V. Description of the invention Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs The bit amount is equal to smin, so each element in the group will be shifted by a constant shift amount. Therefore, the data registers are shifted by constant constants, which needs to be consistent with SIMD operation. The shift of 4 components can be performed in one cycle; however, the shift amount must be the same for each component. Under the present invention, although it is an approximate approximation method, it will obtain the maximum accuracy and be suitable for the constant shifting requirements of the SIMD architecture. Referring to FIG. 7, the module 60 uses the constructed table to complete Multiplication is typically performed in blocks of data in a matrix format. Step 62 initializes the variables for the method and repeats until all groups have been processed. This variable is incremented in step 68, and whether all groups (e.g. 16) have been processed is checked in step 70. In step 64, the data block d [64] and the group multiplication matrix table M '[i] and the shift matrix table S' [i] are loaded. With each of these tables, a predetermined group of items (for example, 4) will be loaded into these components or sub-registers. So (1 [Mountain, d2, d3, d4] represents the first 4 elements of the matrix table d [64]. Similarly, M '{M ,, M'2, M ,, M'4} represents group multiplication The first 4 items of the matrix table, and SIS,, S'2, S'3, S'4} represent the first 4 items of the group shift matrix table. In step 66, the quantization process is based on The multiplication and downward shift shown in equation (4) are performed. For each group and use of individual sub-registers, the individual lines from the multiplication table M '[i] are parallel to the data from table d. Group multiplication of the items and shift down according to the contents of the group shift table so that the video image data can be quantized during the encoding process. With the above-mentioned preferred embodiment, it can be depicted for a 5-bit MMX operation. 14 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page) __ Packing 11111111 _ Printed by the Intellectual Property Bureau of the Ministry of Economic Affairs and Consumer Cooperatives 516320 A7 B7 5. Examples of invention description (f)). The entire range is Qf = d == {l, 2, 3, 4, 5, 6, 7, 8}. The initial multiplication matrix is M = {32, 32, 22, 32, 26, 21, 18, 32} and the initial shift matrix is {5, 6, 6, 7, 7, 7, 7, 8}. Groups of items taken from the quantizer table use a maximum of 5 bits. Next, if qp = {2, 3, 5, 7} for each of the 4 predetermined quantizer inputs, Mpl = M '[l] = {32, 22, 26, 18} for the first group And Spl = S '[l] = {6, 6, 7, 7}. Then, when the shift table Ss is dynamically uniformized, [J Mp2 = {32, 22, 13, 9} and Sp2 = {6, 6, 6, 6}. In another embodiment, the dynamic table is constructed before the usage is quantified before the scaled discrete cosine transform (DCT). Multipliers and shifts are adjusted by weighting coefficients inherited from the last stage of the scaled DCT. It will be understood by those familiar with this technology that the present invention can be implemented in various ways and can be applied to comply with ISO MPEG, MPEG-1, MPEG-2, MPEG-4, H.263, Η.263 + standards, And ITU-Telecom H.262 video standard. For example, it is well known that the sequence of individual video frames compressed by dividing each frame into a data block (such as a 16X16 or 8X8 giant block) can be read from DVD-ROM, hard disk, any storage medium, and capture device Digitize in real time and transfer to RAM. This system is shown in Figure 8. The invention can be applied to video data produced by any of these methods. As shown in Fig. 9, the present invention can also be applied to a situation where each data block can be compressed by the in-block method by performing the DCT of the block d [64] that generates the data. The block d [64] of the data forms the first input into the quantizer and the second input is derived from the quantization matrix qp [64], so that the output of the quantizer is Z [64]. The present invention can also use block 15 shown in Figure 9 (please read the notes on the back before filling this page) I -------- Order ----- I --- · This paper The scale is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 516320 A7 B7 V. Description of the invention (the method is to be operated, in which the data is reduced from the block of data before it is input into the DCT function and then enters the quantization process. Any data block (such as a sub-combination) to be called a prediction block. In addition, still images such as those used in GIF and JPEG files can also be quantified by the present invention. Although the present invention has been specifically described with reference to etc. The embodiments are described, but it will be understood that various changes, modifications, and adaptations can be completed in accordance with the disclosure of the present invention, and all are intended to be included in the scope of the present invention. Although the present invention has been combined with what is currently considered to be the most practical And the preferred embodiments are described, but it will be understood that the present invention is not limited to the disclosed embodiments, but instead, the present invention is intended to cover various modifications included in the scope of the appended patent application. And equivalent settings Component symbols (please read the precautions on the back before filling this page) -------- Order --------- Printed by the Employees' Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs 1 Operation sequence 2 Source 3 Pre-processing 4 Encoding 5 Storage media 6 Decoder 7 Post-processing 8 Explicit TpC 10 Discrete Cosine Transform 12 Inverse Discrete Cosine Transform 14 Quantization (Q) 16 This paper size applies Chinese National Standard (CNS) A4 (210 X 297 mm) 516320 A7 _B7 V. Description of the invention ({f) 16 Inverse quantization (Q ") 18 Variable length coding (VLC) 40 First module 50 Second module 60 Third module (please read the precautions on the back first) (Fill in this page)

· ^1 —.1 ϋ I ϋ ϋ 一 =口、 ϋ ϋ ϋ I . 經濟部智慧財產局員工消費合作社印製· ^ 1 —.1 ϋ I ϋ ϋ I = Mouth, ϋ ϋ ϋ I. Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs

7 IX 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)7 IX This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)

Claims (1)

516320 餡 C8 D8 六、申請專利範圍 1. 一種用於在視頻影像資料編碼期間利用一個平行處 理複數個元件之處理器使係數量化的系統,其係包含: 第一模組,用於建構初始乘法表及初始移位表; 第二模組,用於根據最大的移位値來調整該初始乘法 表及該初始移位表而動態地建構群組乘法表及群組移位表 ,其中該最大的移位値係相關連於最小的量化値;以及 第三模組,用於平行地相乘該視頻影像資料與用於各 該等元件之該群組乘法表,及用於根據該群組移位表平行 地移位各該等元件,而在該編碼期間產生該資料之量化。 (請先閱讀背面之注意事項再填寫本頁) --------^ ·11111111 •^應 經濟部智慧財產局員工消費合作社印制衣 1 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)516320 Stuffing C8 D8 6. Application for Patent Scope 1. A system for quantizing coefficients by using a processor that processes a plurality of elements in parallel during video image data encoding, which includes: a first module for constructing an initial multiplication Table and initial shift table; the second module is used to dynamically construct the group multiplication table and group shift table according to the maximum shift 値 to adjust the initial multiplication table and the initial shift table, where the maximum Is related to the smallest quantization; and a third module for parallelly multiplying the video image data with the group multiplication table for each of these components, and for The shift table shifts each of these elements in parallel, and a quantization of the data is generated during the encoding. (Please read the precautions on the back before filling out this page) -------- ^ · 11111111 • ^ Should be printed by the Consumers ’Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 1 The paper size applies to Chinese National Standard (CNS) A4 Specifications (210 X 297 mm)
TW90103748A 2000-02-22 2001-02-19 Implementation of quantization for SIMD architecture TW516320B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US18406600P 2000-02-22 2000-02-22

Publications (1)

Publication Number Publication Date
TW516320B true TW516320B (en) 2003-01-01

Family

ID=22675423

Family Applications (1)

Application Number Title Priority Date Filing Date
TW90103748A TW516320B (en) 2000-02-22 2001-02-19 Implementation of quantization for SIMD architecture

Country Status (3)

Country Link
AU (1) AU2001249992A1 (en)
TW (1) TW516320B (en)
WO (1) WO2001063923A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI384400B (en) * 2004-03-31 2013-02-01 Icera Inc Computer processor, method of operating the same and related computer program product for asymmetric dual path processing
US8484442B2 (en) 2004-03-31 2013-07-09 Icera Inc. Apparatus and method for control processing in dual path processor
US8484441B2 (en) 2004-03-31 2013-07-09 Icera Inc. Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5353026A (en) * 1992-12-15 1994-10-04 Analog Devices, Inc. Fir filter with quantized coefficients and coefficient quantization method
FR2746564B1 (en) * 1996-03-22 1998-06-05 Matra Communication METHOD FOR CORRECTING NON-LINEARITIES OF AN AMPLIFIER, AND RADIO TRANSMITTER IMPLEMENTING SUCH A METHOD

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI384400B (en) * 2004-03-31 2013-02-01 Icera Inc Computer processor, method of operating the same and related computer program product for asymmetric dual path processing
US8484442B2 (en) 2004-03-31 2013-07-09 Icera Inc. Apparatus and method for control processing in dual path processor
US8484441B2 (en) 2004-03-31 2013-07-09 Icera Inc. Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths
US9477475B2 (en) 2004-03-31 2016-10-25 Nvidia Technology Uk Limited Apparatus and method for asymmetric dual path processing

Also Published As

Publication number Publication date
WO2001063923A1 (en) 2001-08-30
AU2001249992A1 (en) 2001-09-03

Similar Documents

Publication Publication Date Title
CA2653693C (en) Reduction of errors during computation of inverse discrete cosine transform
JP5623565B2 (en) Apparatus and method for encoding and calculating a discrete cosine transform using a butterfly processor
JP4924904B2 (en) Efficient encoding / decoding of sequences of data frames
US5825680A (en) Method and apparatus for performing fast division
US20100034286A1 (en) Low complexity and unified transforms for video coding
US7738697B2 (en) Color transformation method and apparatus with minimized transformation errors
JP2010501128A (en) Efficient fixed-point approximation of forward and inverse discrete cosine transforms
EP0790579A2 (en) High-speed digital video decompression
US20230276023A1 (en) Image processing method and device using a line-wise operation
RU2439682C2 (en) Reduction of errors during calculation of reverse discrete cosine conversion
TW516320B (en) Implementation of quantization for SIMD architecture
US6850566B2 (en) Implementation of quantization for SIMD architecture
TW401705B (en) Method and apparatus for selecting a quantization table for encoding a digital image
JP5129248B2 (en) Efficient fixed-point approximation of forward and inverse discrete cosine transforms
Van Waveren et al. Real-time YCoCg-DXT compression
CN100396101C (en) Image compression apparatus generating and using assistant images and a method thereof
JPH07152779A (en) Processing method for detecting moving picture index and moving picture processor having moving picture index detection processing function
RU2496139C2 (en) Efficient fixed-point approximation for direct and inverse discrete cosine transforms
Paim et al. Exploring approximations in 4-and 8-point DTT hardware architectures for low-power image compression
Hashim et al. Correlated Block Quad-Tree Segmented and DCT based Scheme for Color Image Compression
TWI227629B (en) Method and related processing circuits for reducing memory accessing while performing de/compressing of multimedia files
Milleson Partial Image Decoding On The GPU For Mobile Web Browsers
Varma et al. ROI based image compression in baseline JPEG
JPH0595483A (en) Device and method for picture compression

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees