TW539989B - Multiplier architecture in a general purpose processor optimized for efficient multi-input addition - Google Patents
Multiplier architecture in a general purpose processor optimized for efficient multi-input addition Download PDFInfo
- Publication number
- TW539989B TW539989B TW90107607A TW90107607A TW539989B TW 539989 B TW539989 B TW 539989B TW 90107607 A TW90107607 A TW 90107607A TW 90107607 A TW90107607 A TW 90107607A TW 539989 B TW539989 B TW 539989B
- Authority
- TW
- Taiwan
- Prior art keywords
- bit
- patent application
- operands
- item
- scope
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/57—Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/505—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
- G06F7/509—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination for multiple operands, e.g. digital integrators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
- G06F2207/3808—Details concerning the type of numbers or the way they are handled
- G06F2207/3812—Devices capable of handling different types of numbers
- G06F2207/382—Reconfigurable for different fixed word lengths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
- G06F2207/3808—Details concerning the type of numbers or the way they are handled
- G06F2207/3828—Multigauge devices, i.e. capable of handling packed numbers without unpacking them
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Optimization (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
5399^9 五、發明說明(1) 發明背景 數位信號處理器(DSPs)通常用於各種多媒體應用,如數 位視汛、影像和聲音。DSPs可處理數位信號以產生並開啟 如多媒體之類的檔案。 Μ P E G 1 、Μ P E G - 2、Μ P E G - 4和Η. 2 6 3乃數位視訊壓縮標準 及檔案格式。這些標準藉由儲存視訊訊框之間的主要變化 而非儲存整個訊框以達到數位視訊信號之高壓縮速率。視 訊資料可接著使用許多不同的技術予以進一步壓縮。 DSP乃用於在壓縮期間對視訊資訊執行各種操作。這些 禕作可包含動作搜尋和空間性插補演算法。主要目的在一於 f測鄰近訊框内區塊之間的失真。這些操作具有計算精密 往且需要高資料產出量。 -T 0 這些標準中的MPEG系列正在持續發展以配合遞增之多媒 髏應用與檔案之頻寬需求。每一個新標準版本皆呈現更為 精密的演算法,該等演算法對用於MPEG相容性視訊處理設 備之DSPs定出更大的處理需求。 視訊處理設備製造商通常依靠特殊應用積體電路 (ASICs) ’該等ASICs乃在MPEG及H.263標準之規範下予以 定做以用於視訊編碼。然而,A S I C s之設計複雜、生產成 本高且其應用與通用D S P s相比較缺乏彈性。 圖示簡述 圖1為一視訊處理器之概要圖,其包含一根據本發明之 一具體實施例之雙重模式通用數位信號處理器(D § P )。 圖2為一圖1所示執行單元之概要圖。 -5399 ^ 9 V. Description of the invention (1) Background of the invention Digital signal processors (DSPs) are commonly used in various multimedia applications, such as digital video, video and sound. DSPs can process digital signals to generate and open files such as multimedia. MPEG P1, MPEG-2, MPEG-4 and MPEG-2 3 are digital video compression standards and file formats. These standards achieve high compression rates for digital video signals by storing the main changes between video frames rather than the entire frame. Video data can then be further compressed using many different techniques. DSP is used to perform various operations on video information during compression. These operations can include motion search and spatial interpolation algorithms. The main purpose is to measure the distortion between blocks in adjacent frames at f. These operations are computationally sophisticated and require high data throughput. -T 0 The MPEG series in these standards is continuously evolving to meet the increasing bandwidth requirements of multimedia applications and archives. Each new standard version presents more sophisticated algorithms that place greater processing demands on the DSPs used in MPEG-compatible video processing equipment. Manufacturers of video processing equipment often rely on application-specific integrated circuits (ASICs) ′ These ASICs are custom-made for video encoding under the specifications of the MPEG and H.263 standards. However, the design of A S I C s is complicated, the production cost is high, and its application is less flexible than that of general D S P s. Brief Description of the Drawings Figure 1 is a schematic diagram of a video processor including a dual-mode general-purpose digital signal processor (D § P) according to a specific embodiment of the present invention. FIG. 2 is a schematic diagram of an execution unit shown in FIG. 1. FIG. -
第5頁 發明說明(2) 圖3為一表示一般步驟之流程圖, 流程圖實現-執行單元中的加法器樹==定序器依照此 圖4為一表示一般步驟之流程圖工 流程圖實現一執行單元中的乘法哭楔圖^中的定序器依照此 ;5為圖1之執行單元内另-加;:樹織配… 圖6為-根據本發明另一具體實施例之曰…概要圖。 态树譜配置之概要圖。 執订早凡中加法 不同圖示中相同的夂老铃綠 > 主 W芩亏付諕代表相同的元件。 詳細發明說明 根據一具體實施例,一用於通 12之執行單元10可操作於至少兩種理器(DSP) 法器模式中,執行單元可操作為一i "在正常尚’’乘 法器樹譜模式巾,執行單元可操 乘:器。在-加 兀之8位元加法器。 用於相加多重運鼻 力:法器樹譜模式乃適用於在相對較低位元寬 (如8位TL)上執行計算精密操作,此 '貝’、串 式中H. 263與MPEG系列之規範下用协考工:在視訊壓縮格 本具體實施例,通用DSP 12為勺入;处理視訊資料。根據 和一用於蚀六n南 為匕3 一中央處理單元(CPU)7 ,於儲存視訊處理演算法與視訊資料的記憶之 視訊處理器5之部份。 ,據本具體貫施例,加法器樹譜模式之加法器樹譜架構 ,由再利用共同於(c〇mm〇n t〇)傳統通用⑽。之丨6χι 6乘 /為架構中的先前存在之算術邏輯單元(ALUs)予以構成。 4乃以較少的額外硬體於執行單元内提供一雙重操作模-Description of the invention on page 5 (2) Figure 3 is a flowchart showing the general steps. The implementation of the flowchart-the adder tree in the execution unit == sequencer. According to this Figure 4 is a flowchart showing the general steps. The sequencer in the implementation of the multiplication wedge graph ^ in an execution unit is implemented in accordance with this; 5 is an addition in the execution unit of FIG. 1: tree weaving ... FIG. 6 is-according to another embodiment of the present invention … Summary diagram. An overview of the state tree spectrum configuration. Order early additions and additions The same 夂 Old Bell Green in different illustrations > The main W 芩 Loss payment 諕 represents the same components. Detailed description of the invention According to a specific embodiment, an execution unit 10 for communication 12 can operate in at least two processor (DSP) processor modes, and the execution unit can operate as an " in a normal multiplier " Tree spectrum mode, the execution unit can operate on :. 8-bit adder in-plus. Used to add multiple nose powers: The instrument tree mode is suitable for performing calculation precision operations on relatively low bit widths (such as 8-bit TL). This 'shell', string type H. 263 and MPEG series Under the standard use of co-testers: in the specific embodiment of the video compression format, the general DSP 12 is scooped in; processing video data. According to and a part of the video processor 5 which is used to etch six n south to a central processing unit (CPU) 7 and store video processing algorithms and video data in memory. According to this specific implementation example, the adder tree spectrum structure of the adder tree spectrum mode is shared by (common-common) (common general). 6 × 6 multiplying / constitutes pre-existing arithmetic logic units (ALUs) in the architecture. 4 provides a dual operation mode in the execution unit with less extra hardware-
539989539989
五、發明說明(3) 式。 圖2更為詳盡地描述雙重模式執行單元1 〇。執行單元工〇 包含兩個主要的輸入多工器(MUXs) 14、兩個次要的$入 MUXs 1 6、一個空間性乘積產生器1 8、一個乘法器陣列 20、兩個中繼暫存器22、兩個中繼MUXs 24、一個向量人 併(mer g i ng) ALU 2 6、一個累加器2 8、一個累加器μ υχ 3 0、一個輸出暫存器3 2、以及一個將執行單元所產生的總 和輸出至一結果匯流排36之輸出MUX 34。然而,本發明之 範缚不侷限於具有所有示於圖2之元件之具體實施例。 執行單元1 〇之元件間的交互作用將以各種操作模式中執 行單元之操作予以說明。 , 執行單元1 0之邏輯控制可介由一定序器1 〇 〇 (圖1 )予以實 現。定序器1 00抓取來自指令快取記憶體1 02之指令並送出 控制彳5 5虎以致能在各自板式中運作所需的元件。指令快取 記憶體1 0 2内的指令可予以程式規劃或硬體連線。 在乘法器與加法器樹譜兩模式中,執行單元1 〇在匯流排 控制器106的控制下自運算元匯流排104接收兩對具16位元 之字元。每一對具16位元之字元皆輸入至主要的MUXs 14 其中之一。主要的M U X s 1 4乃由定序器1 0 0予以控制以選擇 炎輸出一單一的具16位元之字元。 圖3為一描述一般步驟的流程圖,定序器1 〇 〇依照此流程 _實現加法器樹譜模式。 定序器1 00於方塊2 0 0中控制切換器50以將來自主要MUXs 尤具16位元之字元輸出至次要MUXs 16。每一個次要的V. Description of the invention Formula (3). Figure 2 describes the dual mode execution unit 10 in more detail. The execution unit work includes two main input multiplexers (MUXs) 14, two minor $ input MUXs 1 6, a spatial product generator 18, a multiplier array 20, two relay temporary storage Device 22, two relay MUXs 24, a vector mer ng ALU 2 6, an accumulator 2 8, an accumulator μ υχ 3 0, an output register 3 2, and an execution unit The resulting sum is output to the output MUX 34 of a result bus 36. However, the scope of the present invention is not limited to a specific embodiment having all the elements shown in FIG. The interaction between the elements of the execution unit 10 will be explained in terms of the operation of the execution unit in various operating modes. The logic control of the execution unit 10 can be implemented through a certain sequencer 100 (Figure 1). The sequencer 100 fetches the instructions from the instruction cache memory 102 and sends out the control unit 5 5 to enable it to operate the required components in the respective board. Instruction cache The instructions in memory 102 can be programmed or hardwired. In the multiplier and adder tree spectrum modes, the execution unit 10 receives two pairs of 16-bit characters from the operand bus 104 under the control of the bus controller 106. Each pair of 16-bit characters is entered into one of the main MUXs 14. The main M U X s 1 4 is controlled by the sequencer 100 to select a single 16-bit character. FIG. 3 is a flowchart describing general steps. The sequencer 100 implements the adder tree spectrum mode according to this flow. Sequencer 100 controls switcher 50 in block 2000 to output 16-bit characters from the primary MUXs to the secondary MUXs 16. Every secondary
第7頁 539989 五、發明說明(4) MUX皆受控制以分別於方塊2 0 2和2 04選擇兩個8位元字元, 運算元A與B以及C與D,而於乘法器陣列20中予以相加。 乘法器陣列2 0包含許多列置於許多層級中交互連接之 A L U s 6 0。在一般的1 6 X1 6乘法器中,乘法器陣列之a L U s具 有7個層級。 在二進位加法中,將兩個η位元數目相加會產生一最大 的(η + 1 )位元數目,多出來的位元代表一新的二進位位置 (place),2(ηΗ)。例如,將兩個8位元數目相加會產生一9 位元總和,如底下方程式所示: I 1 1 1 1 1 1 12 = 2 5 510 + 11111111〇-2 5 j10 ’ II 11111102二51010 於區塊206中8位元運算元A和B可直接輸入至ALU 62並於 區塊20 8中予以相加以產生9位元部份總和e。同樣地,於 區塊210中運算元C和D可直接輸入至ALU 64並於區塊212中 予以相加以產生9位元部份總和f。 於區塊2 14與2 16中9位元部份總和E與F可輸出至ALu 66 並於區塊218中予以相加以產生1〇位元總和G。於區塊22〇 中總和G乃由乘法器陣列2 〇予以輸出。 根據一具體實施例,定序器100控制執行單元1〇以繞過 ^法為'陣^120之乘法器架構下行(d〇wnstream)(例如alu 和累加器28),並將總和G直接輸出至輸出観34。本具 體實施例有利於視訊編碼操作,如利用簡單總和的空間性 插補演算法。Page 7 539989 V. Description of the invention (4) MUX is controlled to select two 8-bit characters in blocks 2 2 and 2 04 respectively, operands A and B and C and D, and multiplier array 20 Add them. The multiplier array 2 0 contains many columns A L U s 6 0 which are inter-connected in many levels. In a general 16 × 16 multiplier, a L U s of the multiplier array has 7 levels. In binary addition, adding the two η-bit numbers will produce a maximum number of (η + 1) bits. The extra bits represent a new binary place, 2 (ηΗ). For example, adding two 8-bit numbers will produce a 9-bit sum, as shown in the formula below: I 1 1 1 1 1 1 1 12 = 2 5 510 + 11111111〇-2 5 j10 'II 11111102 two 51010 at The 8-bit operands A and B in block 206 can be directly input to ALU 62 and added in block 20 8 to generate a 9-bit partial sum e. Similarly, the operands C and D in block 210 can be directly input to ALU 64 and added in block 212 to generate a 9-bit partial sum f. The 9-bit partial sums E and F in blocks 2 14 and 2 16 can be output to ALu 66 and added in block 218 to generate a 10-bit sum G. The sum G in block 22 is output by the multiplier array 20. According to a specific embodiment, the sequencer 100 controls the execution unit 10 to bypass the multiplier architecture (for example, alu and accumulator 28) and bypass the multiplier architecture of “matrix” (120), and directly outputs the sum G To output 観 34. This specific embodiment facilitates video encoding operations, such as a spatial interpolation algorithm using a simple sum.
539989 五、發明說明(5) 根據另一具體實施例,總和G乃經由中 繼MUX 24、及向量合併ALU 26 曰。f為、22中 存。接著將總和G送回累加器Μυχ 3〇並與來σ自目予加:儲 :之1〇位兀總和〇,一起輸入至向量合併ALU 26。G和(Γ I 總和可予以輸出並儲存於累 牙 之 加至累加哭中的數信3其後之10位元總和乃 乐力°。中的數值且巧個週期可重複多次。木呈,每你 例乃有利於視訊編碼操作, 一 ^ 算法。 卞卞如利用累加知作之動作搜尋演 ^ a ^ ^ , 1 〇 0 ^ ί Θ貝見采法窃杈式。定序器i00控制切 次要MUXS並於區塊3 0 0中將16位元字至;5“以 生器以作為被乘數ί和】。部份乘積產生器1δ包^積產 問矩陣’每一個邏輯及Μ皆操作16位元被乘數!之一^ π與16位元被乘數j之一個位元。該等部 元被乘數而言為16,乃送至位於區塊302之心哭位 20。於區塊3〇4中該等部份乘積乃予以縮減 2 j 位元字元,總和X及進位γ,其總和等於^成兩個32 t °Χ々Υ乃送至中繼暫存器22,並經由中繼MUXH^之總 里合併ALU 2 6中予以相加以產生乘向 器32與輸出MIU 34輸出至結果匯流排% ’或存 加法器樹譜模式經由累加器28作(s e ^目關的 作。較佳地,向量合併ALU 26乃一 4〇位元Au)以人累加操 C accommodt e f 〇r ^ ^ ^ /./. μ ^ 卜 口巧' 其它的力心;力二運ί期間的溢位。 。树%木構可用乘法器陣列中所具有的AUs 539989 五、發明說明(6) 予以實現。根據另一具體實施例,一具有如圖5所示相同 架構之執行單元可實現為雙重二運算元加法器樹譜。 根據本特定具體實施例,執行單元1 0乃受控制以繞過 A L U 6 6。9位元總和E與F中的每一個皆自乘法器陣列予以 直接輸出至輸出MUX 34或中繼暫存器22並執行如上所述之 累加操作。 除了於步驟2 0 8與2 1 2之後輸出總和S與F,定序器1 0 0依 照與圖3所示相同之一般步驟。 圖6描繪另一具體實施例,其中一更為複雜的加法器樹 譜架構乃實現於乘法器陣列中。一次要的MUX 1 6乃提供予 每一個16位元輸入。 說明圖6中左側的加法器樹譜架構1。應瞭解右側的加法 器樹譜架構2具有相同的構造與操作。定序器1 0 0依照與圖 3所示相同的一般步驟處理每一個加法器樹譜架構。 每一個次要MU X 1 6皆自運算元匯流排1 0 2 (圖1 )之1 6位元 字元輸入選擇兩個8位元運算元以產生運算元Ai、Bi、Q與 。Ai與匕MALU内相加以產生9位元部份總和Ei。同樣地, q與0}乃於ALU内相力口以產生9位元部份總和。Ei與?1乃於 A L U内相加以產生1 0位元總和Gi。 或者,總和與02 (來自右侧加法器樹譜架構)乃直接送 至輸出MUX 34以輸出至結果匯流排36,另一選擇為,送至 合併AL U 2 6作另一個加法操作及/或累加器操作。 一根據各種具體實施例含有一執行單元之通用DSP是有 優點的。此一 DSP能夠提供媲美希望用於支援動作搜尋演539989 V. Description of the invention (5) According to another specific embodiment, the sum G is relayed through the MUX 24 and the vector merge ALU 26. f is, 22 is stored. Then, the sum G is sent back to the accumulator Μυχ 30 and is added to the σ from the head: the sum of the 10-bit sum 0, which is input to the vector merge ALU 26. The sum of G and (Γ I can be output and stored in the number of letters that are added to the accumulated cry to the accumulated cry. 3 The next 10-bit sum is the Leli °. Each example is conducive to video encoding operations, a ^ algorithm. For example, using the accumulated search action to search performance ^ a ^ ^, 1 〇0 ^ ί 贝 See the mining method stealing. Sequencer i00 control Cut the secondary MUXS and pass the 16-bit word to in block 3 0; 5 "Take the generator as the multiplicand and sum]. Partial product generator 1δ includes the product matrix 每Μ both operate on the 16-bit multiplicand! One of ^ π and the 16-bit multiplicand j is a bit. The multiplicand of these components is 16, which is sent to the heart crying bit located in block 302. 20. The product of these parts in block 304 is reduced by 2 j-bit characters, the sum of X and the carry γ, the sum of which equals ^ to two 32 t ° χ々Υ, which is sent to the relay temporary storage The unit 22 is added to the ALU 26 through the total MUXH ^ to add the multiplier 32 and the output MIU 34 to the output bus% 'or the adder tree spectrum mode via the accumulator 28. (Se ^ head related work. Preferably, vector merging ALU 26 is a 40-bit Au) C accommodt ef ο ^ ^ ^ ^ /./. Μ ^ ^ 口 口 巧 ' The overflow during the period of the Second Movement of the Lithium Tree. The wood structure of the tree can be realized by the AUs in the multiplier array 539989 V. Description of the invention (6). According to another specific embodiment, one has the same as shown in Figure 5. The execution unit of the architecture can be implemented as a double binary operator adder tree spectrum. According to this specific embodiment, the execution unit 10 is controlled to bypass ALU 6 6. 9-bit sums E and F each The multiplier array is directly output to the output MUX 34 or the relay register 22 and performs the accumulation operation as described above. Except for the output sums S and F after steps 2 0 8 and 2 1 2, the sequencer 1 0 0 follows The general steps are the same as shown in Figure 3. Figure 6 depicts another embodiment in which a more complex adder tree spectrum architecture is implemented in a multiplier array. Primary MUX 1 6 is provided to each 16 Bit input. Explain the adder tree spectrum architecture on the left in Figure 6. 1. The addition on the right should be understood. The tree structure 2 has the same structure and operation. The sequencer 1 0 0 processes each adder tree structure according to the same general steps as shown in FIG. 3. Each minor MU X 1 6 is a self-convergence unit. Row 1 0 2 (Figure 1) 16-bit character input Select two 8-bit operands to generate operands Ai, Bi, Q, and Ai and add within MALU to produce a 9-bit partial sum Ei Similarly, q and 0} are matched in the ALU to produce a 9-bit partial sum. Ei with? 1 is added in A L U to produce a 10-bit sum Gi. Alternatively, the sum and 02 (from the right-side adder tree spectrum structure) are directly sent to the output MUX 34 to output to the result bus 36, and the other option is to send to the combined AL U 2 6 for another addition operation and / or Totalizer operation. A general purpose DSP containing an execution unit according to various embodiments is advantageous. This DSP can provide a comparable hope to support motion search performance
7 五、發明說明(7) t法之MIC之效能,其額 ~ 性。這提供遞增的搜地不it砧乃—通用DSP所具有的彈 實現多種動作9 予顧客,用於以一通用1 6位元機器 此-通用4二與8位元影像處理演算法。 望用&於視訊處可使加法器樹譜模式易於止能不希7 V. Description of the invention (7) The effectiveness of the MIC of the t method This provides incremental ground searching and anvil—the bullet that the general-purpose DSP has to achieve a variety of actions 9 to customers for a general-purpose 16-bit machine. This—general-purpose 42- and 8-bit image processing algorithms. Hope to use & in the video to make the adder tree spectrum mode easy to stop
所:行之操作::::處理中之許多應用,但為DSP 算法乃具中所執行的動作搜尋與空間性遽波演 較為簡單^ ν &山丨生仁在低位元寬度數目上含有相對 η各 勺异#。本發明之雙重槿i;夕且_辨本# 例有利地適於使用知门-从又垔杈式執仃早兀之具體貫施 系統與一ASK太2相同兀件、乘法器陣列之兩種操作、本 面區域。"x相比’使用較低的複雜度與較少的矽表 呈神者t有平仃加法器樹譜架構之圖5及圖δ所予以說明的 資料二2 2,特別適用於需要基於二維影像雙線性插補之 低3 =冓勺插作。該等雙線性操作乃用於,例如,空間性 -=2波操作、彩色插補以及部份像素動作搜尋。 ^ ·本發明之加法器樹譜亦適用於低功率應用,如絕對 值總和(SAD)計算。SAD乃一用於在視訊序列中兩相鄰訊 =之間進仃測試以決定一最稱性〖Η)之失真量測。 叶异通常用於動作搜尋操作。 如上所述,一根據各種具體實施例包含一執行單元之雙 向模式DSP乃特別適用於處理影像設備,該處理影像設備 利用MPEG-l/MPEG-2/MPEG-4/H.26 3以及該等目前未知但-預So the operation :::: Many applications in processing, but the motion search and spatial chirping performed by the DSP algorithm is simpler ^ ν & Shan 丨 Shengren contains in the number of low bit width Relative n different spoon different #. The dual hibiscus of the present invention is suitable for the use of the knowledge gate-specific implementation system that implements the early execution system in the same way as an ASK module, two of the multiplier array. Kind of operation, face area. " Compared with x 'using lower complexity and fewer silicon tables. The god t has a flat adder tree spectrum structure as shown in Figure 5 and Figure δ. The data is illustrated in Figure 2 2 and is especially suitable for applications that require Low of 2D image bilinear interpolation 3 = Interpolation. These bilinear operations are used, for example, spatial-= 2 wave operations, color interpolation, and partial pixel motion search. ^ The adder tree spectrum of the present invention is also suitable for low-power applications, such as the sum of absolute values (SAD) calculation. SAD is a distortion measurement that is used to test between two adjacent signals in a video sequence to determine the most characteristic [Η]. Ye Yi is usually used for motion search operations. As described above, a bidirectional mode DSP including an execution unit according to various embodiments is particularly suitable for processing image equipment that uses MPEG-1 / MPEG-2 / MPEG-4 / H.26 3 and the like Currently unknown but-pre
第11頁 539989 五、發明說明(8) 計將來會用 此一通用 人電腦視訊 用於與其它 理如用於行 已經說明 在不脫離本 此,其它的 内。 例如,當 割的種類是 發明之範疇 於視訊壓縮的標準。 D S P乃預計用於視訊攝錄像機、電信會議、個 卡以及高清晰度電視。另外,通用DSP亦預計 利用數位信號處理之技術連接,該數位信號處 動電話、語音辨識和其它應用之聲音處理。 了許多本發明之具體實施例。不過,應瞭解可 發明精神與範疇的條件下製作各種修改。因 具體實施例乃内含於底下申請專利範圍的範疇 本應用在說明對分時,應瞭解任何以π X "作分 複雜的。亦即,僅描述1 6與8位元之使用。本 不侷限於任何運算元中的位元數目。Page 11 539989 V. Description of the invention (8) It will be used in the future. This general-purpose personal computer video is used for other purposes. It has been explained without departing from this and other. For example, when the type of cutting is the scope of the invention for video compression standards. DSP is intended for use in video camcorders, teleconferences, personal cards, and high-definition televisions. In addition, general-purpose DSPs are also expected to make use of digital signal processing technology connections that process voice processing for telephone, speech recognition, and other applications. A number of specific embodiments of the invention have been described. However, it should be understood that various modifications can be made within the spirit and scope of the invention. Because the specific embodiment is included in the scope of patent application below, this application should understand that any division with π X " is complicated when explaining the division. That is, only the use of 16 and 8 bits is described. This is not limited to the number of bits in any operand.
第12頁 539989 圖式簡單說明Page 12 539989 Simple illustration
第13頁Page 13
Claims (1)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US53992200A | 2000-03-31 | 2000-03-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
TW539989B true TW539989B (en) | 2003-07-01 |
Family
ID=24153209
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW90107607A TW539989B (en) | 2000-03-31 | 2001-03-30 | Multiplier architecture in a general purpose processor optimized for efficient multi-input addition |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1269308A2 (en) |
CN (1) | CN1422402A (en) |
AU (1) | AU2001249767A1 (en) |
TW (1) | TW539989B (en) |
WO (1) | WO2001075587A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8713286B2 (en) | 2005-04-26 | 2014-04-29 | Qualcomm Incorporated | Register files for a digital signal processor operating in an interleaved multi-threaded environment |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1853994B1 (en) | 2004-12-17 | 2009-09-23 | Nxp B.V. | Arithmetic or logical operation tree computation. |
US7546331B2 (en) * | 2005-03-17 | 2009-06-09 | Qualcomm Incorporated | Low power array multiplier |
CN101320321B (en) * | 2008-06-27 | 2010-06-02 | 北京大学深圳研究生院 | Array arithmetics logic cell structure |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4839845A (en) * | 1986-03-31 | 1989-06-13 | Unisys Corporation | Method and apparatus for performing a vector reduction |
JPH06242928A (en) * | 1993-02-22 | 1994-09-02 | Nec Corp | Adder and multiplying circuit using the same |
US5974435A (en) * | 1997-08-28 | 1999-10-26 | Malleable Technologies, Inc. | Reconfigurable arithmetic datapath |
-
2001
- 2001-03-30 TW TW90107607A patent/TW539989B/en not_active IP Right Cessation
- 2001-04-02 EP EP01923028A patent/EP1269308A2/en not_active Withdrawn
- 2001-04-02 CN CN 01807694 patent/CN1422402A/en active Pending
- 2001-04-02 AU AU2001249767A patent/AU2001249767A1/en not_active Abandoned
- 2001-04-02 WO PCT/US2001/010603 patent/WO2001075587A2/en not_active Application Discontinuation
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8713286B2 (en) | 2005-04-26 | 2014-04-29 | Qualcomm Incorporated | Register files for a digital signal processor operating in an interleaved multi-threaded environment |
US9235418B2 (en) | 2005-04-26 | 2016-01-12 | Qualcomm Incorporated | Register files for a digital signal processor operating in an interleaved multi-threaded environment |
Also Published As
Publication number | Publication date |
---|---|
WO2001075587A3 (en) | 2002-01-24 |
WO2001075587A2 (en) | 2001-10-11 |
AU2001249767A1 (en) | 2001-10-15 |
CN1422402A (en) | 2003-06-04 |
EP1269308A2 (en) | 2003-01-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1446728B1 (en) | Multiply-accumulate (mac) unit for single-instruction/multiple-data (simd) instructions | |
US6546480B1 (en) | Instructions for arithmetic operations on vectored data | |
US6473529B1 (en) | Sum-of-absolute-difference calculator for motion estimation using inversion and carry compensation with full and half-adders | |
TW310406B (en) | ||
KR100291383B1 (en) | Module calculation device and method supporting command for processing digital signal | |
JPH06292178A (en) | Adaptive video signal arithmetic processor | |
US6820102B2 (en) | DSP unit for multi-level global accumulation | |
US6629115B1 (en) | Method and apparatus for manipulating vectored data | |
TW200411540A (en) | Method and system for performing calculation operations and a device | |
TW200414023A (en) | Method and system for performing a calculation operation and a device | |
JPH03180965A (en) | Integrated circuit apparatus adapted to repeat dct/idct computation using single multiplier/accumulator and single random access memory | |
US6574651B1 (en) | Method and apparatus for arithmetic operation on vectored data | |
US9372665B2 (en) | Method and apparatus for multiplying binary operands | |
US6324638B1 (en) | Processor having vector processing capability and method for executing a vector instruction in a processor | |
US20080243976A1 (en) | Multiply and multiply and accumulate unit | |
US10853037B1 (en) | Digital circuit with compressed carry | |
TW539989B (en) | Multiplier architecture in a general purpose processor optimized for efficient multi-input addition | |
JP2725544B2 (en) | DCT and inverse DCT operation device and operation method thereof | |
Shahbahrami et al. | Matrix register file and extended subwords: two techniques for embedded media processors | |
KR100481586B1 (en) | Apparatus for modular multiplication | |
Belyaev et al. | A high-perfomance multi-format simd multiplier for digital signal processors | |
Raghunath et al. | A compact carry-save multiplier architecture and its applications | |
KR100434391B1 (en) | The architecture and the method to process image data in real-time for DSP and Microprocessor | |
TW202345092A (en) | Super resolution device and method | |
CN117372495A (en) | Calculation method for accelerating dot products with different bit widths in digital image processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GD4A | Issue of patent certificate for granted invention patent | ||
MM4A | Annulment or lapse of patent due to non-payment of fees |