TW539989B - Multiplier architecture in a general purpose processor optimized for efficient multi-input addition - Google Patents

Multiplier architecture in a general purpose processor optimized for efficient multi-input addition Download PDF

Info

Publication number
TW539989B
TW539989B TW90107607A TW90107607A TW539989B TW 539989 B TW539989 B TW 539989B TW 90107607 A TW90107607 A TW 90107607A TW 90107607 A TW90107607 A TW 90107607A TW 539989 B TW539989 B TW 539989B
Authority
TW
Taiwan
Prior art keywords
bit
patent application
operands
item
scope
Prior art date
Application number
TW90107607A
Other languages
Chinese (zh)
Inventor
Ravi Kolagotla
William C Anderson
Bradley C Aldrich
Original Assignee
Intel Corp
Analog Devices Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp, Analog Devices Inc filed Critical Intel Corp
Application granted granted Critical
Publication of TW539989B publication Critical patent/TW539989B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/505Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
    • G06F7/509Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination for multiple operands, e.g. digital integrators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/3804Details
    • G06F2207/3808Details concerning the type of numbers or the way they are handled
    • G06F2207/3812Devices capable of handling different types of numbers
    • G06F2207/382Reconfigurable for different fixed word lengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/3804Details
    • G06F2207/3808Details concerning the type of numbers or the way they are handled
    • G06F2207/3828Multigauge devices, i.e. capable of handling packed numbers without unpacking them
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

In one embodiment, a dual mode execution unit is described for use in a general purpose digital signal processor (DSP). The execution unit can operate as a 16X16 multiplier in one mode and an 8-bit adder tree in another mode. The adder tree structure is constructed by re-utilizing pre-existing arithmetic logic units (ALUs) in the multiplier array of the multiplier architecture. The 8-bit adder tree mode is particularly useful for performing various computation intensive algorithms used in digital video processing, such as motion search and spatial interpolation algorithms.

Description

5399^9 五、發明說明(1) 發明背景 數位信號處理器(DSPs)通常用於各種多媒體應用,如數 位視汛、影像和聲音。DSPs可處理數位信號以產生並開啟 如多媒體之類的檔案。 Μ P E G 1 、Μ P E G - 2、Μ P E G - 4和Η. 2 6 3乃數位視訊壓縮標準 及檔案格式。這些標準藉由儲存視訊訊框之間的主要變化 而非儲存整個訊框以達到數位視訊信號之高壓縮速率。視 訊資料可接著使用許多不同的技術予以進一步壓縮。 DSP乃用於在壓縮期間對視訊資訊執行各種操作。這些 禕作可包含動作搜尋和空間性插補演算法。主要目的在一於 f測鄰近訊框内區塊之間的失真。這些操作具有計算精密 往且需要高資料產出量。 -T 0 這些標準中的MPEG系列正在持續發展以配合遞增之多媒 髏應用與檔案之頻寬需求。每一個新標準版本皆呈現更為 精密的演算法,該等演算法對用於MPEG相容性視訊處理設 備之DSPs定出更大的處理需求。 視訊處理設備製造商通常依靠特殊應用積體電路 (ASICs) ’該等ASICs乃在MPEG及H.263標準之規範下予以 定做以用於視訊編碼。然而,A S I C s之設計複雜、生產成 本高且其應用與通用D S P s相比較缺乏彈性。 圖示簡述 圖1為一視訊處理器之概要圖,其包含一根據本發明之 一具體實施例之雙重模式通用數位信號處理器(D § P )。 圖2為一圖1所示執行單元之概要圖。 -5399 ^ 9 V. Description of the invention (1) Background of the invention Digital signal processors (DSPs) are commonly used in various multimedia applications, such as digital video, video and sound. DSPs can process digital signals to generate and open files such as multimedia. MPEG P1, MPEG-2, MPEG-4 and MPEG-2 3 are digital video compression standards and file formats. These standards achieve high compression rates for digital video signals by storing the main changes between video frames rather than the entire frame. Video data can then be further compressed using many different techniques. DSP is used to perform various operations on video information during compression. These operations can include motion search and spatial interpolation algorithms. The main purpose is to measure the distortion between blocks in adjacent frames at f. These operations are computationally sophisticated and require high data throughput. -T 0 The MPEG series in these standards is continuously evolving to meet the increasing bandwidth requirements of multimedia applications and archives. Each new standard version presents more sophisticated algorithms that place greater processing demands on the DSPs used in MPEG-compatible video processing equipment. Manufacturers of video processing equipment often rely on application-specific integrated circuits (ASICs) ′ These ASICs are custom-made for video encoding under the specifications of the MPEG and H.263 standards. However, the design of A S I C s is complicated, the production cost is high, and its application is less flexible than that of general D S P s. Brief Description of the Drawings Figure 1 is a schematic diagram of a video processor including a dual-mode general-purpose digital signal processor (D § P) according to a specific embodiment of the present invention. FIG. 2 is a schematic diagram of an execution unit shown in FIG. 1. FIG. -

第5頁 發明說明(2) 圖3為一表示一般步驟之流程圖, 流程圖實現-執行單元中的加法器樹==定序器依照此 圖4為一表示一般步驟之流程圖工 流程圖實現一執行單元中的乘法哭楔圖^中的定序器依照此 ;5為圖1之執行單元内另-加;:樹織配… 圖6為-根據本發明另一具體實施例之曰…概要圖。 态树譜配置之概要圖。 執订早凡中加法 不同圖示中相同的夂老铃綠 > 主 W芩亏付諕代表相同的元件。 詳細發明說明 根據一具體實施例,一用於通 12之執行單元10可操作於至少兩種理器(DSP) 法器模式中,執行單元可操作為一i "在正常尚’’乘 法器樹譜模式巾,執行單元可操 乘:器。在-加 兀之8位元加法器。 用於相加多重運鼻 力:法器樹譜模式乃適用於在相對較低位元寬 (如8位TL)上執行計算精密操作,此 '貝’、串 式中H. 263與MPEG系列之規範下用协考工:在視訊壓縮格 本具體實施例,通用DSP 12為勺入;处理視訊資料。根據 和一用於蚀六n南 為匕3 一中央處理單元(CPU)7 ,於儲存視訊處理演算法與視訊資料的記憶之 視訊處理器5之部份。 ,據本具體貫施例,加法器樹譜模式之加法器樹譜架構 ,由再利用共同於(c〇mm〇n t〇)傳統通用⑽。之丨6χι 6乘 /為架構中的先前存在之算術邏輯單元(ALUs)予以構成。 4乃以較少的額外硬體於執行單元内提供一雙重操作模-Description of the invention on page 5 (2) Figure 3 is a flowchart showing the general steps. The implementation of the flowchart-the adder tree in the execution unit == sequencer. According to this Figure 4 is a flowchart showing the general steps. The sequencer in the implementation of the multiplication wedge graph ^ in an execution unit is implemented in accordance with this; 5 is an addition in the execution unit of FIG. 1: tree weaving ... FIG. 6 is-according to another embodiment of the present invention … Summary diagram. An overview of the state tree spectrum configuration. Order early additions and additions The same 夂 Old Bell Green in different illustrations > The main W 芩 Loss payment 諕 represents the same components. Detailed description of the invention According to a specific embodiment, an execution unit 10 for communication 12 can operate in at least two processor (DSP) processor modes, and the execution unit can operate as an " in a normal multiplier " Tree spectrum mode, the execution unit can operate on :. 8-bit adder in-plus. Used to add multiple nose powers: The instrument tree mode is suitable for performing calculation precision operations on relatively low bit widths (such as 8-bit TL). This 'shell', string type H. 263 and MPEG series Under the standard use of co-testers: in the specific embodiment of the video compression format, the general DSP 12 is scooped in; processing video data. According to and a part of the video processor 5 which is used to etch six n south to a central processing unit (CPU) 7 and store video processing algorithms and video data in memory. According to this specific implementation example, the adder tree spectrum structure of the adder tree spectrum mode is shared by (common-common) (common general). 6 × 6 multiplying / constitutes pre-existing arithmetic logic units (ALUs) in the architecture. 4 provides a dual operation mode in the execution unit with less extra hardware-

539989539989

五、發明說明(3) 式。 圖2更為詳盡地描述雙重模式執行單元1 〇。執行單元工〇 包含兩個主要的輸入多工器(MUXs) 14、兩個次要的$入 MUXs 1 6、一個空間性乘積產生器1 8、一個乘法器陣列 20、兩個中繼暫存器22、兩個中繼MUXs 24、一個向量人 併(mer g i ng) ALU 2 6、一個累加器2 8、一個累加器μ υχ 3 0、一個輸出暫存器3 2、以及一個將執行單元所產生的總 和輸出至一結果匯流排36之輸出MUX 34。然而,本發明之 範缚不侷限於具有所有示於圖2之元件之具體實施例。 執行單元1 〇之元件間的交互作用將以各種操作模式中執 行單元之操作予以說明。 , 執行單元1 0之邏輯控制可介由一定序器1 〇 〇 (圖1 )予以實 現。定序器1 00抓取來自指令快取記憶體1 02之指令並送出 控制彳5 5虎以致能在各自板式中運作所需的元件。指令快取 記憶體1 0 2内的指令可予以程式規劃或硬體連線。 在乘法器與加法器樹譜兩模式中,執行單元1 〇在匯流排 控制器106的控制下自運算元匯流排104接收兩對具16位元 之字元。每一對具16位元之字元皆輸入至主要的MUXs 14 其中之一。主要的M U X s 1 4乃由定序器1 0 0予以控制以選擇 炎輸出一單一的具16位元之字元。 圖3為一描述一般步驟的流程圖,定序器1 〇 〇依照此流程 _實現加法器樹譜模式。 定序器1 00於方塊2 0 0中控制切換器50以將來自主要MUXs 尤具16位元之字元輸出至次要MUXs 16。每一個次要的V. Description of the invention Formula (3). Figure 2 describes the dual mode execution unit 10 in more detail. The execution unit work includes two main input multiplexers (MUXs) 14, two minor $ input MUXs 1 6, a spatial product generator 18, a multiplier array 20, two relay temporary storage Device 22, two relay MUXs 24, a vector mer ng ALU 2 6, an accumulator 2 8, an accumulator μ υχ 3 0, an output register 3 2, and an execution unit The resulting sum is output to the output MUX 34 of a result bus 36. However, the scope of the present invention is not limited to a specific embodiment having all the elements shown in FIG. The interaction between the elements of the execution unit 10 will be explained in terms of the operation of the execution unit in various operating modes. The logic control of the execution unit 10 can be implemented through a certain sequencer 100 (Figure 1). The sequencer 100 fetches the instructions from the instruction cache memory 102 and sends out the control unit 5 5 to enable it to operate the required components in the respective board. Instruction cache The instructions in memory 102 can be programmed or hardwired. In the multiplier and adder tree spectrum modes, the execution unit 10 receives two pairs of 16-bit characters from the operand bus 104 under the control of the bus controller 106. Each pair of 16-bit characters is entered into one of the main MUXs 14. The main M U X s 1 4 is controlled by the sequencer 100 to select a single 16-bit character. FIG. 3 is a flowchart describing general steps. The sequencer 100 implements the adder tree spectrum mode according to this flow. Sequencer 100 controls switcher 50 in block 2000 to output 16-bit characters from the primary MUXs to the secondary MUXs 16. Every secondary

第7頁 539989 五、發明說明(4) MUX皆受控制以分別於方塊2 0 2和2 04選擇兩個8位元字元, 運算元A與B以及C與D,而於乘法器陣列20中予以相加。 乘法器陣列2 0包含許多列置於許多層級中交互連接之 A L U s 6 0。在一般的1 6 X1 6乘法器中,乘法器陣列之a L U s具 有7個層級。 在二進位加法中,將兩個η位元數目相加會產生一最大 的(η + 1 )位元數目,多出來的位元代表一新的二進位位置 (place),2(ηΗ)。例如,將兩個8位元數目相加會產生一9 位元總和,如底下方程式所示: I 1 1 1 1 1 1 12 = 2 5 510 + 11111111〇-2 5 j10 ’ II 11111102二51010 於區塊206中8位元運算元A和B可直接輸入至ALU 62並於 區塊20 8中予以相加以產生9位元部份總和e。同樣地,於 區塊210中運算元C和D可直接輸入至ALU 64並於區塊212中 予以相加以產生9位元部份總和f。 於區塊2 14與2 16中9位元部份總和E與F可輸出至ALu 66 並於區塊218中予以相加以產生1〇位元總和G。於區塊22〇 中總和G乃由乘法器陣列2 〇予以輸出。 根據一具體實施例,定序器100控制執行單元1〇以繞過 ^法為'陣^120之乘法器架構下行(d〇wnstream)(例如alu 和累加器28),並將總和G直接輸出至輸出観34。本具 體實施例有利於視訊編碼操作,如利用簡單總和的空間性 插補演算法。Page 7 539989 V. Description of the invention (4) MUX is controlled to select two 8-bit characters in blocks 2 2 and 2 04 respectively, operands A and B and C and D, and multiplier array 20 Add them. The multiplier array 2 0 contains many columns A L U s 6 0 which are inter-connected in many levels. In a general 16 × 16 multiplier, a L U s of the multiplier array has 7 levels. In binary addition, adding the two η-bit numbers will produce a maximum number of (η + 1) bits. The extra bits represent a new binary place, 2 (ηΗ). For example, adding two 8-bit numbers will produce a 9-bit sum, as shown in the formula below: I 1 1 1 1 1 1 1 12 = 2 5 510 + 11111111〇-2 5 j10 'II 11111102 two 51010 at The 8-bit operands A and B in block 206 can be directly input to ALU 62 and added in block 20 8 to generate a 9-bit partial sum e. Similarly, the operands C and D in block 210 can be directly input to ALU 64 and added in block 212 to generate a 9-bit partial sum f. The 9-bit partial sums E and F in blocks 2 14 and 2 16 can be output to ALu 66 and added in block 218 to generate a 10-bit sum G. The sum G in block 22 is output by the multiplier array 20. According to a specific embodiment, the sequencer 100 controls the execution unit 10 to bypass the multiplier architecture (for example, alu and accumulator 28) and bypass the multiplier architecture of “matrix” (120), and directly outputs the sum G To output 観 34. This specific embodiment facilitates video encoding operations, such as a spatial interpolation algorithm using a simple sum.

539989 五、發明說明(5) 根據另一具體實施例,總和G乃經由中 繼MUX 24、及向量合併ALU 26 曰。f為、22中 存。接著將總和G送回累加器Μυχ 3〇並與來σ自目予加:儲 :之1〇位兀總和〇,一起輸入至向量合併ALU 26。G和(Γ I 總和可予以輸出並儲存於累 牙 之 加至累加哭中的數信3其後之10位元總和乃 乐力°。中的數值且巧個週期可重複多次。木呈,每你 例乃有利於視訊編碼操作, 一 ^ 算法。 卞卞如利用累加知作之動作搜尋演 ^ a ^ ^ , 1 〇 0 ^ ί Θ貝見采法窃杈式。定序器i00控制切 次要MUXS並於區塊3 0 0中將16位元字至;5“以 生器以作為被乘數ί和】。部份乘積產生器1δ包^積產 問矩陣’每一個邏輯及Μ皆操作16位元被乘數!之一^ π與16位元被乘數j之一個位元。該等部 元被乘數而言為16,乃送至位於區塊302之心哭位 20。於區塊3〇4中該等部份乘積乃予以縮減 2 j 位元字元,總和X及進位γ,其總和等於^成兩個32 t °Χ々Υ乃送至中繼暫存器22,並經由中繼MUXH^之總 里合併ALU 2 6中予以相加以產生乘向 器32與輸出MIU 34輸出至結果匯流排% ’或存 加法器樹譜模式經由累加器28作(s e ^目關的 作。較佳地,向量合併ALU 26乃一 4〇位元Au)以人累加操 C accommodt e f 〇r ^ ^ ^ /./. μ ^ 卜 口巧' 其它的力心;力二運ί期間的溢位。 。树%木構可用乘法器陣列中所具有的AUs 539989 五、發明說明(6) 予以實現。根據另一具體實施例,一具有如圖5所示相同 架構之執行單元可實現為雙重二運算元加法器樹譜。 根據本特定具體實施例,執行單元1 0乃受控制以繞過 A L U 6 6。9位元總和E與F中的每一個皆自乘法器陣列予以 直接輸出至輸出MUX 34或中繼暫存器22並執行如上所述之 累加操作。 除了於步驟2 0 8與2 1 2之後輸出總和S與F,定序器1 0 0依 照與圖3所示相同之一般步驟。 圖6描繪另一具體實施例,其中一更為複雜的加法器樹 譜架構乃實現於乘法器陣列中。一次要的MUX 1 6乃提供予 每一個16位元輸入。 說明圖6中左側的加法器樹譜架構1。應瞭解右側的加法 器樹譜架構2具有相同的構造與操作。定序器1 0 0依照與圖 3所示相同的一般步驟處理每一個加法器樹譜架構。 每一個次要MU X 1 6皆自運算元匯流排1 0 2 (圖1 )之1 6位元 字元輸入選擇兩個8位元運算元以產生運算元Ai、Bi、Q與 。Ai與匕MALU内相加以產生9位元部份總和Ei。同樣地, q與0}乃於ALU内相力口以產生9位元部份總和。Ei與?1乃於 A L U内相加以產生1 0位元總和Gi。 或者,總和與02 (來自右侧加法器樹譜架構)乃直接送 至輸出MUX 34以輸出至結果匯流排36,另一選擇為,送至 合併AL U 2 6作另一個加法操作及/或累加器操作。 一根據各種具體實施例含有一執行單元之通用DSP是有 優點的。此一 DSP能夠提供媲美希望用於支援動作搜尋演539989 V. Description of the invention (5) According to another specific embodiment, the sum G is relayed through the MUX 24 and the vector merge ALU 26. f is, 22 is stored. Then, the sum G is sent back to the accumulator Μυχ 30 and is added to the σ from the head: the sum of the 10-bit sum 0, which is input to the vector merge ALU 26. The sum of G and (Γ I can be output and stored in the number of letters that are added to the accumulated cry to the accumulated cry. 3 The next 10-bit sum is the Leli °. Each example is conducive to video encoding operations, a ^ algorithm. For example, using the accumulated search action to search performance ^ a ^ ^, 1 〇0 ^ ί 贝 See the mining method stealing. Sequencer i00 control Cut the secondary MUXS and pass the 16-bit word to in block 3 0; 5 "Take the generator as the multiplicand and sum]. Partial product generator 1δ includes the product matrix 每Μ both operate on the 16-bit multiplicand! One of ^ π and the 16-bit multiplicand j is a bit. The multiplicand of these components is 16, which is sent to the heart crying bit located in block 302. 20. The product of these parts in block 304 is reduced by 2 j-bit characters, the sum of X and the carry γ, the sum of which equals ^ to two 32 t ° χ々Υ, which is sent to the relay temporary storage The unit 22 is added to the ALU 26 through the total MUXH ^ to add the multiplier 32 and the output MIU 34 to the output bus% 'or the adder tree spectrum mode via the accumulator 28. (Se ^ head related work. Preferably, vector merging ALU 26 is a 40-bit Au) C accommodt ef ο ^ ^ ^ ^ /./. Μ ^ ^ 口 口 巧 ' The overflow during the period of the Second Movement of the Lithium Tree. The wood structure of the tree can be realized by the AUs in the multiplier array 539989 V. Description of the invention (6). According to another specific embodiment, one has the same as shown in Figure 5. The execution unit of the architecture can be implemented as a double binary operator adder tree spectrum. According to this specific embodiment, the execution unit 10 is controlled to bypass ALU 6 6. 9-bit sums E and F each The multiplier array is directly output to the output MUX 34 or the relay register 22 and performs the accumulation operation as described above. Except for the output sums S and F after steps 2 0 8 and 2 1 2, the sequencer 1 0 0 follows The general steps are the same as shown in Figure 3. Figure 6 depicts another embodiment in which a more complex adder tree spectrum architecture is implemented in a multiplier array. Primary MUX 1 6 is provided to each 16 Bit input. Explain the adder tree spectrum architecture on the left in Figure 6. 1. The addition on the right should be understood. The tree structure 2 has the same structure and operation. The sequencer 1 0 0 processes each adder tree structure according to the same general steps as shown in FIG. 3. Each minor MU X 1 6 is a self-convergence unit. Row 1 0 2 (Figure 1) 16-bit character input Select two 8-bit operands to generate operands Ai, Bi, Q, and Ai and add within MALU to produce a 9-bit partial sum Ei Similarly, q and 0} are matched in the ALU to produce a 9-bit partial sum. Ei with? 1 is added in A L U to produce a 10-bit sum Gi. Alternatively, the sum and 02 (from the right-side adder tree spectrum structure) are directly sent to the output MUX 34 to output to the result bus 36, and the other option is to send to the combined AL U 2 6 for another addition operation and / or Totalizer operation. A general purpose DSP containing an execution unit according to various embodiments is advantageous. This DSP can provide a comparable hope to support motion search performance

7 五、發明說明(7) t法之MIC之效能,其額 ~ 性。這提供遞增的搜地不it砧乃—通用DSP所具有的彈 實現多種動作9 予顧客,用於以一通用1 6位元機器 此-通用4二與8位元影像處理演算法。 望用&於視訊處可使加法器樹譜模式易於止能不希7 V. Description of the invention (7) The effectiveness of the MIC of the t method This provides incremental ground searching and anvil—the bullet that the general-purpose DSP has to achieve a variety of actions 9 to customers for a general-purpose 16-bit machine. This—general-purpose 42- and 8-bit image processing algorithms. Hope to use & in the video to make the adder tree spectrum mode easy to stop

所:行之操作::::處理中之許多應用,但為DSP 算法乃具中所執行的動作搜尋與空間性遽波演 較為簡單^ ν &山丨生仁在低位元寬度數目上含有相對 η各 勺异#。本發明之雙重槿i;夕且_辨本# 例有利地適於使用知门-从又垔杈式執仃早兀之具體貫施 系統與一ASK太2相同兀件、乘法器陣列之兩種操作、本 面區域。"x相比’使用較低的複雜度與較少的矽表 呈神者t有平仃加法器樹譜架構之圖5及圖δ所予以說明的 資料二2 2,特別適用於需要基於二維影像雙線性插補之 低3 =冓勺插作。該等雙線性操作乃用於,例如,空間性 -=2波操作、彩色插補以及部份像素動作搜尋。 ^ ·本發明之加法器樹譜亦適用於低功率應用,如絕對 值總和(SAD)計算。SAD乃一用於在視訊序列中兩相鄰訊 =之間進仃測試以決定一最稱性〖Η)之失真量測。 叶异通常用於動作搜尋操作。 如上所述,一根據各種具體實施例包含一執行單元之雙 向模式DSP乃特別適用於處理影像設備,該處理影像設備 利用MPEG-l/MPEG-2/MPEG-4/H.26 3以及該等目前未知但-預So the operation :::: Many applications in processing, but the motion search and spatial chirping performed by the DSP algorithm is simpler ^ ν & Shan 丨 Shengren contains in the number of low bit width Relative n different spoon different #. The dual hibiscus of the present invention is suitable for the use of the knowledge gate-specific implementation system that implements the early execution system in the same way as an ASK module, two of the multiplier array. Kind of operation, face area. " Compared with x 'using lower complexity and fewer silicon tables. The god t has a flat adder tree spectrum structure as shown in Figure 5 and Figure δ. The data is illustrated in Figure 2 2 and is especially suitable for applications that require Low of 2D image bilinear interpolation 3 = Interpolation. These bilinear operations are used, for example, spatial-= 2 wave operations, color interpolation, and partial pixel motion search. ^ The adder tree spectrum of the present invention is also suitable for low-power applications, such as the sum of absolute values (SAD) calculation. SAD is a distortion measurement that is used to test between two adjacent signals in a video sequence to determine the most characteristic [Η]. Ye Yi is usually used for motion search operations. As described above, a bidirectional mode DSP including an execution unit according to various embodiments is particularly suitable for processing image equipment that uses MPEG-1 / MPEG-2 / MPEG-4 / H.26 3 and the like Currently unknown but-pre

第11頁 539989 五、發明說明(8) 計將來會用 此一通用 人電腦視訊 用於與其它 理如用於行 已經說明 在不脫離本 此,其它的 内。 例如,當 割的種類是 發明之範疇 於視訊壓縮的標準。 D S P乃預計用於視訊攝錄像機、電信會議、個 卡以及高清晰度電視。另外,通用DSP亦預計 利用數位信號處理之技術連接,該數位信號處 動電話、語音辨識和其它應用之聲音處理。 了許多本發明之具體實施例。不過,應瞭解可 發明精神與範疇的條件下製作各種修改。因 具體實施例乃内含於底下申請專利範圍的範疇 本應用在說明對分時,應瞭解任何以π X "作分 複雜的。亦即,僅描述1 6與8位元之使用。本 不侷限於任何運算元中的位元數目。Page 11 539989 V. Description of the invention (8) It will be used in the future. This general-purpose personal computer video is used for other purposes. It has been explained without departing from this and other. For example, when the type of cutting is the scope of the invention for video compression standards. DSP is intended for use in video camcorders, teleconferences, personal cards, and high-definition televisions. In addition, general-purpose DSPs are also expected to make use of digital signal processing technology connections that process voice processing for telephone, speech recognition, and other applications. A number of specific embodiments of the invention have been described. However, it should be understood that various modifications can be made within the spirit and scope of the invention. Because the specific embodiment is included in the scope of patent application below, this application should understand that any division with π X " is complicated when explaining the division. That is, only the use of 16 and 8 bits is described. This is not limited to the number of bits in any operand.

第12頁 539989 圖式簡單說明Page 12 539989 Simple illustration

第13頁Page 13

Claims (1)

案號90107607_年月曰 修正_ 六、申請專利範圍 1 . 一種數位信號處理器,包含: 一解碼器,其解碼一指令,該指令指明一位址樹譜操 作; 一電路,其乃耦接至該解碼器並包含: 一乘法器陣列,其包含許多算術邏輯單元;以及 一選擇電路,其從一2n位元字元選擇第一與第二運 算元,其中該第一與第二運算元之位元數小於2n,且 該選擇電路乃適於在乘法器陣列裏所選擇之該等A L U s 中的一個内將該等第一與第二運算元相加以產生一第一總 和 。 2 .如申請專利範圍第1項之數位信號處理器,其中該等 第一與第二運算元乃η位元字元。 3 ·如申請專利範圍第1項之數位信號處理器,其中η等於 8 ° 4.如申請專利範圍第1項之數位信號處理器,其中該選 擇電路乃適於反應該指令進行操作以, 自一第二2η位元字元選擇一第三運算元與一第四運算 元,其中該等第一與第二運算元的位元數小於2η,以及 於一第二個所選擇之該等ALUs内將該等第三與第四η 位元運算元相加以產生一第二總和。 5 .如申請專利範圍第4項之數位信號處理器,其中該等 第三與第四運算元乃η位元字元。 6 .如申請專利範圍第4項之數位信號處理器,其中該選 擇電路乃適於反應該指令進行操作而於一第三個所選擇的Case No. 90107607_Amendment of the month and year_ VI. Patent application scope 1. A digital signal processor comprising: a decoder that decodes an instruction that specifies a single-bit tree operation; a circuit that is coupled The decoder includes: a multiplier array including a plurality of arithmetic logic units; and a selection circuit that selects first and second operands from a 2n-bit character, wherein the first and second operands The number of bits is less than 2n, and the selection circuit is adapted to add the first and second operands in one of the ALUs selected in the multiplier array to generate a first sum. 2. The digital signal processor according to item 1 of the patent application scope, wherein the first and second operands are n-bit characters. 3 · If the digital signal processor of the first patent application scope, where η is equal to 8 ° 4. If the digital signal processor of the first patent application scope, the selection circuit is adapted to operate in response to the instruction, since A second 2η-bit character selects a third operand and a fourth operand, wherein the number of bits of the first and second operands is less than 2η, and within a second selected ALUs The third and fourth n-bit operands are added to generate a second sum. 5. The digital signal processor according to item 4 of the patent application scope, wherein the third and fourth operands are n-bit characters. 6. The digital signal processor according to item 4 of the scope of patent application, wherein the selection circuit is adapted to operate in response to the instruction and selects a third selected signal. O:\70\70388-920408.ptc 第14頁 科./fd 案號 90107607 曰 修正 六、申請|利範圍 該等ALUs内 總和。 7.如申請 電路乃適於 選擇第 將該等 之乘積。 8 .如申請 算元選擇器 9 . 一種視 一中央 一記憶 一數位 並含有一執 專利範圍 包含一乘 訊處理器 處理單元 元件,其 信號處理 行單元, 碼器,其 將該第一總和與該第二總和相加以產生一第三 專利範圍第1項之數位信號處理器,其中該選擇 反應一指定一乘法器模式之指令進行操作以 一與第二2 η位元被乘數,以及 第一與第二2η位元被乘數相乘以產生一 4η位元 第1項之數位信號處理器,其中該運 法器。 ,包含: (CPU); 儲存指令以執行視訊編碼操作; 器,其乃耦接至該c P U與該記憶元件 該執行單元包含: 解碼一指定一位址樹譜操作之指令; 以及 第一電路,其乃耦接至該解碼器,並包含 一乘法器陣列,其包含許多算術邏輯單元 (ALUs), 該 位元字元選 内一所選擇 以產生一第 第一電路 擇第一與 包含一反應該等指令之元件而自一2n 第二η位元運算元,並於乘法器陣列 的ALUs中將該等第一與第二η位元運算元相加 一總和。 1 〇 .如申請專利範圍第9項之視訊處理器,其中η等於8。O: \ 70 \ 70388-920408.ptc Page 14 Section./fd Case No. 90107607 Amendment VI. Application | Benefit Range The sum of these ALUs. 7. If the circuit is applied, it is suitable to choose the product of the two. 8. If the operator selector is applied: 9. A center, a memory, a digit and a patented scope including a multiplication processor processing unit element, a signal processing line unit, and a coder, which combines the first sum with The second sum is added to generate a digital signal processor of item 3 of the third patent range, wherein the selection reflects an instruction specifying a multiplier mode to operate with a second multiplier of 2 n bits, and the first A digital signal processor that is multiplied by a second 2n-bit multiplier to generate a 4n-bit first term, wherein the operator. Including: (CPU); storing instructions to perform a video encoding operation; a device coupled to the cPU and the memory element; the execution unit includes: decoding an instruction to specify a bit-tree tree operation; and a first circuit , Which is coupled to the decoder, and includes a multiplier array, which contains a number of arithmetic logic units (ALUs), the bit character is selected within a selection to generate a first circuit selection first and includes a Responding to the components of these instructions, a 2n second n-bit operand is added to the sum of the first and second n-bit operands in the ALUs of the multiplier array. 10. The video processor according to item 9 of the patent application scope, wherein n is equal to 8. O:\70\70388-920408.ptc 第15頁 539989^^ 案號 90107607 曰 修正 \、申請專利範圍 理器,其中該第一電 行操作以 產生一乘積。 理器,進一步包含: 一第一 2n位元字元選 一第二2n位元字元選 一與第二2n位元被乘 乘積將兩個乘法器陣 理器,其中該等執行 理器,其中該等執行 尋演算法之指令。 理器,其中該等執行 插補演算法之指令。 包含: 中將許多運算元相 1 1 .如申請專利範圍第9項之視訊處 路反應一指定一乘法器模式之指令進 將該等兩個2 η位元被乘數相乘以 1 2 .如申請專利範圍第9項之視訊處 一第一被乘數選擇器,其自至少 擇該第一被乘數; 一第二被乘數選擇器,其自至少 擇該第二被乘數; 一部份乘積產生器,其自該等第 數產生許多部份總和;以及 一加法器,其藉由縮減該等部份 列中所產生的4 η位元字元相加。 1 3 .如申請專利範圍第9項之視訊處 視訊編碼操作之指令符合MPEG標準。 1 4.如申請專利範圍弟9項之視訊處 視訊編碼操作之指令包含執行動作搜 1 5 .如申請專利範圍第9項之視訊處 視訊編碼操作之指令包含執行空間性 1 6 . —種用於多重輸入相加之方法, 於一包含許多A L U s之乘法器陣列 力口 ; 選擇一加法器樹譜模式; 自一 2n位元字元選擇第一與第二運算元; 於乘法器陣列裏ALUs中之第一個内將該等兩個第一與O: \ 70 \ 70388-920408.ptc Page 15 539989 ^^ Case No. 90107607 Amendment \, Patent application scope processor, where the first bank operates to produce a product. The processor further includes: a first 2n-bit character selection, a second 2n-bit character selection, and a second 2n-bit character being multiplied and multiplied by two multiplier arrays, wherein the execution processors, These are instructions for performing a search algorithm. Processor, where the instructions execute the interpolation algorithm. Contains: a number of operand phases 1 1. If the video processing of item 9 of the patent application responds to an instruction specifying a multiplier mode, the two 2 η bits are multiplied by 1 2. For example, if the video office of claim 9 has a first multiplicand selector, it selects at least the first multiplicand; a second multiplicand selector, selects at least the second multiplicand; A partial product generator that generates a sum of many parts from the numbers; and an adder that adds up by reducing the 4n-bit characters generated in the partial columns. 1 3. If the video office of item 9 of the scope of patent application, the video coding operation instructions conform to the MPEG standard. 1 4. If the scope of the patent application is 9 items, the instruction of the video coding operation of the video office includes the execution of the search. 15. If the scope of the patent application is the item 9 of the video coding operation of the video coding operation, the instruction includes the execution of spatial 16. For multi-input addition, in a multiplier array containing many ALU s; select an adder tree spectrum mode; select the first and second operands from a 2n-bit character; in the multiplier array The first of the ALUs Q:\70\70388-920408.ptc 第16頁 539989 ; 修 :案號 90107607 曰 修正 不蘇丽圍 第二運算元相加;以及 自乘法器陣列輸出一總和。 1 7 .如申請專利範圍第1 6項之方法,其中該等第一與第 二運算元之位元數小於2 η。 1 8 .如申請專利範圍第1 6項之方法,進一步包含: 自一第二2η位元字元選擇第三與第四運算元,該等第 三與第四運算元之位元數小於2 η ; 於乘法器陣列裏A L U s中之第二個内將該等第三與第四 子元相加, 於乘法器陣列裏之一第三ALU内將一來自該第一 ALU之 第一總和與一來自該第二ALU之第二總和相加。 其中該等第三與第 進一步包含: 第一 2n位元被乘數 第二2 η位元被乘數 1 9 .如申請專利範圍第1 8項之方法 四運算元乃η位元字元。 2 〇 .如申請專利範圍第1 6項之方法 選擇一乘法器模式; 自至少一第一 2η位元字元選擇-自至少一第二2η位元字元選擇-將該等第一與第二被乘數相乘以產生一乘積。 2 1 .如申請專利範圍第1 6項之方法,其中η等於8 2 2 .如申請專利範圍第1 6項之方法,進一步包含 於加法器樹譜模式中執行一動作搜尋演算法。 2 3 .如申請專利範圍第1 6項之方法,另外包含: 於加法器樹譜模式中執行一空間性插補演算法 2 4.如申請專利範圍第1 6項之方法,另外包含:Q: \ 70 \ 70388-920408.ptc Page 16 539989; Revision: Case No. 90107607 Modification No Su Liwei Add the second operand; and output a sum from the multiplier array. 17. The method according to item 16 of the scope of patent application, wherein the number of bits of the first and second operands is less than 2 η. 18. The method according to item 16 of the scope of patent application, further comprising: selecting the third and fourth operands from a second 2n-bit character, and the number of bits of the third and fourth operands is less than 2 η; add the third and fourth children in the second of ALU s in the multiplier array, add a first sum from the first ALU in a third ALU in the multiplier array Add to a second sum from the second ALU. The third and the second further include: the first 2n-bit multiplicand, the second 2 n-bit multiplicand 19, and the method of item 18 in the scope of patent application. The four operands are n-bit characters. 2 〇 Select the multiplier mode according to the method of claim 16 in the scope of patent application; select from at least one first 2n-bit character-select from at least one second 2n-bit character-select these first and second Multiply two multiplicands to produce a product. 2 1. The method according to item 16 of the patent application, wherein η is equal to 8 2 2. The method according to item 16 in the patent application, further comprising performing an action search algorithm in the adder tree spectrum mode. 2 3. The method of item 16 in the scope of patent application, further comprising: performing a spatial interpolation algorithm in the adder tree spectrum mode 2 4. The method of item 16 in the scope of patent application, further comprising: O:\70\70388-920408.ptc 第17頁O: \ 70 \ 70388-920408.ptc Page 17 六 案號 90107607 修正 >請專利_範圍 於加法器樹譜模式中執行一空間性低通濾波演算法。 令 加 2 5 . —種可用一機器予以讀取之程式儲存元件,其包含相 ,該等指令使機器: 於一包含許多A L U s之乘法器陣列中將許多運算元相 選擇一加法器樹譜模式; 自一 2n位元字元選擇第一與第二運算元; 於乘法器陣列裏該等A L U s之第一個中將該等第一與第 二運算元相加;以及 自乘法器陣列輸出一總和。 2 6 .如申請專利範圍第2 5項之程式儲存元件,其中該等 第一與第二運算元乃η位元字元。6 Case No. 90107607 Amendment > Patent_Scope A spatial low-pass filtering algorithm is performed in the adder tree spectrum mode. Let add 2 5. A program storage element that can be read by a machine, which contains phases. These instructions enable the machine to: select a number of operand phases in an multiplier array containing many ALU s-an adder tree spectrum Mode; selecting first and second operands from a 2n-bit character; adding the first and second operands in the first of the ALU s in the multiplier array; and a self-multiplier array Output a sum. 26. The program storage element according to item 25 of the patent application scope, wherein the first and second operands are n-bit characters. O:\70\70388-920408.ptc 第18頁O: \ 70 \ 70388-920408.ptc Page 18
TW90107607A 2000-03-31 2001-03-30 Multiplier architecture in a general purpose processor optimized for efficient multi-input addition TW539989B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US53992200A 2000-03-31 2000-03-31

Publications (1)

Publication Number Publication Date
TW539989B true TW539989B (en) 2003-07-01

Family

ID=24153209

Family Applications (1)

Application Number Title Priority Date Filing Date
TW90107607A TW539989B (en) 2000-03-31 2001-03-30 Multiplier architecture in a general purpose processor optimized for efficient multi-input addition

Country Status (5)

Country Link
EP (1) EP1269308A2 (en)
CN (1) CN1422402A (en)
AU (1) AU2001249767A1 (en)
TW (1) TW539989B (en)
WO (1) WO2001075587A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8713286B2 (en) 2005-04-26 2014-04-29 Qualcomm Incorporated Register files for a digital signal processor operating in an interleaved multi-threaded environment

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1853994B1 (en) 2004-12-17 2009-09-23 Nxp B.V. Arithmetic or logical operation tree computation.
US7546331B2 (en) * 2005-03-17 2009-06-09 Qualcomm Incorporated Low power array multiplier
CN101320321B (en) * 2008-06-27 2010-06-02 北京大学深圳研究生院 Array arithmetics logic cell structure

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4839845A (en) * 1986-03-31 1989-06-13 Unisys Corporation Method and apparatus for performing a vector reduction
JPH06242928A (en) * 1993-02-22 1994-09-02 Nec Corp Adder and multiplying circuit using the same
US5974435A (en) * 1997-08-28 1999-10-26 Malleable Technologies, Inc. Reconfigurable arithmetic datapath

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8713286B2 (en) 2005-04-26 2014-04-29 Qualcomm Incorporated Register files for a digital signal processor operating in an interleaved multi-threaded environment
US9235418B2 (en) 2005-04-26 2016-01-12 Qualcomm Incorporated Register files for a digital signal processor operating in an interleaved multi-threaded environment

Also Published As

Publication number Publication date
WO2001075587A3 (en) 2002-01-24
WO2001075587A2 (en) 2001-10-11
AU2001249767A1 (en) 2001-10-15
CN1422402A (en) 2003-06-04
EP1269308A2 (en) 2003-01-02

Similar Documents

Publication Publication Date Title
EP1446728B1 (en) Multiply-accumulate (mac) unit for single-instruction/multiple-data (simd) instructions
US6546480B1 (en) Instructions for arithmetic operations on vectored data
US6473529B1 (en) Sum-of-absolute-difference calculator for motion estimation using inversion and carry compensation with full and half-adders
TW310406B (en)
KR100291383B1 (en) Module calculation device and method supporting command for processing digital signal
JPH06292178A (en) Adaptive video signal arithmetic processor
US6820102B2 (en) DSP unit for multi-level global accumulation
US6629115B1 (en) Method and apparatus for manipulating vectored data
TW200411540A (en) Method and system for performing calculation operations and a device
TW200414023A (en) Method and system for performing a calculation operation and a device
JPH03180965A (en) Integrated circuit apparatus adapted to repeat dct/idct computation using single multiplier/accumulator and single random access memory
US6574651B1 (en) Method and apparatus for arithmetic operation on vectored data
US9372665B2 (en) Method and apparatus for multiplying binary operands
US6324638B1 (en) Processor having vector processing capability and method for executing a vector instruction in a processor
US20080243976A1 (en) Multiply and multiply and accumulate unit
US10853037B1 (en) Digital circuit with compressed carry
TW539989B (en) Multiplier architecture in a general purpose processor optimized for efficient multi-input addition
JP2725544B2 (en) DCT and inverse DCT operation device and operation method thereof
Shahbahrami et al. Matrix register file and extended subwords: two techniques for embedded media processors
KR100481586B1 (en) Apparatus for modular multiplication
Belyaev et al. A high-perfomance multi-format simd multiplier for digital signal processors
Raghunath et al. A compact carry-save multiplier architecture and its applications
KR100434391B1 (en) The architecture and the method to process image data in real-time for DSP and Microprocessor
TW202345092A (en) Super resolution device and method
CN117372495A (en) Calculation method for accelerating dot products with different bit widths in digital image processing

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees