TW410308B

TW410308B - Multiplication and multiplication accumulation processor under the structure of PA-RISC

Info

Publication number: TW410308B
Application number: TW88101781A
Authority: TW
Inventors: Jing-Je Liang; Ching-Jia Chen
Original assignee: Winbond Electronics Corp
Priority date: 1999-02-05
Filing date: 1999-02-05
Publication date: 2000-11-01

Abstract

A multiplication and multiplication accumulation processor under the structure of PA-RISC is comprised of MUXs, adders, accumulators and selectors, handling the multiplication and multiplication accumulation of two calculating units (the first calculating unit and the second calculating unit). Inside the processor, two MUXs select the first half bits of the second calculation unit and the remaining half bits of the second calculation unit respectively. Two adders in form of partial integration process the multiplication of the first half bits of the first calculation unit and the selected bits of one MUX and the multiplication of the remaining half bits of the first calculation unit and the selected bits of the other MUX respectively to obtain separately a multiplication value with a carry. Two accumulators keep the multiplication value of two adders respectively and selectively send it back to two adders to proceed accumulation. Two selectors choose the first half bits or the remaining half bits of a accumulator and the first half bits or the remaining half bits of the other accumulator to combine and obtain the desired multiplication or multiplication accumulation value.

Description

本發明提供一種構築在32位元PA-R i sc架構下之1 6位元乘法及乘法累加處理器，其可以增加數位信號處理的效率’並降低整體硬體的成本。由於多媒體的蓬勃發展，數位信號處理（DSp)亦愈來愈複雜且其衍生的運算量亦愈來愈大。不過，研究中顯示·大部分程式有80%都在執行簡單的指令，而不常使用複雜的指令。且複雜指令不但會增加硬體的複雜度及成本’並且亦會增加硬體的執行周期時間（Cycle time)，降低硬體的執行效能。因此’近來有不少硬體便是建築在精簡指令集的架構下（Reduced Instruction Set Computer，RISC)，其僅僅提供一些簡單功能的指令’至於其他複雜功能的功能則由系統軟體來實行。不過’多媒體應用的數位信號處理經常需要大量的乘法及乘法累加運算，亦即需要許多簡單指令以完成’因此會造成RISC架構應用在數位信號處理時的效能降低。有鑑於此，本發明的主要目的便是提供一種在 PA-RISC架構下之乘法及乘法累加處理器，其可以提供新的乘法及乘法累加指令集，並在不影響原有RISC架構的效能下’更為適用於現今蓬勃發展的多媒體應用。本發明的另一目的就是在32位元PA-RISC架構下，提供16位元之乘法及乘法累加指令集，使新訂定之指令集更適用於多媒體、信號處理等需要自動處理溢位問題的應The present invention provides a 16-bit multiplication and multiply-accumulate processor built on a 32-bit PA-R i sc architecture, which can increase the efficiency of digital signal processing 'and reduce the overall hardware cost. Due to the rapid development of multimedia, digital signal processing (DSp) is becoming more and more complex and the amount of operations derived from it is increasing. However, research has shown that 80% of most programs execute simple instructions and rarely use complex instructions. And complex instructions will not only increase the complexity and cost of the hardware ’, but also increase the hardware cycle time (Cycle time), reducing the performance of the hardware. Therefore, 'recently many pieces of hardware have recently been built under the Reduced Instruction Set Computer (RISC), which only provides instructions for some simple functions.' For other complex functions, functions are implemented by system software. However, the digital signal processing of multimedia applications often requires a large number of multiplication and multiply-accumulate operations, that is, it requires many simple instructions to complete. Therefore, the performance of the RISC architecture application in digital signal processing is reduced. In view of this, the main object of the present invention is to provide a multiplication and multiply-accumulate processor under the PA-RISC architecture, which can provide a new multiplication and multiply-accumulate instruction set without affecting the performance of the original RISC architecture 'More suitable for today's booming multimedia applications. Another object of the present invention is to provide a 16-bit multiplication and multiply-accumulate instruction set under the 32-bit PA-RISC architecture, so that the newly set instruction set is more suitable for multimedia, signal processing, and other issues that require automatic processing of overflow issues should

第4頁五、發明說明（2) 為達上述及其他目的，本發明乃提供一種在pA KISC 架構下的乘法及乘法累加處理器，其主要由多工器、加法器、累積器及選擇器以組成，用以處理兩運算元（第一運. 算7L及第二運算元）的乘法及乘法累加運算^在這種處理器中，兩個多工器分別選擇第二運算元之前半位元及第二運算元之後半位元。兩個加法器則以部分積方式分別，處理第一運算70之前半位元與一多工器選擇之位元的乘法運算及處理第一運算元之後半位元與另一多工器選擇之位元的乘法運算，藉以分別得到—具有進位之乘積。兩個累積器分別存放兩個加法器之乘積’並選擇性地將其送回兩個加法器以進行累加。而兩個選擇器則分別選擇一累積器之前半位元或後半位元及另一累積器之前半位元或後半位元’藉以組合得到想要的乘積或乘法累加值。在k種32位元PA-RISC架構下的乘法及乘法累加處理器中，由於32位元的乘法是由兩組位元的乘法完成，因此數位信號的處理速率可以提高。另外，由於16位元乘法器的複雜度遠低於32位元乘法器，因此整個硬體的成本亦可以大幅降低。。另外，在本發明PA-RISC架構下的乘法及乘法累加處理器中，更可以具有一輸入缓衝區，用以暫存第一運算元及第二運算元的資料。並且’兩個加法器可分別具有心“匕編碼器及CSa陣 1。Booth編碼器係將第一運算元之前半位元根據一多工器選擇之位元分割成若干個部分積；而CSA陣列則是相加5. Description of the invention (2) In order to achieve the above and other objectives, the present invention provides a multiplication and multiply-accumulate processor under the pA KISC architecture, which is mainly composed of a multiplexer, an adder, an accumulator, and a selector. It is composed to handle the multiplication and multiply-accumulate operations of two operands (the first operation. 7L and the second operand). In this processor, two multiplexers select the first half of the second operand, respectively. And the second operand. The two adders use a partial product method to process the multiplication of the first half of the first operation 70 and the bit selected by a multiplexer and the second half of the first operation and the second multiplexer to select Bit multiplication operations to obtain separately-products with carry. The two accumulators store the product of the two adders, respectively, and selectively send them back to the two adders for accumulation. The two selectors respectively select the first half or the last half of one accumulator and the first half or the last half of the other accumulator to combine to obtain the desired product or multiply-accumulated value. In k types of 32-bit PA-RISC multiplication and multiply-accumulate processors, since 32-bit multiplication is performed by two sets of multiplications, the processing rate of digital signals can be increased. In addition, since the complexity of the 16-bit multiplier is much lower than that of the 32-bit multiplier, the cost of the entire hardware can also be greatly reduced. . In addition, the multiplication and multiply-accumulate processor in the PA-RISC architecture of the present invention may further have an input buffer for temporarily storing the data of the first operand and the second operand. And 'the two adders can each have a core encoder and CSa array 1. The Booth encoder divides the first half of the first operand into several partial products according to the bits selected by a multiplexer; and CSA Arrays are additive

410308 五'發明卿⑶ ί ί這些部分積，以得到部分和及部分進位，部分和及部· 刀位再經由一36位元加法器以得到真正乘積。再者，兩個累積器的長度均略大於兩個加法器的長. ί以存放乘法累加的過渡值（例如濾波器或富利葉轉換函數）。理g ί外、本發明在PA RISC架構下的乘法及乘法累加處可以具有一輸出控制器，用以輸出兩個選擇器組成 *:積或乘積累加值，並在累積器發生正溢位或負溢位時’輸出一最大正值或一最小負值。 σ另外，本發明亦提供—種PA_Ri sc架構下的乘法處理，，用以處理第一運算元及第二運算元的乘法。這種乘法處理器是由分割裝置、乘法裝置、選擇裝置所組成。其中，分割裝置分別將第一運算元及第二運算元分成前半位兀及後半位元。乘法裝置分別將第一運算元的前半位元及第二運算元的前半位元或後半位元之__相乘，及，將第一運算元的後半位元及第二運算元的前半位元或後半位元之另一相乘，藉以得到二個乘積。而選擇裝置則分別選擇二乘積之前半位元或後半位元以組合得到所要的乘積。在這個例子中’分割裝置及選擇裝置均可由多工器完成。另外’欲相乘的兩運算元均在長度上減半，因此\ $ 路複雜度及硬體成本均可以降低’且處理迷度亦可以揀^ 加。曰為讓本發明之上述和其他目的、特徵、和優點能更明顯易懂’下文特舉一較佳實施例，並配合所附.圖式，作詳410308 Five 'Invention Secretary ⑶ ί These partial products to get the partial sum and partial carry, the partial sum and the knife position is then passed through a 36-bit adder to get the real product. Furthermore, the length of both accumulators is slightly longer than the length of the two adders. Ί is used to store the transition values accumulated by multiplication (such as a filter or a Fourier transform function). In addition, the multiplication and multiply-accumulate section of the present invention under the PA RISC architecture may have an output controller for outputting two selectors *: product or multiply-accumulate addition value, and a positive overflow or Negative overflow 'outputs a maximum positive value or a minimum negative value. σ In addition, the present invention also provides a kind of multiplication processing under the PA_Ri sc architecture, for processing the multiplication of the first operand and the second operand. This multiplication processor is composed of a division device, a multiplication device, and a selection device. Among them, the dividing device divides the first operand and the second operand into the first half bit and the second half bit, respectively. The multiplication device multiplies the __ of the first half of the first operand and the first half or the second half of the second operand, respectively, and the second half of the first operand and the first half of the second operand. Multiply the second or last half of the bit to get the two products. The selection device selects the first half bit or the last half bit of the two products to obtain the desired product. In this example, the 'splitting device and the selection device can be completed by a multiplexer. In addition, the two operands to be multiplied are halved in length, so the complexity of the road and the hardware cost can be reduced ', and the processing complexity can also be increased. In order to make the above and other objects, features, and advantages of the present invention more comprehensible ', a preferred embodiment is exemplified below, and is accompanied by the accompanying drawings.

第6頁 mm 410308 五、發明說明（4) 細說明如下：圖式說明第1圖係本發明在PA-R I SC架構下之乘法及乘法累加處理器的電路方塊圖；第2圖係本發明在PA_RISC架構下之乘法及乘法累加處理器中CSA陣列的電路方塊圖；以及。第3圖係本發明在PA_RISC架構下之乘法及乘法累加處理器中CL A加法器的電路示意圖。實施例在習知PA-RISC架構下，原本就提供有乘法及乘法累加如令集，不過只限於此架構下的32位元有號數或無號數乘法及乘法累加動作。乘法結果取其前半部（MSB)32位元或後半部（LSB)32位元存回一般暫存器，若乘積出現正溢位（Over f 1 ow )或負溢位（Under f 1 ow)時，則前半部（msb )32 位元或後半部（LSB)32位元可以最大正值（I6’h7fff)或最小負值（16 h8000)存回—般暫存器。乘法累加結果則直接存回64位元之累積器，若乘法累加結果出現正溢位或負溢位時’亦可以最大正值（32, h7f f f f f ff )或最小負值 (32 h80000000)存回累積器。累積器與一般暫存器之間可透過搬移指令’從兩個一般暫存器搬到累積器，或從累積器搬至一般暫存器（每次只能搬前半部32位元或後半部32 位元）。不過’這種架構應用在數位信號處理中卻有以下缺失：翻 410308 五、發明說明（5) (1 )PA-RISC架構下之32位元乘法及乘法累加雖然能增加數位信號處理之精確度，但一般演算法卻不需這麼高的精確度。因此，以32位元硬體來實行數位信號處理不但會造成硬體的浪費，並且會增加硬體的執行周期或花費更多執行周期以完成一個乘法或乘法累加動作，降低整體架構之效能β (2) —般數位信號處理，如離散富利葉轉換（discrete fourier transform)、FIR 濾波器（finite impulse filter)、IIR 渡波器（infinite impulse filter)及離散餘弦轉換（discrete cosine transform)，常常是一連串乘法累加的結果，其中間過渡累加結果有無溢位並不會影響到數位信號處理的最後結果。因此，若以3 2位元 PA-RISC架構實行數位信號處理，則需要更複雜的軟體來處理乘法累加在過渡時期產生的溢位問題。因此，本發明便在32位元的PA-RISC架構下，訂定新的16位元乘法及乘法累加指令集，其可以適用於多媒體及數位信號處理等需要自動處理溢位之應用中，且具有以下特性： (1)在本發明PA-RISC架構下之乘法及乘法累加處理器中’一般暫存器送出的兩個32位元運算元（Β_Β[0 :31]， B_X[〇 : 31 ])分別取其前半部16位元（BB[〇 :丨5 ]，Page 6 mm 410308 5. Description of the invention (4) The detailed description is as follows: Schematic description The first diagram is a circuit block diagram of the multiplication and multiply-accumulate processor of the present invention under the PA-R I SC architecture; the second diagram is this Circuit block diagram of a CSA array in a multiply and multiply accumulate processor under the PA_RISC architecture; and. Fig. 3 is a schematic circuit diagram of the CL A adder in the multiplication and multiply accumulate processor in the PA_RISC architecture of the present invention. Example Under the conventional PA-RISC architecture, multiplication and multiply-accumulate such as a repertoire are originally provided, but it is limited to 32-bit signed or unnumbered multiplication and multiply-accumulate actions under this architecture. The multiplication result takes the first half (MSB) 32 bits or the second half (LSB) 32 bits and stores them back to the general register. If the product has a positive overflow (Over f 1 ow) or a negative overflow (Under f 1 ow) , The first half (msb) 32 bits or the second half (LSB) 32 bits can be stored back to the general register with the largest positive value (I6'h7fff) or the smallest negative value (16 h8000). The multiplication accumulation result is directly stored back to the 64-bit accumulator. If the multiplication accumulation result has a positive or negative overflow, it can also be stored with the maximum positive value (32, h7f ffff ff) or the minimum negative value (32 h80000000). Accumulator. The accumulator and general register can be moved from the two general registers to the accumulator or from the accumulator to the general register via the transfer instruction (only the first 32 bits or the second half can be moved at a time) 32-bit). However, 'the application of this architecture has the following defects in digital signal processing: Turn 410308 5. Description of the invention (5) (1) 32-bit multiplication and accumulation under PA-RISC architecture Although it can increase the accuracy of digital signal processing , But general algorithms do not need such high accuracy. Therefore, implementing digital signal processing with 32-bit hardware will not only cause hardware waste, but also increase the hardware execution cycle or spend more execution cycles to complete a multiplication or multiply-accumulate operation, reducing the performance of the overall architecture β (2) — general digital signal processing, such as discrete fourier transform, FIR filter (finite impulse filter), IIR (infinite impulse filter) and discrete cosine transform (discrete cosine transform), often The result of a series of multiplication and accumulation. The presence or absence of overflow in the intermediate transition accumulation result will not affect the final result of digital signal processing. Therefore, if digital signal processing is implemented with a 32-bit PA-RISC architecture, more complicated software is needed to deal with the overflow problem caused by multiply accumulation during the transition period. Therefore, the present invention specifies a new 16-bit multiplication and multiply-accumulate instruction set under the 32-bit PA-RISC architecture, which can be applied to applications requiring automatic processing of overflow, such as multimedia and digital signal processing, and It has the following characteristics: (1) The two 32-bit operands sent out by the 'general register' in the multiplication and multiply-accumulate processors in the PA-RISC architecture of the present invention (B_B [0:31], B_X [〇: 31 ]) Take the first 16 bits (BB [〇: 丨 5],

Bj[〇 :15])及後半部 16 位元（b_B[16 :31]，B_X[16 : 31 ])作為兩組1 6位元有號數或無號數乘法（B_B [ 〇 : 15]*B j[〇 : 15] ’ B_b[i6 : 31]*Β—χ[16 : 31])，的運算元。Bj [〇: 15]) and the second 16 bits (b_B [16: 31], B_X [16: 31]) as two groups of 16-bit signed or unsigned multiplication (B_B [〇: 15] * B j [〇: 15] 'B_b [i6: 31] * Β-χ [16: 31]).

五、發明說明（6) 待自動處理溢位問題並分別捨棄兩組乘法結果（RES丨[〇 : 31] ’RES2[0 :31])的後半部 16 位元（reS1[16 :31]， RES2[16 :31])後，再將兩乘法結果的前半部（MSB)i6位元 (RES1[0 ’15]，RES2[0 :15])存回32位元之一般暫存器。 (2) 在本發明PA-RISC架構下之乘法及乘法累加處理器中’一般暫存器送出的兩個32位元運算元（β_Β[0 :31]， B_X [0 :31])可任意選取其前半部16位元或後半部16位元作為有號數或無號數乘法（B 一 B[〇 :15]*B_X[0 :15]， B_B[0 : 15]*B —X[16 : 31] ’ B_B[16 : 31 ]*B_X[〇 : 15]， B—B[ 1 6 : 31 ] *B_X[ 16 : 31 ])的運算元，而得到乘法結果則直接存回32位元之一般暫存器。 (3) 在本發明PA-RISC架構下之乘法及乘法累加處理器中’可新增兩組40位元累加暫存器（ACC[0 : 39])以存放兩組乘法累加結果的中間過渡值。在數位信號處理中，一連串乘法累加動作可能會在中間過渡值出現溢位、卻在最後結果沒有溢位。因此，新增的40位元累加暫存器（ACC[0 : 39 ])可存放中間過渡值。若最後結果出現溢位，則在不增加硬體及狀態暫存器的情況下，直接以—指令飽和 (Saturate)累加暫存器的内容。 (4) 在本發明PA-RISC架構下之乘法及乘法累加處理器中’可搬移任何暫存器（GR[0 :31])的資料至40位元累加暫存器的後半部32位元（ACC[8 :39])，或搬移暫存器的後半部8位元（GR[ 24 : 31)至累加暫存器的前半部8位元 CACC[〇 :7])。反之，亦可由累加暫存器的後半部32位元V. Description of the invention (6) To deal with the overflow problem automatically and discard the second half of the multiplication results (RES 丨 [〇: 31] 'RES2 [0: 31]), 16 bits (reS1 [16:31], RES2 [16:31]), and then store the first half (MSB) i6 bits (RES1 [0 '15], RES2 [0:15]) of the double multiplication result back to the 32-bit general register. (2) The two 32-bit operands (β_Β [0:31], B_X [0:31]) sent by the 'general register in the multiplication and multiply-accumulate processor in the PA-RISC architecture of the present invention can be arbitrary Select the first 16 bits or the second 16 bits as the signed or unsigned multiplication (B-B [〇: 15] * B_X [0: 15], B_B [0: 15] * B —X [ 16: 31] 'B_B [16: 31] * B_X [〇: 15], B—B [1 6: 31] * B_X [16: 31]), and the multiplication result is directly stored back to 32 bits Yuan's general register. (3) In the multiplication and multiply-accumulate processor in the PA-RISC architecture of the present invention, two sets of 40-bit accumulation registers (ACC [0: 39]) can be added to store the intermediate transitions of the two sets of multiply-accumulate results. value. In digital signal processing, a series of multiply and accumulate actions may overflow at the intermediate transition value, but there is no overflow at the end result. Therefore, the newly added 40-bit accumulation register (ACC [0: 39]) can store intermediate transition values. If the final result overflows, the content of the register is directly accumulated with —instruction saturation without adding hardware and status register. (4) In the multiplication and multiply-accumulate processor under the PA-RISC architecture of the present invention, the data of any register (GR [0:31]) can be moved to the 32-bit 32-bit accumulation register (ACC [8:39]), or move the last 8 bits of the register (GR [24:31) to the first 8 bits of the accumulator register CACC [0: 7]). Conversely, the second half of the accumulation register can also be 32-bit

410308 五、發明說明（7) (ACC[8 :39])搬資料到任何暫存器的32位元（GR[〇 : 31]) ’或由累加暫存器的前半部8位元（ACC[〇 : 7])搬至任何暫存器的後半部8位元（gr[24 :31])。 (5)在本發明PA-RISC架構下之乘法及乘法累加處理器中’可同時執行兩组16位元的乘法累加。從一般暫存器送出的兩個32位元運算元（b_b[〇 : 31 ]，Β_Χ[0 : 31 ])分別取其前半部1 6位元（B—B[ 0 : 1 5 ]，B_X[ 0 : 15])及後半部1 6位元（B —B [ 1 6 : 31 ] ’ B_X[ 1 6 : 31 ])作為兩組1 6位元之運算元、並配合兩組40位元累積器，藉以同時執行兩組丨6位元的有號數或無號數的乘法累加（B_B[〇 : 15]*Β_χ[〇 : 15]+ACC1[0 : 39] ’ B_B[16 : 31]*B—X[16 : 31]+ACC2[0 : 39])、並將所得之乘法累加值存回兩組4〇位元的累積器。第1圖即本發明在PA-RISC架構下之乘法及乘法累加處理器的電路方塊圖。在第1圖中，乘法及乘法累加處理器主要是由輸入緩衝區B1 ，兩個多工器Ml、M2，兩個加法器ADI、AD2，兩個累積器AC1、AC2、兩個選擇器SI、S2及輸出控制器C1所組成。在這個實施例中，輸入緩衝區B1 (如64位元暫存器）係用以暫存暫存兩個32位元運算元（Β_Β[0 : 31 ]，Β_Χ[0 : 31])，並將其前半部16位元（Β_Β[0 : 15]，Β_Χ[0 : 15])及後半部16位元（B_B[ 16 ·_ 3〗]，B_X[ 1 6 : 31 ])按照彼此交替的方式（Β_Β[0 : 15]，Β —ΧΠ6 : 31]，B —X[〇 : 15]， B —B[16 : 31 ])依序排列。410308 V. Description of the invention (7) (ACC [8:39]) Move the data to any 32-bit register (GR [〇: 31]) 'or the first 8-bit register (ACC [〇: 7]) Move to the last 8 bits of any register (gr [24:31]). (5) In the multiplication and multiply-accumulate processor under the PA-RISC architecture of the present invention, two groups of 16-bit multiply-accumulate can be performed simultaneously. The two 32-bit operands (b_b [〇: 31], B_ × [0: 31]) sent from the general register are taken from the first half of the 16-bit (B—B [0: 1 5], B_X [0: 15]) and 16 bits in the second half (B — B [1 6: 31] 'B_X [1 6: 31]) as two groups of 16-bit operands, and cooperate with two groups of 40 bits Accumulator, by which two groups of 6-bit signed or unsigned multiplications are performed simultaneously (B_B [〇: 15] * Β_χ [〇: 15] + ACC1 [0: 39] 'B_B [16: 31 ] * B—X [16: 31] + ACC2 [0: 39]), and store the obtained multiplied accumulation value back to the two 40-bit accumulators. Fig. 1 is a circuit block diagram of the multiplication and multiply-accumulate processor of the present invention under the PA-RISC architecture. In Figure 1, the multiplication and multiply-accumulate processors are mainly composed of the input buffer B1, two multiplexers M1, M2, two adders ADI, AD2, two accumulators AC1, AC2, and two selectors SI. , S2 and output controller C1. In this embodiment, the input buffer B1 (such as a 64-bit register) is used to temporarily store two 32-bit operands (B_Β [0: 31], B_χ [0: 31]), and The first 16 bits (B_B [0: 15], B_χ [0: 15]) and the second 16 bits (B_B [16 · _ 3〗], B_X [1 6: 31]) are alternated with each other. Ways (B_B [0:15], B_XΠ6: 31], B_X [0:15], B_B [16:31]) are arranged in order.

第10頁 _410308__ 五、發明說明（8) 多工器Ml、M2則分別選擇32位元運算元（β_χ[〇 :31]) 的前半部16位元（Β_Χ[0 : 15])及後半部16位元（Β_χ[16 : 31]) ’用以作為兩組乘法或乘法累加的運算元。當多工器 Ml選擇32位元運算元（Β_Χ[0 :31])的前半部16位元 (Β_Χ[ 0 :15])時，多工器M2係選擇32位元運算元（Β —Χ[〇 : 31])的後半部16位元（Β_Χ[16 :31])。反之，當多工器M1 選擇32位元運算元（B_X[0 :31])的後半部16位元 (B_X[ 16 : 31])時，多工器M2則選擇32位元運算元 (B_X[〇 :31])的前半部 16 位元（B_X[16 :31D。加法器AD1係以部分積（Partial products)方式處理 32位元運算元（B_B [0 :31])的前半部16位元（Β_Β[0 :15]) 及多工器Ml選擇的16位元間的乘法運算，加法器AD2則以部分積（Partial products)方式處理32位元運算元 (Β_Β[0 :31])的後半部16位元（B_B[16 :31])及多工器M2 選擇的16位元間的乘法運算。當多工器Ml選擇32位元運算元（Β—χ[〇 : 31])的前半部 16位元（Β_Χ[ 〇 : 15 ])、多工器M2選擇32位元運算元 (Β_Χ[0 : 31 ])的後半部1 6位元（Β —Χ[ 16 : 31 ])時，加法器 AD1係執行32位元運算元（Β_Β[0 :31])的前半部16位元 (Β_Β[0 :15])及32位元運算元（Β —Χ[〇 :31])的前半部16位元（8_乂[0:15])間的乘法運算，且加法器人02係執行32位元運算元（Β_Β[0 :31])的後半部16位元（Β_Μ16 :31])及 32位元運算元（β_Χ[〇 :31])的後半部16位元（Β_Χ[16 : 3 1 ])間的乘法運算。Page 10_410308__ V. Description of the invention (8) The multiplexers M1 and M2 select the first 16 bits (B_χ [0: 15]) and the second half of the 32-bit operand (β_χ [0:31]), respectively. Part 16-bit (B_χ [16: 31]) 'Used as two sets of multiplication or multiply-accumulate operands. When the multiplexer M1 selects the first half of the 32-bit operands (B_χ [0:31]) and 16-bits (Β_χ [0: 15]), the multiplexer M2 selects the 32-bit operands (Β —χ [〇: 31]) in the second half of 16 bits (B_χ [16:31]). Conversely, when the multiplexer M1 selects the second half of the 32-bit operand (B_X [0:31]) (B_X [16:31]), the multiplexer M2 selects the 32-bit operand (B_X [〇: 31]) The first half of 16 bits (B_X [16: 31D. The adder AD1 processes the first half of 32-bit operands (B_B [0:31]) in partial products) Multiplication between 16-bits selected by B_Β [0:15]) and multiplexer M1, and AD2 by the adder processes 32-bit operands as partial products (B_B [0:31]) The multiplication operation between the 16-bit (B_B [16:31]) and the 16-bit selected by the multiplexer M2. When the multiplexer M1 selects the 32-bit operand (B-χ [〇: 31]) The first half is 16 bits (B_ × [〇: 15]), and the multiplexer M2 selects the second half of the 32-bit operands (B_ × [0: 31]). The 16 bits (B — × [16: 31]) At the time, the adder AD1 executes the first half of the 32-bit operand (B_Β [0:31]) 16-bit (B_Β [0:15]) and the 32-bit operand (Β —χ [〇: 31]) Multiplication between the first half of 16 bits (8_ 乂 [0:15]), and adder 02 performs 32-bit operators (B_ [0:31]) Multiplication between the second half of 16-bits (B_M16: 31]) and the second half of 32-bit operands (β_χ [〇: 31]) (Β_χ [16: 3 1]) Operation.

410308 五、發明說明（9) 反之，當多工器Ml選擇32位元運算元（Β_Χ[0 :31])的後半部16位元（Bj[ 16 : 31 ])、多工器M2選擇32位元運算元（B J[0 : 31 ])的前半部16位元（Β_Χ[0 : 15])時，加法器' AD1係執行32位元運算元（B_B[〇 : 31 ])的前半部16位元 (Β_Β[0 : 15])及32位元運算元（B j[ 0 : 31 ])的後半部1 6位元（B_X[ 16 : 31 ])間的乘法運算，而加法器AD2則執行32位元運算元（Β_Β[0 :31])的後半部16位元（B_B[16 :31])及 32位元運算元（BJ[0 :31])的前半部16位元（Β_Χ[0 :15]) 間的乘法運算^ 由上述可知’加法器ADI、AD2可選擇性地完成兩個32 位元運算元（B_B[0 :31]，B_X[〇 :31])之間的四種乘法運算（B —B[0 : 15]*B_X[0 : 15]，B_B[0 : 15]*B_X[16 : 31]， B_B[16 : 31]*B_X[0 :15] ， B—B[16 : X[16 : 31])。另外’在本實施例中，加法器ADI、AD2可分別由 Booth編碼器El、E2 ’CSA陣列CSA1、CSA2及CLA加法器 CLA1、CLA2所組成。410308 V. Description of the invention (9) Conversely, when the multiplexer M1 selects the second half of the 32-bit operand (B_χ [0:31]) and 16 bits (Bj [16:31]), the multiplexer M2 selects 32 When the first half of the bit operand (BJ [0: 31]) is 16 bits (B_χ [0: 15]), the adder 'AD1 executes the first half of the 32-bit operand (B_B [〇: 31]). Multiplication between 16-bit (B_B [0: 15]) and 32-bit operands (B j [0: 31]) and 16-bit (B_X [16: 31]) multipliers, and adder AD2 Then the second half of the 32-bit operand (B_B [0:31]) 16-bit (B_B [16:31]) and the first half of the 32-bit operand (BJ [0:31]) 16-bit ( Β_χ [0:15]) ^ From the above we can see that the 'adders ADI, AD2 can optionally complete between two 32-bit operands (B_B [0:31], B_X [〇: 31]) Four kinds of multiplication operations (B —B [0: 15] * B_X [0: 15], B_B [0: 15] * B_X [16: 31], B_B [16: 31] * B_X [0: 15], B—B [16: X [16: 31]). In addition, in this embodiment, the adders ADI and AD2 may be composed of Booth encoders El and E2, respectively, a CSA array CSA1, CSA2, and a CLA adder CLA1, CLA2.

Booth編碼器E1係根據32位元運算元（B_B[0 : 3 1 ])之前半部1 6位元（B_B[0 : 1 5])的排列處理多工器Ml選擇之16 位元，藉以得到複數個部分積。B00 th編碼器E2則根據32 位元運算元（Β_Β[0 :31])之後半部16位元（B_B[16 :31]) 的排列處理多工器M2選擇之1 6位元，藉以得到複數個部分積。其中’Booth編碼器E1的處理方式可依序自32位元運算元（B—B[0 : 31 ])之前半部16位元（b_b[〇 : 15])後面選取 3位元、藉以將多工器Ml選擇之16位元乘以土〇、±1、±2The Booth encoder E1 processes the 16 bits selected by the multiplexer M1 according to the arrangement of the first half of 16 bits (B_B [0: 1 5]) of the 32-bit operand (B_B [0: 3 1]). Get multiple partial products. The B00 th encoder E2 processes the 16 bits selected by the multiplexer M2 according to the arrangement of the 16 bits (B_B [16: 31]) in the second half of the 32-bit operands (B_B [0:31]), thereby obtaining Plural partial products. Among them, the processing method of 'Booth encoder E1 can be sequentially selected from the first half of the 32-bit operand (B-B [0: 31]) and the 16-bit (b_b [〇: 15]) followed by 3 bits. Multiply the 16 bits selected by the multiplexer M1 by ± 0, ± 1, ± 2

第12頁 410308 &、發明說明（ίο) 並得到複數個部分積。而Booth編碼器E2的處理方式則可依序自32位元運算元（Β_Β[0 ·· 31])之後半部16位元 (B__B[16 :31])後面選取3位元、藉以決定要將多工器M2選擇之16位元乘以±〇、±1、±2並得到複數個部分積。這些部分積則在隨後的CSA陣列CSA1、CSA2(Carry save adder)及CLA 加法器CLA1、CLA2(Look ahead adder)中相加，藉以得到32位元運算元（B_B[0 :31])之前半部16位元 (Β_Β[0 : 15])及多工器Ml選擇之16位元，或32位元運算元 (Β_Β[0 :31])之後半部16位元（B_B[16 :31])及多工器M2 選擇之16位元的乘法運算。Page 12 410308 & Invention Description (ίο) and get multiple partial products. The processing method of Booth encoder E2 can select 3 bits in order from the second half of the 32-bit operand (B_B [0 ·· 31]) to the 16-bit half (B__B [16: 31]) to decide Multiply the 16 bits selected by the multiplexer M2 by ± 0, ± 1, ± 2 and obtain a plurality of partial products. These partial products are added in the subsequent CSA arrays CSA1, CSA2 (Carry save adder) and CLA adders CLA1, CLA2 (Look ahead adder) to obtain the first half of the 32-bit operand (B_B [0:31]) 16 bits (B_B [0: 15]) and 16 bits selected by multiplexer M1, or the latter 16 bits (B_B [16:31]) ) And a 16-bit multiplication operation selected by multiplexer M2.

Booth編碼器El、E2的處理方式示於第1表中。The processing methods of the Booth encoders El and E2 are shown in Table 1.

Bit Operation Yi+1 yi Yi-1 0 0 0 +0 0 0 1 +x 0 1 0 +x 0 1 1 +2X 1 0 0 -2X 1 0 1 -X 1 1 0 -X 1 1 1 -0 第1表另外，CSA陣列CSA1、CSA2及CLA加法器CLA1、CLA2則第13頁五、發明說明（11) 相加Booth編碼器E丨、E2所得到的部分積及一乘法累加值 (選擇性地），藉以完成乘法或乘法累加運算。 ^ 第2圖係本發明在PA-RI SC架構下乘法及乘法累加處理器中CSA陣列之電路方塊圖；第3圖則是本發明在以―RISC 架構下乘法及乘法累加處理器中CLa加法器的示意圖。如第1表所示’在這個實施例中，由於兩個運算元 (Β_Β[0 : 31 ]，Β„Χ[0 : 31 ])分別是以32 位元表示，Booth 編碼器El、E2可分別得到9個部分積。因此，如第2圖所示’ CSA陣列電路CSA1、CSA2可分別以兩個三輸入CSA加法器CS1、CS3及一個四輸入CSA加法器CS2先將9個部分積 (P0~P8)及一個乘法累加值（ACC)相加，藉以得到六個中間過渡值（P’0〜P’5)。然後，再以兩個三輸入CSA加法器 CS4、CS5將這六個中間過渡值（P’ 〇〜p’ 5)相加，藉以得到另四個中間過渡值（P" 〇~P” 3 )。最後，再以一個四輸入CSA 加法器CS6將這四個中間過渡值（P" 〇~P" 3)相加，藉以得到九個部分積（P0〜P8)及一個乘法累加值（ACC)的總和（sum) 及溢位（carry)。而CLA加法器CLA1、CLA2，如第3圖所示，則相加C S A陣列C S A1、C S A 2得到的總和（s u m )及溢位 (carry)，藉以得到完整的乘積及乘法累加值。在第2圖中，當三輸入CSA加法器CS1、CS3、CS4、CS5 的三個輸入分別為a、b、c時，其得到的總和為aMTc且溢位為a&blb&c | c&a。而當四輸入CSA加法器CS2、CS6的四個輸入分別為a、b、c、d時’其可以得到三個參數p、q、h 分別為&*1^<：~(1'(&丨13)&((：丨（1)、（&&15)|((：&〇’且四個輸Bit Operation Yi + 1 yi Yi-1 0 0 0 +0 0 0 1 + x 0 1 0 + x 0 1 1 + 2X 1 0 0 -2X 1 0 1 -X 1 1 0 -X 1 1 1 -0 No. Table 1 In addition, the CSA arrays CSA1, CSA2, and CLA adders CLA1, CLA2 are on page 13. V. Description of the invention (11) The partial product obtained by adding Booth encoders E 丨 and E2 and a multiplication cumulative value (optionally ) To complete multiplication or multiply-accumulate operations. ^ Figure 2 is a circuit block diagram of a CSA array in a multiplication and multiply-accumulate processor in the PA-RI SC architecture of the present invention; Figure 3 is a CLa addition of a multiply-accumulate and multiply-accumulate processor in the RISC architecture of the present invention Of the device. As shown in Table 1, 'In this embodiment, since the two operands (B_Β [0: 31], B „× [0: 31]) are respectively represented by 32 bits, Booth encoders El and E2 can Nine partial products are obtained respectively. Therefore, as shown in FIG. 2, the CSA array circuits CSA1 and CSA2 can respectively use two three-input CSA adders CS1 and CS3 and a four-input CSA adder CS2 to first integrate nine partial products ( P0 ~ P8) and a multiplied accumulation value (ACC) to obtain six intermediate transition values (P'0 ~ P'5). Then, the two three-input CSA adders CS4 and CS5 add the six The intermediate transition values (P '0 ~ p' 5) are added to obtain another four intermediate transition values (P " 0 ~ P "3). Finally, a four-input CSA adder CS6 adds the four intermediate transition values (P " 〇 ~ P " 3) to obtain nine partial products (P0 ~ P8) and a multiply-accumulated value (ACC). Sum and carry. The CLA adders CLA1 and CLA2, as shown in FIG. 3, add the sum (sum) and carry of the C S A array C S A1, C S A 2 to obtain a complete product and multiply accumulated value. In Figure 2, when the three inputs of the three-input CSA adder CS1, CS3, CS4, and CS5 are a, b, and c, respectively, the total sum is aMTc and the overflow is a & blb & c | c & a. When the four inputs of the four-input CSA adder CS2 and CS6 are a, b, c, and d, respectively, 'it can obtain three parameters p, q, and h are & * 1 ^ <: ~ (1' (& 丨 13) & ((: 丨 (1), (& & 15) | ((: & 〇 'and four losers

第14頁 -⑽ 308_____ 五、發明說明（12) 入a、b、c、d的總和及溢位分別為〜（（p&Q)丨（〜p&〜q))及〜（（p&~q)|(〜p&h))。在第3圖中’ CLA加法器CLA1、CLA2則根據CSA陣列 CSA1、CSA2輸出的總和（sum)及溢位（carry)所輸出的參數 pi〜p36及gl〜g36得到最後的乘積或乘法累加結果。其中， Pn=sum η 1 carry η (η=1~36) >gn=sum η | carry η (η=1〜36) ’sum η是總和的第η位元，carry η是溢位的第η 位元。且，CLA加法器CL A1、CLA2中具有三種不同的運算’分別以白色圓、灰色圓、黑色圓表示。白色圓表示的運算係輸入（gn，pn)且輸出（gout，pout): (gn，pn)。灰色圓表示的運算係輸入（gn，pn)、（gn—i ’ pn丨）且輸出（gout , P〇Ut )=(gn I (gH&Pn)，Pw&Pn)。黑色圓表示的運算則輸入 (gn，Pn)、（gnd，Ph)、（gn_2，Pn_2)且輸出（g0Ut， POUtWgnKgh&pJKgH&Pn&pH) ’Pn&pni&pn2)。另外’累積器AC1、AC2則暫存CLA加法器CLA1、CLA2 所得到的乘法或乘法累加結果（即32位元運算元（b_B [ 0 : 31])之前半部16位元（Β_Β[0 : 15])及多工器Ml選擇之16位元，或32位元運算元（Β_Β[0 :31])之後半部16位元 (B_B [0 : 15])及多工器M2選擇之16位元的乘法或乘法累加結果），並選擇性地（在進行乘法累加時）將結果送回CSA陣列CSA1、CSA2進行累加。在這個例子中，為儲存具有溢位的乘法累加結果，累積器AC1、AC2的長度略大於加法器 ADI、AD2的長度，如本實施例之40位元。另外，選擇器SI、S2分別選取加法器ADI、AD2的前半Page 14-⑽ 308_____ V. Description of the Invention (12) The sum and overflow of a, b, c, and d are ~ ((p & Q) 丨 (~ p & ~ q)) and ~ ((p & ~ q) | (~ p & h)). In Figure 3, the CLA adders CLA1 and CLA2 get the final product or multiply-accumulate result based on the parameters pi ~ p36 and gl ~ g36 output by the CSA array CSA1, CSA2 and the sum (carry). . Among them, Pn = sum η 1 carry η (η = 1 ~ 36) > gn = sum η | carry η (η = 1 ~ 36) 'sum η is the ηth bit of the sum, and carry η is the digit of the overflow bit η bit. In addition, the CLA adder CL A1 and CLA2 have three different operations, which are represented by white circles, gray circles, and black circles, respectively. The operation system represented by the white circle is input (gn, pn) and output (gout, pout): (gn, pn). The operation system represented by the gray circle is input (gn, pn), (gn-i 'pn 丨) and output (gout, P0Ut) = (gn I (gH & Pn), Pw & Pn). The operations represented by black circles are input (gn, Pn), (gnd, Ph), (gn_2, Pn_2) and output (g0Ut, POUtWgnKgh & pJKgH & Pn & pH) ‘Pn & pni & pn2). In addition, the accumulators AC1 and AC2 temporarily store the multiplication or multiply-accumulate results obtained by the CLA adders CLA1 and CLA2 (that is, the 32-bit operand (b_B [0: 31]), the first half of which is 16 bits (B_Β [0: 15]) and 16 bits selected by multiplexer M1, or the latter half of 16 bits (B_B [0: 15]) and 32 selected by 32-bit operands (B_B [0: 31]) and 16 selected by multiplexer M2 (By multiplying or accumulating the result by multiplication), and optionally (when multiply accumulating), return the result to the CSA array CSA1, CSA2 for accumulation. In this example, in order to store the multiplication and accumulation results with overflow, the lengths of the accumulators AC1 and AC2 are slightly larger than the lengths of the adders ADI and AD2, such as 40 bits in this embodiment. In addition, the selectors SI and S2 select the first half of the adders ADI and AD2, respectively.

第15頁 410308 五、發明說明（13) 部16位元或後半部16位元、並將其組合成乘法或乘法累加的結果。而輸出控制器C1則在累積器AC 1、AC2之内容出現正溢位（Overflow)或負溢位（Underflow)時，輸出最大正值或最小負值。根據這種架構’在0_35製程、溫度70。(：、電壓3V的標準參數下，所設計得到的乘法及乘法累加處理器，其經合成後之執行周期時間約為21，5ns。另外，在PA-RISC架構下之乘法及乘法累加處理器是依照1 6位元的數位信號處理而定，因此有下列優點： (1 )習知RIS C架構要求速度快、執行周期時間短，因此只具備常用的簡單指令，不適用於需要大量運算的多媒體及數位信號處理應用’因此，本發明在pA_R丨SC架構下提供之乘法及乘法累加處理器，可更加適用於影像及多媒體應用之數位信號處理。 (2)本發明在PA-RISC架構下提供乘法及乘法累加處理器’其每個乘法及乘法累加均只利用兩個執行周期時間以完成’因此’除需要加入部分硬體及控制信號以解碼指令外’原來R I S C的執行周期時間及執行效能並不會受到任何負面的影響。 (3 )在一般數位信號處理的應用中只需要丨6位元的精確度’因此習知PA-RISC架構下以32位元實行乘法及乘法累加’不但會造成硬體的浪費、增加軟體的執行周期時間’更會降低硬體的效能。而本發明在構下提第16頁 410303 五、發明說明（14) 供之乘法及乘法累加處理器’則巧妙地選取兩個32位元運算元的前半部1 6位元及後半部1 6位元作為兩組乘法或乘法累加之運算元、並同時進行兩組16位元之乘法或乘法累加，因此不但不會造成硬體的浪費，更可以縮短執行周期時間，增加硬體的使用效能。綜上所述，本發明在PA-RISC架構下提供之乘法及乘法累加處理器，其可以提供新的乘法及乘法累加指令集，並在不影響原有RISC架構的效能下，更為適用於現今蓬勃發展的多媒體應用。另外，本發明在PA-RISC架構下提供之乘法及乘法累加處理器’其可以在32位元PA-RISC架構下，訂定1 6位元之乘法及乘法累加指令集，使新訂定之指令集更適用於多媒體、信號處理等需要自動處理溢位問題的應用。雖然本發明已以較佳實施例揭露如上’然其並非用以限定本發明’任何熟習此技藝者，在不脫離本發明之精神和範圍内’當可做更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。Page 15 410308 V. Description of the invention (13) The 16-bit part or the 16-bit second half, and combine them into the result of multiplication or accumulation. The output controller C1 outputs the maximum positive value or the minimum negative value when the contents of the accumulators AC1 and AC2 show an overflow (Underflow) or an underflow (Underflow). According to this architecture, the process is in the 0_35 process and the temperature is 70. (: Under the standard parameters of voltage 3V, the designed multiplication and multiply-accumulate processor has a combined execution cycle time of approximately 21.5ns. In addition, the multiplication and multiply-accumulate processor under PA-RISC architecture It is based on 16-bit digital signal processing, so it has the following advantages: (1) The conventional RIS C architecture requires fast speed and short execution cycle time, so it only has commonly used simple instructions and is not suitable for those that require a large number of operations. Multimedia and digital signal processing applications' Therefore, the multiplication and multiply-accumulate processors provided by the present invention under the pA_R 丨 SC architecture can be more suitable for digital signal processing of imaging and multimedia applications. (2) The present invention is under the PA-RISC architecture Provide multiplication and multiply-accumulate processors 'Each multiplication and multiply-accumulate uses only two execution cycle times to complete' so 'except for the need to add some hardware and control signals to decode instructions' the original RISC execution cycle time and execution Performance is not affected by any negative effects. (3) In general digital signal processing applications, only 6-bit accuracy is required. It is known that performing multiplication and multiply accumulating with 32 bits under PA-RISC architecture 'will not only cause waste of hardware, increase the execution cycle time of software', but also reduce the performance of the hardware. 410303 V. Description of the invention (14) The supplied multiplication and multiply-accumulate processor 'cleverly selects the first half of 16 bits and the second half of 16 bits of two 32-bit operands as the two sets of multiplication or accumulation. Operates and performs two sets of 16-bit multiplication or accumulation at the same time, so not only will not cause hardware waste, but also shorten the execution cycle time and increase the use efficiency of the hardware. In summary, the present invention -The multiplication and multiply-accumulate processor provided under the RISC architecture can provide a new multiplication and multiply-accumulate instruction set, and is more suitable for today's booming multimedia applications without affecting the performance of the original RISC architecture. In addition, The multiplication and multiply-accumulate processor provided by the present invention under the PA-RISC architecture 'can set a 16-bit multiplication and multiply-accumulate instruction set under the 32-bit PA-RISC architecture, so that the newly set The order set is more suitable for applications that require automatic processing of overflow issues, such as multimedia and signal processing. Although the present invention has been disclosed above in a preferred embodiment, it is not intended to limit the present invention to anyone skilled in the art without departing from this art. Within the spirit and scope of the invention, 'can be modified and retouched. Therefore, the scope of protection of the present invention shall be determined by the scope of the attached patent application.

第17頁Page 17

Claims

410308410308

六、申請專利範圍 1. 一種ΡΑ-RISC架構下的乘法及乘法累加處理以處理一第一運算元及一第二運算元的乘法算，包括：水忐累加運 -第-多工器及-第二多工器’分別選擇該第元之前半位元及該第二運算元之後半位元；一第一加法器及一第二加法器，以部分積方式分別理該第一運算兀之前半位元與該第一多工器選擇乘法運算及處理該第一運算元之後半位元與該第二選擇之位元的乘法運算，藉以分別得到一具有進位之器積； -第-累積器及-第二累積器，分別存放該第一加法器及該第二加法器之乘積，並選擇性地將之送回該第一加法器及該第一加法器以進行累加；以及一第一選擇器及一第二選擇器，分別選擇該第一累積器之前半位元或後半位元，及選擇該第二累積器之前半位元或後半位元，藉以組合成所欲得到之乘積或乘法累加值〇 2.如申請專利範圍第1項所述PA_RISC架構下的乘法及乘法累加處理器，更包括一輸入緩衝區，用以暫存該一運算元及該第二運算元。 3.如申請專利範圍第1項所述PA_RISC架構下的乘法及乘法累加處理器’其中，該第一加法器具有： ' 一 Booth編碼器，根據該第一運算元之前半位元處理該第一多工器選擇之位元，藉以得到複數個部分積；以及6. Scope of Patent Application 1. A multiplication and multiply-accumulate process under the PA-RISC architecture to process a multiplication of a first operand and a second operand, including: Hydroid Accumulation-The -Multiplexer and- The second multiplexer 'selects the first half-bit and the second half-bit of the second operand, respectively; a first adder and a second adder respectively process the first operation in a partial product manner. The first half bit and the first multiplexer select a multiplication operation and process the multiplication operation between the second half bit and the second selected bit after the first operand, thereby respectively obtaining a product with a carry;-the first-accumulation And a second accumulator, respectively, storing the products of the first adder and the second adder, and selectively sending them back to the first adder and the first adder for accumulation; and a first A selector and a second selector, respectively, select the first half or the last half of the first accumulator, and select the first half or the last half of the second accumulator, thereby combining to form the desired product Or multiply accumulated value 02. if applied The multiplication and multiply-accumulate processor under the PA_RISC architecture described in the first item of the patent scope further includes an input buffer for temporarily storing the one operand and the second operand. 3. The multiplication and multiply-accumulate processor under the PA_RISC architecture described in item 1 of the scope of the patent application, wherein the first adder has: 'a Booth encoder that processes the first multiplier according to the first half of the first operand. A bit selected by a multiplexer to obtain a plurality of partial products; and

第18頁 410308Page 18 410308

一CSA陣列，相加該些部分積以得到一總和及— 位；及通六、申請專利範圍一CLA加法器，相加該總和及該溢位以乘積或乘法累加值。付巧欲传到之 4审如:請專利範圍第j項所述pA_RISC 乘法累加處理器，其中，該第二加法器具有：的“及 — Booth編碼器，根據該第二運算元之該第二多二器選擇之位元，ϋ以得到複數個部分積處理位；^ Α % ’相加該些部分積以得到-總和及一溢一 Α加法器’相加該總和及該乘積或乘法累加值。 τ ~欲得到之 5. 如申請專利蘇jf]笛，= ^ ^ 祀圍第1項所述PA-RISC架構下的悉土 β 乘法累加處理器，兑中，兮铉β 矿铒卜的乘法及 ,ώ ^ ^ ^ ^ 、Τ 該第一累積器及該第二累葙恶夕長度係略大於該第一加法哭1 α 系槓Is之 6. 如申請專利範圍= 及該第二加法器之長度。 ^ β ^ 靶圍第1項所述PA-RISC架構下的桊半jg 乘法累加處理器，更包括一认，l f的乘法及輸出控制器’用以輪中兮笛一選擇器及該第二選擇器組ώ辄出該第一盆g藉3m?: 成之乘積或乘積累加值，並在該第一累積is發生正溢位壶鱼、斗，^ ^ 最小負值。益位次負溢位時，冑出-最大正值或一如I吻專利範圍第1項所述pa-risc架構下的乘法及乘法累加處理器，豆中，兮坌 ο 卜的采法及以32位元表示。这第-運算元及該第二運算元係 8·如申請專利範圍第！項所述pA Risc架構下的乘法及A CSA array, adding the partial products to obtain a sum and -bit; and 6. A patent application scope A CLA adder that adds the sum and the overflow to multiply or accumulate the value. The 4th trial of Fu Qiaoyu is as follows: Please refer to the pA_RISC multiply and accumulate processor described in item j of the patent scope, wherein the second adder has: "and- Booth encoder, according to the second operand of the first The bits selected by the two-to-two divisor are used to obtain a plurality of partial product processing bits; ^ Α% 'adds the partial products to obtain a -total and a overflow-A adder' to add the total and the product or multiplication Cumulative value. Τ ~ desired. 5. As applied for patent Su jf] flute, = ^ ^ Si Tu β multiplication and accumulation processor under the PA-RISC architecture described in the first item of the perimeter. The multiplication and the value of Bu, ^ ^ ^ ^, T The length of the first accumulator and the second accumulative evil night are slightly larger than the first addition cry 1 α bar 6. Is the scope of the patent application = and the first The length of the two adders. ^ Β ^ The target half-jg multiply-accumulate processor under the PA-RISC architecture described in item 1 above, including a recognition, multiplication and output controller of lf, The selector and the second selector group rent out the first pot to borrow 3m ?: the product of success or the product of product accumulation, and Cumulative is is a positive overflow of the fish and bucket, and the minimum negative value is ^. When the negative position is the negative overflow, the maximum positive value or the multiplication under the pa-risc architecture as described in item 1 of the patent scope And multiply-accumulate processor, Douzhong, Xi 坌 ο Bu mining method and expressed in 32 bits. The first and second operands are 8 · As described in the patent application scope! Item pA Risc architecture Multiplication and

/、'申請專利範圍乘法累加處理器，其中，與第— CLA加法器具有相同的運算架構。加法器或第二加法器中的- ^ —種PA-RISC架構下的乘一運算元及一第二運算元的乘法 ^ —分割裝置，分別將該第一成前半位元及後半位元；法處理器’用以處理一第 ’包括：運异元及該第二運算元分二運算”別將該第一運算元的前半位元及該第運之:相乘，及，將該第-之另-相乘，n以“半位元或後半位元元以细f 3裝s，分別選擇該二乘積之前半位元或後半位，，-«得到該第一運算元及該第二運算元的乘積。〇.如申請專利範圍第9項所述的乘法處理器，直中，該分割裝置是以多工器完成。 /、 11’如申請專利範圍第9項所述的乘法處理器，其中，該選擇裝置是以多工器完成。、/ 、 'Scope of patent application The multiply-accumulate processor has the same operation architecture as the first-CLA adder. -^ In the adder or the second adder under the PA-RISC architecture, a multiplication by an operand and a second operand by a division ^, which divides the first into the first half bit and the second half bit respectively; The law processor 'for processing a first' includes: operation difference and the second operand are divided into two operations. "Do not multiply the first half of the first operand and the first operation: and, -Otherwise-multiply, n with "half-bit or last half-bit with fine f 3 loaded s, respectively select the first half or the last half of the two product,-" to get the first operand and the first Product of two operands. 〇. According to the multiplication processor described in item 9 of the scope of the patent application, the division device is completed by a multiplexer. /, 11 'The multiplication processor according to item 9 in the scope of patent application, wherein the selection device is implemented by a multiplexer. ,