TWI529614B

TWI529614B - Serial multiply accumulator for galois field

Info

Publication number: TWI529614B
Application number: TW103111505A
Authority: TW
Inventors: 洪瑞徽; 顏池男
Original assignee: 衡宇科技股份有限公司
Priority date: 2014-03-27
Filing date: 2014-03-27
Publication date: 2016-04-11
Also published as: TW201537457A

Description

用於伽羅瓦場的串列乘積累加器 Serial multiplier accumulator for Galois field

本發明關於一種用於伽羅瓦場的串列乘積累加器。特別是關於能用於伽羅瓦場，計算二個乘法與加法的一種串列乘積累加器。 The present invention relates to a serial multiply accumulator for a Galois field. In particular, a serial multiply accumulator that can be used in a Galois field to calculate two multiplications and additions.

有限域在數位通訊系統中，如加密方案和糾錯碼的應用，扮演著重要的角色。相較於自然數系統，有限域具有許多特性，以至於重要運算、有限域加法及有限域乘法，總是需要以個別不同的硬體來實現。因為有限域加法能直接用互斥或閘，以低硬體及時間複雜度來實現，瓶頸部分一直存在於有限域乘法器中。 Finite fields play an important role in digital communication systems, such as encryption schemes and error correction codes. Compared to natural number systems, finite fields have many characteristics, so that important operations, finite field additions, and finite field multiplications always need to be implemented in different hardware. Because finite field addition can be implemented directly with mutex or gate, with low hardware and time complexity, the bottleneck portion always exists in the finite field multiplier.

有三種類型的有限域乘法器的硬體架構：串列式、全並列式及部分並列式。串列式架構具有最低硬體複雜度，但需要數個時脈來進行乘法運算。然而，因為周邊硬體的運算速度已提升許多，且不是所有的乘法運算都需要非常大的迭代計算步驟，串列式架構仍然活躍於許多應用之中。 There are three types of finite field multiplier hardware architectures: tandem, full side by side, and partial side by side. A tandem architecture has the lowest hardware complexity, but requires several clocks for multiplication. However, because the speed of peripheral hardware has increased a lot, and not all multiplication operations require non- Often the iterative computational steps, the tandem architecture is still active in many applications.

在某些應用中，一乘積累加器的主要運算為結合數個有限域加法與乘法，如E=AxB+CxD，其中A、B、C、D與E是有限域的元素集合。更詳細地來說，A包含m個元素：a₀、a₁、a₂...及a_m-1。相似地，B包含b₀、b₁、b₂...與b_m-1。C包含c₀、c₁、c₂...與c_m-1。D包含d₀、d₁、d₂...與d_m-1。E包含e₀、e₁、e₂...與e_m-1。在這種情況下，如第1圖所示，傳統上需要進行二個有限域乘法與一個有限域加法。圖示於左方的一有限域乘法器處理AxB，右方的另一個進行CxD。很顯然每一乘法器有m-1個A區與一個B區。A區與B區皆具有一及閘、一互斥或閘及一暫存器。唯一的差別在於B區並不接收由本身反饋的資料。虛線箭號的連接由GF(2^m)的本原多項式所界定。也存在著m個互斥或閘，該些互斥或閘形成為一有限域加法器以進行AxB+CxD運算。 In some applications, the primary operation of a multiply accumulator is to combine several finite field additions and multiplications, such as E = AxB + CxD, where A, B, C, D, and E are sets of elements of a finite field. In more detail, A contains m elements: a ₀ , a ₁ , a ₂ ... and a _m-1 . Similarly, B contains b ₀ , b ₁ , b ₂ ... and b _m-1 . C contains c ₀ , c ₁ , c ₂ ... and c _m-1 . D contains d ₀ , d ₁ , d ₂ ... and d _m-1 . E contains e ₀ , e ₁ , e ₂ ... and e _m-1 . In this case, as shown in Fig. 1, two finite field multiplications and one finite field addition are conventionally required. The finite field multiplier shown on the left handles AxB, and the other on the right performs CxD. It is obvious that each multiplier has m-1 A areas and one B area. Both Zone A and Zone B have a gate, a mutual exclusion or gate and a register. The only difference is that Zone B does not receive data that is fed back by itself. The connection of the dashed arrows is defined by the primitive polynomial of GF(2 ^m ). There are also m mutually exclusive or gates formed as a finite field adder for the AxB+CxD operation.

在這設計中，面積成本為二個有限域乘法器及一個有限域加法器。經過計算，該乘積累加器包含2m個及閘、3m個互斥或閘與2m個暫存器。這種設計的關鍵路徑為乘法器與互斥或閘。美國專利第7,082,452號，“伽羅瓦場乘法/乘法-加法乘積累加”，提供一種並列架構，以達成相同運算之快速計算目的。然而，‘452的硬體複雜度太高，以致不能用於某些面積因素考量的設計裏。 In this design, the area cost is two finite field multipliers. And a finite field adder. After calculation, the multiply accumulator includes 2m gates, 3m mutually exclusive gates and 2m registers. The key path to this design is the multiplier and the mutex or gate. U.S. Patent No. 7,082,452, "Galova Field Multiplication/Multiplication-Addition Multiply Accumulation Plus" provides a parallel architecture for the fast calculation of the same operation. However, the hardware complexity of '452 is too high to be used in designs that take into account certain area factors.

在此，本發明揭露一種用於乘積累加器的串列架構，其具有較低的硬體複雜度，但能達成如第1圖所示之傳統乘積累加器相似的效能。也就是說，相較於傳統的乘積累加器，可用較少的原件，如互斥或閘與暫存器，來達成相同的運算結果。因此，本發明具有較低面積成本的優勢。 Here, the present invention discloses a tandem rack for multiplying accumulators Structure, which has a lower hardware complexity, but can achieve similar performance as the conventional multiply accumulator shown in Figure 1. That is to say, compared to the conventional multiply accumulator, fewer originals, such as mutex or gate and register, can be used to achieve the same operation result. Therefore, the present invention has the advantage of lower area cost.

如上所述，對於傳統結合有限域乘法與加法的乘積累加器，仍有使用較少元件以節省使用面積的方式存在。 As described above, for a conventional multiply accumulator that combines finite field multiplication and addition, there are still ways to use less components to save space.

依照本發明的一個態樣，一種串列乘積累加器，用以執行伽羅瓦場之二乘法運算及一加法運算，包含：一第一元素饋送電路，用以依次地於每個時脈中，輸出伽羅瓦場中的第一元素；一第二元素饋送電路，用以依次地於每個時脈中，輸出伽羅瓦場中的第二元素；複數個第一計算電路，由上游至下游依次相連，每一第一計算電路於每個時脈中，接收該第一元素、該第二元素、一第三元素與一第四元素，自一上游連接的第一計算電路接收一運算資料，選擇性地接收一回饋資料，將該第一元素與該第三元素相乘及該第二元素與該第四元素相乘而產生二乘積，並向下游輸出另一運算資料，該輸出的運算資料由一乘積與另一乘積相加、該二乘積與該接收的運算資料相加、該二乘積與該回饋資料相加，或該二乘積與該接收的運算資料及該回饋資料相加所獲得，其中佈設在最上游的第一計算電路不接收由其它第一計算電路傳來的運算資料；及一第二計算電路，連接到佈設在最下游的第一計算電路，用以接收該第一元素、該第二元素、一第三元素與一第四元素每個時脈中，自連接的第一計算電路接收該輸出的運算資料，將該第一元素與該第三元素相乘及該第二元素與該第四元素相乘而產生二乘積，並輸出回饋資料，該回饋資料由一乘積與另一乘積相加或該二乘積與該接收的運算資料相加所獲得；其中該第一元素、第二元素、第三元素與第四元素具有相同的數量，提供至該第一計算電路之一或該第二計算電路的第三元素，不同於提供至其它第一計算電路的第三元素，提供至該第一計算電路之一或該第二計算電路的第四元素，不同於提供至其它第一計算電路的第三元素。 According to an aspect of the present invention, a serial multiply accumulator is provided for performing a two-multiplication operation and an addition operation of a Galois field, comprising: a first element feeding circuit for sequentially in each clock, Outputting a first element in the Galois field; a second element feeding circuit for sequentially outputting a second element in the Galois field in each clock; a plurality of first calculating circuits, from upstream to downstream Connected, each first computing circuit receives the first element, the second element, a third element, and a fourth element in each clock, and receives an operation data from an upstream computing circuit. Selectively receiving a feedback data, multiplying the first element by the third element, multiplying the second element by the fourth element to generate a product of two, and outputting another operation data downstream, the operation of the output The data is added by a product and another product, and the two products are added to the received operational data, and the two products are associated with the feedback data. Adding, or adding the second product to the received operational data and the feedback data, wherein the first computing circuit disposed at the most upstream does not receive the operational data transmitted by the other first computing circuits; and a second calculation a circuit, connected to the first calculation circuit disposed at the most downstream, for receiving the first element, the second element, a third element, and a fourth element in each clock, and receiving the first calculation circuit from the connection The output operation data, the first element is multiplied by the third element and the second element is multiplied by the fourth element to generate a product of two, and the feedback data is output, the feedback data is a product and another product Adding or adding the two products to the received operational data; wherein the first element, the second element, the third element, and the fourth element have the same quantity, and are provided to one of the first computing circuits or The third element of the second calculation circuit, different from the third element provided to the other first calculation circuit, is provided to one of the first calculation circuit or the fourth element of the second calculation circuit, different from the other Calculating a third circuit element.

此外，該串列乘積累加器可進一步包含：一第三元素饋送電路，連接到每一第一計算電路與該第二計算電路，用以提供該些電路一特定的第三元素；及一第四元素饋送電路，連接到每一第一計算電路與該第二計算電路，用以提供該些電路一特定的第四元素。 In addition, the serial multiply accumulator further includes: a third element feeding circuit connected to each of the first calculating circuit and the second calculating circuit for providing a specific third element of the circuits; and a fourth element feeding circuit connected to each of the first calculating circuits And the second computing circuit is configured to provide a specific fourth element of the circuits.

依照本案構想，一多項式的係數依次地對應該第一計算電路，常數項係數對應最上游的第一計算電路。如果與該第一計算電路對應的係數不是零，該回饋資料提供給該第一計算電路。如果該第一計算電路與第二計算電路的總數量，等於或小於該多項式的最高次方，至少一該多項式的較高次方係數不會對應到一第一計算電路。 According to the present concept, the coefficients of a polynomial sequentially correspond to the first calculation circuit, and the constant term coefficients correspond to the most upstream first calculation circuit. If the coefficient corresponding to the first calculation circuit is not zero, the feedback data is provided Give the first calculation circuit. If the total number of the first calculation circuit and the second calculation circuit is equal to or less than the highest power of the polynomial, at least one higher power coefficient of the polynomial does not correspond to a first calculation circuit.

又該第一計算電路可進一步包含：一第一及閘，用以將該第一元素與第三元素相乘；一第二及閘，用以將該第二元素與第四元素相乘；一第一互斥或閘，用以將一乘積與另一乘積相加，該二乘積與該接收的運算資料相加，該二乘積與該回饋資料相加，或該二乘積與該接收的運算資料及該回饋資料相加；及一第一暫存器，用以於一時脈中，暫時儲存由該第一互斥或閘傳來的運算資料。 The first computing circuit may further include: a first gate for multiplying the first element by the third element; and a second gate for multiplying the second element by the fourth element; a first mutex or gate for adding a product to another product, the diploid product being added to the received operational data, the diploid product being added to the feedback data, or the diploid product and the received product The operation data and the feedback data are added; and a first register is used for temporarily storing the operation data transmitted by the first mutual exclusion or gate in a clock.

該第二計算電路進一步包含：一第三及閘，用以將該第一元素與第三元素相乘；一第四及閘用，以將該第二元素與第四元素相乘；一第二互斥或閘，用以將一乘積與另一乘積相加或該二乘積與該接收的運算資料相加；及一第二暫存器，用以於一時脈中，暫時儲存由該第一互斥或閘傳來的運算資料。 The second calculation circuit further includes: a third sum gate for multiplying the first element by the third element; and a fourth sum gate for multiplying the second element by the fourth element; a second exclusive or gate for adding a product to another product or adding the second product to the received operational data; and a second temporary register for temporarily storing the first time in the clock A mutually exclusive or gated operational data.

藉由重新佈設電路設計，可以節省許多傳統乘積累加器中使用的元素，如互斥或閘與暫存器。本發明因而可以具有較低面積成本的優點。 By redeploying the circuit design, many elements used in conventional multiply accumulators, such as mutex or gate and scratchpad, can be saved. The invention thus can have the advantage of lower area cost.

10‧‧‧乘積累加器 10‧‧‧multiply accumulator

120‧‧‧第二元素饋送電路 120‧‧‧Second element feeding circuit

130‧‧‧上游第一計算電路 130‧‧‧Upstream first computing circuit

1301‧‧‧第一及閘 1301‧‧‧First Gate

1302‧‧‧第二及閘 1302‧‧‧Second Gate

1303‧‧‧第一互斥或閘 1303‧‧‧First mutual exclusion or gate

1304‧‧‧第一暫存器 1304‧‧‧First register

140‧‧‧下游第一計算電路 140‧‧‧ downstream first computing circuit

1401‧‧‧第一及閘 1401‧‧‧First Gate

1402‧‧‧第二及閘 1402‧‧‧Second Gate

1403‧‧‧第一互斥或閘 1403‧‧‧First mutual exclusion or gate

1404‧‧‧第一暫存器 1404‧‧‧First register

150‧‧‧第二計算電路 150‧‧‧Second calculation circuit

1501‧‧‧第三及閘 1501‧‧‧third gate

1502‧‧‧第四及閘 1502‧‧‧fourth gate

1503‧‧‧第二互斥或閘 1503‧‧‧Second exclusive or gate

1504‧‧‧第二暫存器 1504‧‧‧Second register

20‧‧‧乘積累加器 20‧‧‧Accumulation accumulator

210‧‧‧第一元素饋送電路 210‧‧‧first element feeding circuit

220‧‧‧第二元素饋送電路 220‧‧‧Second element feeding circuit

240‧‧‧第一計算電路 240‧‧‧First calculation circuit

250‧‧‧第二計算電路 250‧‧‧Second calculation circuit

第1圖顯示一傳統的乘積累加器。 Figure 1 shows a conventional multiply accumulator.

第2圖為依照本發明之用於運算二個有限域乘法及一個有限域加法的一乘積累加器。 Figure 2 is a multiply accumulator for computing two finite field multiplications and one finite field addition in accordance with the present invention.

第3圖為依照本發明之用於運算二個有限域乘法及一個有限域加法的另一乘積累加器。 Figure 3 is another multiply accumulator for computing two finite field multiplications and one finite field addition in accordance with the present invention.

本發明將藉由參照下列的實施例而更具體地描述。 The invention will be more specifically described by reference to the following examples.

請參閱第2圖，該圖說明依照本發明的一個實施例。一乘積累加器10能進行伽羅瓦場GF(2³)的二個有限域乘法與一個有限域加法運算。乘積累加器10包含一第一元素饋送電路110、一第二元素饋送電路120、一上游第一計算電路130、一下游第一計算電路140與一第二計算電路150。上游第一計算電路130與下游第一計算電路140具有相同的結構及相似的功能。為了便於理解，在此定義一方向；上游側位於圖左方，下游側位於圖右方。這也是左方之第一計算電路命名為“上游”第一計算電路130而另一者命名為“下游”第一計算電路140的原因。 Please refer to Figure 2, which illustrates an embodiment in accordance with the present invention. The multiplier accumulator 10 is capable of performing two finite field multiplications of the Galois field GF(2 ³ ) and a finite field addition operation. The multiplier accumulator 10 includes a first element feeding circuit 110, a second element feeding circuit 120, an upstream first calculating circuit 130, a downstream first calculating circuit 140 and a second calculating circuit 150. The upstream first computing circuit 130 has the same structure and similar functions as the downstream first computing circuit 140. For ease of understanding, a direction is defined herein; the upstream side is located on the left side of the figure, and the downstream side is located on the right side of the figure. This is also the reason why the first computing circuit on the left is named "upstream" first calculation circuit 130 and the other is named "downstream" first calculation circuit 140.

在此實施例中，乘積累加器10可運算E=AxB+CxD，其中A、B、C、D與E為GF(2³)中的元素集合。A為一第一元素的集合且包含a₀、a₁與a₂。B為一第三元素的集合且包含b₀、b₁與b₂。C為一第二元素的集合且包含c₀、c₁與c₂。D為一第四元素且包含d₀、d₁與d₂。E為運算的結果並包含e₀、e₁與e₂。需要注意的是每個集合中的元素數目(3)都要相等。然而，依照本發明，該數字不限於3個。任何數字，如64與128，都適用。 In this embodiment, multiply accumulator 10 can operate E = AxB + CxD, where A, B, C, D, and E are sets of elements in GF( ²³ ). A is a collection of first elements and contains a ₀ , a ₁ and a ₂ . B is a collection of third elements and includes b ₀ , b ₁ and b ₂ . C is a collection of second elements and includes c ₀ , c ₁ and c ₂ . D is a fourth element and contains d ₀ , d ₁ and d ₂ . E is the result of the operation and contains e ₀ , e ₁ and e ₂ . It should be noted that the number of elements (3) in each set must be equal. However, according to the present invention, the number is not limited to three. Any number, such as 64 and 128, applies.

第一元素饋送電路110被用來於每個時脈中，依次地輸出GF(2³)中的第一元素。順序為a₂提供於第一時脈，a₁接著提供於第二時脈，a₃提供於第三(最後)時脈。相似地，第二元素饋送電路120依次地於每個時脈中，輸出GF(2³)的第二元素：c₂在第一時脈、c₁在第二時脈及c₀在第三時脈。 The first element feeding circuit 110 is used to sequentially output the first element in GF(2 ³ ) in each clock. The sequence a _{2 is} provided in the first clock, a _{1 is} then provided in the second clock, and a _{3 is} provided in the third (last) clock. Similarly, the second element feeding circuit 120 sequentially outputs a second element of GF(2 ³ ) in each clock: c ₂ at the first clock, c ₁ at the second clock, and c ₀ at the third Clock.

該上游第一計算電路130與下游第一計算電路140由上游至下游依次相連，每一者都能於每個時脈中，接收該第一元素、該第二元素、一第三元素與一第四元素。它們也能接收由一上游連接的第一計算電路所傳來的運算資料。然而，上游第一計算電路130以位於最上游，它不會收到前述的運算資料。在加法中，上游第一計算電路130與下游第一計算電路140選擇性地接收一回饋資料。此處，是否一第一計算電路可接收該回饋資料取決於一不可約多項式p(x)。在本實施例中，p(x)=x³+x+1。p(x)的各係數依次對應到各第一計算電路，以常數項係數，1，對應到上游第一計算電路130及x係數對應到下游第一計算電路140。如果某一第一計算電路對應之係數不為零，該回饋資料可被提供該第一計算電路。如此一來，常數項係數是1，上游第一計算電路130接收該回饋資料(繪示於第2圖中的虛線箭號)；x項的係數為1，那麼下游第一計算電路140接收該回饋資料(繪示於第2圖中的虛線箭號)。如果該第一計算電路與第二計算電路的總數量等於或小於p(x)的最高次方，至少一p(x)的較高次方係數不會對應到一第一計算電路。第一計算電路與第二計算電路的總數為3。p(x)的最高項次也是3。雖然x³的係數1，並沒有第一計算電路對應到它。然而，在其它的實施例中，A、B、C或D中元素的相當多，第一計算電路與第二計算電路的數量可能會高過p(x)的最高項次數字。 The upstream first computing circuit 130 and the downstream first computing circuit 140 are sequentially connected from upstream to downstream, each of which can receive the first element, the second element, a third element and one in each clock. The fourth element. They can also receive operational data from a first computational circuit connected upstream. However, the upstream first computing circuit 130 is located at the most upstream, and it does not receive the aforementioned operational data. In the addition, the upstream first calculation circuit 130 and the downstream first calculation circuit 140 selectively receive a feedback material. Here, whether or not a first computing circuit can receive the feedback data depends on an irreducible polynomial p(x). In the present embodiment, p(x) = x ³ + x + 1. Each coefficient of p(x) sequentially corresponds to each of the first calculation circuits, with a constant term coefficient of 1, corresponding to the upstream first calculation circuit 130 and the x coefficient corresponding to the downstream first calculation circuit 140. If the coefficient corresponding to a certain first calculation circuit is not zero, the feedback data may be provided to the first calculation circuit. In this way, the constant term coefficient is 1, the upstream first calculating circuit 130 receives the feedback data (shown in the dotted arrow in FIG. 2); the coefficient of the x term is 1, then the downstream first calculating circuit 140 receives the Feedback data (shown in the dotted line in Figure 2). If the total number of the first calculation circuit and the second calculation circuit is equal to or less than the highest power of p(x), at least one higher power coefficient of p(x) does not correspond to a first calculation circuit. The total number of the first calculation circuit and the second calculation circuit is three. The highest order of p(x) is also 3. Although the coefficient of x ³ is 1, there is no first calculation circuit corresponding to it. However, in other embodiments, there are quite a few elements in A, B, C, or D, and the number of first computing circuits and second computing circuits may be higher than the highest order number of p(x).

藉由將該第一元素與該第三元素相乘及該第二元素與該第四元素相乘，該上游第一計算電路130與下游第一計算電路140產生二個乘積。在乘法完成後，該些計算電路可在不同的情形下，將一乘積與另一乘積相加、該二乘積與接收的運算資料相加、該二乘積與回饋資料相加，或該二乘積與接收的運算資料及回饋資料相加以產生一運算資料。細節部分將描述於下文中。應注意的是該上游第一計算電路130不接收由其它第一計算電路傳來的運算資料，但下游第一計算電路140能接收由上游第一計算電路130傳來的運算資料。在下一個時脈中，上游第一計算電路130與下游第一計算電路140將各自輸出一個運算資料。 The upstream first calculation circuit 130 and the downstream first calculation circuit 140 generate two products by multiplying the first element by the third element and the second element by the fourth element. After the multiplication is completed, the computing circuits may add a product to another product in different situations, add the two products to the received operational data, add the two products to the feedback data, or the two products. And adding the calculated operation data and the feedback data to generate an operation data. The details will be described below. It should be noted that the upstream first computing circuit 130 does not receive the operational data transmitted by the other first computing circuits, but the downstream first computing circuit 140 can receive the operational data transmitted by the upstream first computing circuit 130. In the next clock, the upstream first calculation circuit 130 and the downstream first calculation circuit 140 will each output an operational data.

第二計算電路150連接到下游第一計算電路 140。它被用來於每個時脈中接收該第一元素、該第二元素、一第三元素與一第四元素。它也能接收由下游第一計算電路140傳來的一運算資料。將第一元素與第三元素相乘及第二元素與第四元素相乘是第二計算電路150產生二個乘積的步驟。第二計算電路150將一乘積與另一乘積相加，或將該二乘積與接收的運算資料相加，以形成該回饋資料。在下一個時脈中，第二計算電路150輸出該回饋資料。要強調的是提供給上游第一計算電路130、下游第一計算電路140或第二計算電路150的第三元素，不同於提供給另一第一計算電路的第三元素，提供給上游第一計算電路130、下游第一計算電路140或第二計算電路150的第四元素，不同於提供給另一第一計算電路的第四元素。如第2圖所示，在每個時脈中，b₀與d₀輸入至上游第一計算電路130，b₁與d₁輸入至下游第一計算電路140，b₂與d₂輸入至第二計算電路150。 The second calculation circuit 150 is connected to the downstream first calculation circuit 140. It is used to receive the first element, the second element, a third element and a fourth element in each clock. It can also receive an operational data transmitted by the downstream first computing circuit 140. Multiplying the first element by the third element and multiplying the second element by the fourth element is a step in which the second calculation circuit 150 produces two products. The second calculation circuit 150 adds a product to another product, or adds the two products to the received operational data to form the feedback material. In the next clock, the second calculation circuit 150 outputs the feedback data. It is emphasized that the third element provided to the upstream first calculation circuit 130, the downstream first calculation circuit 140 or the second calculation circuit 150, different from the third element provided to the other first calculation circuit, is provided to the upstream first The fourth element of the calculation circuit 130, the downstream first calculation circuit 140 or the second calculation circuit 150 is different from the fourth element provided to the other first calculation circuit. As shown in FIG. 2, in each clock, b ₀ and d _{0 are} input to the upstream first calculation circuit 130, and b ₁ and d _{1 are} input to the downstream first calculation circuit 140, and b ₂ and d _{2 are} input to the first Second, the calculation circuit 150.

上游第一計算電路130具有一第一及閘1301、一第二及閘1302、一第一互斥或閘1303與一第一暫存器1304。第一及閘1301將第一元素與第三元素相乘。第二及閘1302將第二元素與第四元素相乘。該第一互斥或閘1303於第一時脈中，將一乘積與另一乘積相加，並於第二時脈與其後的時脈中，將二乘積與該回饋資料相加。第一暫存器1304能於一時脈中，暫時儲存由第一互斥或閘1303傳來的運算資料。 The upstream first computing circuit 130 has a first sum gate 1301, a second sum gate 1302, a first mutex or gate 1303, and a first register 1304. The first sum gate 1301 multiplies the first element by the third element. The second sum gate 1302 multiplies the second element by the fourth element. The first mutex or gate 1303 adds a product to another product in the first clock, and adds the diploid product to the feedback data in the second clock and subsequent clocks. The first register 1304 can temporarily store the operational data transmitted by the first mutex or gate 1303 in a clock.

下游第一計算電路140具有一第一及閘1401、一第二及閘1402、一第一互斥或閘1403與一第一暫存器13404。如同該上游第一計算電路130，第一及閘1401將第一元素與第三元素相乘。第二及閘1402將第二元素與第四元素相乘。然而，不同的是第一互斥或閘1403於第一時脈中，將一乘積與另一乘積相加，並於第二時脈與其後的時脈中，將二乘積與接收的運算資料及回饋資料相加。第一暫存器1404能於一時脈中，暫時儲存由第一互斥或閘1403傳來的運算資料。 The downstream first computing circuit 140 has a first sum gate 1401, a second sum gate 1402, a first mutex or gate 1403, and a first register 13404. Like the upstream first computing circuit 130, the first AND gate 1401 multiplies the first element by the third element. The second sum gate 1402 multiplies the second element by the fourth element. However, the difference is that the first mutex or gate 1403 is added to the other product in the first clock, and the second product and the received operation data are added to the second clock and the subsequent clock. And the feedback data is added. The first register 1404 can temporarily store the operational data transmitted by the first mutex or gate 1403 in a clock.

第二計算電路150具有一第三及閘1501、一第四及閘1502、一第二互斥或閘1503與一第二暫存器1504。第三及閘1501將第一元素與第三元素相乘。第四及閘1502將第二元素與第四元素相乘。第二互斥或閘1503能於第一時脈中，將一乘積與另一乘積相加，而於第二時脈與其後的時脈中，將該二乘積與接收到的運算資料相加。第二暫存器1504能於一時脈中，暫時儲存由第二互斥或閘1503傳來的運算資料。 The second calculating circuit 150 has a third sum gate 1501, a fourth sum gate 1502, a second mutex or gate 1503 and a second register 1504. The third sum gate 1501 multiplies the first element by the third element. The fourth sum gate 1502 multiplies the second element by the fourth element. The second mutex or gate 1503 can add a product to another product in the first clock, and add the second product to the received operation data in the second clock and subsequent clocks. . The second register 1504 can temporarily store the operation data transmitted by the second mutex or gate 1503 in a clock.

雖然未繪示於第2圖中，乘積累加器10能進一步包含一第三元素饋送電路與一第四元素饋送電路。該第三元素饋送電路連接到上游第一計算電路130、下游第一計算電路140與第二計算電路150，提供該特定的第三元素。相似地，第四元素饋送電路也如同第三元素饋送電路般連接到相同的電路，以提供該特定的第四元素。當然，第三元素饋送電路與第四元素饋送電路能合而為一。 Although not shown in FIG. 2, the multiply accumulator 10 can further include a third element feeding circuit and a fourth element feeding circuit. The third element feed circuit is coupled to the upstream first calculation circuit 130, the downstream first calculation circuit 140, and the second calculation circuit 150 to provide the particular third element. Similarly, the fourth element feeding circuit is also connected to the same circuit as the third element feeding circuit to provide the specific fourth element. Of course, the third element feeding circuit and the fourth element feeding circuit can be combined into one.

運算方式能以多項式的形式呈現，每一多項式代表一元素的集合。從而，元素E(x)等於(A(x)B(x)+C(x)D(x))模除p(x)，其中p(x)為GF(2³)的一三次不可約多項式。p(x)=x³+x+1，設A(x)=(a₂x²+a₁x+a₀)，B(x)=(b₂x²+b₁x+b₀)，C(x)=(c₂x²+c₁x+c₀)，D(x)=(d₂x²+d₁x+d₀)及E(x)=(e₂x²+ec₁x+e₀)。能得到：E(x)=(A(x)B(x)+C(x)D(x))mod p(x)=((a₂b₂+c₂d₂)x⁴+(a₂b₁+a₁b₂+c₂d₁+c₁d₂)x³+(a₂b₀+a₁b₁+a₀b₂+c₂d₀+c₁d₁+c₀d₂)x²+(a₁b₀+a₀b₁+c₁d₀+c₀d₁)x+(a₀b₀+c₀d₀))mod(x³+x+1)=(a₂b₀+a₁b₁+a₀b₂+a₂b₂+c₂d₀+c₁d₁+c₀d₂+c₂d₂)x²+(a₁b₀+a₀b₁+a₂b₁+a₁b₂+a₂b₂+c₁d₀+c₀d₁+c₂d₁+c₁d₂+c₂d₂)x+(a₀b₀+a₂b₁+a₁b₂+c₀d₀+c₂d₁+c₁d₂) The arithmetic method can be presented in the form of a polynomial, each polynomial representing a collection of elements. Thus, the element E(x) is equal to (A(x)B(x)+C(x)D(x)) modulo p(x), where p(x) is one or three times GF(2 ³ ) A polynomial. p(x)=x ³ +x+1, let A(x)=(a ₂ x ² +a ₁ x+a ₀ ), B(x)=(b ₂ x ² +b ₁ x+b ₀ ) , C(x)=(c ₂ x ² +c ₁ x+c ₀ ), D(x)=(d ₂ x ² +d ₁ x+d ₀ ) and E(x)=(e ₂ x ² + Ec ₁ x+e ₀ ). Can get: E(x)=(A(x)B(x)+C(x)D(x)) mod p(x)=((a ₂ b ₂ +c ₂ d ₂ )x ⁴ +(a ₂ b ₁ + a ₁ b ₂ + c ₂ d ₁ + c ₁ d ₂ ) x ³ + (a ₂ b ₀ + a ₁ b ₁ + a ₀ b ₂ + c ₂ d ₀ + c ₁ d ₁ + c ₀ d ₂ )x ² +(a ₁ b ₀ +a ₀ b ₁ +c ₁ d ₀ +c ₀ d ₁ )x+(a ₀ b ₀ +c ₀ d ₀ ))mod(x ³ +x+1)= (a ₂ b ₀ +a ₁ b ₁ +a ₀ b ₂ +a ₂ b ₂ +c ₂ d ₀ +c ₁ d ₁ +c ₀ d ₂ +c ₂ d ₂ )x ² +(a ₁ b ₀ + a ₀ b ₁ +a ₂ b ₁ +a ₁ b ₂ +a ₂ b ₂ +c ₁ d ₀ +c ₀ d ₁ +c ₂ d ₁ +c ₁ d ₂ +c ₂ d ₂ )x+(a ₀ b ₀ + a ₂ b ₁ + a ₁ b ₂ + c ₀ d ₀ + c ₂ d ₁ + c ₁ d ₂ )

詳細的運算一步驟描述於下。 A detailed operation step is described below.

在第一時脈中，a₂與c₂被提供至上游第一計算電路130、下游第一計算電路140與第二計算電路150中。上游第一計算電路130計算一運算資料為a₂b₀+c₂d₀。下游第一計算電路140計算一運算資料為a₂b₁+c₂d₁。第二計算電路150計算一回饋資料為a₂b₂+c₂d₂，其中該回饋資料將於第二時脈中，饋送至上游第一計算電路130與下游第一計算電路。 In the first clock, a ₂ and c ₂ are supplied to the upstream first calculation circuit 130, the downstream first calculation circuit 140, and the second calculation circuit 150. The upstream first calculation circuit 130 calculates an operation data as a ₂ b ₀ + c ₂ d ₀ . The downstream first calculation circuit 140 calculates an operation data as a ₂ b ₁ + c ₂ d ₁ . The second calculation circuit 150 calculates a feedback data as a ₂ b ₂ + c ₂ d ₂ , wherein the feedback data is fed to the upstream first calculation circuit 130 and the downstream first calculation circuit in the second clock.

在第二時脈中，a₁與c₁被提供到上游第一計算電路130、下游第一計算電路140與第二計算電路150。上游第一計算電路130將來自第一及閘1301與第二及閘1302的乘積與該回饋資料相加，計算一更新的運算資料為a₁b₀+c₁d₀+a₂b₂+c₂d₂，並輸出第一時脈中的運算資料。下游第一計算電路140將來自第一及閘1401與第二及閘1402的乘積、來自運算資料第一計算電路130與該回饋資料相加，計算一更新的運算資料為a₁b₁+c₁d₁+a₂b₀+c₂d₀+a₂b₂+c₂d₂，並輸出第一時脈中的運算資料。第二計算電路150將來自第三及閘1501與第四及閘1502的乘積與來自第二計算電路140的運算資料相加，計算一更新的運算資料為a₁b₂+c₁d₂+a₂b₁+c₂d₁並輸出第一時脈中的回饋資料。 In the second clock, a ₁ and c ₁ are supplied to the upstream first calculation circuit 130, the downstream first calculation circuit 140, and the second calculation circuit 150. The upstream first calculating circuit 130 adds the product from the first AND gate 1301 and the second AND gate 1302 to the feedback data, and calculates an updated operation data as a ₁ b ₀ +c ₁ d ₀ +a ₂ b ₂ + c ₂ d ₂ and output the calculation data in the first clock. The downstream first calculating circuit 140 adds the product from the first AND gate 1401 and the second AND gate 1402, and the first calculation circuit 130 from the operation data to the feedback data to calculate an updated operation data as a ₁ b ₁ +c ₁ d ₁ + a ₂ b ₀ + c ₂ d ₀ + a ₂ b ₂ + c ₂ d ₂ , and outputs the operation data in the first clock. The second calculation circuit 150 adds the product from the third AND gate 1501 and the fourth AND gate 1502 to the operation data from the second calculation circuit 140, and calculates an updated operation data as a ₁ b ₂ +c ₁ d ₂ + a ₂ b ₁ +c ₂ d ₁ and output the feedback data in the first clock.

在第三時脈中，a₀與c₀被提供給上游第一計算電路130、下游第一計算電路140與第二計算電路150。上游第一計算電路130將來自第一及閘1301與第二及閘1302的乘積與該回饋資料相加，計算另一更新的運算資料為a₀b₀+c₀d₀+a₁b₂+c₁d₂+a₂b₁+c₂d₁，並輸出第二時脈中的運算資料。a₀b₀+c₀d₀+a₁b₂+c₁d₂+a₂b₁+c₂d₁即是e₀。下游第一計算電路140將來自第一及閘1401與第二及閘1402的乘積、來自運算資料第一計算電路130與該回饋資料相加，計算另一更新的運算資料為a₀b₁+c₀d₁+a₁b₀+c₁d₀+a₂b₂+c₂d₂+a₁b₂+c₁d₂+a₂b₁+c₂d₁，並輸出第二時脈中的運算資料。a₀b₁+c₀d₁+a₁b₀+c₁d₀+a₂b₂+c₂d₂+a₁b₂+c₁d₂+a₂b₁+c₂d₁即是e₁。第二計算電路150將來自第三及閘1501與第四及閘1502的乘積與來自第二計算電路140的運算資料相加，計算另一更新的運算資料為a₀b₂+c₀d₂+a₁b₁+c₁d₁+a₂b₀+c₂d₀+a₂b₂+c₂d₂，並輸出第二時脈中的回饋資料。a₀b₂+c₀d₂+a₁b₁+c₁d₁+a₂b₀+c₂d₀+a₂b₂+c₂d₂為e₂。 In the third clock, a ₀ and c ₀ are supplied to the upstream first calculation circuit 130, the downstream first calculation circuit 140, and the second calculation circuit 150. The upstream first calculating circuit 130 adds the product from the first AND gate 1301 and the second AND gate 1302 to the feedback data, and calculates another updated operation data as a ₀ b ₀ +c ₀ d ₀ +a ₁ b ₂ +c ₁ d ₂ +a ₂ b ₁ +c ₂ d ₁ , and output the calculation data in the second clock. a ₀ b ₀ +c ₀ d ₀ +a ₁ b ₂ +c ₁ d ₂ +a ₂ b ₁ +c ₂ d ₁ is e ₀ . The downstream first calculation circuit 140 adds the product from the first AND gate 1401 and the second AND gate 1402, and the first calculation circuit 130 from the operation data to the feedback data to calculate another updated operation data as a ₀ b ₁ + c ₀ d ₁ + a ₁ b ₀ + c ₁ d ₀ + a ₂ b ₂ + c ₂ d ₂ + a ₁ b ₂ + c ₁ d ₂ + a ₂ b ₁ + c ₂ d ₁ , and output the second time The calculation data in the pulse. a ₀ b ₁ +c ₀ d ₁ +a ₁ b ₀ +c ₁ d ₀ +a ₂ b ₂ +c ₂ d ₂ +a ₁ b ₂ +c ₁ d ₂ +a ₂ b ₁ +c ₂ d ₁ Is e ₁ . The second calculation circuit 150 adds the product from the third AND gate 1501 and the fourth AND gate 1502 to the operation data from the second calculation circuit 140, and calculates another updated operation data as a ₀ b ₂ + c ₀ d ₂ +a ₁ b ₁ +c ₁ d ₁ +a ₂ b ₀ +c ₂ d ₀ +a ₂ b ₂ +c ₂ d ₂ , and output feedback data in the second clock. a ₀ b ₂ +c ₀ d ₂ +a ₁ b ₁ +c ₁ d ₁ +a ₂ b ₀ +c ₂ d ₀ +a ₂ b ₂ +c ₂ d ₂ is e ₂ .

如上所述，一元素集合的數量不限於3個。對於任何數量元素的集合進行運算之詳細描述繪示於第3圖。一乘積累加器20能運算伽羅瓦場的二個有限域乘法有限域加法。乘積累加器20包含一第一元素饋送電路210、一第二元素饋送電路220、m-1個第一計算電路240由上游至下游依次相連及一第二計算電路250。第一元素饋送電路210、第二元素饋送電路220、第一計算電路240與第二計算電路250的功能與結構，各自相同於與第一元素饋送電路110、第二元素饋送電路120、上游第一計算電路130或下游第一計算電路140及第二計算電路250的功能與結構。在此就不加贅述。然而，因為本實施例中的一p(x)的x項係數為零，由虛線包圍的第一計算電路240不接受來自第二計算電路250的回饋資料。加法運算僅包含將一乘積與另一乘積相加，及將該二乘積與該接收的運算資料相加。不僅這第一計算電路240遇到這樣的情況，只要對應的p(x)之係數為零，也可能會有其它第一計算電路240有相同的情形。 As described above, the number of one element sets is not limited to three. A detailed description of the operation of a collection of any number of elements is shown in Figure 3. The multiplier accumulator 20 can operate two finite field multiplication finite field additions of the Galois field. The multiplier accumulator 20 includes a first element feeding circuit 210, a second element feeding circuit 220, and m-1 first calculating circuits 240 connected in series from upstream to downstream and a second calculating circuit 250. The functions and structures of the first element feeding circuit 210, the second element feeding circuit 220, the first calculating circuit 240, and the second calculating circuit 250 are the same as the first element feeding circuit 110, the second element feeding circuit 120, and the upstream The function and structure of a computing circuit 130 or downstream first computing circuit 140 and second computing circuit 250. I will not repeat them here. However, since the x term coefficient of a p(x) in the present embodiment is zero, the first calculation circuit 240 surrounded by the broken line does not accept the feedback material from the second calculation circuit 250. The addition operation only involves adding a product to another product and adding the two product to the received operational data. Not only does this first computing circuit 240 encounter such a situation, as long as the corresponding p(x) coefficient is zero, there may be other first computing circuits 240 having the same situation.

請同時參閱第1圖與第3圖。比較這兩張圖，就能明瞭本發明的優點。如果傳統乘積累加器需要進行如上AxB+CxD的運算，其中A、B、C與D都具有m個元素，那就需要2m個及閘、3m個互斥或閘與2m個暫存器。然而，由本發明提供的乘積累加器設計，僅需2m個及閘、m個互斥或閘與m個暫存器。可節省2m個互斥或閘與m個暫存器但性能相當。 Please also refer to Figures 1 and 3. Comparing these two figures, the advantages of the present invention will be clarified. If the traditional multiply accumulator needs to perform the above operation of AxB+CxD, where A, B, C and D both have m elements, then It is necessary to have 2m gates, 3m mutually exclusive or gates and 2m registers. However, the multiply accumulator design provided by the present invention requires only 2m gates, m mutex or gates and m registers. Can save 2m mutual exclusion or gate with m registers but performance is equivalent.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可作些許之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the invention, and any one of ordinary skill in the art can make some modifications and refinements without departing from the spirit and scope of the invention. The scope of the invention is defined by the scope of the appended claims.

10‧‧‧乘積累加器 10‧‧‧multiply accumulator

120‧‧‧第二元素饋送電路 120‧‧‧Second element feeding circuit

1301‧‧‧第一及閘 1301‧‧‧First Gate

1302‧‧‧第二及閘 1302‧‧‧Second Gate

1303‧‧‧第一互斥或閘 1303‧‧‧First mutual exclusion or gate

1304‧‧‧第一暫存器 1304‧‧‧First register

1401‧‧‧第一及閘 1401‧‧‧First Gate

1402‧‧‧第二及閘 1402‧‧‧Second Gate

1403‧‧‧第一互斥或閘 1403‧‧‧First mutual exclusion or gate

1404‧‧‧第一暫存器 1404‧‧‧First register

150‧‧‧第二計算電路 150‧‧‧Second calculation circuit

1501‧‧‧第三及閘 1501‧‧‧third gate

1502‧‧‧第四及閘 1502‧‧‧fourth gate

1503‧‧‧第二互斥或閘 1503‧‧‧Second exclusive or gate

1504‧‧‧第二暫存器 1504‧‧‧Second register

Claims

一種用於伽羅瓦場的串列乘積累加器，可執行二乘法運算及一加法運算，包含：一第一元素饋送電路，用以依次地於每個時脈中，輸出伽羅瓦場中的第一元素；一第二元素饋送電路，用以依次地於每個時脈中，輸出伽羅瓦場中的第二元素；複數個第一計算電路，由上游至下游依次相連，每一第一計算電路於每個時脈中，接收該第一元素、該第二元素、一第三元素與一第四元素，自一上游連接的第一計算電路接收一運算資料，選擇性地接收一回饋資料，將該第一元素與該第三元素相乘及該第二元素與該第四元素相乘而產生二乘積，並向下游輸出另一運算資料，該輸出的運算資料由一乘積與另一乘積相加、該二乘積與該接收的運算資料相加、該二乘積與該回饋資料相加，或該二乘積與該接收的運算資料及該回饋資料相加所獲得，其中佈設在最上游的第一計算電路不接收由其它第一計算電路傳來的運算資料；及一第二計算電路，連接到佈設在最下游的第一計算電路，用以接收該第一元素、該第二元素、一第三元素與一第四元素每個時脈中，自連接的第一計算電路接收該輸出的運算資料，將該第一元素與該第三元素相乘及該第二元素與該第四元素相乘而產生二乘積，並輸出回饋資料，該回饋資料由一乘積與另一乘積相加或該二乘積與該接收的運算資料相加所獲得；其中該第一元素、第二元素、第三元素與第四元素具有相同的數量，提供至該第一計算電路之一或該第二計算電路的第三元素，不同於提供至其它第一計算電路的第三元素，提供至該第一計算電路之一或該第二計算電路的第四元素，不同於提供至其它第一計算電路的第三元素。 A serial multiply accumulator for a Galois field, which can perform a multiplication operation and an addition operation, comprising: a first element feeding circuit for sequentially outputting the first of the Galois fields in each clock An element; a second element feeding circuit for sequentially outputting a second element in the Galois field in each clock; a plurality of first calculating circuits sequentially connected from upstream to downstream, each first calculating The circuit receives the first element, the second element, a third element and a fourth element in each clock, receives an operation data from an upstream computing circuit, and selectively receives a feedback data. Multiplying the first element by the third element and multiplying the second element by the fourth element to generate a product of the second product, and outputting another operation data downstream, the output data of which is obtained by one product and another Adding a product, adding the second product to the received operational data, adding the two product to the feedback data, or adding the two product to the received operational data and the feedback data, wherein the second product is disposed at the most upstream First The calculation circuit does not receive the operation data transmitted by the other first calculation circuit; and a second calculation circuit is connected to the first calculation circuit disposed at the most downstream for receiving the first element, the second element, and the first The first computing circuit receives the self-connected first circuit in each of the three elements and the fourth element The output operation data, the first element is multiplied by the third element and the second element is multiplied by the fourth element to generate a product of two, and the feedback data is output, the feedback data is a product and another product Adding or adding the two products to the received operational data; wherein the first element, the second element, the third element, and the fourth element have the same quantity, and are provided to one of the first computing circuits or The third element of the second calculation circuit, different from the third element provided to the other first calculation circuit, is provided to one of the first calculation circuit or the fourth element of the second calculation circuit, different from being provided to the other first Calculate the third element of the circuit.

如申請專利範圍第1項所述之串列乘積累加器，進一步包含：一第三元素饋送電路，連接到每一第一計算電路與該第二計算電路，用以提供該些電路一特定的第三元素；及一第四元素饋送電路，連接到每一第一計算電路與該第二計算電路，用以提供該些電路一特定的第四元素。 The tandem multiply accumulator according to claim 1, further comprising: a third element feeding circuit connected to each of the first calculating circuit and the second calculating circuit for providing the circuit with a specific a third element; and a fourth element feeding circuit connected to each of the first computing circuit and the second computing circuit for providing a specific fourth element of the circuits.

如申請專利範圍第1項所述之串列乘積累加器，其中一多項式的係數依次地對應該第一計算電路，常數項係數對應最上游的第一計算電路。 The tandem multiply accumulator according to claim 1, wherein the coefficients of one polynomial sequentially correspond to the first calculation circuit, and the constant term coefficients correspond to the first calculation circuit that is the most upstream.

如申請專利範圍第3項所述之串列乘積累加器，其中如果與該第一計算電路對應的係數不是零，該回饋資料提供給該第一計算電路。 The tandem multiply accumulator according to claim 3, wherein the feedback data is provided to the first calculation circuit if the coefficient corresponding to the first calculation circuit is not zero.

如申請專利範圍第3項所述之串列乘積累加器，其中如果該第一計算電路與第二計算電路的總數量，等於或小於該多項式的最高次方，至少一該多項式的較高次方係數不會對應到一第一計算電路。 The tandem multiply accumulator according to claim 3, wherein if the total number of the first calculating circuit and the second calculating circuit is equal to or less than a highest power of the polynomial, at least one higher order of the polynomial The square coefficient does not correspond to a first calculation circuit.

如申請專利範圍第1項所述之串列乘積累加器，其中該第一計算電路進一步包含：一第一及閘，用以將該第一元素與第三元素相乘；一第二及閘，用以將該第二元素與第四元素相乘；一第一互斥或閘，用以將一乘積與另一乘積相加，該二乘積與該接收的運算資料相加，該二乘積與該回饋資料相加，或該二乘積與該接收的運算資料及該回饋資料相加；及一第一暫存器，用以於一時脈中，暫時儲存由該第一互斥或閘傳來的運算資料。 The serial multiply accumulator according to claim 1, wherein the first calculating circuit further comprises: a first sluice for multiplying the first element by the third element; and a second sluice And multiplying the second element by the fourth element; a first mutex or gate for adding a product to another product, the diploid product being added to the received operation data, the diploid product Adding to the feedback data, or adding the second product to the received operational data and the feedback data; and a first temporary register for temporarily storing the first mutual exclusion or gate transmission in a clock The calculation data.

如申請專利範圍第1項所述之串列乘積累加器，其中該第二計算電路進一步包含：一第三及閘，用以將該第一元素與第三元素相乘；一第四及閘用，以將該第二元素與第四元素相乘；一第二互斥或閘，用以將一乘積與另一乘積相加或該二乘積與該接收的運算資料相加；及一第二暫存器，用以於一時脈中，暫時儲存由該第一互斥或閘傳來的運算資料。 The tandem multiply accumulator according to claim 1, wherein the second calculating circuit further comprises: a third sum gate for multiplying the first element by the third element; and a fourth gate For multiplying the second element by the fourth element; a second mutex or gate for adding a product to another product or adding the second product to the received operational data; The second register is configured to temporarily store the operation data transmitted by the first mutual exclusion or gate in a clock.