CN118312133A - Karatuba-based ultra-high order binary polynomial multiplier - Google Patents

Karatuba-based ultra-high order binary polynomial multiplier Download PDF

Info

Publication number
CN118312133A
CN118312133A CN202410394029.6A CN202410394029A CN118312133A CN 118312133 A CN118312133 A CN 118312133A CN 202410394029 A CN202410394029 A CN 202410394029A CN 118312133 A CN118312133 A CN 118312133A
Authority
CN
China
Prior art keywords
module
multiplier
oka
multiplication
polynomial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410394029.6A
Other languages
Chinese (zh)
Inventor
田静
张永真
杨柳
王中风
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202410394029.6A priority Critical patent/CN118312133A/en
Publication of CN118312133A publication Critical patent/CN118312133A/en
Pending legal-status Critical Current

Links

Landscapes

  • Error Detection And Correction (AREA)

Abstract

The application provides a Karatuba-based ultra-high order binary polynomial multiplier, which comprises the following components: the system comprises a column-by-column calculation module, a reordering module and an OKA multiplier module. The column-by-column calculation module is used for dividing an input ultrahigh-order binary polynomial in a finite field into blocks, wherein the blocks are n-term polynomials; the reordering module is used for ordering the divided items in the blocks through a depth-first recursion function of the binary tree model; the OKA multiplier module operates on each of the ordered blocks by recursion. According to the method, a row-by-row calculation strategy is combined, the rows are calculated according to the blocks, the area is reduced, and the ultrahigh-order binary polynomial multiplication is efficiently realized; each item of the input polynomial is ordered through a reordering module, so that the complexity of an algorithm is reduced; the bit width of the recursive OKA multiplier has scalability, the conversion level of the multiplier is changed, and the delay and the area can be balanced, so that the better surface effect ratio is obtained.

Description

Karatuba-based ultra-high order binary polynomial multiplier
Technical Field
The application relates to the technical field of finite field multipliers in cryptography, in particular to a Karatuba-based ultrahigh-order binary polynomial multiplier.
Background
Polynomial multiplication is one of the widely used operations in the finite field GF (2 m), the efficiency of which has a great impact on the overall performance and cost of the system. There are also many studies on improved methods of polynomial multiplication, and the current algorithms for optimizing polynomial multiplication include the most classical Karatsuba algorithm (Karatsuba algorithm, KA), karatsuba algorithm based on 8-level layering, and non-overlapping Karatsuba algorithm (Overlap-free Karatsuba algorithm, OKA), etc.
There are several schemes currently proposed to optimize the multiplication of the higher order binary polynomials. For the higher order binary polynomial multiplication implementation, there are two strategies, namely a row-by-row strategy and a strategy that divides the vector into blocks. However, when the calculated operand reaches a higher bit level, the existing algorithm cannot effectively and quickly calculate, and meanwhile, the existing ultra-high order binary polynomial column-by-column multiplication is based on dot multiplication operation, so that the resource consumption is high.
In addition, in the prior art, the hardware implementation of the Karatsuba algorithm is generally based on the divide-and-conquer concept, where the multiplication operation is split into three partial product operations, each of which is recursively calculated according to the same algorithm, until the operand is split into two single-bit-based multiplications. The Karatsuba algorithm architecture needs to reorder the input items every time of splitting, resulting in longer routing and routing of the layout and increasing the time delay.
Disclosure of Invention
The application provides a Karatuba-based ultra-high order binary polynomial multiplier, which aims to solve the problems that when the calculated operand reaches a higher bit level, the calculation cannot be effectively and rapidly performed, the resource consumption is high, and the Karatuba algorithm architecture needs to reorder input items each time of disassembly, so that the layout wiring winding is long and the time delay is increased, and comprises the following steps: a row-by-row calculation module, a reordering module and an OKA multiplier module;
The output end of the column-by-column calculation module is connected with the input end of the reordering module, and the output end of the reordering module is connected with the input end of the OKA multiplier module;
The column-by-column calculation module is used for dividing an input ultrahigh-order binary polynomial into blocks, wherein the blocks are n-term polynomials, and inputting the divided blocks into the reordering module to order all the blocks;
The reordering module is used for ordering each item in the divided blocks through a depth-first recursion function of a binary tree model, and inputting the ordered items into the OKA multiplier module for multiplication operation;
The OKA multiplier module performs multiplication operation on each ordered item by continuously recursively using the OKA multiplier with a lower level and using a non-recursion traditional polynomial multiplier at the bottommost layer.
In one possible implementation, the OKA multiplier module includes a plurality of different stages of OKA multiplication architecture, the OKA multiplier module configured to:
Calculating the output of the OKA multiplication architecture of the current level by using three OKA multiplication architectures which are lower than the OKA multiplication architecture of the current level, wherein the three OKA multiplication architectures of the lower level are respectively an even number core, a cross core and an odd number core, and the output comprises a zeroth item output, an odd number item output and an even number item output;
The zeroth term output is c 0, and c 0 is the first term of even core output data;
The output of the even term is c 2i (where 0< i < N), the c 2i is the result of the addition of even and odd core output data;
The output of the odd term is c 2i+1 (where 0.ltoreq.i < N), the c 2i+1 is the output of the cross kernel minus the output of the corresponding even and odd kernels;
wherein c is an output coefficient for representing each coefficient in the polynomial multiplication result; n is the number of bases, which represents the number of bases that need to be processed in the recursion process of the OKA multiplier;
According to the even number core, the odd number core and the cross core continue to use the OKA multiplication architecture which is one level lower than the OKA multiplication architectures which are three levels lower than the odd number core and the cross core, until the odd number core and the cross core continuously reach the traditional polynomial multiplication architecture at the bottom layer, and all the outputs are combined into a multiplication result.
In a possible implementation manner, the OKA multiplier module is further configured to operate on the input number in a recursive manner, and change the number of recursions according to the bit width of the non-recursive conventional polynomial multiplier;
the calculation formula of the number of recursions is as follows:
k=log2N;
where N is the number of bases, k is the number of recursions, the bit width of the OKA multiplication architecture of each stage is N, N/2, N/4, N/2 k-1, and performing point multiplication calculation on the bottommost layer by adopting the traditional polynomial multiplication architecture with the bit width of N/2 k.
In one possible implementation, the reordering module is configured to:
Selecting the bit width of a base as w bits, wherein the total bit width of the block is n bits, and splitting the block into n/w bases;
splitting coefficients of the block according to the bit width w of the base, wherein each w continuous coefficients form one base;
Sorting and combining the split bases according to the coefficient indexes of the blocks, wherein the bases consisting of even-numbered coefficients are combined, and the bases consisting of odd-numbered coefficients are combined;
and taking the base after sequencing and combination as the input of the OKA multiplier module.
In one possible implementation, the column-by-column computation module is configured to:
dividing an operand with input bit width r into n-bit sub-blocks according to the bit width n of the OKA multiplier module;
Reading a first n-bit sub-block of operand M and operand K from a block random access memory And
Sub-blocks to be readAndAs input, the OKA multiplier is called to carry out multiplication operation to obtain a preliminary result
After each multiplication operation is finished, the address of the operand K is shifted one bit back, and the next sub-block is read, at the moment, the last sub-blockThe value of (2) will be stored to the next addressIn (a) and (b);
Sequentially reading the next sub-block of the operand K according to the sequence of the blocks, and calling the OKA multiplier module to carry out multiplication operation, wherein the low-order part of the result obtained by each calculation needs to be accumulated with the high-order part of the product result of the previous block to obtain the partial product of the blocks;
after a column is calculated, the sub-blocks of operand K need to be combined to prepare the multiplication of the next column;
And sequentially reading the operand M according to the sequence of the columns until all the sub-blocks of the last column are calculated, and obtaining a final multiplication result.
In one possible implementation, the device further comprises an input module;
The output end of the input module is connected with the input end of the column-by-column calculation module, and the input module is used for receiving data to be processed.
In one possible implementation, the device further comprises an output module;
the input end of the output module is connected with the output end of the OKA multiplier module;
The output module is used for receiving the polynomial product result of the block, which is recursively obtained by the OKA multiplier module, processing the polynomial product result of the block and outputting the processed polynomial product result.
In one possible implementation of this method,
The output module is configured to: splitting the polynomial product result of the block to obtain a high-order part and a low-order part of the polynomial product result of the block;
Adding the high order part of the product result of the j-1 th block to the low order part of the product result of the j-1 th block in the i-th period; adding the product result of the corresponding part obtained in the i-1 th period to obtain the j-th partial product obtained in the previous i column, wherein i is the period number and j is the block index;
and sequentially calculating partial products in all the periods until the last period, obtaining and outputting a final polynomial product result of the block.
In a possible implementation manner, the OKA multiplier module further includes an addition and subtraction circuit module, wherein an input of the addition and subtraction circuit module is connected with output ends of the even number core, the cross core and the odd number core, and an output end of the addition and subtraction circuit module is connected with an input end of the output module;
The addition and subtraction circuit module is used for combining the output results of the even number core, the cross core and the odd number core into a single polynomial product result of the block.
From the foregoing, the present application provides a Karatsuba-based ultra-high order binary wide polynomial multiplier, comprising: a row-by-row calculation module, a reordering module and an OKA multiplier module; the column-by-column calculation module is used for dividing an input ultrahigh-order binary polynomial into blocks, wherein the blocks are n-term polynomials, and inputting the divided blocks into the reordering module for reordering; the reordering module is used for ordering each item in the divided blocks through a depth-first recursion function of a binary tree model, and inputting the ordered items into the OKA multiplier module for multiplication operation; the OKA multiplier module operates on the individual items in the ordered blocks by continually recursing to the underlying conventional polynomial multiplier. On the basis of a recursive Karatuba architecture of a traditional algorithm, the method combines a column-by-column calculation strategy, calculates columns by blocks, processes multi-bit multiplication in parallel, reduces the area, simultaneously avoids repeated reading of data from a block random access memory in the traditional multiplication method, and can efficiently realize the polynomial multiplication of the ultra-high order binary system; before Karatuba multiplication, each item of the input higher order binary polynomial is reordered based on a depth-first recursive function of a binary tree model, so that the complexity of an algorithm is reduced. The bit width of the recursive OKA multiplier has scalability, the conversion level of the multiplier is changed, and the delay and the area can be balanced, so that the better surface effect ratio is obtained.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the practice of the invention and together with the description, serve to explain the principles of the embodiments of the invention. It is evident that the drawings in the following description are only some embodiments of the implementation of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a schematic diagram of a Karatuba-based ultra-high order binary polynomial multiplier according to an exemplary embodiment of the present application;
FIG. 2 is a schematic diagram of a 2-point karatsuba multiplier architecture according to an example embodiment of the present application;
FIG. 3 is a schematic diagram of an n-bit conversion level based OKA multiplier architecture according to an exemplary embodiment of the present application;
FIG. 4 is a schematic diagram of an OKA multiplier architecture based on a k-stage of conversion according to an illustrative embodiment of the present application;
FIG. 5 is a schematic diagram of an 8-bit reorder Karatuba polynomial multiplier architecture according to an exemplary embodiment of the present application;
FIG. 6 is a schematic diagram illustrating the architecture of the first 4-bit reorder Karatuba polynomial multiplier selected from FIG. 5 according to an exemplary embodiment of the present application;
fig. 7 is a schematic diagram of a column-by-column calculation method when r=10 bits and n=3 bits are shown in an exemplary embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of the implementations of embodiments of the invention.
Polynomial multiplication is critical in cryptography, and its optimization algorithms such as Karatsuba, non-Overlapping Karatsuba (OKA), and Karatsuba based on 8-level layering are each characterized. However, existing Karatsuba algorithms do not allow efficient and fast computation when the computed operands reach a higher bit level; meanwhile, the existing super high order binary polynomial column-by-column multiplication is based on dot multiplication operation, so that the resource consumption is high; and the Karatuba algorithm architecture needs to reorder the input items when splitting each time, so that the layout wiring is longer, and the time delay is increased.
To solve the above problems, an embodiment of the present application provides a Karatsuba-based ultra-high order binary polynomial multiplier, as shown in fig. 1, including: a row-by-row calculation module, a reordering module and an OKA multiplier module; the output end of the row-by-row calculation module is connected with the input end of the reordering module, and the output end of the reordering module is connected with the input end of the OKA multiplier module.
The main function of the column-by-column calculation module is to divide the input higher order binary polynomial. This division is based on the bit width n of the OKA multiplier module, each block being an n-term polynomial. The purpose of this is to decompose the higher order binary polynomial into smaller, more tractable parts for subsequent multiplication operations. After the division is completed, the column-by-column computation module takes these blocks as inputs and passes them to the reordering module.
The reordering module is used for ordering each item in the divided blocks through a depth-first recursion function of the binary tree model, and inputting the ordered items into the OKA multiplier module for multiplication operation; where Depth First Search (DFS) of a binary tree model is a method of traversing a binary tree by recursively accessing nodes of the tree. In a depth-first search, the root node is accessed first, then the left subtree is accessed recursively, and finally the right subtree is accessed recursively. This approach may ensure that each node is accessed only once.
The OKA multiplier module operates on the individual items in the ordered blocks by continually recursing to the underlying conventional polynomial multiplier. The method can remarkably reduce the complexity of multiplication operation and improve the operation speed. And finally obtaining the result of polynomial multiplication through the operation of the OKA multiplier module.
In some embodiments of the present application, the OKA multiplier module is further configured to operate on the input number in a recursive manner and modify the number of recursions according to the bit width of the non-recursive conventional polynomial multiplier; the calculation formula of the number of recursions is: k=log 2 N. Wherein, N is the number of bases, k is the recursion times, the bit width of each stage of OKA multiplication architecture is sequentially N, N/2, N/4, … N/2 k-1, wherein, the traditional polynomial multiplication architecture with the bit width of N/2 k is adopted at the bottom layer for dot multiplication calculation.
Specifically, the N-term input polynomial of the OKA multiplier is divided into N bases with w bit widths according to the bit width w of the traditional polynomial multiplier at the bottom layer, wherein n=n/w, and the number of recursion layers is changed by adjusting the bit width w of the traditional polynomial multiplier at the bottom layer, so that the method is applicable to inputs with various sizes, and delay and area of the whole framework can be balanced.
In some embodiments of the application, the OKA multiplier module includes a plurality of different stages of OKA multiplication architecture, the OKA multiplier module configured to: and calculating the output of the OKA multiplication architecture of the current level by using three OKA multiplication architectures which are lower than the OKA multiplication architecture of the current level, wherein the three OKA multiplication architectures of the lower level are respectively an even number core, a cross core and an odd number core, and the output comprises a zeroth item output, an odd number item output and an even number item output.
The zeroth term output is c 0,c0 as the first term of even core output data, the even term output is c 2i (where 0 < i < N), c 2i is the result of the addition of even core and odd core output data, the odd term output is c 2i+1 (where 0.ltoreq.i < N), and c 2i+1 is the output of the crossing core minus the output of the corresponding even core and odd core. Wherein c is an output coefficient for representing each coefficient in the polynomial multiplication result; n is the number of bases, which represents the number of bases that need to be processed during the recursion of the OKA multiplier.
The even number core, the cross core and the odd number core continue to use three multipliers of lower level respectively, recursion is continued until the traditional polynomial multiplier of the bottommost layer, and the output data are combined into a multiplication result.
In the embodiment of the application, the recursion of the OKA multiplier module uses 3 low-level OKA multiplication architectures which are respectively expressed as even kernels, crossed kernels and odd kernels, the zeroth output c 0 is equal to the first term of output data of the even kernels, the even output c 2i (0 < i < N) is the result of adding output data of the even kernels and the odd kernels, and the odd output c 2i+1 (0 is less than or equal to i < N) is the output of the crossed kernels minus the output of the corresponding even kernels and the odd kernels, so that the recursion is performed until the bottom layer of the traditional polynomial multiplication architecture. Referring to fig. 2, fig. 2 shows a 2-point Karatsuba-based polynomial multiplier architecture.
It can be understood that in the embodiment of the present application, N-term polynomial multiplication modules are directly used for parallel processing, N/2-point OKA multiplier architecture is used in parallel at the top layer to obtain even-numbered kernels, cross kernels and odd-numbered kernels, then N/4-point OKA multiplier architecture is used in parallel inside the N/2-point OKA multiplier architecture, and the architecture can be further used in recursion, for example, N/8-point multiplier architecture is used in parallel inside the N/4-point multiplier architecture, so that the architecture is used in flexible recursion, and can be suitable for various size inputs, and different requirements on area delay are satisfied. According to the input bit width N of the multiplier and the bit width w of the traditional polynomial multiplication adopted by the base of the bottommost layer of the recursion, the number of input terms N=n/w, the number of recursion layers k=log 2 N is determined, the OKA bit width of each layer is sequentially N, N/2, N/4, … N/2 k-1, and the traditional polynomial multiplier with the bit width of N/2 k is adopted at the bottommost layer to carry out dot multiplication calculation. According to different bit widths n, the bit width of the base is changed, corresponding to different recursion times, a proper conversion level is selected, and the time delay and the consumption area of the whole multiplier can be effectively balanced. Referring to FIG. 3, an embodiment of the present application is based on a recursive implementation with three OKA sub-multipliers at the highest level, at conversion level k, the lowest level being a non-recursive conventional polynomial multiplication, as shown with reference to FIG. 4.
Furthermore, in addition to using a 2-point Karatsuba-based multiplier in the embodiment of the present application, the kernel is replaced by a 3-point Karatsuba-based multiplier, so that multiplication with the polynomial term of 2nx3 can be supported, and the kernel can be replaced arbitrarily, thereby obtaining various combinations. Second, the bottom layer of the recursive Karatsuba algorithm module adopts traditional polynomial multiplication, the higher the conversion level is, the larger the delay of the multiplier is, and the smaller the area is. The multiplier architecture in the embodiment of the application can be suitable for various conversion levels, and can balance the area and delay according to different OKA multiplier bit widths and then determine the optimal conversion level.
In some embodiments of the application, the reordering module is configured to: selecting the bit width of the base as w bits, dividing the total bit width of the block into n bits, and dividing the block into n/w bases; splitting coefficients of the block according to the bit width w of the base, wherein each w continuous coefficients form a base; sorting and combining the split bases according to the coefficient indexes of the blocks, wherein the bases consisting of even-numbered coefficients are combined and the bases consisting of odd-numbered coefficients are combined; the ordered and combined basis is used as input to the OKA multiplier module.
The main task of the reordering module is to order the entries in the divided blocks (i.e., the n-term polynomials) according to a specific rule, so that the subsequent OKA multiplier module can perform multiplication operations more efficiently. The recursive processing mode reduces the complexity of calculation and improves the operation efficiency.
Specifically, let the number of entries be N, the bit width of each entry be w, and the reordering algorithm procedure be as follows:
in the embodiment of the application, the reordering module is utilized, and the reordering is not needed to be performed again each time of recursion Karatsube. Referring to fig. 5 and 6, a specific parameter is selected as an example, for two 8-term polynomials in the finite field GF (2 m) AndThe polynomial multiplication of (2) is carried out by passing the polynomial through a reordering module, wherein the bit width of the base is w=2 bitsAfter sequencing, the method can obtain:
Similarly, polynomial expression is used Coefficient splitting into (1)As shown in fig. 5, the 8-bit polynomial multiplier input is: And The three 4-bit karatsuba polynomial multiplier inputs that are recursively internal are: And And AndAs shown in FIG. 6, the input of the first 4-bit polynomial multiplier isAnd
The reordering module provides a more optimized input data format for the OKA multiplier module by parity-based ordering and combining of the blocks so that the multiplication operations can be performed more efficiently. In the recursion process of the OKA multiplier module, the reordering module ensures that each layer of recursion can obtain a proper data format, thereby reducing the complexity of multiplication operation.
In some embodiments of the application, the column-by-column computation module is configured to:
Dividing an operand with an input bit width r into n-bit sub-blocks according to the bit width n of the OKA multiplier module; this partitioning helps to break down the higher order binary polynomial into smaller, more manageable parts, making the subsequent multiplication operations more efficient.
Reading a first n-bit sub-block of operand M and operand K from a block random access memoryAnd
Sub-blocks to be readAndAs input, the OKA multiplier is called to carry out multiplication operation to obtain a preliminary resultThe column-by-column calculation module utilizes the optimization characteristic of the OKA multiplier and can provide high-efficiency operation capability in ultra-high order binary polynomial multiplication.
After each multiplication operation is finished, the address of the operand K is shifted one bit back, and the next sub-block is read, at the moment, the last sub-blockThe value of (2) will be stored to the next addressIn (a) and (b); the address management mode ensures the order and continuity of data reading and avoids the repetition or omission of data.
Sequentially reading the next sub-block of the operand K according to the sequence of the blocks, and calling the OKA multiplier module to carry out multiplication operation, wherein the low-order part of the result obtained by each calculation needs to be accumulated with the high-order part of the product result of the previous block to obtain the partial product of the blocks; this accumulation operation ensures the continuity of the multiplication operation so that the final result correctly reflects the sum of the superposition of the corresponding partial products.
After a column is calculated, the sub-blocks of operand K need to be combined to prepare the multiplication of the next column; this combination provides the correct input for the subsequent multiplication operation.
And sequentially reading the operand M according to the sequence of the columns until all the sub-blocks of the last column are calculated, and obtaining a final multiplication result.
Specifically, the algorithm of the column-by-column calculation is shown as follows, wherein the bit width of the result output by the OKA multiplication structure is 2n-1, and the result is divided into n outputsAnd an output of n-1
Parameters: multiplier bit width r; the OKA multiplier bit width n,OVERHANG=r mod n。
Input: multiplier M [ r-1:0], K [ r-1:0]
And (3) outputting: r=m×k
In the algorithm, the bit widths of the input multipliers M and K are r, in order to avoid large area, an operation number with the serial input bit width of n is adopted to the OKA multiplication module,Reading the first n bits of M and K from a block random access memoryAndThe result is obtained by multiplying the results by the OKA algorithm and is not changedContinue reading the next n bits of KMultiplication is continued until a set of values of K has been read, and the next n bits of M are again read. Since the read of the block random access memory is preferential, a delay of one clock cycle exists between the storage and the read, and after each calculation is finishedThe address of (c) is shifted one bit back, for example,Will be stored toAfter the completion of the calculation of a column,Will be updated to the value ofAndIs a combination of (a) and (b).
Referring to fig. 7, a specific parameter is selected as an example, and when r=10 bits and n=3 bits, the first 3 bits of M and K are read for the first timeM 1,m2 Input multiplication module calculates to obtainWill beStore toAnd will beStorage ofFollowed by reading(The read value is notStored values, the same applies below), calculateWill beAndAdding and storing toAnd will beStore to AndCalculated to obtainWill beAndAdd and store toAnd will beStore toSimultaneous use of register REG storageMost significant (e.g., m 2·k8 in fig. 7). ReadingThen, the lowest order bits k 9 are truncated, andAfter multiplication, the highest two-bit value (e.g., m 1·k9,m2·k9 in FIG. 7) is added to the value stored in the register REG, and then summedAdding upAndCombining to obtain k 7,k8,k9, and storing into
At this time, the multiplication of the first column is completed. Reading while calculating the second column Sequentially calculating and storing the same with the previousAnd adding until the last column is calculated, and obtaining a final multiplication result.
In summary, the column-by-column calculation method optimizes the processes of operand division, reading, multiplication operation, result accumulation and the like, and provides efficient and accurate data processing capability for the ultra-high order binary polynomial multiplier architecture. The design not only improves the efficiency of multiplication operation, but also reduces the resource consumption, so that the ultra-high order binary polynomial multiplication is more feasible and efficient in practical application.
In some embodiments of the present application, with continued reference to FIG. 1, the polynomial multiplication architecture further includes an input module; the output end of the input module is connected with the input end of the column-by-column calculation module, and the input module is used for receiving data to be processed.
In the present application, the data M, K may be directly input, and the bit width is r. Dividing M and K into r/n blocks according to the bit width n of the OKA multiplier, storing the r/n blocks into a block random access memory, directly taking out corresponding sub-blocks of the required M and K from the block random access memory according to the operation sequence, and performing the subsequent reordering and multiplication operation.
In some embodiments of the present application, with continued reference to FIG. 1, the polynomial multiplication architecture further includes an output module; the input end of the output module is connected with the OKA multiplier module; the output module is used for receiving the product result obtained by recursion of the OKA multiplier module, processing the polynomial result of the block and outputting the processed result.
Specifically, the output module can effectively process the final result of the multiplication operation. Depending on the needs of the application, the output module may need to perform format conversion or normalization processing on the received polynomial product result in order to be compatible with other systems or modules.
In some embodiments of the application, the output module is configured to: splitting the polynomial product result of the block to obtain a high-order part and a low-order part of the multiplication result of the block; in the ith period, adding the high-order part of the product result of the jth-1 block to the low-order part of the product result of the jth block, namely, superposing the high-order part of the partial product of the previous block with the low-order part of the partial product of the current block, wherein the superposition operation ensures the accuracy of the result; and adding the product result of the corresponding part obtained in the i-1 th period to obtain the j-th partial product obtained by the operation of the previous i column, wherein i is the period number and j is the block index. Thus, at the end of each cycle, the product of the j-th partial product obtained by the previous i-column operation is obtained. And sequentially superposing partial products of all the periods until the last period to obtain and output a final ultrahigh-order binary polynomial multiplication result.
In some embodiments of the present application, with continued reference to fig. 1, the OKA multiplier module further includes an addition-subtraction circuit module, where an input of the addition-subtraction circuit module is connected to output ends of the even core, the cross core, and the odd core, and an output of the addition-subtraction circuit module is connected to an input end of the output module; the addition and subtraction circuit module is used for combining the output results of the even number core, the cross core and the odd number core into a polynomial product result.
According to the embodiment of the application, the addition and subtraction circuit module is used for completing the task that each layer of OKA multiplier needs to combine the results of the upper layer through addition and subtraction operation in the recursion process.
The present application provides an embodiment to illustrate a Karatsuba-based ultra-high order binary polynomial multiplier:
Let the term number of the two higher order polynomials of the input be r=12323, the bit width of the input data of the oka multiplier module be n=128 bits, l=96, and zero padding is performed on the data less than 128 bits. The input to the OKA multiplier module is divided into n=8 terms, each term having a bit width w=16 bits.
Assuming that the partial product of the j (0. Ltoreq.j < l) th block in the i (0. Ltoreq.i < l) th period is calculated, the two hexadecimal numbers of the OKA multiplierAndInputs are respectively:
Will be AndInputting a reordering module to obtain a sequence after sequencing, wherein the sequence is as follows:
Inputting the ordered result to the OKA multiplier module, and obtaining the input of the cross kernel as follows:
Furthermore, the inputs to the even numbered cores are:
the inputs for the odd kernels are:
The three results are respectively input into a polynomial multiplier architecture, and in the recursion process, the OKA multiplier needs to combine the results of the upper layer through addition and subtraction operation, so that the output of the obtained even number kernels is as follows:
Zeven[0:6]=1a90d9a2,71cb088a,4b5c1fd5,78e8329b,726e2ed0,1e4c7e2e,38ea8a7c;
the resulting odd kernel outputs are:
Zodd[0:6]=077c12c0,3bac8f4f,24cc3ae4,46222f46,7f55db40,0b9e88f0,0079a370;
The resulting cross kernel output is:
Zcross[0:6]=7f63afa8,7b1c7c2a,7eb95e61,023e6435,1936e1f8,2e74c63a,1752504c;
after the multiplier architecture is called for three times, the output results of the even number core, the odd number core and the crossed core are processed by an addition and subtraction circuit, and the result of the sub-multiplier is obtained as follows:
Z[0:14]=1a90d9a2,628f64ca,76b71a4a,317bfbef,70f0909a,11297b50,5c24087f,3cf479e8,344c0196,140d1468,6119a56e,3ba630e4,3374028c,2fc17940,0079a37;
Processing Z through an output module to obtain the output result of the final polynomial multiplication: z_ mediate [254:0] = 00798cb14a34392a51fdb16320243d6225cc19560ba0a1e18d5878c57e5ad9a2.
Splitting Z_ mediate again to obtain the high-order part of each partial productAnd a lower partThe following is shown:
RESULT_REG[126:0]=00798cb14a34392a51fdb16320243d62;
RESULT_MID[127:0]=25cc19560ba0a1e18d5878c57e5ad9a2;
Similarly, it is assumed that the block of the j-1 th period, The high-order part is calculated by the OKA multiplierThe jth partial product obtained from the ith-1 th period
Therefore, the higher order part of the j-1 th block product result in the i-th period overlaps the lower order part of the j-th block product result, and the i-1 th periodFinally, the final result of the j-th partial product obtained in the previous i column is obtained:
and then, corresponding partial products of the subsequent periods are sequentially overlapped until the last (1+1) th period, and a final multiplication result is obtained.
According to the above embodiments, the present application provides a Karatsuba-based ultra-high order binary polynomial multiplier architecture, comprising: a row-by-row calculation module, a reordering module and an OKA multiplier module; the column-by-column calculation module is used for dividing an input ultrahigh-order binary polynomial into blocks, wherein the blocks are n-item polynomials, and inputting the divided blocks into the reordering module for ordering; the reordering module is used for ordering each item in the divided blocks through a depth-first recursion function of the binary tree model, and inputting the ordered items into the OKA multiplier module for multiplication operation; the low-level OKA multiplier is continuously recursively executed until the bottommost non-recursion traditional polynomial multiplier is subjected to multiplication operation. On the basis of a recursive Karatuba architecture of a traditional algorithm, the method combines a column-by-column calculation strategy, calculates columns by blocks, processes multi-bit multiplication in parallel, reduces the area, simultaneously avoids repeated reading of data from a block random access memory in the traditional multiplication method, and can efficiently realize the polynomial multiplication of the ultra-high order binary system; before Karatuba multiplication, each item of the input super-high-order binary polynomial is reordered based on a depth-first recursive function of a binary tree model, so that the complexity of an algorithm is reduced. The bit width of the recursive OKA multiplier has scalability, and simultaneously, the conversion level of the multiplier is changed, so that the delay and the area can be balanced, and further, the better surface effect ratio is obtained.
From the foregoing, it will be appreciated that embodiments of the application are intended to cover a non-exclusive inclusion, such that a structure, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such structure, article, or apparatus. Without further limitation, the statement "comprises … …" does not exclude that an additional identical element is present in a structure, article, or apparatus that comprises the element.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (9)

1. An ultra-high order binary polynomial multiplier based on kartsuba, comprising: a row-by-row calculation module, a reordering module and an OKA multiplier module;
The output end of the column-by-column calculation module is connected with the input end of the reordering module, and the output end of the reordering module is connected with the input end of the OKA multiplier module;
The column-by-column calculation module is used for dividing an input ultrahigh-order binary polynomial into blocks, wherein the blocks are n-term polynomials, and inputting the divided blocks into the reordering module to order all the blocks;
The reordering module is used for ordering each item in the divided blocks through a depth-first recursion function of a binary tree model, and inputting the ordered items into the OKA multiplier module for multiplication operation;
the OKA multiplier module performs multiplication operation on each item in the ordered blocks by continuously recursively using the OKA multipliers of a lower stage and utilizing non-recursion traditional polynomial multipliers at the bottommost layer.
2. The kartsuba-based ultra-high order binary polynomial multiplier of claim 1, wherein the OKA multiplier module comprises a plurality of different stages of OKA multiplication architecture, the OKA multiplier module configured to:
Calculating the output of the OKA multiplication architecture of the current level by using three OKA multiplication architectures which are lower than the OKA multiplication architecture of the current level, wherein the three OKA multiplication architectures of the lower level are respectively an even number core, a cross core and an odd number core, and the output comprises a zeroth item output, an odd number item output and an even number item output;
The zeroth term output is c 0, and c 0 is the first term of even core output data;
The output of the even term is c 2i (where 0< i < N), the c 2i is the result of the addition of even and odd core output data;
The output of the odd term is c 2i+1 (where 0.ltoreq.i < N), the c 2i+1 is the output of the cross kernel minus the output of the corresponding even and odd kernels;
wherein c is an output coefficient for representing each coefficient in the polynomial multiplication result; n is the number of bases, which represents the number of bases that need to be processed in the recursion process of the OKA multiplier;
according to the even number core, the odd number core and the cross core continue to use the OKA multiplication architecture which is one level lower than the OKA multiplication architectures which are three levels lower than the odd number core and the cross core, until the odd number core and the cross core continuously reach the traditional polynomial multiplication architecture at the bottom layer, and all output data are combined into a multiplication result.
3. The kartsuba-based ultra-high order binary polynomial multiplier of claim 2, wherein the OKA multiplier module is further configured to recursively operate on the input numbers and alter the number of recursions based on the bit width of the non-recursive conventional polynomial multiplier;
the calculation formula of the number of recursions is as follows:
k=log2N;
where N is the number of bases, k is the number of recursions, the bit width of the OKA multiplication architecture of each stage is N, N/2, N/4, N/2 k-1, and performing point multiplication calculation on the bottommost layer by adopting the traditional polynomial multiplication architecture with the bit width of N/2 k.
4. The kartsuba-based ultra-high order binary polynomial multiplier of claim 1, wherein the reordering module is configured to:
Selecting the bit width of a base as w bits, wherein the total bit width of the block is n bits, and splitting the block into n/w bases;
splitting coefficients of the block according to the bit width w of the base, wherein each w continuous coefficients form one base;
Sorting and combining the split bases according to the coefficient indexes of the blocks, wherein the bases consisting of even-numbered coefficients are combined, and the bases consisting of odd-numbered coefficients are combined;
and taking the base after sequencing and combination as the input of the OKA multiplier module.
5. The kartsuba-based ultra-high order binary polynomial multiplier of claim 1, wherein the column-wise computation module is configured to:
dividing an operand with input bit width r into n-bit sub-blocks according to the bit width n of the OKA multiplier module;
Reading a first n-bit sub-block of operand M and operand K from a block random access memory And
Sub-blocks to be readAndAs input, the OKA multiplier is called to carry out multiplication operation to obtain a preliminary result
After each multiplication operation is finished, the address of the operand K is shifted one bit back, and the next sub-block is read, at the moment, the last sub-blockThe value of (2) will be stored to the next addressIn (a) and (b);
Sequentially reading the next sub-block of the operand K according to the sequence of the blocks, and calling the OKA multiplier module to carry out multiplication operation, wherein the low-order part of the result obtained by each calculation needs to be accumulated with the high-order part of the product result of the previous block to obtain the partial product of the blocks;
after a column is calculated, the sub-blocks of operand K need to be combined to prepare the multiplication of the next column;
And sequentially reading the operand M according to the sequence of the columns until all the sub-blocks of the last column are calculated, and obtaining a final multiplication result.
6. The Karatsuba-based ultra-high order binary polynomial multiplier according to claim 1, further comprising an input module;
The output end of the input module is connected with the input end of the column-by-column calculation module, and the input module is used for receiving data to be processed.
7. The Karatsuba-based ultra-high order binary polynomial multiplier according to claim 1, further comprising an output module;
the input end of the output module is connected with the output end of the OKA multiplier module;
The output module is used for receiving the polynomial product result of the block, which is recursively obtained by the OKA multiplier module, processing the polynomial product result of the block and outputting the processed polynomial product result.
8. The ultra-high order binary polynomial multiplier based on kartsuba according to claim 7, wherein,
The output module is configured to: splitting the polynomial product result of the block to obtain a high-order part and a low-order part of the polynomial product result of the block;
Adding the high order part of the product result of the j-1 th block to the low order part of the product result of the j-1 th block in the i-th period; adding the product result of the corresponding part obtained in the i-1 th period to obtain the j-th partial product obtained in the previous i column, wherein i is the period number and j is the block index;
And sequentially calculating partial products in all periods until the last period, obtaining and outputting the final polynomial multiplication result.
9. The kartsuba-based ultra-high order binary polynomial multiplier of claim 7, wherein the OKA multiplier module further comprises an addition-subtraction circuit module, an input of the addition-subtraction circuit module being connected to outputs of the even, cross, and odd cores, an output of the addition-subtraction circuit module being connected to an input of the output module;
the addition and subtraction circuit module is used for combining the output results of the even number core, the cross core and the odd number core into a polynomial product result of the block.
CN202410394029.6A 2024-04-02 2024-04-02 Karatuba-based ultra-high order binary polynomial multiplier Pending CN118312133A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410394029.6A CN118312133A (en) 2024-04-02 2024-04-02 Karatuba-based ultra-high order binary polynomial multiplier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410394029.6A CN118312133A (en) 2024-04-02 2024-04-02 Karatuba-based ultra-high order binary polynomial multiplier

Publications (1)

Publication Number Publication Date
CN118312133A true CN118312133A (en) 2024-07-09

Family

ID=91724917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410394029.6A Pending CN118312133A (en) 2024-04-02 2024-04-02 Karatuba-based ultra-high order binary polynomial multiplier

Country Status (1)

Country Link
CN (1) CN118312133A (en)

Similar Documents

Publication Publication Date Title
US11574031B2 (en) Method and electronic device for convolution calculation in neural network
US8051124B2 (en) High speed and efficient matrix multiplication hardware module
JP2019106186A (en) Apparatus for and method of carrying out convolution calculation in convolution neural network
CN107704916A (en) A kind of hardware accelerator and method that RNN neutral nets are realized based on FPGA
US8793300B2 (en) Montgomery multiplication circuit
US8959134B2 (en) Montgomery multiplication method
CN109564585B (en) Dot product based processing element
US10824394B2 (en) Concurrent multi-bit adder
JPS63182773A (en) Circuit for calculating discrete cosine conversion of sample vector
CN104617959A (en) Universal processor-based LDPC (Low Density Parity Check) encoding and decoding method
GB2492488A (en) A logic circuit performing a multiplication as the sum of addends operation with a desired rounding precision
WO2018027706A1 (en) Fft processor and algorithm
US20190278566A1 (en) System and method for long addition and long multiplication in associative memory
US9933998B2 (en) Methods and apparatuses for performing multiplication
CN115238863A (en) Hardware acceleration method, system and application of convolutional neural network convolutional layer
US6598061B1 (en) System and method for performing modular multiplication
CN116954555A (en) Floating point division divided by integer constant
JP4282193B2 (en) Multiplier
JP7038608B2 (en) Semiconductor device
CN118312133A (en) Karatuba-based ultra-high order binary polynomial multiplier
US20200026998A1 (en) Information processing apparatus for convolution operations in layers of convolutional neural network
CN113592075B (en) Convolution operation device, method and chip
CN113128688B (en) General AI parallel reasoning acceleration structure and reasoning equipment
JP2007500388A (en) Long integer multiplier
CN114281755A (en) Vector processor-oriented semi-precision vectorization convolution method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination