CN117406957B - Modular multiplication method, modular multiplication assembly and semi-custom circuit - Google Patents
Modular multiplication method, modular multiplication assembly and semi-custom circuit Download PDFInfo
- Publication number
- CN117406957B CN117406957B CN202311709371.2A CN202311709371A CN117406957B CN 117406957 B CN117406957 B CN 117406957B CN 202311709371 A CN202311709371 A CN 202311709371A CN 117406957 B CN117406957 B CN 117406957B
- Authority
- CN
- China
- Prior art keywords
- value
- modular multiplication
- modular
- window width
- modulus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000004364 calculation method Methods 0.000 claims abstract description 70
- 230000009467 reduction Effects 0.000 claims description 101
- 125000004122 cyclic group Chemical group 0.000 claims description 13
- 230000001351 cycling effect Effects 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 description 10
- 238000013461 design Methods 0.000 description 10
- RNAMYOYQYRYFQY-UHFFFAOYSA-N 2-(4,4-difluoropiperidin-1-yl)-6-methoxy-n-(1-propan-2-ylpiperidin-4-yl)-7-(3-pyrrolidin-1-ylpropoxy)quinazolin-4-amine Chemical compound N1=C(N2CCC(F)(F)CC2)N=C2C=C(OCCCN3CCCC3)C(OC)=CC2=C1NC1CCN(C(C)C)CC1 RNAMYOYQYRYFQY-UHFFFAOYSA-N 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000012942 design verification Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
- G06F7/722—Modular multiplication
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The embodiment of the invention provides a modular multiplication method, a modular multiplication component and a semi-custom circuit, which relate to the field of privacy calculation, wherein the modular multiplication method comprises the following steps: receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources; and according to the allocated window width, carrying out modular multiplication on the data to be converted based on a window modular protocol to obtain a modular multiplication result, wherein the modular multiplication result is used for converting the data to be converted to form converted data. After receiving the data to be converted, firstly carrying out resource analysis according to the FPGA chip of the current FPGA to determine the window width of the window module protocol. The large window width means that the allocated resources are more and the calculation speed is high. The small window width means that the allocated resources are less and the calculation speed is slow; the balance of resources and speed is realized through the adjustable window width.
Description
Technical Field
The invention relates to the field of privacy calculation, in particular to a modular multiplication method, a modular multiplication component and a semi-custom circuit.
Background
For a lattice cryptosystem running on a polynomial ring, in various cryptographic structures, such as hash computation, digital signature, homomorphic encryption and the like, a large Number of modular multiplication operations, such as polynomial multiplication, number-Theoretic Transform, NTT and the like, are involved, so that modular multiplication is a main arithmetic component, and although modular multiplication is not a large-scale arithmetic component per se, the operation efficiency of the modular multiplication is directly determined by the operation efficiency of the whole homomorphic encryption algorithm due to intensive application in the homomorphic encryption algorithm.
In carrying out the present invention, the applicant has found that at least the following problems exist in the prior art:
in a hardware implementation of homomorphic encryption, modular multiplication typically employs Barrett modular reduction modular multiplication or Montgomery modular reduction modular multiplication. Barrett is a Barrett algorithm (Barrett Reduction) for implementing a remainder operation of dividing a large integer by a modulus; montgomery is a Montgomery algorithm (Montgomery Modular Multiplication) for performing modular multiplication operations.
For applications such as homomorphic encryption, which are computationally intensive and data intensive, the computation speed of the CPU has been difficult to meet the requirements of various applications and designs, so that the use of programmable logic devices or semi-custom circuit FPGAs (Field-Programmable Gate Array) has become the first choice for acceleration, and modular multiplication has become a key ring, and specific algorithm design and corresponding FPGA hardware have become a key in design in advance.
The design of the FPGA always balances the resource consumption and the speed, for example, homomorphic encryption has high requirements on the FPGA resource, and generally, the faster the hardware is, the higher the resource consumption and the higher the cost of the design. In addition, when the design occupies too much resources in the FPGA, the design verification and the timing sequence convergence are very difficult, and the balance between the resource consumption and the speed is difficult to achieve.
Disclosure of Invention
The embodiment of the invention provides a modular multiplication method, a modular multiplication component and a semi-custom circuit, which can solve the technical problems that modular multiplication computing resources are consumed and the speed is difficult to balance in the prior art.
To achieve the above object, in a first aspect, an embodiment of the present invention provides a modular multiplication method, including:
receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources;
and according to the allocated window width, carrying out modular multiplication on the data to be converted based on a window modular protocol to obtain a modular multiplication result, wherein the modular multiplication result is used for converting the data to be converted to form converted data.
In a second aspect, an embodiment of the present invention provides a method for generating a fixed lookup table, including:
setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;
for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 w+1 A plurality of reserve values, each of which is +.>,,/>;
Wherein,means will->Calculating modulus +.>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value.
In a third aspect, an embodiment of the present invention provides a modular multiplication assembly, applied in a semi-custom circuit FPGA, the modular multiplication assembly:
the method comprises the steps of receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources; and according to the allocated window width, carrying out modular multiplication on the data to be converted based on a window modular protocol to obtain a modular multiplication result, wherein the modular multiplication result is used for converting the data to be converted to form converted data.
In a fourth aspect, an embodiment of the present invention provides a fixed lookup table generating component for:
before the first receiving of the data to be converted, generating a fixed lookup table T through pre-calculation, wherein the fixed lookup table T is generated through pre-calculation, and specifically comprises the following steps:
setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;
for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 w+1 A plurality of reserve values, each of which is +.>,,/>;
Wherein,means will->Calculating modulus +.>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value.
In a fifth aspect, an embodiment of the present invention provides a semi-custom circuit, including a fixed lookup table generating component, a memory, and the foregoing modular multiplication component; wherein:
the fixed lookup table generating component is configured to generate, by pre-calculation, a fixed lookup table T before receiving data to be converted for the first time, where the generating, by pre-calculation, the fixed lookup table T specifically includes:
setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;
for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 w+1 A plurality of reserve values, each of which is +.>,,/>;
Wherein,means will->Calculating modulus +.>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value;
the memory is used for storing a fixed lookup table T.
The technical scheme has the following beneficial effects: after receiving the data to be converted, firstly carrying out resource analysis according to the FPGA chip of the current FPGA to determine the window width of the window module protocol. The large window width means that the allocated resources are more and the calculation speed is high. The small window width means that the allocated resources are less and the calculation speed is slow; the balance of resources and speed is realized through the adjustable window width.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a modular multiplication method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a modular multiplication assembly in accordance with an embodiment of the present invention;
FIG. 3 is a block diagram of a semi-custom circuit according to an embodiment of the present invention;
FIG. 4 is a circuit diagram of a design of a window module specification in an FPGA according to an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, in combination with an embodiment of the present invention, there is provided a modular multiplication method including:
s101: receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources;
s102: and according to the allocated window width, carrying out modular multiplication on the data to be converted based on the window modular protocol to obtain a modular multiplication result, wherein the modular multiplication result is used for converting the data to be converted to form converted data.
The modular multiplication of the window modular reduction is adopted, and the modular multiplication module is used for half-custom circuit FPGA and realizes the modular multiplication logic in the FPGA. After receiving the data to be converted, firstly carrying out resource analysis according to the FPGA chip of the current FPGA to determine the window width of the window module protocol. The large window width means that the allocated resources are more and the calculation speed is high. The small window width means that the allocated resources are less and the calculation speed is slow; the balance of resources and speed is realized through the adjustable window width.
Preferably, the modular multiplication method may further include:
s103: generating a fixed lookup table T through pre-calculation before receiving data to be converted for the first time;
in S103, the fixed lookup table T is generated by pre-calculation, specifically including:
s103-1: setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;
s103-2: for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 w+1 A plurality of reserve values, each of which is +.>,,/>;
Wherein,means will->Calculating modulus of table>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value.
It can be seen that when the window width w is determined and the bit length k of the table calculation modulus is determined, then each data in the look-up table T is fixedThe value of (2) is fixed. For example: if w=5, k=32, then there are 64 data in the fixed lookup table T, i has a value from 0 to 63, the spare value in the fixed lookup table T +.>Sequentially 0, 516095, 1032190 … …; the fixed lookup table T is a fixed value that has been calculated, and is a constant list, and is typically stored in BRAM (Block RAM) of the FPGA, and in modular multiplication calculation based on window modular reduction, there is only one fixed lookup table T in one modular multiplication for implementing shifting and addition operations when modular multiplication based on window modular reduction is adopted.
Preferably, in S102, the modular multiplication based on the window modular reduction cooperates with the data to be converted to obtain a modular multiplication result, which specifically includes:
the modular multiplication result is expressed as:,/>,/>,/>and->Two multiplier variables representing the modular multiplication required respectively, multiplier one and multiplier two respectively,/-for the multiplier one and multiplier two respectively>,/>Is k bits, the window width is +.>Bit, window Width->Preset modulus with data to be converted +.>Is of a width of (1)Degree-dependent, general window widthDoes not exceed a preset modulus of the data to be converted +.>Is a width of (c).
S102-1: assigning V to a modular multiplication resultAs the current->A value; wherein (1)>V is the product of two multipliers in the modular multiplication, and subscript 2 indicates +.>All are binary values, and the subscript n-1 is the number of bits of V; wherein (1)>One of a plurality of data segments belonging to data to be converted; for example, n=64 bits indicates that V has 64 bits, which is 64 binary bits.
S102-2: according to the w of the allocated window widthAssign +.>Determining a preset modulus->,The method comprises the steps of carrying out a first treatment on the surface of the Wherein the preset modulus->And->Belongs to the same data segment, and->Is a bit length of a preset modulus +.>Between one and two times the bit length of (a);
s102-3: w according to the assigned window width, currentValue and preset modulus->Searching corresponding standby values from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby values; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +. >Performing a first round of main mode reduction operation on the value; wherein, the process of taking the model of V is called model reduction;
s102-4: after the first round of master reduction operation is completed, i is taken as index, andthe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>A value;
s102-5: judging updateValue and preset modulus->Is a relationship of (2); if update->The value is smaller than the preset modulus->Then update->The value is output as a modular multiplication result, i.e. +.>The method comprises the steps of carrying out a first treatment on the surface of the If update->The value is greater than or equal to the preset modulus->Then execute the calculation update->Value and preset modulus->The difference, i.e.)>Update->The value is subjected to a supplementary modular reduction operation to obtain the next +.>A step of value; judging the next->Whether the value is smaller than a preset modulus +.>If next->The value is smaller than the preset modulus->Then will next->The value is output as a modular multiplication result; if next +.>The value is greater than or equal to the preset modulus->Then repeatedly perform the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>The step of value until the next +.>The value is smaller than the preset modulus->. No additional subtraction is required in this process.
In general, for the field of modular multiplication calculation, such as homomorphic encryption field, a large number of modular multiplication components are needed to realize modular multiplication, and in the prior art, barrett modular reduction modular multiplication or Montgomery modular reduction modular multiplication is adopted, and when the two modular reduction modular multiplication are adopted, the digital signal processing (DSP, digital Signal Processing) chip resources for realizing multiplication in the FPGA are huge and even not enough. The modular multiplication based on the window modular reduction does not need DSP resources in the FPGA, and when the modular multiplication calculation is carried out by adopting the window modular reduction, the modular multiplication operation can be realized by adopting the shift and addition with the assistance of the fixed lookup table T, the multiplication operation is not needed, and the very short DSP resources in the FPGA are saved, so that the problem of insufficient DSP is directly solved.
Preferably, S102-3: w according to the assigned window width, currentValue and preset modulus->Searching corresponding standby values from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby values; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +. >The value is subjected to a first round of main mode reduction operation, and specifically comprises the following steps:
a first step of dividing w of the window width into two partsAssign +.>The current r value is compared with a preset modulus +.>Taking the remainder, taking the remainder as the first intermediate r value (i.e.: calculate +.>) The method comprises the steps of carrying out a first treatment on the surface of the The current r value is compared with the preset modulus +.>The specific operation of taking the remainder is as follows: intercepting the rear n-w i bit of the current r value, reserving the front n-w i bit of the current r value, taking the reserved front n-w i bit of the current r value as a first intermediate r value, and needing no additional addition, subtraction, multiplication and division operation;
shifting the current r value rightward by n-w i bits, and taking the latter n-w i bits of the current r value as an address value of a standby value T_out in a fixed lookup table T; namely: calculating address valuesIn is the shorthand for input; taking the following n-w i bits as an address value T_in of the fixed lookup table T;
a third step of searching a standby value T_out under the address value from a fixed lookup table T corresponding to the allocated window width w according to the address value; i.e. calculate the reserve value;
In the first step and the second step, the current r value is divided into two parts, the first part takes the n-w i bits behind the current r value as an address value T_in of a fixed lookup table T, the second part takes the remaining first n-w i bits of the current r value as a component part for solving the first intermediate r value, and the operation of solving the second intermediate r value in the fourth step is carried out;
The fourth step, based on the standby value t_out, updates the first intermediate r value to obtain a second intermediate r value:the method comprises the steps of carrying out a first treatment on the surface of the Updating the first intermediate r value based on the standby value T_out, wherein the specific operation of obtaining the second intermediate r value is as follows: the value of the spare value t_out shifted to the left by k-w i bits is used as spare value +.>And->The product of the two values is->And->The sum of the product and the first intermediate r-value is taken as the second intermediate r-value, +.>To calculate modulus +.>Is the highest order number of bits of (a). Multiplication operations->When adopting window mode reduction, the method does not need multiplier to calculate, and directly shifts k-w x i bits binary digits leftwards, which is equivalent to multiplication, thus saving DSP resources.
As can be seen from the first to fourth steps, if the window width w is wider, the depth of the fixed lookup table T is deeper (the depth of the fixed lookup table T is 2 w+1 ) The number of the required BRAM is more, the number of the second step and the fourth step corresponding to the shift addition is less, the required LUT is less, and the calculation speed is high. LUT resources refer to Lookup tables in the FPGA, are basic units of the FPGA, and are commonly used for realizing various logics of circuits.
If the window width w is smaller, the depth of the fixed lookup table T is lower, the required BRAM is less, the number of the results of a few bits can be calculated by the reduction operation of the main mode of each round, the required BRAM is less, the second step and the fourth step corresponding to the shift addition are less, the required LUT is more, and the calculation speed is low.
Preferably, S102-4: after the first round of master reduction operation is completed, i is taken as index, andthe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>The values specifically include:
cycling from a first step to a fourth step, wherein in the first step, i is taken as an index, andthe assignment of the R is reduced by 1 compared with the previous round, and the second intermediate r value obtained in the previous round is used as the current r value;
so circulated untilWhen the value of (a) is k, the wheel is taken as the last wheel;
and taking the second intermediate r value obtained in the fourth step of the last round as an updated r value.
If the window width w is wider, the number of cyclic main mode reduction operations is smaller, more BRAMs are required, and less LUTs are required. If the window width w is smaller, the number of times of cyclic main mode reduction operation is more, the required BRAM is less, and the required LUT is more.
In combination with an embodiment of the present invention, there is provided a method for generating a fixed lookup table, including:
setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;
for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 w+1 A plurality of reserve values, each of which is +.>,,/>;
Wherein,means will->Calculating modulus of table>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value.
It can be seen that when the window width w is determined and the bit length k of the table calculation modulus is determined, then each data in the look-up table T is fixedThe value of (2) is fixed. For example: if w=5, k=32, then there are 64 data in the fixed lookup table T, i has a value from 0 to 63, the spare value in the fixed lookup table T +.>Sequentially 0, 516095, 1032190 … …; the fixed lookup table T is a fixed value that has been calculated, and is a constant list, and is typically stored in BRAM (Block RAM) of the FPGA, and in modular multiplication calculation based on window modular reduction, there is only one fixed lookup table T in one modular multiplication for implementing shifting and addition operations when modular multiplication based on window modular reduction is adopted.
In connection with an embodiment of the present invention, a modular multiplication assembly is provided for use in a semi-custom circuit FPGA, the modular multiplication assembly being configured to:
receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources; and according to the allocated window width, carrying out modular multiplication on the data to be converted based on the window modular protocol to obtain a modular multiplication result, wherein the modular multiplication result is used for converting the data to be converted to form converted data.
The modular multiplication of the window modular reduction is adopted, and the modular multiplication module is used for half-custom circuit FPGA and realizes the modular multiplication logic in the FPGA. After receiving the data to be converted, firstly carrying out resource analysis according to the FPGA chip of the current FPGA to determine the window width of the window module protocol. The large window width means that the allocated resources are more and the calculation speed is high. The small window width means that the allocated resources are less and the calculation speed is slow; the balance of resources and speed is realized through the adjustable window width.
Preferably, as shown in fig. 2, the modular multiplication assembly comprises:
a primary assignment module 21 for assigning V to the modular multiplication resultAs the current->A value; wherein,v is the product of two multipliers in the modular multiplication, and subscript 2 indicates +.>All are binary values, and the subscript n-1 is the number of bits of V; and according to the w of the allocated window width +.>Assign +.>Determining a preset modulus->,The method comprises the steps of carrying out a first treatment on the surface of the Wherein V belongs to one of a plurality of data segments of the data to be converted; preset modulus->Belongs to the same data segment as V, and the bit length of V is in the preset modulus +.>Between one and two times the bit length of (a); wherein the modular multiplication result->Expressed as:,/>,/>and->Two multiplier variables representing the modular multiplication required respectively, multiplier one and multiplier two respectively,/-for the multiplier one and multiplier two respectively >,/>Is k bits, the window width is +.>Bit, window Width->Preset modulus with data to be converted +.>Related to the width of the window, general window width +.>Does not exceed a preset modulus of the data to be converted +.>Is a width of (c).
A first round master model reduction operation module 22 for currently performing a window width reduction operation according to w of the allocated window widthValue and preset modulus->Searching corresponding standby values from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby values; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +.>Performing a first round of main mode reduction operation on the value; wherein, the process of taking the model of V is called model reduction;
a master model reduction operation circulation module 23 for indexing i after the first round of master model reduction operation is completedThe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>A value;
a module multiplication result output module 24 for judging updateValue and preset modulus->Is a relationship of (2); if update->The value is smaller than the preset modulus->Then update- >The value is output as a modular multiplication result, i.e. +.>The method comprises the steps of carrying out a first treatment on the surface of the If update->The value is greater than or equal to the preset modulus->Then execute the calculation update->Value and preset modulus->The difference, i.e.)>Update->The value is subjected to a supplementary modular reduction operation to obtain the next +.>A step of value; judging the next->Whether the value is smaller than a preset modulus +.>If next->The value is smaller than the preset modulus->Then will next->The value is output as a modular multiplication result; if next +.>The value is greater than or equal to the preset modulus->Then repeatedly perform the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>The step of value until the next +.>The value is smaller than the preset modulus->. No additional subtraction is required in this process.
In general, for the field of modular multiplication calculation, such as homomorphic encryption field, a large number of modular multiplication components are needed to realize modular multiplication, and in the prior art, barrett modular reduction modular multiplication or Montgomery modular reduction modular multiplication is adopted, and when the two modular reduction modular multiplication are adopted, the digital signal processing (DSP, digital Signal Processing) chip resources for realizing multiplication in the FPGA are huge and even not enough. The modular multiplication based on the window modular reduction does not need DSP resources in the FPGA, and when the modular multiplication calculation is carried out by adopting the window modular reduction, the modular multiplication operation can be realized by adopting the shift and addition with the assistance of the fixed lookup table T, the multiplication operation is not needed, and the very short DSP resources in the FPGA are saved, so that the problem of insufficient DSP is directly solved.
Preferably, the first round master module subtracting operation module 22 specifically includes:
a first operation sub-module for dividing w of the window width into two partsAssign +.>The current r value is compared with a preset modulus +.>Taking the remainder, taking the remainder as the first intermediate r value (i.e.: calculate +.>) The method comprises the steps of carrying out a first treatment on the surface of the The current r value is compared with the preset modulus +.>The specific operation of taking the remainder is as follows: intercepting the rear n-w i bit of the current r value, reserving the front n-w i bit of the current r value, taking the reserved front n-w i bit of the current r value as a first intermediate r value, and needing no additional addition, subtraction, multiplication and division operation;
the second operation submodule is used for shifting the current r value to the right by n-w.i bits, and taking the latter n-w.i bits of the current r value as an address value of a standby value T_out in the fixed lookup table; namely: calculating address valuesIn is the shorthand for input; taking the following n-w i bits as an address value T_in of the fixed lookup table T;
a third operation sub-module, configured to search, according to the address value, a standby value t_out under the address value from a fixed lookup table T corresponding to the allocated window width w; i.e. calculate the reserve value;
In the first operation sub-module and the second operation sub-module, the current r value is divided into two parts, the first part takes the n-w i bits behind the current r value as an address value T_in of a fixed lookup table T, the second part takes the remaining n-w i bits of the current r value as a component part for solving a first intermediate r value, and the first operation sub-module enters a fourth operation sub-module to solve a second intermediate r value;
The fourth operation submodule is used for updating the first intermediate r value based on the standby value T_out to obtain a second intermediate r value:the method comprises the steps of carrying out a first treatment on the surface of the Updating the first intermediate r value based on the standby value T_out, wherein the specific operation of obtaining the second intermediate r value is as follows: the value of the spare value t_out shifted to the left by k-w i bits is used as spare value +.>And->The product of the two values is->And->The sum of the product and the first intermediate r-value is taken as the second intermediate r-value, +.>To calculate modulus +.>Is the highest order number of bits of (a). Multiplication operations->When adopting window mode reduction, the method does not need multiplier to calculate, and directly shifts k-w x i bits binary digits leftwards, which is equivalent to multiplication, thus saving DSP resources.
According to the first operation sub-module to the second operation sub-module, if the window width w is wider, the depth of searching the fixed lookup table T is deeper (the depth of the fixed lookup table T is 2 w+1 ) The result of several bits can be calculated by each round of main module reduction operation, the number of required BRAMs is more, and the second operation submodule and the fourth operation submodule corresponding to the shift addition are arrangedThe number of steps corresponding to the block is small, the required LUT is small, and the calculation speed is high. LUT resources refer to Lookup tables in the FPGA, are basic units of the FPGA, and are commonly used for realizing various logics of circuits.
If the window width w is smaller, the depth of the fixed lookup table T is lower, the required BRAM is less, the number of the results of a few bits can be calculated by performing the reduction operation on the main mode of each round, the required BRAM is less, the steps corresponding to the second operation sub-module and the fourth operation sub-module which are corresponding to the shift addition are more, the required LUT is more, and the calculation speed is low.
Preferably, the master module reduction operating cycle module 23 is specifically configured to:
cycling the first operation sub-module to the fourth operation sub-module, wherein in the first operation sub-module, i is taken as an index, and thenThe assignment of the R is reduced by 1 compared with the previous round, and the second intermediate r value obtained in the previous round is used as the current r value;
so circulated untilWhen the value of (a) is k, the wheel is taken as the last wheel;
and taking the second intermediate r value obtained in the fourth operation sub-module of the last round as an updated r value.
If the window width w is wider, the number of cyclic main mode reduction operations is smaller, more BRAMs are required, and less LUTs are required. If the window width w is smaller, the number of times of cyclic main mode reduction operation is more, the required BRAM is less, and the required LUT is more.
The embodiment of the invention also provides a fixed lookup table generating component which is used for:
Before the first receiving of the data to be converted, generating a fixed lookup table T through pre-calculation, and generating the fixed lookup table T through pre-calculation, wherein the method specifically comprises the following steps:
setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;
for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 w+1 A plurality of reserve values, each of which is +.>,,/>;
Wherein,means will->Calculating modulus of table>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value.
It can be seen that when the window width w is determined and the bit length k of the table calculation modulus is determined, then each data in the look-up table T is fixedThe value of (1) is solidAnd (3) determining. For example: if w=5, k=32, then there are 64 data in the fixed lookup table T, i has a value from 0 to 63, the spare value in the fixed lookup table T +.>Sequentially 0, 516095, 1032190 … …; the fixed lookup table T is a fixed value that has been calculated, and is a constant list, and is typically stored in BRAM (Block RAM) of the FPGA, and in modular multiplication calculation based on window modular reduction, there is only one fixed lookup table T in one modular multiplication for implementing shifting and addition operations when modular multiplication based on window modular reduction is adopted.
As shown in fig. 3, in connection with an embodiment of the present invention, a semi-custom circuit is provided that includes a fixed lookup table generation component 31, a memory 32, and any of the foregoing modular multiplication components 33; wherein:
a fixed lookup table generation component 31 for:
before the first receiving of the data to be converted, generating a fixed lookup table T through pre-calculation, and generating the fixed lookup table T through pre-calculation, wherein the method specifically comprises the following steps:
setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;
for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 w+1 A plurality of reserve values, each of which is +.>,,/>;
Wherein,means will->Calculating modulus of table>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value;
a memory 32 for storing a fixed lookup table T.
In the embodiment of the present invention, the window width w is set differently, so that the complexity of the circuit implementation is different, but the structure is similar, fig. 4 is a circuit diagram example of the FPGA implementation based on the modular reduction in the modular multiplication of the window modular reduction, where the window width w=8 is an example, V is 64 bits, and M and Z are 32 bits. The meaning of the parameters and symbols appearing in fig. 4 is as follows: ROM0-ROM3 is used for storing a fixed Lookup Table (Lookup Table) and is used for storing a fixed Lookup Table T based on modular multiplication of window modular reduction; the MUX is a multiplexer; CLK is a clock signal; ADDR is an address signal; DOUT is the data output; one block with a plus sign inside is an Adder (Adder) that is used to implement the addition of two numbers; one block with a minus sign inside is a Subtractor (sub-vector) for implementing a subtraction of two numbers; one box with a greater than number inside is a Comparator (Comparator) that compares the magnitude relationship of two numbers.
If BRAM resources in the FPGA are sufficient, the window width w can be properly enlarged, and the calculation speed of modular reduction in modular multiplication based on window modular reduction is improved. For example, z=v mod M, Z is 64 bits, V and M are 32 bits, and the window width w is 8, so that 5 clock cycles are required to complete the modulo of 64-bit data to 32-bit data; if the window width drop w is 5, 9 clock cycles are required, but the required BRAM is greatly reduced. Table 1 is a comparison of FPGA resources required for different window widths w.
Table 1 hardware operations and resource comparisons for different window width settings corresponding to FPGAs
The embodiment of the invention has the following effects:
1. the window module protocol is adopted to carry out module multiplication calculation, different settings of window width w can be carried out according to resources, the size of the window w can be adjusted, and the calculation speed of module multiplication can be adjusted, so that the used FPGA resources are different, and the calculation speeds are also different. The window width w is wide, so that the calculation speed is high, but the LUT consumes much; the window width w is narrow, the calculation speed is relatively slow, but the LUT consumption is small. If the resources of the FPGA are sufficient, a larger window width w can be considered, and the calculation speed is faster. The flexibly set window width w enables the modular multiplication based on window modular reduction to be suitable for different FPGAs, and different window widths w can be selected according to the current LUT consumption and calculation speed requirements of the FPGAs, so that a user can achieve the required balance between resources and calculation speed. For example, in the case where the window width w is greater than or equal to 5, the modular multiplication calculation in the embodiment of the present invention is faster than conventional general Barrett and Montgomery in calculation speed, and requires fewer clock cycles, so that the rate of the whole hardware is improved, as shown in Table 2.
Table 2: modular reduction and Barrett and Montgomery calculation speed comparison table based on modular multiplication of window modular reduction
2. Of course, the modular multiplication based on the window modular reduction can be matched with other modular multiplication modular reduction to balance the insufficient resources of the FPGA, solve the problem of insufficient resources and difficult time sequence convergence, and improve the conversion speed of the data to be converted, and naturally, the conversion speed is not faster than that of the traditional modular multiplication algorithm.
3. The hardware acceleration of the modular multiplication algorithm can be realized through the window modular rapid modular reduction, for example, the hardware acceleration of the homomorphic encryption algorithm can be realized in the homomorphic encryption field of privacy calculation, the polynomial multiplication can be used, and the algorithms such as NTT can be used.
It should be understood that the specific order or hierarchy of steps in the processes disclosed are examples of exemplary approaches. Based on design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.
In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate preferred embodiment of this invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. As will be apparent to those skilled in the art; various modifications to these embodiments will be readily apparent, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, as used in the specification or claims, the term "comprising" is intended to be inclusive in a manner similar to the term "comprising," as interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean "non-exclusive or".
Those of skill in the art will further appreciate that the various illustrative logical blocks (illustrative logical block), units, and steps described in connection with the embodiments of the invention may be implemented by electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components (illustrative components), elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation is not to be understood as beyond the scope of the embodiments of the present invention.
The various illustrative logical blocks or units described in the embodiments of the invention may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described. A general purpose processor may be a microprocessor, but in the alternative, the general purpose processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. In an example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may reside in a user terminal. In the alternative, the processor and the storage medium may reside as distinct components in a user terminal.
In one or more exemplary designs, the above-described functions of embodiments of the present invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer readable media includes both computer storage media and communication media that facilitate transfer of computer programs from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media may include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store program code in the form of instructions or data structures and other data structures that may be read by a general or special purpose computer, or a general or special purpose processor. Further, any connection is properly termed a computer-readable medium, e.g., if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless such as infrared, radio, and microwave, and is also included in the definition of computer-readable medium. The disks (disks) and disks (disks) include compact disks, laser disks, optical disks, DVDs, floppy disks, and blu-ray discs where disks usually reproduce data magnetically, while disks usually reproduce data optically with lasers. Combinations of the above may also be included within the computer-readable media.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (8)
1. A modular multiplication method comprising:
receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources;
according to the allocated window width, obtaining a modular multiplication result based on modular multiplication of window modular protocol and data to be converted, wherein the modular multiplication result is used for converting the data to be converted into converted data;
the modular multiplication based on the window modular protocol is matched with the data to be converted to obtain a modular multiplication result, and the method specifically comprises the following steps:
assigning V to a modular multiplication resultAs the current->A value; wherein (1)>V is the product of two multipliers in the modular multiplication, and subscript 2 indicates +.>All are binary values; wherein V belongs to one of a plurality of data segments of the data to be converted;
According to the w of the allocated window widthAssign +.>Determining a preset modulus->,/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the preset modulus ∈>Belongs to the same data segment as the V, and the bit length of the V is equal to the preset modulus +.>Between one and two times the bit length of (a);
w according to the assigned window width, currentValue and preset modulus->Searching a corresponding standby value from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby value; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +.>Performing a first round of main mode reduction operation on the value;
after the first round of master reduction operation is completed, i is taken as index, andthe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>A value;
judging updateValue and preset modulus->Is a relationship of (2);
if updatingThe value is smaller than the preset modulus->Then update->The value is output as a modular multiplication result;
if updatingThe value is greater than or equal to the preset modulus->Then execute the calculation update->Value and preset modulus- >Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>A step of value; judging the next->Whether the value is smaller than a preset modulus +.>If next->The value is smaller than the preset modulus->Then will next->The value is output as a modular multiplication result; if next +.>The value is greater than or equal to the preset modulus->Then repeatedly perform the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary module reduction operation to obtain the next value/>The step of value until the next +.>The value is smaller than the preset modulus->。
2. The modular multiplication method of claim 1, further comprising:
generating a fixed lookup table T through pre-calculation before receiving data to be converted for the first time;
the generating the fixed lookup table T by pre-calculation specifically includes:
setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;
for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 w+1 A plurality of reserve values, each of which is +.>,,/>;
Wherein,refers toWill->Calculating modulus of table>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value.
3. A modular multiplication method as claimed in claim 1, characterized in that said w according to the allocated window width willAssign +.>The method comprises the steps of carrying out a first treatment on the surface of the According to the assigned window width w, currently +.>Value and preset modulus->Searching a corresponding standby value from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby value; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +.>The value is subjected to a first round of main mode reduction operation, and specifically comprises the following steps:
a first step of dividing w of the window width into two partsAssign +.>The current r value is compared with a preset modulusTaking a remainder, and taking the remainder as a first intermediate r value; said comparing the current r value to a preset modulus +.>The specific operation of taking the remainder is as follows: intercepting the rear n-w i bit of the current r value, reserving the front n-w i bit of the current r value, and taking the reserved front n-w i bit of the current r value as a first intermediate r value;
shifting the current r value rightward by n-w i bits, and taking the latter n-w i bits of the current r value as an address value of a standby value T_out in a fixed lookup table T;
A third step of searching a standby value T_out under the address value from a fixed lookup table T corresponding to the allocated window width w according to the address value;
the fourth step, updating the first intermediate r value based on the standby value t_out to obtain a second intermediate r value:the method comprises the steps of carrying out a first treatment on the surface of the The specific operation of updating the first intermediate r value based on the standby value t_out to obtain the second intermediate r value is as follows: shifting the spare value t_out to the left by a value of k-w i bits as the spare value +.>And (3) withThe product of the said reserve value +.>And->The sum of the product and the first intermediate r-value is taken as the second intermediate r-value.
4. A modular multiplication method as claimed in claim 3, wherein after the first round of the master modular reduction operation is completed, i is used as an index to be used as an indexThe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>The values specifically include:
cycling from a first step to a fourth step, wherein in the first step, i is taken as an index, andthe assignment of the R is reduced by 1 compared with the previous round, and the second intermediate r value obtained in the previous round is used as the current r value;
so circulated untilWhen the value of (a) is k, the wheel is taken as the last wheel;
And taking the second intermediate r value obtained in the fourth step of the last round as an updated r value.
5. A modular multiplication assembly for use in a semi-custom circuit FPGA, the modular multiplication assembly comprising:
the method comprises the steps of receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources; according to the allocated window width, obtaining a modular multiplication result based on modular multiplication of window modular protocol and data to be converted, wherein the modular multiplication result is used for converting the data to be converted into converted data;
the modular multiplication assembly includes:
the primary assignment module is used for assigning V to the modular multiplication resultAs the current->A value; wherein (1)>V is the product of two multipliers in the modular multiplication, and subscript 2 indicates +.>All are binary values; and according to the w of the allocated window width +.>Assign +.>Determining a preset modulus->,/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein V belongs to one of a plurality of data segments of the data to be converted; the preset modulus->Belongs to the same data segment as the V, and the bit length of the V is equal to the preset modulus +.>Between one and two times the bit length of (a);
a first round of main module reduction operation module for current w according to the allocated window width Value and preset modulus->Searching a corresponding standby value from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby value; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +.>Performing a first round of main mode reduction operation on the value;
a main module reducing operation circulation module for indexing i after the first round of main module reducing operation is completedThe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>A value;
the module multiplication result output module is used for judging and updatingValue and preset modulus->Is a relationship of (2); if it is moreNew->The value is smaller than the preset modulusThen update->The value is output as a modular multiplication result; if update->The value is greater than or equal to the preset modulus->Then execute the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>A step of value; judging the next->Whether the value is smaller than a preset modulus +.>If next->The value is smaller than the preset modulus->Then will next->The value is output as a modular multiplication result; if next +. >The value is greater than or equal to the preset modulus->Then repeatedly perform the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>The step of value until the next +.>The value is smaller than the preset modulus->。
6. The modular multiplication assembly of claim 5, wherein the first round master module reduction operation module comprises:
a first operation sub-module for dividing w of the window width into two partsAssign +.>The current r value is compared with a preset modulus +.>Taking a remainder, and taking the remainder as a first intermediate r value; said comparing the current r value to a preset modulus +.>The specific operation of taking the remainder is as follows: intercepting the rear n-w i bit of the current r value, reserving the front n-w i bit of the current r value, and taking the reserved front n-w i bit of the current r value as a first intermediate r value;
the second operation submodule is used for shifting the current r value to the right by n-w.i bits, and taking the latter n-w.i bits of the current r value as an address value of a standby value T_out in the fixed lookup table;
a third operation sub-module, configured to search, according to the address value, a standby value t_out under the address value from a fixed lookup table T corresponding to the allocated window width w;
the fourth operation submodule is used for updating the first intermediate r value based on the standby value T_out to obtain a second intermediate r value: The method comprises the steps of carrying out a first treatment on the surface of the The specific operation of updating the first intermediate r value based on the standby value t_out to obtain the second intermediate r value is as follows: shifting the spare value T_out to the left by a value of k-w i bits as the spare valueAnd->The product of the said reserve value +.>And->The sum of the product and the first intermediate r-value is taken as the second intermediate r-value.
7. The modular multiplication assembly of claim 6, wherein the master module is configured to operate a loop module, in particular:
cycling the first operation sub-module to the fourth operation sub-module, wherein i is taken as an index in the first operation sub-module to be used as an indexThe assignment of the R is reduced by 1 compared with the previous round, and the second intermediate r value obtained in the previous round is used as the current r value;
so circulated untilWhen the value of (a) is k, the wheel is taken as the last wheel;
and taking the second intermediate r value obtained in the fourth operation sub-module of the last round as an updated r value.
8. A semi-custom circuit comprising a fixed look-up table generating component, a memory, and the modular multiplication component of any of claims 5-7; wherein:
the fixed lookup table generating component is configured to generate, by pre-calculation, a fixed lookup table T before receiving data to be converted for the first time, where the generating, by pre-calculation, the fixed lookup table T specifically includes:
Setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;
for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 w+1 A plurality of reserve values, each of which is +.>,,/>;
Wherein,means will->Calculating modulus +.>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value;
the memory is used for storing a fixed lookup table T;
the modular multiplication assembly includes:
the primary assignment module is used for assigning V to the modular multiplication resultAs the current->A value; wherein (1)>V is the product of two multipliers in the modular multiplication, and subscript 2 indicates +.>All are binary values; and according to the w of the allocated window width +.>Assign +.>Determining a preset modulus->,/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein V belongs to one of a plurality of data segments of the data to be converted; the preset modulus->Belongs to the same data segment as the V, and the bit length of the V is equal to the preset modulus +. >Between one and two times the bit length of (a);
a first round of main module reduction operation module for current w according to the allocated window widthValue and preset modulus->Searching a corresponding standby value from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby value; during the modulo reduction operation of modulo multiplication, a shift operation is used to implement the multiplication operation, and byThe shift operation and the addition operation are for the present->Performing a first round of main mode reduction operation on the value;
a main module reducing operation circulation module for indexing i after the first round of main module reducing operation is completedThe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>A value;
the module multiplication result output module is used for judging and updatingValue and preset modulus->Is a relationship of (2); if update->The value is smaller than the preset modulusThen update->The value is output as a modular multiplication result; if update->The value is greater than or equal to the preset modulus->Then execute the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>A step of value; judging the next- >Whether the value is smaller than a preset modulus +.>If next->The value is smaller than the preset modulus->Then will next->The value is output as a modular multiplication result; if next +.>The value is greater than or equal to the preset modulus->Then repeatedly perform the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>The step of value until the next +.>The value is smaller than the preset modulus->。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311709371.2A CN117406957B (en) | 2023-12-13 | 2023-12-13 | Modular multiplication method, modular multiplication assembly and semi-custom circuit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311709371.2A CN117406957B (en) | 2023-12-13 | 2023-12-13 | Modular multiplication method, modular multiplication assembly and semi-custom circuit |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117406957A CN117406957A (en) | 2024-01-16 |
CN117406957B true CN117406957B (en) | 2024-03-15 |
Family
ID=89498306
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311709371.2A Active CN117406957B (en) | 2023-12-13 | 2023-12-13 | Modular multiplication method, modular multiplication assembly and semi-custom circuit |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117406957B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117742664B (en) * | 2024-02-19 | 2024-07-19 | 粤港澳大湾区数字经济研究院(福田) | GPU-based modular method, device, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101004798A (en) * | 2006-12-30 | 2007-07-25 | 凤凰微电子(中国)有限公司 | Smart card of supporting high performance computing, large capacity storage, high-speed transmission, and new type application |
CN105740730A (en) * | 2014-12-10 | 2016-07-06 | 上海华虹集成电路有限责任公司 | Method for realizing secure point multiplication in chips |
CN114338042A (en) * | 2021-12-31 | 2022-04-12 | 观源(上海)科技有限公司 | High-speed isochronous modular inverse algorithm for order n in SM2 algorithm curve |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040133788A1 (en) * | 2003-01-07 | 2004-07-08 | Perkins Gregory M. | Multi-precision exponentiation method and apparatus |
-
2023
- 2023-12-13 CN CN202311709371.2A patent/CN117406957B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101004798A (en) * | 2006-12-30 | 2007-07-25 | 凤凰微电子(中国)有限公司 | Smart card of supporting high performance computing, large capacity storage, high-speed transmission, and new type application |
CN105740730A (en) * | 2014-12-10 | 2016-07-06 | 上海华虹集成电路有限责任公司 | Method for realizing secure point multiplication in chips |
CN114338042A (en) * | 2021-12-31 | 2022-04-12 | 观源(上海)科技有限公司 | High-speed isochronous modular inverse algorithm for order n in SM2 algorithm curve |
Also Published As
Publication number | Publication date |
---|---|
CN117406957A (en) | 2024-01-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117406957B (en) | Modular multiplication method, modular multiplication assembly and semi-custom circuit | |
CN115344237B (en) | Data processing method combining Karatsuba and Montgomery modular multiplication | |
US20200067695A1 (en) | Hardware masked substitution box for the data encryption standard | |
CN113467750A (en) | Large integer bit width division circuit and method for SRT algorithm with radix of 4 | |
CN102004627B (en) | Multiplication rounding implementation method and device | |
CN113837365A (en) | Model for realizing sigmoid function approximation, FPGA circuit and working method | |
CN116436709B (en) | Encryption and decryption method, device, equipment and medium for data | |
CN117331529A (en) | Divider logic circuit and method for realizing same | |
JP5175983B2 (en) | Arithmetic unit | |
CN115202616A (en) | Modular multiplier, security chip, electronic device and encryption method | |
CN113467752B (en) | Division operation device, data processing system and method for private calculation | |
CN102646033B (en) | Provide implementation method and the device of the RSA Algorithm of encryption and signature function | |
CN115270155A (en) | Method for obtaining maximum common divisor of big number expansion and hardware architecture | |
CN209560522U (en) | Obtain the hardware device of the intermediate result group in encryption and decryption operation | |
CN109947393B (en) | Operation method and device based on remainder device | |
JP2002287635A (en) | High-speed arithmetic circuit of sha arithmetic operation | |
CN104407837B (en) | A kind of device and its application process for realizing Galois Field multiplication | |
CN116820394B (en) | Scalar multiplication circuit oriented to elliptic curve encryption algorithm | |
CN214409954U (en) | Multiplication circuit in SSD master control chip | |
CN115765975B (en) | Low-power-consumption realization method of SHA-256 algorithm, chip, server and storage medium | |
CN101221555B (en) | Address generation method used for base-2 fast Fourier transform in-place computation | |
KR100858559B1 (en) | Method for adding and multipying redundant binary and Apparatus for adding and multipying redundant binary | |
CN108390761B (en) | Hardware implementation method of dual-domain modular inversion | |
CN111147390B (en) | Load sharing residual solving method and device | |
WO2001075635A2 (en) | Dsp execution unit for efficient alternate modes of operation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |