CN117406957B

CN117406957B - Modular multiplication method, modular multiplication assembly and semi-custom circuit

Info

Publication number: CN117406957B
Application number: CN202311709371.2A
Authority: CN
Inventors: 袁媛; 丁晓慧; 张海龙; 杨皓程
Original assignee: Beijing Rongshulianzhi Technology Co ltd
Current assignee: Beijing Rongshulianzhi Technology Co ltd
Priority date: 2023-12-13
Filing date: 2023-12-13
Publication date: 2024-03-15
Anticipated expiration: 2043-12-13
Also published as: CN117406957A

Abstract

The embodiment of the invention provides a modular multiplication method, a modular multiplication component and a semi-custom circuit, which relate to the field of privacy calculation, wherein the modular multiplication method comprises the following steps: receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources; and according to the allocated window width, carrying out modular multiplication on the data to be converted based on a window modular protocol to obtain a modular multiplication result, wherein the modular multiplication result is used for converting the data to be converted to form converted data. After receiving the data to be converted, firstly carrying out resource analysis according to the FPGA chip of the current FPGA to determine the window width of the window module protocol. The large window width means that the allocated resources are more and the calculation speed is high. The small window width means that the allocated resources are less and the calculation speed is slow; the balance of resources and speed is realized through the adjustable window width.

Description

Modular multiplication method, modular multiplication assembly and semi-custom circuit

Technical Field

The invention relates to the field of privacy calculation, in particular to a modular multiplication method, a modular multiplication component and a semi-custom circuit.

Background

For a lattice cryptosystem running on a polynomial ring, in various cryptographic structures, such as hash computation, digital signature, homomorphic encryption and the like, a large Number of modular multiplication operations, such as polynomial multiplication, number-Theoretic Transform, NTT and the like, are involved, so that modular multiplication is a main arithmetic component, and although modular multiplication is not a large-scale arithmetic component per se, the operation efficiency of the modular multiplication is directly determined by the operation efficiency of the whole homomorphic encryption algorithm due to intensive application in the homomorphic encryption algorithm.

In carrying out the present invention, the applicant has found that at least the following problems exist in the prior art:

in a hardware implementation of homomorphic encryption, modular multiplication typically employs Barrett modular reduction modular multiplication or Montgomery modular reduction modular multiplication. Barrett is a Barrett algorithm (Barrett Reduction) for implementing a remainder operation of dividing a large integer by a modulus; montgomery is a Montgomery algorithm (Montgomery Modular Multiplication) for performing modular multiplication operations.

For applications such as homomorphic encryption, which are computationally intensive and data intensive, the computation speed of the CPU has been difficult to meet the requirements of various applications and designs, so that the use of programmable logic devices or semi-custom circuit FPGAs (Field-Programmable Gate Array) has become the first choice for acceleration, and modular multiplication has become a key ring, and specific algorithm design and corresponding FPGA hardware have become a key in design in advance.

The design of the FPGA always balances the resource consumption and the speed, for example, homomorphic encryption has high requirements on the FPGA resource, and generally, the faster the hardware is, the higher the resource consumption and the higher the cost of the design. In addition, when the design occupies too much resources in the FPGA, the design verification and the timing sequence convergence are very difficult, and the balance between the resource consumption and the speed is difficult to achieve.

Disclosure of Invention

The embodiment of the invention provides a modular multiplication method, a modular multiplication component and a semi-custom circuit, which can solve the technical problems that modular multiplication computing resources are consumed and the speed is difficult to balance in the prior art.

To achieve the above object, in a first aspect, an embodiment of the present invention provides a modular multiplication method, including:

receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources;

and according to the allocated window width, carrying out modular multiplication on the data to be converted based on a window modular protocol to obtain a modular multiplication result, wherein the modular multiplication result is used for converting the data to be converted to form converted data.

In a second aspect, an embodiment of the present invention provides a method for generating a fixed lookup table, including:

setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;

for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 ^w+1 A plurality of reserve values, each of which is +.>，，/>；

Wherein,means will->Calculating modulus +.>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 ^w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value.

In a third aspect, an embodiment of the present invention provides a modular multiplication assembly, applied in a semi-custom circuit FPGA, the modular multiplication assembly:

the method comprises the steps of receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources; and according to the allocated window width, carrying out modular multiplication on the data to be converted based on a window modular protocol to obtain a modular multiplication result, wherein the modular multiplication result is used for converting the data to be converted to form converted data.

In a fourth aspect, an embodiment of the present invention provides a fixed lookup table generating component for:

before the first receiving of the data to be converted, generating a fixed lookup table T through pre-calculation, wherein the fixed lookup table T is generated through pre-calculation, and specifically comprises the following steps:

In a fifth aspect, an embodiment of the present invention provides a semi-custom circuit, including a fixed lookup table generating component, a memory, and the foregoing modular multiplication component; wherein:

the fixed lookup table generating component is configured to generate, by pre-calculation, a fixed lookup table T before receiving data to be converted for the first time, where the generating, by pre-calculation, the fixed lookup table T specifically includes:

Wherein,means will->Calculating modulus +.>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 ^w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value;

the memory is used for storing a fixed lookup table T.

The technical scheme has the following beneficial effects: after receiving the data to be converted, firstly carrying out resource analysis according to the FPGA chip of the current FPGA to determine the window width of the window module protocol. The large window width means that the allocated resources are more and the calculation speed is high. The small window width means that the allocated resources are less and the calculation speed is slow; the balance of resources and speed is realized through the adjustable window width.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a modular multiplication method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a modular multiplication assembly in accordance with an embodiment of the present invention;

FIG. 3 is a block diagram of a semi-custom circuit according to an embodiment of the present invention;

FIG. 4 is a circuit diagram of a design of a window module specification in an FPGA according to an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, in combination with an embodiment of the present invention, there is provided a modular multiplication method including:

s101: receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources;

s102: and according to the allocated window width, carrying out modular multiplication on the data to be converted based on the window modular protocol to obtain a modular multiplication result, wherein the modular multiplication result is used for converting the data to be converted to form converted data.

The modular multiplication of the window modular reduction is adopted, and the modular multiplication module is used for half-custom circuit FPGA and realizes the modular multiplication logic in the FPGA. After receiving the data to be converted, firstly carrying out resource analysis according to the FPGA chip of the current FPGA to determine the window width of the window module protocol. The large window width means that the allocated resources are more and the calculation speed is high. The small window width means that the allocated resources are less and the calculation speed is slow; the balance of resources and speed is realized through the adjustable window width.

Preferably, the modular multiplication method may further include:

s103: generating a fixed lookup table T through pre-calculation before receiving data to be converted for the first time;

in S103, the fixed lookup table T is generated by pre-calculation, specifically including:

s103-1: setting a table calculation modulus for calculating a modular multiplication for the fixed lookup table TAnd setting a plurality of window widths w;

s103-2: for each window width w, calculating a modulus from the window width w and the tableGenerating a corresponding fixed lookup table T for the window width w, wherein the fixed lookup table T has 2 ^w+1 A plurality of reserve values, each of which is +.>，，/>；

Wherein,means will->Calculating modulus of table>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 ^w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value.

It can be seen that when the window width w is determined and the bit length k of the table calculation modulus is determined, then each data in the look-up table T is fixedThe value of (2) is fixed. For example: if w=5, k=32, then there are 64 data in the fixed lookup table T, i has a value from 0 to 63, the spare value in the fixed lookup table T +.>Sequentially 0, 516095, 1032190 … …; the fixed lookup table T is a fixed value that has been calculated, and is a constant list, and is typically stored in BRAM (Block RAM) of the FPGA, and in modular multiplication calculation based on window modular reduction, there is only one fixed lookup table T in one modular multiplication for implementing shifting and addition operations when modular multiplication based on window modular reduction is adopted.

Preferably, in S102, the modular multiplication based on the window modular reduction cooperates with the data to be converted to obtain a modular multiplication result, which specifically includes:

the modular multiplication result is expressed as:，/>，/>，/>and->Two multiplier variables representing the modular multiplication required respectively, multiplier one and multiplier two respectively,/-for the multiplier one and multiplier two respectively>，/>Is k bits, the window width is +.>Bit, window Width->Preset modulus with data to be converted +.>Is of a width of (1)Degree-dependent, general window widthDoes not exceed a preset modulus of the data to be converted +.>Is a width of (c).

S102-1: assigning V to a modular multiplication resultAs the current->A value; wherein (1)>V is the product of two multipliers in the modular multiplication, and subscript 2 indicates +.>All are binary values, and the subscript n-1 is the number of bits of V; wherein (1)>One of a plurality of data segments belonging to data to be converted; for example, n=64 bits indicates that V has 64 bits, which is 64 binary bits.

S102-2: according to the w of the allocated window widthAssign +.>Determining a preset modulus->，The method comprises the steps of carrying out a first treatment on the surface of the Wherein the preset modulus->And->Belongs to the same data segment, and->Is a bit length of a preset modulus +.>Between one and two times the bit length of (a);

s102-3: w according to the assigned window width, currentValue and preset modulus->Searching corresponding standby values from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby values; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +. >Performing a first round of main mode reduction operation on the value; wherein, the process of taking the model of V is called model reduction;

s102-4: after the first round of master reduction operation is completed, i is taken as index, andthe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>A value;

s102-5: judging updateValue and preset modulus->Is a relationship of (2); if update->The value is smaller than the preset modulus->Then update->The value is output as a modular multiplication result, i.e. +.>The method comprises the steps of carrying out a first treatment on the surface of the If update->The value is greater than or equal to the preset modulus->Then execute the calculation update->Value and preset modulus->The difference, i.e.)>Update->The value is subjected to a supplementary modular reduction operation to obtain the next +.>A step of value; judging the next->Whether the value is smaller than a preset modulus +.>If next->The value is smaller than the preset modulus->Then will next->The value is output as a modular multiplication result; if next +.>The value is greater than or equal to the preset modulus->Then repeatedly perform the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>The step of value until the next +.>The value is smaller than the preset modulus->. No additional subtraction is required in this process.

In general, for the field of modular multiplication calculation, such as homomorphic encryption field, a large number of modular multiplication components are needed to realize modular multiplication, and in the prior art, barrett modular reduction modular multiplication or Montgomery modular reduction modular multiplication is adopted, and when the two modular reduction modular multiplication are adopted, the digital signal processing (DSP, digital Signal Processing) chip resources for realizing multiplication in the FPGA are huge and even not enough. The modular multiplication based on the window modular reduction does not need DSP resources in the FPGA, and when the modular multiplication calculation is carried out by adopting the window modular reduction, the modular multiplication operation can be realized by adopting the shift and addition with the assistance of the fixed lookup table T, the multiplication operation is not needed, and the very short DSP resources in the FPGA are saved, so that the problem of insufficient DSP is directly solved.

Preferably, S102-3: w according to the assigned window width, currentValue and preset modulus->Searching corresponding standby values from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby values; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +. >The value is subjected to a first round of main mode reduction operation, and specifically comprises the following steps:

a first step of dividing w of the window width into two partsAssign +.>The current r value is compared with a preset modulus +.>Taking the remainder, taking the remainder as the first intermediate r value (i.e.: calculate +.>) The method comprises the steps of carrying out a first treatment on the surface of the The current r value is compared with the preset modulus +.>The specific operation of taking the remainder is as follows: intercepting the rear n-w i bit of the current r value, reserving the front n-w i bit of the current r value, taking the reserved front n-w i bit of the current r value as a first intermediate r value, and needing no additional addition, subtraction, multiplication and division operation;

shifting the current r value rightward by n-w i bits, and taking the latter n-w i bits of the current r value as an address value of a standby value T_out in a fixed lookup table T; namely: calculating address valuesIn is the shorthand for input; taking the following n-w i bits as an address value T_in of the fixed lookup table T;

a third step of searching a standby value T_out under the address value from a fixed lookup table T corresponding to the allocated window width w according to the address value; i.e. calculate the reserve value；

In the first step and the second step, the current r value is divided into two parts, the first part takes the n-w i bits behind the current r value as an address value T_in of a fixed lookup table T, the second part takes the remaining first n-w i bits of the current r value as a component part for solving the first intermediate r value, and the operation of solving the second intermediate r value in the fourth step is carried out;

The fourth step, based on the standby value t_out, updates the first intermediate r value to obtain a second intermediate r value:the method comprises the steps of carrying out a first treatment on the surface of the Updating the first intermediate r value based on the standby value T_out, wherein the specific operation of obtaining the second intermediate r value is as follows: the value of the spare value t_out shifted to the left by k-w i bits is used as spare value +.>And->The product of the two values is->And->The sum of the product and the first intermediate r-value is taken as the second intermediate r-value, +.>To calculate modulus +.>Is the highest order number of bits of (a). Multiplication operations->When adopting window mode reduction, the method does not need multiplier to calculate, and directly shifts k-w x i bits binary digits leftwards, which is equivalent to multiplication, thus saving DSP resources.

As can be seen from the first to fourth steps, if the window width w is wider, the depth of the fixed lookup table T is deeper (the depth of the fixed lookup table T is 2 ^w+1 ) The number of the required BRAM is more, the number of the second step and the fourth step corresponding to the shift addition is less, the required LUT is less, and the calculation speed is high. LUT resources refer to Lookup tables in the FPGA, are basic units of the FPGA, and are commonly used for realizing various logics of circuits.

If the window width w is smaller, the depth of the fixed lookup table T is lower, the required BRAM is less, the number of the results of a few bits can be calculated by the reduction operation of the main mode of each round, the required BRAM is less, the second step and the fourth step corresponding to the shift addition are less, the required LUT is more, and the calculation speed is low.

Preferably, S102-4: after the first round of master reduction operation is completed, i is taken as index, andthe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>The values specifically include:

cycling from a first step to a fourth step, wherein in the first step, i is taken as an index, andthe assignment of the R is reduced by 1 compared with the previous round, and the second intermediate r value obtained in the previous round is used as the current r value;

so circulated untilWhen the value of (a) is k, the wheel is taken as the last wheel;

and taking the second intermediate r value obtained in the fourth step of the last round as an updated r value.

If the window width w is wider, the number of cyclic main mode reduction operations is smaller, more BRAMs are required, and less LUTs are required. If the window width w is smaller, the number of times of cyclic main mode reduction operation is more, the required BRAM is less, and the required LUT is more.

In combination with an embodiment of the present invention, there is provided a method for generating a fixed lookup table, including:

In connection with an embodiment of the present invention, a modular multiplication assembly is provided for use in a semi-custom circuit FPGA, the modular multiplication assembly being configured to:

receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources; and according to the allocated window width, carrying out modular multiplication on the data to be converted based on the window modular protocol to obtain a modular multiplication result, wherein the modular multiplication result is used for converting the data to be converted to form converted data.

Preferably, as shown in fig. 2, the modular multiplication assembly comprises:

a primary assignment module 21 for assigning V to the modular multiplication resultAs the current->A value; wherein,v is the product of two multipliers in the modular multiplication, and subscript 2 indicates +.>All are binary values, and the subscript n-1 is the number of bits of V; and according to the w of the allocated window width +.>Assign +.>Determining a preset modulus->，The method comprises the steps of carrying out a first treatment on the surface of the Wherein V belongs to one of a plurality of data segments of the data to be converted; preset modulus->Belongs to the same data segment as V, and the bit length of V is in the preset modulus +.>Between one and two times the bit length of (a); wherein the modular multiplication result->Expressed as:，/>，/>and->Two multiplier variables representing the modular multiplication required respectively, multiplier one and multiplier two respectively,/-for the multiplier one and multiplier two respectively >，/>Is k bits, the window width is +.>Bit, window Width->Preset modulus with data to be converted +.>Related to the width of the window, general window width +.>Does not exceed a preset modulus of the data to be converted +.>Is a width of (c).

A first round master model reduction operation module 22 for currently performing a window width reduction operation according to w of the allocated window widthValue and preset modulus->Searching corresponding standby values from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby values; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +.>Performing a first round of main mode reduction operation on the value; wherein, the process of taking the model of V is called model reduction;

a master model reduction operation circulation module 23 for indexing i after the first round of master model reduction operation is completedThe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>A value;

a module multiplication result output module 24 for judging updateValue and preset modulus->Is a relationship of (2); if update->The value is smaller than the preset modulus->Then update- >The value is output as a modular multiplication result, i.e. +.>The method comprises the steps of carrying out a first treatment on the surface of the If update->The value is greater than or equal to the preset modulus->Then execute the calculation update->Value and preset modulus->The difference, i.e.)>Update->The value is subjected to a supplementary modular reduction operation to obtain the next +.>A step of value; judging the next->Whether the value is smaller than a preset modulus +.>If next->The value is smaller than the preset modulus->Then will next->The value is output as a modular multiplication result; if next +.>The value is greater than or equal to the preset modulus->Then repeatedly perform the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>The step of value until the next +.>The value is smaller than the preset modulus->. No additional subtraction is required in this process.

Preferably, the first round master module subtracting operation module 22 specifically includes:

a first operation sub-module for dividing w of the window width into two partsAssign +.>The current r value is compared with a preset modulus +.>Taking the remainder, taking the remainder as the first intermediate r value (i.e.: calculate +.>) The method comprises the steps of carrying out a first treatment on the surface of the The current r value is compared with the preset modulus +.>The specific operation of taking the remainder is as follows: intercepting the rear n-w i bit of the current r value, reserving the front n-w i bit of the current r value, taking the reserved front n-w i bit of the current r value as a first intermediate r value, and needing no additional addition, subtraction, multiplication and division operation;

the second operation submodule is used for shifting the current r value to the right by n-w.i bits, and taking the latter n-w.i bits of the current r value as an address value of a standby value T_out in the fixed lookup table; namely: calculating address valuesIn is the shorthand for input; taking the following n-w i bits as an address value T_in of the fixed lookup table T;

a third operation sub-module, configured to search, according to the address value, a standby value t_out under the address value from a fixed lookup table T corresponding to the allocated window width w; i.e. calculate the reserve value；

In the first operation sub-module and the second operation sub-module, the current r value is divided into two parts, the first part takes the n-w i bits behind the current r value as an address value T_in of a fixed lookup table T, the second part takes the remaining n-w i bits of the current r value as a component part for solving a first intermediate r value, and the first operation sub-module enters a fourth operation sub-module to solve a second intermediate r value;

The fourth operation submodule is used for updating the first intermediate r value based on the standby value T_out to obtain a second intermediate r value:the method comprises the steps of carrying out a first treatment on the surface of the Updating the first intermediate r value based on the standby value T_out, wherein the specific operation of obtaining the second intermediate r value is as follows: the value of the spare value t_out shifted to the left by k-w i bits is used as spare value +.>And->The product of the two values is->And->The sum of the product and the first intermediate r-value is taken as the second intermediate r-value, +.>To calculate modulus +.>Is the highest order number of bits of (a). Multiplication operations->When adopting window mode reduction, the method does not need multiplier to calculate, and directly shifts k-w x i bits binary digits leftwards, which is equivalent to multiplication, thus saving DSP resources.

According to the first operation sub-module to the second operation sub-module, if the window width w is wider, the depth of searching the fixed lookup table T is deeper (the depth of the fixed lookup table T is 2 ^w+1 ) The result of several bits can be calculated by each round of main module reduction operation, the number of required BRAMs is more, and the second operation submodule and the fourth operation submodule corresponding to the shift addition are arrangedThe number of steps corresponding to the block is small, the required LUT is small, and the calculation speed is high. LUT resources refer to Lookup tables in the FPGA, are basic units of the FPGA, and are commonly used for realizing various logics of circuits.

If the window width w is smaller, the depth of the fixed lookup table T is lower, the required BRAM is less, the number of the results of a few bits can be calculated by performing the reduction operation on the main mode of each round, the required BRAM is less, the steps corresponding to the second operation sub-module and the fourth operation sub-module which are corresponding to the shift addition are more, the required LUT is more, and the calculation speed is low.

Preferably, the master module reduction operating cycle module 23 is specifically configured to:

cycling the first operation sub-module to the fourth operation sub-module, wherein in the first operation sub-module, i is taken as an index, and thenThe assignment of the R is reduced by 1 compared with the previous round, and the second intermediate r value obtained in the previous round is used as the current r value;

and taking the second intermediate r value obtained in the fourth operation sub-module of the last round as an updated r value.

The embodiment of the invention also provides a fixed lookup table generating component which is used for:

Before the first receiving of the data to be converted, generating a fixed lookup table T through pre-calculation, and generating the fixed lookup table T through pre-calculation, wherein the method specifically comprises the following steps:

It can be seen that when the window width w is determined and the bit length k of the table calculation modulus is determined, then each data in the look-up table T is fixedThe value of (1) is solidAnd (3) determining. For example: if w=5, k=32, then there are 64 data in the fixed lookup table T, i has a value from 0 to 63, the spare value in the fixed lookup table T +.>Sequentially 0, 516095, 1032190 … …; the fixed lookup table T is a fixed value that has been calculated, and is a constant list, and is typically stored in BRAM (Block RAM) of the FPGA, and in modular multiplication calculation based on window modular reduction, there is only one fixed lookup table T in one modular multiplication for implementing shifting and addition operations when modular multiplication based on window modular reduction is adopted.

As shown in fig. 3, in connection with an embodiment of the present invention, a semi-custom circuit is provided that includes a fixed lookup table generation component 31, a memory 32, and any of the foregoing modular multiplication components 33; wherein:

a fixed lookup table generation component 31 for:

Wherein,means will->Calculating modulus of table>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 ^w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value;

a memory 32 for storing a fixed lookup table T.

In the embodiment of the present invention, the window width w is set differently, so that the complexity of the circuit implementation is different, but the structure is similar, fig. 4 is a circuit diagram example of the FPGA implementation based on the modular reduction in the modular multiplication of the window modular reduction, where the window width w=8 is an example, V is 64 bits, and M and Z are 32 bits. The meaning of the parameters and symbols appearing in fig. 4 is as follows: ROM0-ROM3 is used for storing a fixed Lookup Table (Lookup Table) and is used for storing a fixed Lookup Table T based on modular multiplication of window modular reduction; the MUX is a multiplexer; CLK is a clock signal; ADDR is an address signal; DOUT is the data output; one block with a plus sign inside is an Adder (Adder) that is used to implement the addition of two numbers; one block with a minus sign inside is a Subtractor (sub-vector) for implementing a subtraction of two numbers; one box with a greater than number inside is a Comparator (Comparator) that compares the magnitude relationship of two numbers.

If BRAM resources in the FPGA are sufficient, the window width w can be properly enlarged, and the calculation speed of modular reduction in modular multiplication based on window modular reduction is improved. For example, z=v mod M, Z is 64 bits, V and M are 32 bits, and the window width w is 8, so that 5 clock cycles are required to complete the modulo of 64-bit data to 32-bit data; if the window width drop w is 5, 9 clock cycles are required, but the required BRAM is greatly reduced. Table 1 is a comparison of FPGA resources required for different window widths w.

Table 1 hardware operations and resource comparisons for different window width settings corresponding to FPGAs

The embodiment of the invention has the following effects:

1. the window module protocol is adopted to carry out module multiplication calculation, different settings of window width w can be carried out according to resources, the size of the window w can be adjusted, and the calculation speed of module multiplication can be adjusted, so that the used FPGA resources are different, and the calculation speeds are also different. The window width w is wide, so that the calculation speed is high, but the LUT consumes much; the window width w is narrow, the calculation speed is relatively slow, but the LUT consumption is small. If the resources of the FPGA are sufficient, a larger window width w can be considered, and the calculation speed is faster. The flexibly set window width w enables the modular multiplication based on window modular reduction to be suitable for different FPGAs, and different window widths w can be selected according to the current LUT consumption and calculation speed requirements of the FPGAs, so that a user can achieve the required balance between resources and calculation speed. For example, in the case where the window width w is greater than or equal to 5, the modular multiplication calculation in the embodiment of the present invention is faster than conventional general Barrett and Montgomery in calculation speed, and requires fewer clock cycles, so that the rate of the whole hardware is improved, as shown in Table 2.

Table 2: modular reduction and Barrett and Montgomery calculation speed comparison table based on modular multiplication of window modular reduction

2. Of course, the modular multiplication based on the window modular reduction can be matched with other modular multiplication modular reduction to balance the insufficient resources of the FPGA, solve the problem of insufficient resources and difficult time sequence convergence, and improve the conversion speed of the data to be converted, and naturally, the conversion speed is not faster than that of the traditional modular multiplication algorithm.

3. The hardware acceleration of the modular multiplication algorithm can be realized through the window modular rapid modular reduction, for example, the hardware acceleration of the homomorphic encryption algorithm can be realized in the homomorphic encryption field of privacy calculation, the polynomial multiplication can be used, and the algorithms such as NTT can be used.

It should be understood that the specific order or hierarchy of steps in the processes disclosed are examples of exemplary approaches. Based on design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate preferred embodiment of this invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. As will be apparent to those skilled in the art; various modifications to these embodiments will be readily apparent, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, as used in the specification or claims, the term "comprising" is intended to be inclusive in a manner similar to the term "comprising," as interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean "non-exclusive or".

Those of skill in the art will further appreciate that the various illustrative logical blocks (illustrative logical block), units, and steps described in connection with the embodiments of the invention may be implemented by electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components (illustrative components), elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation is not to be understood as beyond the scope of the embodiments of the present invention.

The various illustrative logical blocks or units described in the embodiments of the invention may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described. A general purpose processor may be a microprocessor, but in the alternative, the general purpose processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. In an example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may reside in a user terminal. In the alternative, the processor and the storage medium may reside as distinct components in a user terminal.

In one or more exemplary designs, the above-described functions of embodiments of the present invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer readable media includes both computer storage media and communication media that facilitate transfer of computer programs from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media may include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store program code in the form of instructions or data structures and other data structures that may be read by a general or special purpose computer, or a general or special purpose processor. Further, any connection is properly termed a computer-readable medium, e.g., if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless such as infrared, radio, and microwave, and is also included in the definition of computer-readable medium. The disks (disks) and disks (disks) include compact disks, laser disks, optical disks, DVDs, floppy disks, and blu-ray discs where disks usually reproduce data magnetically, while disks usually reproduce data optically with lasers. Combinations of the above may also be included within the computer-readable media.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A modular multiplication method comprising:

according to the allocated window width, obtaining a modular multiplication result based on modular multiplication of window modular protocol and data to be converted, wherein the modular multiplication result is used for converting the data to be converted into converted data;

the modular multiplication based on the window modular protocol is matched with the data to be converted to obtain a modular multiplication result, and the method specifically comprises the following steps:

assigning V to a modular multiplication resultAs the current->A value; wherein (1)>V is the product of two multipliers in the modular multiplication, and subscript 2 indicates +.>All are binary values; wherein V belongs to one of a plurality of data segments of the data to be converted;

According to the w of the allocated window widthAssign +.>Determining a preset modulus->，/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the preset modulus ∈>Belongs to the same data segment as the V, and the bit length of the V is equal to the preset modulus +.>Between one and two times the bit length of (a);

w according to the assigned window width, currentValue and preset modulus->Searching a corresponding standby value from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby value; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +.>Performing a first round of main mode reduction operation on the value;

after the first round of master reduction operation is completed, i is taken as index, andthe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>A value;

judging updateValue and preset modulus->Is a relationship of (2);

if updatingThe value is smaller than the preset modulus->Then update->The value is output as a modular multiplication result;

if updatingThe value is greater than or equal to the preset modulus->Then execute the calculation update->Value and preset modulus- >Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>A step of value; judging the next->Whether the value is smaller than a preset modulus +.>If next->The value is smaller than the preset modulus->Then will next->The value is output as a modular multiplication result; if next +.>The value is greater than or equal to the preset modulus->Then repeatedly perform the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary module reduction operation to obtain the next value/>The step of value until the next +.>The value is smaller than the preset modulus->。

2. The modular multiplication method of claim 1, further comprising:

generating a fixed lookup table T through pre-calculation before receiving data to be converted for the first time;

the generating the fixed lookup table T by pre-calculation specifically includes:

Wherein,refers toWill->Calculating modulus of table>Taking the value obtained after the remainder; w represents a window width; i represents a cyclic index, i is from 1 to 2 ^w+1 Is a positive integer of (2); k represents modulus +.>The highest number of digits of (2); % represents a remainder taking operation;subscript 2 on the right side of the equal sign indicates +.>Is a binary value.

3. A modular multiplication method as claimed in claim 1, characterized in that said w according to the allocated window width willAssign +.>The method comprises the steps of carrying out a first treatment on the surface of the According to the assigned window width w, currently +.>Value and preset modulus->Searching a corresponding standby value from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby value; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +.>The value is subjected to a first round of main mode reduction operation, and specifically comprises the following steps:

a first step of dividing w of the window width into two partsAssign +.>The current r value is compared with a preset modulusTaking a remainder, and taking the remainder as a first intermediate r value; said comparing the current r value to a preset modulus +.>The specific operation of taking the remainder is as follows: intercepting the rear n-w i bit of the current r value, reserving the front n-w i bit of the current r value, and taking the reserved front n-w i bit of the current r value as a first intermediate r value;

shifting the current r value rightward by n-w i bits, and taking the latter n-w i bits of the current r value as an address value of a standby value T_out in a fixed lookup table T;

A third step of searching a standby value T_out under the address value from a fixed lookup table T corresponding to the allocated window width w according to the address value;

the fourth step, updating the first intermediate r value based on the standby value t_out to obtain a second intermediate r value:the method comprises the steps of carrying out a first treatment on the surface of the The specific operation of updating the first intermediate r value based on the standby value t_out to obtain the second intermediate r value is as follows: shifting the spare value t_out to the left by a value of k-w i bits as the spare value +.>And (3) withThe product of the said reserve value +.>And->The sum of the product and the first intermediate r-value is taken as the second intermediate r-value.

4. A modular multiplication method as claimed in claim 3, wherein after the first round of the master modular reduction operation is completed, i is used as an index to be used as an indexThe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>The values specifically include:

5. A modular multiplication assembly for use in a semi-custom circuit FPGA, the modular multiplication assembly comprising:

the method comprises the steps of receiving data to be converted, and determining window width allocated to the data to be converted according to available hardware resources; according to the allocated window width, obtaining a modular multiplication result based on modular multiplication of window modular protocol and data to be converted, wherein the modular multiplication result is used for converting the data to be converted into converted data;

the modular multiplication assembly includes:

the primary assignment module is used for assigning V to the modular multiplication resultAs the current->A value; wherein (1)>V is the product of two multipliers in the modular multiplication, and subscript 2 indicates +.>All are binary values; and according to the w of the allocated window width +.>Assign +.>Determining a preset modulus->，/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein V belongs to one of a plurality of data segments of the data to be converted; the preset modulus->Belongs to the same data segment as the V, and the bit length of the V is equal to the preset modulus +.>Between one and two times the bit length of (a);

a first round of main module reduction operation module for current w according to the allocated window width Value and preset modulus->Searching a corresponding standby value from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby value; during the modular reduction operation of modular multiplication, a shift operation is used to implement the multiplication operation, and the current +.>Performing a first round of main mode reduction operation on the value;

a main module reducing operation circulation module for indexing i after the first round of main module reducing operation is completedThe assignment of the master model is lowered by 1 compared with the previous round to carry out the master model reduction operation of the next round; thus will->The method loops to k to finish the multi-round master mode reduction operation to obtain the update +.>A value;

the module multiplication result output module is used for judging and updatingValue and preset modulus->Is a relationship of (2); if it is moreNew->The value is smaller than the preset modulusThen update->The value is output as a modular multiplication result; if update->The value is greater than or equal to the preset modulus->Then execute the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>A step of value; judging the next->Whether the value is smaller than a preset modulus +.>If next->The value is smaller than the preset modulus->Then will next->The value is output as a modular multiplication result; if next +. >The value is greater than or equal to the preset modulus->Then repeatedly perform the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>The step of value until the next +.>The value is smaller than the preset modulus->。

6. The modular multiplication assembly of claim 5, wherein the first round master module reduction operation module comprises:

a first operation sub-module for dividing w of the window width into two partsAssign +.>The current r value is compared with a preset modulus +.>Taking a remainder, and taking the remainder as a first intermediate r value; said comparing the current r value to a preset modulus +.>The specific operation of taking the remainder is as follows: intercepting the rear n-w i bit of the current r value, reserving the front n-w i bit of the current r value, and taking the reserved front n-w i bit of the current r value as a first intermediate r value;

the second operation submodule is used for shifting the current r value to the right by n-w.i bits, and taking the latter n-w.i bits of the current r value as an address value of a standby value T_out in the fixed lookup table;

a third operation sub-module, configured to search, according to the address value, a standby value t_out under the address value from a fixed lookup table T corresponding to the allocated window width w;

the fourth operation submodule is used for updating the first intermediate r value based on the standby value T_out to obtain a second intermediate r value: The method comprises the steps of carrying out a first treatment on the surface of the The specific operation of updating the first intermediate r value based on the standby value t_out to obtain the second intermediate r value is as follows: shifting the spare value T_out to the left by a value of k-w i bits as the spare valueAnd->The product of the said reserve value +.>And->The sum of the product and the first intermediate r-value is taken as the second intermediate r-value.

7. The modular multiplication assembly of claim 6, wherein the master module is configured to operate a loop module, in particular:

cycling the first operation sub-module to the fourth operation sub-module, wherein i is taken as an index in the first operation sub-module to be used as an indexThe assignment of the R is reduced by 1 compared with the previous round, and the second intermediate r value obtained in the previous round is used as the current r value;

8. A semi-custom circuit comprising a fixed look-up table generating component, a memory, and the modular multiplication component of any of claims 5-7; wherein:

the memory is used for storing a fixed lookup table T;

the modular multiplication assembly includes:

the primary assignment module is used for assigning V to the modular multiplication resultAs the current->A value; wherein (1)>V is the product of two multipliers in the modular multiplication, and subscript 2 indicates +.>All are binary values; and according to the w of the allocated window width +.>Assign +.>Determining a preset modulus->，/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein V belongs to one of a plurality of data segments of the data to be converted; the preset modulus->Belongs to the same data segment as the V, and the bit length of the V is equal to the preset modulus +. >Between one and two times the bit length of (a);

a first round of main module reduction operation module for current w according to the allocated window widthValue and preset modulus->Searching a corresponding standby value from a fixed lookup table T corresponding to the allocated window width w, and performing modular reduction operation of modular multiplication based on the standby value; during the modulo reduction operation of modulo multiplication, a shift operation is used to implement the multiplication operation, and byThe shift operation and the addition operation are for the present->Performing a first round of main mode reduction operation on the value;

the module multiplication result output module is used for judging and updatingValue and preset modulus->Is a relationship of (2); if update->The value is smaller than the preset modulusThen update->The value is output as a modular multiplication result; if update->The value is greater than or equal to the preset modulus->Then execute the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>A step of value; judging the next- >Whether the value is smaller than a preset modulus +.>If next->The value is smaller than the preset modulus->Then will next->The value is output as a modular multiplication result; if next +.>The value is greater than or equal to the preset modulus->Then repeatedly perform the calculation update->Value and preset modulus->Difference of->The value is subjected to a supplementary modular reduction operation to obtain the next +.>The step of value until the next +.>The value is smaller than the preset modulus->。