CN114840174A

CN114840174A - System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers

Info

Publication number: CN114840174A
Application number: CN202210565348.XA
Authority: CN
Inventors: 王立峰; 张奇惠; 刘曼
Original assignee: Guangzhou Wise Security Technology Co Ltd
Current assignee: Guangzhou Wise Security Technology Co Ltd
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-08-02
Anticipated expiration: 2042-05-18
Also published as: CN114840174B

Abstract

The invention provides a system and a method for quickly realizing Montgomery modular multiplication by using multiple multipliers, and relates to the technical field of high-efficiency performance algorithms of security chips. The system carries out combined operation based on the existing point addition, double points, modular exponentiation, modular inversion, modular subtraction, modular addition and modular multiplication modules, optimizes the calculation mode of the original Montgomery modular multiplication formula loop iteration, uses a plurality of 64-bit multipliers for parallel operation, greatly improves the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improves the performance of a security chip.

Description

System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers

Technical Field

The invention relates to the technical field of security algorithms, in particular to a system and a method for quickly realizing Montgomery modular multiplication by using multiple multipliers.

Background

At present, the asymmetric cryptographic chip mostly uses an elliptic curve, and the elliptic curve public key cryptography is based on the following curve characteristics: 1. the elliptic curves on the finite field form a finite exchange group under the point addition operation, and the order of the finite exchange group is similar to the scale of the fundamental field. 2. Similar to exponentiation in finite field multiplications, the multiple point operations in elliptic curves constitute a one-way function.

In the multi-point operation, the problem of solving the multiple with the known multi-point and base point is called the elliptic curve discrete logarithm problem. For the discrete logarithm problem of a general elliptic curve, only an exponential calculation complexity solving method exists at present. Compared with the large number decomposition problem and the discrete logarithm problem in a finite field, the solution difficulty of the elliptic curve discrete logarithm problem is much higher.

The elliptic curve public key password is composed of operations of point multiplication and multiple points and point addition and modular exponentiation in curve calculation, and can be finally decomposed into operation modes of modular multiplication, modular addition and modular subtraction.

The implementation of large digital-to-analog multiplication in the prior art mainly uses Montgomery modular multiplication, and because the Montgomery modular multiplication formula is circularly and iteratively calculated, the current modular multiplication calculation speed is limited by the formula, so that the speed of signature, signature verification, encryption, decryption and key generation of asymmetric algorithm chip operation using the elliptic curve calculation method is only dozens of times to hundreds of times per second, and the method becomes the bottleneck of asymmetric encryption chip operation.

Disclosure of Invention

The present invention is directed to a system and method for fast implementing Montgomery modular multiplication using multiple multipliers, so as to solve the foregoing problems in the prior art.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a system for rapidly realizing Montgomery modular multiplication by using a multi-multiplier comprises an asymmetric algorithm chip and an upper computer, wherein the asymmetric algorithm chip comprises a processor, an asymmetric hardware module and a random data module, and the processor, the asymmetric hardware module and the random data module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point doubling module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and outputs the result data to the upper computer.

Preferably, the processor is a microprocessor.

Another object of the present invention is to provide a method for fast implementing montgomery modular multiplication by using multiple multipliers, based on the system for fast implementing montgomery modular multiplication by using multiple multipliers, comprising the following steps:

the processor acquires data and parameters to be operated, writes the data and the parameters into an RAM of the asymmetric hardware module, and writes an operation mode to be operated into a register of the asymmetric hardware module;

after the asymmetric hardware module detects an enable bit of a register, calling a corresponding operation module according to a determined operation mode to operate data and parameters to be operated;

writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and uploads the result data to the upper computer.

Preferably, the method calls a corresponding operation module to operate the data and the parameters to be operated according to the determined operation mode, and specifically includes the following steps:

the method comprises the following steps:

s1, confirming the data length W to realize the modular multiplication operation, and confirming that the module of the data length is M, wherein M [0], M [1], M [2], … and M [ e ] are 64-bit length groups from low to high of M; a and B are multipliers of the data length, A [0], A [1], A [2], …, and A [ e ] are 64-bit length groups of A from low to high, B [0], B [1], B [2], …, and B [ e ] are 64-bit length groups of B from low to high, respectively; defining Md, T1, T2, u, W, V, W ', V' as intermediate result registers;

w/64 as an e in step S1; w may be selected to have a data length including, but not limited to, 256 bits, 512 bits, 1024 bits.

S2, first, a first cycle operation is performed using one 64-bit multiplier to calculate a pre-operation value Md, where Md is B [0] × Mc, Mc is- { M1, M0} -1mod P, and P is a power of 64 of 2;

s3, using two 64-bit multipliers and an adder to perform second period operation, and obtaining the parameter u of the first period operation by parallel calculation ₁ ；

Parameter u ₁ The technical process comprises the following steps: u. of ₁ ＝T1+T2，T1＝r0*Mc、T2＝A[0]Md, r0 ═ r% W, W is the power of 64 of 2, r is the modulo 64 th bit of data;

s4, using eight 64-bit multipliers to perform the third period operation, using the lowest 64-bit data A [0] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And the parameter u of the first round of operation ₁ Parallel calculation of the intermediate result r0 ₁ ；

r0 ₁ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[0]*B[0]、V[1]＝u ₁ *M[0]、W[0]＝A[0]*B[1]、W[1]＝u ₁ *M[1]、 V`[0]＝A[0]*B[2]、V`[1]＝u ₁ *M[2]、W`[0]＝A[0]*B[3]、W`[1]＝u ₁ *M[3]。

S5, using two 64-bit multipliers and an adder to perform the fourth round operation, and calculating the parameter u of the second round operation in parallel ₂ ；u ₂ ＝T1+T2，T1＝r0 ₁ *Mc、T2＝A[0]*Md；

S6, eight 64-bit multipliers are used to perform the fifth cycle operation, and the second lower 64-bit data A [1] of the multiplier A is used]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And a second round parameter u ₂ Parallel calculation of the intermediate result r0 ₂ ；

r0 ₂ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[1]*B[0]、V[1]＝u ₂ *M[0]、W[0]＝A[1]*B[1]、W[1]＝u ₂ *M[1]、 V`[0]＝A[1]*B[2]、V`[1]＝u ₂ *M[2]、W`[0]＝A[1]*B[3]、W`[1]＝u ₂ *M[3]。

S7, using two 64-bit multipliers and an adder to perform the sixth cycle operation, and calculating the parameter u of the third cycle in parallel ₃ ；u ₃ ＝T1+T2，T1＝r0 ₂ *Mc、T2＝A[0]*Md；

S8, eight 64-bit multipliers are used to perform the seventh cycle operation, the second high 64-bit data A [2] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]…、B[e]And a third round parameter u ₃ Parallel calculation of the intermediate result r0 ₃ ；

r0 ₃ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[2]*B[0]、V[1]＝u ₃ *M[0]、W[0]＝A[2]*B[1]、W[1]＝u ₃ *M[1]、 V`[0]＝A[2]*B[2]、V`[1]＝u ₃ *M[2]、W`[0]＝A[2]*B[3]、W`[1]＝u ₃ *M[3]。

S9, repeating the steps S7-S8 in a loop until the 4 th round parameter u is calculated in parallel ₄ ；

S10, using the highest 64 bits data A [3 ] of multiplier A]All bit data B [0] of and B]、B[1]、 B[2]、…、B[e]And the 4 th round parameter u4, and the final result r0 is calculated in parallel ₄ 。

Preferably, the first and second liquid crystal materials are,

preferably, W in step S1 is 256 bits, and the pre-calculation value Md in step S2 is B [0 ═ B]*Mc， Mc＝-{M1，M0} ^-1 mod P, P being a power of 64 of 2;

parameter u of the first round of operation in step S3 ₁ The calculation method of (1) is as follows: u. of ₁ ＝T1+T2，T1＝r0*Mc、 T2＝A[0]Md, r0 ═ r% W, W is the 64 th power of 2, r is the modulo 64 th data;

r0 in step S4 ₁ The calculation formula of (2) is as follows: r0 ₁ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[0]*B[0]、V[1]＝u ₁ *M[0]、 W[0]＝A[0]*B[1]、W[1]＝u ₁ *M[1]、V`[0]＝A[0]*B[2]、V`[1]＝u ₁ *M[2]、 W`[0]＝A[0]*B[3]、W`[1]＝u ₁ *M[3]。

Preferably, in step S5: u. of ₂ ＝T1+T2，T1＝r0 ₁ *Mc、T2＝A[0]*Md；

R0 in step S6 ₂ The calculation formula of (2) is as follows: r0 ₂ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[1]*B[0]、V[1]＝u ₂ *M[0]、 W[0]＝A[1]*B[1]、W[1]＝u ₂ *M[1]、V`[0]＝A[1]*B[2]、V`[1]＝u ₂ *M[2]、 W`[0]＝A[1]*B[3]、W`[1]＝u ₂ *M[3]。

Preferably, in step S7: u. of ₃ ＝T1+T2，T1＝r0 ₂ *Mc、T2＝A[0]*Md；

R0 in step S8 ₃ The calculation formula of (2) is as follows: r0 ₃ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[2]*B[0]、V[1]＝u ₃ *M[0]、 W[0]＝A[2]*B[1]、W[1]＝u ₃ *M[1]、V`[0]＝A[2]*B[2]、V`[1]＝u ₃ *M[2]、 W`[0]＝A[2]*B[3]、W`[1]＝u ₃ *M[3]。

Preferably, in step S9: u. of ₄ ＝T1+T2，T1＝r0 ₃ *Mc、T2＝A[0]*Md；

R0 in step S10 ₄ The calculation formula of (2) is as follows: r0 ₄ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[3]*B[0]、V[1]＝u ₄ *M[0]、 W[0]＝A[3]*B[1]、W[1]＝u ₄ *M[1]、V`[0]＝A[3]*B[2]、V`[1]＝u ₄ *M[2]、 W`[0]＝A[3]*B[3]、W`[1]＝u ₄ *M[3]。

The invention has the beneficial effects that:

the invention provides a system and a method for rapidly realizing Montgomery modular multiplication by using multiple multipliers, which optimize the calculation mode of the original Montgomery modular multiplication formula loop iteration, use a plurality of 64-bit multipliers for parallel operation, greatly improve the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improve the performance of a security chip.

Drawings

FIG. 1 is a chip structure in a system for fast Montgomery modular multiplication using multiple multipliers provided in embodiment 1;

fig. 2 is a schematic diagram of the principle of the method for quickly implementing montgomery modular multiplication using multiple multipliers provided in embodiment 2.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

Example 1

The embodiment provides a system for rapidly realizing Montgomery modular multiplication by using a multi-multiplier, which comprises an asymmetric algorithm chip and an upper computer, wherein the asymmetric algorithm chip is shown in FIG. 1 and comprises a processor, an asymmetric hardware module and a random data module, and the processor, the asymmetric hardware module and the random data module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point doubling module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and outputs the result data to the upper computer.

Example 2

The embodiment provides a method for quickly implementing montgomery modular multiplication by using multiple multipliers, based on the system for quickly implementing montgomery modular multiplication by using multiple multipliers described in embodiment 1, including the following steps:

In this embodiment, a corresponding operation module is called according to a determined operation mode to operate on data and parameters to be operated, and an operation principle of a multiplier adopted in the embodiment is shown in fig. 2, and the method specifically includes the following steps:

s1, confirming the data length W to realize the modular multiplication operation, confirming the module of the data length as M, M [0], M [1], M [2], … and M [ e ] are 64bit length groups of M from low to high; a and B are multipliers of the data length, A [0], A [1], A [2], …, and A [ e ] are 64-bit length groups of A from low to high, B [0], B [1], B [2], …, and B [ e ] are 64-bit length groups of B from low to high, respectively; defining Md, T1, T2, u, W, V, W ', V' as intermediate result registers;

s2, firstly, a 64-bit multiplier is used for carrying out first cycle operation to calculate a pre-operation value Md;

S5, using two 64-bit multipliers and an adder to perform the fourth round operation, and calculating the parameter u of the second round operation in parallel ₂ ；

S7, using two 64-bit multipliers and an adder to perform the sixth cycle operation, and calculating the parameter u of the third cycle in parallel ₃ ；

S9, repeating the steps S7-S8 in a loop until the parameter u of the e-th round is calculated in parallel _e+1 ；

S10, using the highest 64 bits data A [ e ] of multiplier A]All bit data B [0] of and B]、B[1]、 B[2]、…、B[e]And the e-th round parameter u _e+1 The final result r0 is calculated in parallel _e+1 。

In the present embodiment, e in step S1 is W/64; w may be selected to have a data length including, but not limited to, 256 bits, 512 bits, 1024 bits.

In a more preferred embodiment, W in step S1 is 256 bits, the precalculated value Md is B [0] Mc in step S2,

parameter u of the first round of operation in step S3 ₁ The calculation method of (1) is as follows: u. of ₁ ＝T1+T2，T1＝r0*Mc、 T2＝A[0]Md, r0 ═ r% W, W is the power of 64 of 2, r is the modulo 64 th bit of data;

In step S5 in the present embodiment: u. of ₂ ＝T1+T2，T1＝r0 ₁ *Mc、T2＝A[0]*Md；

In step S7 in the present embodiment: u. of ₃ ＝T1+T2，T1＝r0 ₂ *Mc、T2＝A[0]*Md；

In step S9 in the present embodiment: u. of ₄ ＝T1+T2，T1＝r0 ₃ *Mc、T2＝A[0]*Md；

R0 in step S10 ₄ The calculation formula of (2) is as follows: r0 ₄ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[3]*B[0]、V[1]＝u ₄ *M[0]、W[0]＝A[3]*B[1]、W[1]＝u ₄ *M[1]、V`[0]＝A[3]*B[2]、V`[1]＝u ₄ *M[2]、 W`[0]＝A[3]*B[3]、W`[1]＝u ₄ *M[3]。

By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:

the method optimizes the calculation mode of the original Montgomery modular multiplication formula loop iteration, uses a plurality of 64-bit multipliers for parallel operation after conversion, greatly improves the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improves the performance of a security chip.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims

1. A system for rapidly realizing Montgomery modular multiplication by using a multi-multiplier is characterized by comprising an asymmetric algorithm chip and an upper computer, wherein the asymmetric algorithm chip comprises a processor, an asymmetric hardware module and a random data module, and the processor, the asymmetric hardware module and the random data module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point doubling module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and outputs the result data to the upper computer.

2. The system according to claim 1, wherein said processor is a microprocessor.

3. A method for fast implementing montgomery modular multiplication by using multiple multipliers, which is based on the system for fast implementing montgomery modular multiplication by using multiple multipliers in any one of claims 1-2, and comprises the following steps:

4. The method according to claim 3, wherein the corresponding operation module is invoked to operate on the data and parameters to be operated according to the determined operation mode, and the method specifically comprises the following steps:

S6, eight 64-bit multipliers are used for the fifth cycle operation, and the second lower 64 bits of the multiplier A are usedData A [1]]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And a second round parameter u ₂ Parallel calculation of the intermediate result r0 ₂ ；

S8, eight 64-bit multipliers are used to perform the seventh cycle operation, the second high 64-bit data A [2] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]、B[3]And a third round parameter u ₃ Parallel calculation of the intermediate result r0 ₃ ；

S10, using the highest 64 bits data A [ e ] of multiplier A]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And the e-th round parameter u _e+1 The final result r0 is calculated in parallel _e+1 。

5. The method according to claim 4, wherein in step S1, e is W/64; w may be selected to have a data length including, but not limited to, 256 bits, 512 bits, 1024 bits.

6. The method of claim 5, wherein W in step S1 is 256 bits, and the pre-calculated value Md in step S2 is B [0]]*Mc，Mc＝-{M1，M0} ^-1 mod P, P is a power of 64 of 2;

parameter u of the first round of operation in step S3 ₁ The calculation method of (1) is as follows: u. of ₁ ＝T1+T2，T1＝r0*Mc、T2＝A[0]Md, r0 ═ r% W, W is the power of 64 of 2, r is the modulo 64 th bit of data;

r0 in step S4 ₁ The calculation formula of (2) is as follows: r0 ₁ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[0]*B[0]、V[1]＝u ₁ *M[0]、W[0]＝A[0]*B[1]、W[1]＝u ₁ *M[1]、V`[0]＝A[0]*B[2]、V`[1]＝u ₁ *M[2]、W`[0]＝A[0]*B[3]、W`[1]＝u ₁ *M[3]。

7. The method for fast Montgomery modular multiplication using multiple multipliers of claim 6, wherein in step S5: u. u ₂ ＝T1+T2，T1＝r0 ₁ *Mc、T2＝A[0]*Md；

R0 in step S6 ₂ The calculation formula of (2) is as follows: r0 ₂ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[1]*B[0]、V[1]＝u ₂ *M[0]、W[0]＝A[1]*B[1]、W[1]＝u ₂ *M[1]、V`[0]＝A[1]*B[2]、V`[1]＝u ₂ *M[2]、W`[0]＝A[1]*B[3]、W`[1]＝u ₂ *M[3]。

8. The method for fast Montgomery modular multiplication using multiple multipliers of claim 7, wherein in step S7: u. u ₃ ＝T1+T2，T1＝r0 ₂ *Mc、T2＝A[0]*Md；

R0 in step S8 ₃ The calculation formula of (2) is as follows: r0 ₃ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[2]*B[0]、V[1]＝u ₃ *M[0]、W[0]＝A[2]*B[1]、W[1]＝u ₃ *M[1]、V`[0]＝A[2]*B[2]、V`[1]＝u ₃ *M[2]、W`[0]＝A[2]*B[3]、W`[1]＝u ₃ *M[3]。

9. The method for fast Montgomery modular multiplication using multiple multipliers of claim 8, wherein in step S9: u. of ₄ ＝T1+T2，T1＝r0 ₃ *Mc、T2＝A[0]*Md；

R0 in step S10 ₄ The calculation formula of (2) is as follows: r0 ₄ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[3]*B[0]、V[1]＝u ₄ *M[0]、W[0]＝A[3]*B[1]、W[1]＝u ₄ *M[1]、V`[0]＝A[3]*B[2]、V`[1]＝u ₄ *M[2]、W`[0]＝A[3]*B[3]、W`[1]＝u ₄ *M[3]。