CN114840174B

CN114840174B - System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers

Info

Publication number: CN114840174B
Application number: CN202210565348.XA
Authority: CN
Inventors: 王立峰; 张奇惠; 刘曼
Original assignee: Guangzhou Wise Security Technology Co Ltd
Current assignee: Guangzhou Wise Security Technology Co Ltd
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2023-03-03
Anticipated expiration: 2042-05-18
Also published as: CN114840174A

Abstract

The invention provides a system and a method for quickly realizing Montgomery modular multiplication by using multiple multipliers, and relates to the technical field of high-efficiency performance algorithms of security chips. The system carries out combined operation based on the existing point addition, double points, modular exponentiation, modular inversion, modular subtraction, modular addition and modular multiplication modules, optimizes the calculation mode of the original Montgomery modular multiplication formula loop iteration, uses a plurality of 64-bit multipliers for parallel operation, greatly improves the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improves the performance of a security chip.

Description

System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers

Technical Field

The invention relates to the technical field of security algorithms, in particular to a system and a method for quickly realizing Montgomery modular multiplication by using multiple multipliers.

Background

At present, the asymmetric cryptographic chip mostly uses an elliptic curve, and the elliptic curve public key cryptography is based on the following curve characteristics: 1. the elliptic curves on the finite field form a finite exchange group under the point addition operation, and the order of the finite exchange group is similar to the scale of the fundamental field. 2. Similar to exponentiation in finite field multiplications, the multiple point operations in elliptic curves constitute a one-way function.

In the multiple point operation, the problem of solving the multiple by knowing the multiple point and the base point is called an elliptic curve discrete logarithm problem. For the discrete logarithm problem of a general elliptic curve, only an exponential calculation complexity solving method exists at present. Compared with the large number decomposition problem and the discrete logarithm problem in a finite field, the solution difficulty of the elliptic curve discrete logarithm problem is much larger.

The elliptic curve public key password is composed of operations of point multiplication and multiple points and point addition and modular exponentiation in curve calculation, and can be finally decomposed into operation modes of modular multiplication, modular addition and modular subtraction.

The implementation of large digital-to-analog multiplication in the prior art mainly uses Montgomery modular multiplication, and because the Montgomery modular multiplication formula is circularly and iteratively calculated, the current modular multiplication calculation speed is limited by the formula, so that the speed of signature, signature verification, encryption, decryption and key generation of asymmetric algorithm chip operation using the elliptic curve calculation method is only dozens of times to hundreds of times per second, and the method becomes the bottleneck of asymmetric encryption chip operation.

Disclosure of Invention

The present invention is directed to a system and method for fast implementing Montgomery modular multiplication using multiple multipliers, so as to solve the foregoing problems in the prior art.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a system for rapidly realizing Montgomery modular multiplication by using a multi-multiplier comprises an asymmetric algorithm chip and an upper computer, wherein the asymmetric algorithm chip comprises a processor, an asymmetric hardware module and a random data module, and the processor, the asymmetric hardware module and the random data module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point doubling module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and outputs the result data to the upper computer.

Preferably, the processor is a microprocessor.

Another object of the present invention is to provide a method for fast implementing montgomery modular multiplication by using multiple multipliers, based on the system for fast implementing montgomery modular multiplication by using multiple multipliers, comprising the following steps:

the processor acquires data and parameters to be operated, writes the data and the parameters into the RAM of the asymmetric hardware module, and writes an operation mode to be operated into a register of the asymmetric hardware module;

after the asymmetric hardware module detects an enable bit of a register, calling a corresponding operation module according to a determined operation mode to operate data and parameters to be operated;

writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and uploads the result data to the upper computer.

Preferably, the method calls a corresponding operation module to operate the data and the parameters to be operated according to the determined operation mode, and specifically includes the following steps:

the method comprises the following steps:

s1, confirming a data length W to be subjected to modular multiplication operation, and determining that the modulus of the data length is M, M [0], M [1], M [2] and (8230) \\ M [ e ] is a 64-bit length grouping of M from low to high; a and B are multipliers of the data length, A [0], A [1], A [2], \ 8230, A [ e ] is 64-bit length grouping of A from low to high, B [0], B [1], B [2], \ 8230, B [ e ] is 64-bit length grouping of B from low to high; defining Md, T1, T2, u, W, V, W ', V' as intermediate result registers;

e = W/64 in step S1; w may be selected to have a data length including, but not limited to, 256bit,512bit, and 1024bit.

S2, firstly, a 64-bit multiplier is used for carrying out a first cycle operation to calculate a pre-operation value Md, wherein the pre-operation value Md = B [0] Mc, mc = - { M1, M0} -1mod P, and P is a 64 th power of 2;

s3, using two 64-bit multipliers and an adder to carry out second period operation, and obtaining a parameter u of the first period operation through parallel calculation ₁ ；

Parameter u ₁ The technical process comprises the following steps: u. of ₁ ＝T1+T2，T1＝r0*Mc、T2＝A[0]* Md, r0= r% W, W being the power of 64 of 2, r being the modulo 64 th bit of data;

s4, eight 64-bit multipliers are used for third-period operation, and the lowest 64-bit data A [0] of the multipliers A is adopted]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And the parameter u of the first round of operation ₁ Parallel computingIntermediate result r0 ₁ ；

r0 ₁ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[0]*B[0]、V[1]＝u ₁ *M[0]、W[0]＝A[0]*B[1]、W[1]＝u ₁ *M[1]、 V`[0]＝A[0]*B[2]、V`[1]＝u ₁ *M[2]、W`[0]＝A[0]*B[3]、W`[1]＝u ₁ *M[3]。

S5, using two 64-bit multipliers and an adder to carry out fourth period operation, and calculating the parameter u of the second round of operation in parallel ₂ ；u ₂ ＝T1+T2，T1＝r0 ₁ *Mc、T2＝A[0]*Md；

S6, eight 64-bit multipliers are used for the fifth cycle operation, and the second lower 64-bit data A [1] of the multiplier A is used]All bit data B [0] of B and B]、B[1]、B[2]、…、B[e]And a second round parameter u ₂ Parallel calculation of intermediate result r0 ₂ ；

r0 ₂ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[1]*B[0]、V[1]＝u ₂ *M[0]、W[0]＝A[1]*B[1]、W[1]＝u ₂ *M[1]、 V`[0]＝A[1]*B[2]、V`[1]＝u ₂ *M[2]、W`[0]＝A[1]*B[3]、W`[1]＝u ₂ *M[3]。

S7, using two 64-bit multipliers and an adder to carry out sixth-cycle operation, and calculating the parameter u of the third cycle in parallel ₃ ；u ₃ ＝T1+T2，T1＝r0 ₂ *Mc、T2＝A[0]*Md；

S8, eight 64-bit multipliers are used for carrying out seventh-cycle operation, and the second highest 64-bit data A [2] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]…、B[e]And a third round parameter u ₃ Parallel calculation of intermediate result r0 ₃ ；

r0 ₃ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[2]*B[0]、V[1]＝u ₃ *M[0]、W[0]＝A[2]*B[1]、W[1]＝u ₃ *M[1]、 V`[0]＝A[2]*B[2]、V`[1]＝u ₃ *M[2]、W`[0]＝A[2]*B[3]、W`[1]＝u ₃ *M[3]。

S9, circularly repeating the steps S7-S8 until parallel computing is performedParameter u of round 4 ₄ ；

S10, adopting maximum 64bit data A [3 ] of multiplier A]All bit data B [0] of B and B]、B[1]、 B[2]、…、B[e]And the 4 th round parameter u4, and the final result r0 is calculated in parallel ₄ 。

Preferably, the first and second liquid crystal materials are,

preferably, W in step S1 is 256 bits, and the pre-calculation value Md = B [0] in step S2]*Mc， Mc＝-{M1，M0} ^-1 mod P, P being a power of 64 of 2;

parameter u of the first round of operation in step S3 ₁ The calculation method of (1) is as follows: u. of ₁ ＝T1+T2，T1＝r0*Mc、 T2＝A[0]* Md, r0= r% W, W being the 64 th power of 2, r being the modulo 64 th bit of data;

r0 in step S4 ₁ The calculation formula of (c) is: r0 ₁ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[0]*B[0]、V[1]＝u ₁ *M[0]、 W[0]＝A[0]*B[1]、W[1]＝u ₁ *M[1]、V`[0]＝A[0]*B[2]、V`[1]＝u ₁ *M[2]、 W`[0]＝A[0]*B[3]、W`[1]＝u ₁ *M[3]。

Preferably, in step S5: u. of ₂ ＝T1+T2，T1＝r0 ₁ *Mc、T2＝A[0]*Md；

R0 in step S6 ₂ The calculation formula of (2) is as follows: r0 ₂ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[1]*B[0]、V[1]＝u ₂ *M[0]、 W[0]＝A[1]*B[1]、W[1]＝u ₂ *M[1]、V`[0]＝A[1]*B[2]、V`[1]＝u ₂ *M[2]、 W`[0]＝A[1]*B[3]、W`[1]＝u ₂ *M[3]。

Preferably, in step S7: u. of ₃ ＝T1+T2，T1＝r0 ₂ *Mc、T2＝A[0]*Md；

R0 in step S8 ₃ The calculation formula of (2) is as follows: r0 ₃ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[2]*B[0]、V[1]＝u ₃ *M[0]、 W[0]＝A[2]*B[1]、W[1]＝u ₃ *M[1]、V`[0]＝A[2]*B[2]、V`[1]＝u ₃ *M[2]、 W`[0]＝A[2]*B[3]、W`[1]＝u ₃ *M[3]。

Preferably, in step S9: u. of ₄ ＝T1+T2，T1＝r0 ₃ *Mc、T2＝A[0]*Md；

R0 in step S10 ₄ The calculation formula of (c) is: r0 ₄ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[3]*B[0]、V[1]＝u ₄ *M[0]、 W[0]＝A[3]*B[1]、W[1]＝u ₄ *M[1]、V`[0]＝A[3]*B[2]、V`[1]＝u ₄ *M[2]、 W`[0]＝A[3]*B[3]、W`[1]＝u ₄ *M[3]。

The invention has the beneficial effects that:

the invention provides a system and a method for rapidly realizing Montgomery modular multiplication by using multiple multipliers, which optimize the calculation mode of the original Montgomery modular multiplication formula loop iteration, use a plurality of 64-bit multipliers for parallel operation, greatly improve the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improve the performance of a security chip.

Drawings

Fig. 1 is a chip structure in a system for quickly implementing montgomery modular multiplication using multiple multipliers provided in embodiment 1;

fig. 2 is a schematic diagram of the principle of the method for quickly implementing montgomery modular multiplication using multiple multipliers provided in embodiment 2.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are given by way of illustration only.

Example 1

The embodiment provides a system for rapidly realizing Montgomery modular multiplication by using a multi-multiplier, which comprises an asymmetric algorithm chip and an upper computer, wherein the asymmetric algorithm chip is shown in FIG. 1 and comprises a processor, an asymmetric hardware module and a random data module, and the processor, the asymmetric hardware module and the random data module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point multiplication module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and outputs the result data to the upper computer.

Example 2

The embodiment provides a method for quickly implementing montgomery modular multiplication by using multiple multipliers, based on the system for quickly implementing montgomery modular multiplication by using multiple multipliers described in embodiment 1, including the following steps:

the processor acquires data and parameters to be operated, writes the data and the parameters into an RAM of the asymmetric hardware module, and writes an operation mode to be operated into a register of the asymmetric hardware module;

after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module according to a determined operation mode to operate data and parameters to be operated;

In this embodiment, a corresponding operation module is called according to a determined operation mode to operate on data and parameters to be operated, and an operation principle of a multiplier adopted in the embodiment is shown in fig. 2, and the method specifically includes the following steps:

s1, confirming a data length W to be subjected to modular multiplication operation, and determining that the modulus of the data length is M, M [0], M [1], M [2] and (8230) \\ M [ e ] is a 64-bit length grouping of M from low to high; a and B are multipliers of the data length, A [0], A [1], A [2], \ 8230, A [ e ] is a 64-bit length grouping of A from low to high, B [0], B [1], B [2], \ 8230, B [ e ] is a 64-bit length grouping of B from low to high; defining Md, T1, T2, u, W, V, W ', V' as intermediate result registers;

s2, firstly, a 64-bit multiplier is used for carrying out first cycle operation to calculate a pre-operation value Md;

s3, using two 64-bit multipliers and an adder to perform second period operation, and obtaining a parameter u of the first period operation through parallel calculation ₁ ；

S4, eight 64-bit multipliers are used for third period operation, and the lowest 64-bit data A [0] of the multipliers A is adopted]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And the parameter u of the first round of operation ₁ Parallel calculation of the intermediate result r0 ₁ ；

S5, using two 64-bit multipliers and an adder to carry out fourth period operation, and calculating the parameter u of the second round of operation in parallel ₂ ；

S6, eight 64-bit multipliers are used for the fifth cycle operation, and the second lower 64-bit data A [1] of the multiplier A is used]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And a second round parameter u ₂ Parallel calculation of the intermediate result r0 ₂ ；

S7, using two 64-bit multipliers and an adder to carry out sixth-cycle operation, and calculating the parameter u of the third cycle in parallel ₃ ；

S8, eight 64-bit multipliers are used for carrying out seventh cycle operation, and 64-bit data A [2] of the second highest of the multipliers A]All bit data B [0] of and B]、B[1]、B[2]…、B[e]And a third round parameter u ₃ Parallel calculation of the intermediate result r0 ₃ ；

S9, repeating the steps S7-S8 circularly until the parameter u of the e-th round is calculated in parallel _e+1 ；

S10, adopting the highest 64-bit data A [ e ] of the multiplier A]All bit data B [0] of and B]、B[1]、 B[2]、…、B[e]And the e-th round parameter u _e+1 And calculating the final result r0 in parallel _e+1 。

E = W/64 in step S1 in the present embodiment; w may be selected to have a data length including, but not limited to, 256bit,512bit,1024 bit.

In a more preferred embodiment, W in step S1 is 256 bits, the precalculated value Md = B [0] Mc in step S2,

parameter u of the first round of operation in step S3 ₁ The calculation method of (1) is as follows: u. of ₁ ＝T1+T2，T1＝r0*Mc、 T2＝A[0]* Md, r0= r% W, W being the power of 64 of 2, r being the modulo 64 th bit of data;

In step S5 in this embodiment: u. u ₂ ＝T1+T2，T1＝r0 ₁ *Mc、T2＝A[0]*Md；

R0 in step S6 ₂ The calculation formula of (c) is: r0 ₂ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[1]*B[0]、V[1]＝u ₂ *M[0]、 W[0]＝A[1]*B[1]、W[1]＝u ₂ *M[1]、V`[0]＝A[1]*B[2]、V`[1]＝u ₂ *M[2]、 W`[0]＝A[1]*B[3]、W`[1]＝u ₂ *M[3]。

In step S7 in the present embodiment: u. of ₃ ＝T1+T2，T1＝r0 ₂ *Mc、T2＝A[0]*Md；

R0 in step S8 ₃ The calculation formula of (c) is: r0 ₃ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[2]*B[0]、V[1]＝u ₃ *M[0]、 W[0]＝A[2]*B[1]、W[1]＝u ₃ *M[1]、V`[0]＝A[2]*B[2]、V`[1]＝u ₃ *M[2]、 W`[0]＝A[2]*B[3]、W`[1]＝u ₃ *M[3]。

In step S9 in the present embodiment: u. of ₄ ＝T1+T2，T1＝r0 ₃ *Mc、T2＝A[0]*Md；

R0 in step S10 ₄ The calculation formula of (2) is as follows: r0 ₄ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])： (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[3]*B[0]、V[1]＝u ₄ *M[0]、W[0]＝A[3]*B[1]、W[1]＝u ₄ *M[1]、V`[0]＝A[3]*B[2]、V`[1]＝u ₄ *M[2]、 W`[0]＝A[3]*B[3]、W`[1]＝u ₄ *M[3]。

By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:

the method optimizes the calculation mode of the original Montgomery modular multiplication formula loop iteration, uses a plurality of 64-bit multipliers for parallel operation after conversion, greatly improves the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improves the performance of a security chip.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims

1. A method for rapidly realizing Montgomery modular multiplication by using a multi-multiplier is characterized in that a system for rapidly realizing Montgomery modular multiplication based on the multi-multiplier is realized, the system comprises an asymmetric algorithm chip and an upper computer, the asymmetric algorithm chip comprises a processor, an asymmetric hardware module and a random number module, and the processor, the asymmetric hardware module and the random number module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point doubling module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires an end mark in the asymmetric hardware module register, reads out result data in the RAM through the bus, and outputs the result data to the upper computer;

the method comprises the following steps:

writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires an end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and uploads the result data to the upper computer;

calling a corresponding operation module according to the determined operation mode to operate the data and the parameters to be operated, and specifically comprising the following steps of:

s1, confirming a data length W to be subjected to modular multiplication operation, and determining that the module of the data length is M, wherein M0, M1, M2 and 8230, M e is a 64-bit length group of M from low to high; a and B are multipliers of the data length, A [0], A [1], A [2], \ 8230, A [ e ] is a 64-bit length grouping of A from low to high, B [0], B [1], B [2], \ 8230, B [ e ] is a 64-bit length grouping of B from low to high; defining Md, T1, T2, u, W, V, W 'and V' as intermediate result registers;

S4, eight 64-bit multipliers are used for third period operation, and the lowest 64-bit data A [0] of the multipliers A is adopted]All bit data B [0] of B and B]、B[1]、B[2]、…、B[e]And the parameter u of the first round of operation ₁ Parallel calculation of the intermediate result r0 ₁ ；

S5, using two 64-bit multipliers and an adder to carry out fourth period operation, and calculating the parameter u of the second round operation in parallel ₂ ；

S6, eight 64-bit multipliers are used for the fifth cycle operation, and the second lower 64-bit data A [1] of the multiplier A is used]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And a second round parameter u ₂ Parallel calculation of intermediate result r0 ₂ ；

S8, eight 64-bit multipliers are used for carrying out seventh cycle operation, and 64-bit data A [2] of the second highest of the multipliers A]All bit data B [0] of and B]、B[1]、B[2]、B[3]And a third round parameter u ₃ Parallel calculation of the intermediate result r0 ₃ ；

S9, circularly repeating the steps S7-S8 until the parameter u of the e-th round is calculated in parallel _e+1 ；

S10, adopting the highest 64-bit data A [ e ] of the multiplier A]All bit data B [0] of B and B]、B[1]、B[2]、…、B[e]And the e-th round parameter u _e+1 And calculating the final result r0 in parallel _e+1 。

2. The method of claim 1, wherein the processor is a microprocessor.

3. The method for fast Montgomery modular multiplication using multiple multipliers of claim 1, wherein e = W/64 in step S1; w may optionally have a data length including, but not limited to, 256bit,512bit, 1024bit.

4. Root of herbaceous plantThe method as claimed in claim 3, wherein W in step S1 is 256 bits, and the pre-calculated value Md = B [0] in step S2]*Mc，Mc＝-{M1，M0} ^-1 mod P, P being a power of 64 of 2;

parameter u of the first round of operation in step S3 ₁ The calculation method of (1) is as follows: u. of ₁ ＝T1+T2，T1＝r0*Mc、T2＝A[0]* Md, r0= r% W, W being the power of 64 of 2, r being the modulo 64 th bit of data;

r0 in step S4 ₁ The calculation formula of (2) is as follows: r0 ₁ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[0]*B[0]、V[1]＝u ₁ *M[0]、W[0]＝A[0]*B[1]、W[1]＝u ₁ *M[1]、V`[0]＝A[0]*B[2]、V`[1]＝u ₁ *M[2]、W`[0]＝A[0]*B[3]、W`[1]＝u ₁ *M[3]。

5. The method for fast Montgomery modular multiplication using multiple multipliers as claimed in claim 4, wherein in step S5: u. u ₂ ＝T1+T2，T1＝r0 ₁ *Mc、T2＝A[0]*Md；

R0 in step S6 ₂ The calculation formula of (2) is as follows: r0 ₂ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[1]*B[0]、V[1]＝u ₂ *M[0]、W[0]＝A[1]*B[1]、W[1]＝u ₂ *M[1]、V`[0]＝A[1]*B[2]、V`[1]＝u ₂ *M[2]、W`[0]＝A[1]*B[3]、W`[1]＝u ₂ *M[3]。

6. The method for fast Montgomery modular multiplication using multiple multipliers according to claim 5, wherein in step S7: u. of ₃ ＝T1+T2，T1＝r0 ₂ *Mc、T2＝A[0]*Md；

R0 in step S8 ₃ The calculation formula of (2) is as follows: r0 ₃ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[2]*B[0]、V[1]＝u ₃ *M[0]、W[0]＝A[2]*B[1]、W[1]＝u ₃ *M[1]、V`[0]＝A[2]*B[2]、V`[1]＝u ₃ *M[2]、W`[0]＝A[2]*B[3]、W`[1]＝u ₃ *M[3]。

7. The method for fast implementation of montgomery modular multiplication using multiple multipliers according to claim 6, wherein in the step S9: u. of ₄ ＝T1+T2，T1＝r0 ₃ *Mc、T2＝A[0]*Md；

R0 in step S10 ₄ The calculation formula of (2) is as follows: r0 ₄ ＝{(W`[0]+W`[1]):(V`[0]+V`[1])：(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]＝A[3]*B[0]、V[1]＝u ₄ *M[0]、W[0]＝A[3]*B[1]、W[1]＝u ₄ *M[1]、V`[0]＝A[3]*B[2]、V`[1]＝u ₄ *M[2]、W`[0]＝A[3]*B[3]、W`[1]＝u ₄ *M[3]。