CN114840174A - System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers - Google Patents

System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers Download PDF

Info

Publication number
CN114840174A
CN114840174A CN202210565348.XA CN202210565348A CN114840174A CN 114840174 A CN114840174 A CN 114840174A CN 202210565348 A CN202210565348 A CN 202210565348A CN 114840174 A CN114840174 A CN 114840174A
Authority
CN
China
Prior art keywords
data
module
bit
multipliers
asymmetric
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210565348.XA
Other languages
Chinese (zh)
Other versions
CN114840174B (en
Inventor
王立峰
张奇惠
刘曼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Wise Security Technology Co Ltd
Original Assignee
Guangzhou Wise Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Wise Security Technology Co Ltd filed Critical Guangzhou Wise Security Technology Co Ltd
Priority to CN202210565348.XA priority Critical patent/CN114840174B/en
Publication of CN114840174A publication Critical patent/CN114840174A/en
Application granted granted Critical
Publication of CN114840174B publication Critical patent/CN114840174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/722Modular multiplication
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a system and a method for quickly realizing Montgomery modular multiplication by using multiple multipliers, and relates to the technical field of high-efficiency performance algorithms of security chips. The system carries out combined operation based on the existing point addition, double points, modular exponentiation, modular inversion, modular subtraction, modular addition and modular multiplication modules, optimizes the calculation mode of the original Montgomery modular multiplication formula loop iteration, uses a plurality of 64-bit multipliers for parallel operation, greatly improves the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improves the performance of a security chip.

Description

System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers
Technical Field
The invention relates to the technical field of security algorithms, in particular to a system and a method for quickly realizing Montgomery modular multiplication by using multiple multipliers.
Background
At present, the asymmetric cryptographic chip mostly uses an elliptic curve, and the elliptic curve public key cryptography is based on the following curve characteristics: 1. the elliptic curves on the finite field form a finite exchange group under the point addition operation, and the order of the finite exchange group is similar to the scale of the fundamental field. 2. Similar to exponentiation in finite field multiplications, the multiple point operations in elliptic curves constitute a one-way function.
In the multi-point operation, the problem of solving the multiple with the known multi-point and base point is called the elliptic curve discrete logarithm problem. For the discrete logarithm problem of a general elliptic curve, only an exponential calculation complexity solving method exists at present. Compared with the large number decomposition problem and the discrete logarithm problem in a finite field, the solution difficulty of the elliptic curve discrete logarithm problem is much higher.
The elliptic curve public key password is composed of operations of point multiplication and multiple points and point addition and modular exponentiation in curve calculation, and can be finally decomposed into operation modes of modular multiplication, modular addition and modular subtraction.
The implementation of large digital-to-analog multiplication in the prior art mainly uses Montgomery modular multiplication, and because the Montgomery modular multiplication formula is circularly and iteratively calculated, the current modular multiplication calculation speed is limited by the formula, so that the speed of signature, signature verification, encryption, decryption and key generation of asymmetric algorithm chip operation using the elliptic curve calculation method is only dozens of times to hundreds of times per second, and the method becomes the bottleneck of asymmetric encryption chip operation.
Disclosure of Invention
The present invention is directed to a system and method for fast implementing Montgomery modular multiplication using multiple multipliers, so as to solve the foregoing problems in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a system for rapidly realizing Montgomery modular multiplication by using a multi-multiplier comprises an asymmetric algorithm chip and an upper computer, wherein the asymmetric algorithm chip comprises a processor, an asymmetric hardware module and a random data module, and the processor, the asymmetric hardware module and the random data module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point doubling module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and outputs the result data to the upper computer.
Preferably, the processor is a microprocessor.
Another object of the present invention is to provide a method for fast implementing montgomery modular multiplication by using multiple multipliers, based on the system for fast implementing montgomery modular multiplication by using multiple multipliers, comprising the following steps:
the processor acquires data and parameters to be operated, writes the data and the parameters into an RAM of the asymmetric hardware module, and writes an operation mode to be operated into a register of the asymmetric hardware module;
after the asymmetric hardware module detects an enable bit of a register, calling a corresponding operation module according to a determined operation mode to operate data and parameters to be operated;
writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and uploads the result data to the upper computer.
Preferably, the method calls a corresponding operation module to operate the data and the parameters to be operated according to the determined operation mode, and specifically includes the following steps:
the method comprises the following steps:
s1, confirming the data length W to realize the modular multiplication operation, and confirming that the module of the data length is M, wherein M [0], M [1], M [2], … and M [ e ] are 64-bit length groups from low to high of M; a and B are multipliers of the data length, A [0], A [1], A [2], …, and A [ e ] are 64-bit length groups of A from low to high, B [0], B [1], B [2], …, and B [ e ] are 64-bit length groups of B from low to high, respectively; defining Md, T1, T2, u, W, V, W ', V' as intermediate result registers;
w/64 as an e in step S1; w may be selected to have a data length including, but not limited to, 256 bits, 512 bits, 1024 bits.
S2, first, a first cycle operation is performed using one 64-bit multiplier to calculate a pre-operation value Md, where Md is B [0] × Mc, Mc is- { M1, M0} -1mod P, and P is a power of 64 of 2;
s3, using two 64-bit multipliers and an adder to perform second period operation, and obtaining the parameter u of the first period operation by parallel calculation 1
Parameter u 1 The technical process comprises the following steps: u. of 1 =T1+T2,T1=r0*Mc、T2=A[0]Md, r0 ═ r% W, W is the power of 64 of 2, r is the modulo 64 th bit of data;
s4, using eight 64-bit multipliers to perform the third period operation, using the lowest 64-bit data A [0] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And the parameter u of the first round of operation 1 Parallel calculation of the intermediate result r0 1
r0 1 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[0]*B[0]、V[1]=u 1 *M[0]、W[0]=A[0]*B[1]、W[1]=u 1 *M[1]、 V`[0]=A[0]*B[2]、V`[1]=u 1 *M[2]、W`[0]=A[0]*B[3]、W`[1]=u 1 *M[3]。
S5, using two 64-bit multipliers and an adder to perform the fourth round operation, and calculating the parameter u of the second round operation in parallel 2 ;u 2 =T1+T2,T1=r0 1 *Mc、T2=A[0]*Md;
S6, eight 64-bit multipliers are used to perform the fifth cycle operation, and the second lower 64-bit data A [1] of the multiplier A is used]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And a second round parameter u 2 Parallel calculation of the intermediate result r0 2
r0 2 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[1]*B[0]、V[1]=u 2 *M[0]、W[0]=A[1]*B[1]、W[1]=u 2 *M[1]、 V`[0]=A[1]*B[2]、V`[1]=u 2 *M[2]、W`[0]=A[1]*B[3]、W`[1]=u 2 *M[3]。
S7, using two 64-bit multipliers and an adder to perform the sixth cycle operation, and calculating the parameter u of the third cycle in parallel 3 ;u 3 =T1+T2,T1=r0 2 *Mc、T2=A[0]*Md;
S8, eight 64-bit multipliers are used to perform the seventh cycle operation, the second high 64-bit data A [2] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]…、B[e]And a third round parameter u 3 Parallel calculation of the intermediate result r0 3
r0 3 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[2]*B[0]、V[1]=u 3 *M[0]、W[0]=A[2]*B[1]、W[1]=u 3 *M[1]、 V`[0]=A[2]*B[2]、V`[1]=u 3 *M[2]、W`[0]=A[2]*B[3]、W`[1]=u 3 *M[3]。
S9, repeating the steps S7-S8 in a loop until the 4 th round parameter u is calculated in parallel 4
S10, using the highest 64 bits data A [3 ] of multiplier A]All bit data B [0] of and B]、B[1]、 B[2]、…、B[e]And the 4 th round parameter u4, and the final result r0 is calculated in parallel 4
Preferably, the first and second liquid crystal materials are,
preferably, W in step S1 is 256 bits, and the pre-calculation value Md in step S2 is B [0 ═ B]*Mc, Mc=-{M1,M0} -1 mod P, P being a power of 64 of 2;
parameter u of the first round of operation in step S3 1 The calculation method of (1) is as follows: u. of 1 =T1+T2,T1=r0*Mc、 T2=A[0]Md, r0 ═ r% W, W is the 64 th power of 2, r is the modulo 64 th data;
r0 in step S4 1 The calculation formula of (2) is as follows: r0 1 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[0]*B[0]、V[1]=u 1 *M[0]、 W[0]=A[0]*B[1]、W[1]=u 1 *M[1]、V`[0]=A[0]*B[2]、V`[1]=u 1 *M[2]、 W`[0]=A[0]*B[3]、W`[1]=u 1 *M[3]。
Preferably, in step S5: u. of 2 =T1+T2,T1=r0 1 *Mc、T2=A[0]*Md;
R0 in step S6 2 The calculation formula of (2) is as follows: r0 2 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[1]*B[0]、V[1]=u 2 *M[0]、 W[0]=A[1]*B[1]、W[1]=u 2 *M[1]、V`[0]=A[1]*B[2]、V`[1]=u 2 *M[2]、 W`[0]=A[1]*B[3]、W`[1]=u 2 *M[3]。
Preferably, in step S7: u. of 3 =T1+T2,T1=r0 2 *Mc、T2=A[0]*Md;
R0 in step S8 3 The calculation formula of (2) is as follows: r0 3 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[2]*B[0]、V[1]=u 3 *M[0]、 W[0]=A[2]*B[1]、W[1]=u 3 *M[1]、V`[0]=A[2]*B[2]、V`[1]=u 3 *M[2]、 W`[0]=A[2]*B[3]、W`[1]=u 3 *M[3]。
Preferably, in step S9: u. of 4 =T1+T2,T1=r0 3 *Mc、T2=A[0]*Md;
R0 in step S10 4 The calculation formula of (2) is as follows: r0 4 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[3]*B[0]、V[1]=u 4 *M[0]、 W[0]=A[3]*B[1]、W[1]=u 4 *M[1]、V`[0]=A[3]*B[2]、V`[1]=u 4 *M[2]、 W`[0]=A[3]*B[3]、W`[1]=u 4 *M[3]。
The invention has the beneficial effects that:
the invention provides a system and a method for rapidly realizing Montgomery modular multiplication by using multiple multipliers, which optimize the calculation mode of the original Montgomery modular multiplication formula loop iteration, use a plurality of 64-bit multipliers for parallel operation, greatly improve the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improve the performance of a security chip.
Drawings
FIG. 1 is a chip structure in a system for fast Montgomery modular multiplication using multiple multipliers provided in embodiment 1;
fig. 2 is a schematic diagram of the principle of the method for quickly implementing montgomery modular multiplication using multiple multipliers provided in embodiment 2.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
The embodiment provides a system for rapidly realizing Montgomery modular multiplication by using a multi-multiplier, which comprises an asymmetric algorithm chip and an upper computer, wherein the asymmetric algorithm chip is shown in FIG. 1 and comprises a processor, an asymmetric hardware module and a random data module, and the processor, the asymmetric hardware module and the random data module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point doubling module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and outputs the result data to the upper computer.
Example 2
The embodiment provides a method for quickly implementing montgomery modular multiplication by using multiple multipliers, based on the system for quickly implementing montgomery modular multiplication by using multiple multipliers described in embodiment 1, including the following steps:
the processor acquires data and parameters to be operated, writes the data and the parameters into an RAM of the asymmetric hardware module, and writes an operation mode to be operated into a register of the asymmetric hardware module;
after the asymmetric hardware module detects an enable bit of a register, calling a corresponding operation module according to a determined operation mode to operate data and parameters to be operated;
writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and uploads the result data to the upper computer.
In this embodiment, a corresponding operation module is called according to a determined operation mode to operate on data and parameters to be operated, and an operation principle of a multiplier adopted in the embodiment is shown in fig. 2, and the method specifically includes the following steps:
s1, confirming the data length W to realize the modular multiplication operation, confirming the module of the data length as M, M [0], M [1], M [2], … and M [ e ] are 64bit length groups of M from low to high; a and B are multipliers of the data length, A [0], A [1], A [2], …, and A [ e ] are 64-bit length groups of A from low to high, B [0], B [1], B [2], …, and B [ e ] are 64-bit length groups of B from low to high, respectively; defining Md, T1, T2, u, W, V, W ', V' as intermediate result registers;
s2, firstly, a 64-bit multiplier is used for carrying out first cycle operation to calculate a pre-operation value Md;
s3, using two 64-bit multipliers and an adder to perform second period operation, and obtaining the parameter u of the first period operation by parallel calculation 1
S4, using eight 64-bit multipliers to perform the third period operation, using the lowest 64-bit data A [0] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And the parameter u of the first round of operation 1 Parallel calculation of the intermediate result r0 1
S5, using two 64-bit multipliers and an adder to perform the fourth round operation, and calculating the parameter u of the second round operation in parallel 2
S6, eight 64-bit multipliers are used to perform the fifth cycle operation, and the second lower 64-bit data A [1] of the multiplier A is used]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And a second round parameter u 2 Parallel calculation of the intermediate result r0 2
S7, using two 64-bit multipliers and an adder to perform the sixth cycle operation, and calculating the parameter u of the third cycle in parallel 3
S8, eight 64-bit multipliers are used to perform the seventh cycle operation, the second high 64-bit data A [2] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]…、B[e]And a third round parameter u 3 Parallel calculation of the intermediate result r0 3
S9, repeating the steps S7-S8 in a loop until the parameter u of the e-th round is calculated in parallel e+1
S10, using the highest 64 bits data A [ e ] of multiplier A]All bit data B [0] of and B]、B[1]、 B[2]、…、B[e]And the e-th round parameter u e+1 The final result r0 is calculated in parallel e+1
In the present embodiment, e in step S1 is W/64; w may be selected to have a data length including, but not limited to, 256 bits, 512 bits, 1024 bits.
In a more preferred embodiment, W in step S1 is 256 bits, the precalculated value Md is B [0] Mc in step S2,
parameter u of the first round of operation in step S3 1 The calculation method of (1) is as follows: u. of 1 =T1+T2,T1=r0*Mc、 T2=A[0]Md, r0 ═ r% W, W is the power of 64 of 2, r is the modulo 64 th bit of data;
r0 in step S4 1 The calculation formula of (2) is as follows: r0 1 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[0]*B[0]、V[1]=u 1 *M[0]、 W[0]=A[0]*B[1]、W[1]=u 1 *M[1]、V`[0]=A[0]*B[2]、V`[1]=u 1 *M[2]、 W`[0]=A[0]*B[3]、W`[1]=u 1 *M[3]。
In step S5 in the present embodiment: u. of 2 =T1+T2,T1=r0 1 *Mc、T2=A[0]*Md;
R0 in step S6 2 The calculation formula of (2) is as follows: r0 2 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[1]*B[0]、V[1]=u 2 *M[0]、 W[0]=A[1]*B[1]、W[1]=u 2 *M[1]、V`[0]=A[1]*B[2]、V`[1]=u 2 *M[2]、 W`[0]=A[1]*B[3]、W`[1]=u 2 *M[3]。
In step S7 in the present embodiment: u. of 3 =T1+T2,T1=r0 2 *Mc、T2=A[0]*Md;
R0 in step S8 3 The calculation formula of (2) is as follows: r0 3 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[2]*B[0]、V[1]=u 3 *M[0]、 W[0]=A[2]*B[1]、W[1]=u 3 *M[1]、V`[0]=A[2]*B[2]、V`[1]=u 3 *M[2]、 W`[0]=A[2]*B[3]、W`[1]=u 3 *M[3]。
In step S9 in the present embodiment: u. of 4 =T1+T2,T1=r0 3 *Mc、T2=A[0]*Md;
R0 in step S10 4 The calculation formula of (2) is as follows: r0 4 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[3]*B[0]、V[1]=u 4 *M[0]、W[0]=A[3]*B[1]、W[1]=u 4 *M[1]、V`[0]=A[3]*B[2]、V`[1]=u 4 *M[2]、 W`[0]=A[3]*B[3]、W`[1]=u 4 *M[3]。
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the method optimizes the calculation mode of the original Montgomery modular multiplication formula loop iteration, uses a plurality of 64-bit multipliers for parallel operation after conversion, greatly improves the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improves the performance of a security chip.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims (9)

1. A system for rapidly realizing Montgomery modular multiplication by using a multi-multiplier is characterized by comprising an asymmetric algorithm chip and an upper computer, wherein the asymmetric algorithm chip comprises a processor, an asymmetric hardware module and a random data module, and the processor, the asymmetric hardware module and the random data module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point doubling module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and outputs the result data to the upper computer.
2. The system according to claim 1, wherein said processor is a microprocessor.
3. A method for fast implementing montgomery modular multiplication by using multiple multipliers, which is based on the system for fast implementing montgomery modular multiplication by using multiple multipliers in any one of claims 1-2, and comprises the following steps:
the processor acquires data and parameters to be operated, writes the data and the parameters into an RAM of the asymmetric hardware module, and writes an operation mode to be operated into a register of the asymmetric hardware module;
after the asymmetric hardware module detects an enable bit of a register, calling a corresponding operation module according to a determined operation mode to operate data and parameters to be operated;
writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and uploads the result data to the upper computer.
4. The method according to claim 3, wherein the corresponding operation module is invoked to operate on the data and parameters to be operated according to the determined operation mode, and the method specifically comprises the following steps:
s1, confirming the data length W to realize the modular multiplication operation, and confirming that the module of the data length is M, wherein M [0], M [1], M [2], … and M [ e ] are 64-bit length groups from low to high of M; a and B are multipliers of the data length, A [0], A [1], A [2], …, and A [ e ] are 64-bit length groups of A from low to high, B [0], B [1], B [2], …, and B [ e ] are 64-bit length groups of B from low to high, respectively; defining Md, T1, T2, u, W, V, W ', V' as intermediate result registers;
s2, firstly, a 64-bit multiplier is used for carrying out first cycle operation to calculate a pre-operation value Md;
s3, using two 64-bit multipliers and an adder to perform second period operation, and obtaining the parameter u of the first period operation by parallel calculation 1
S4, using eight 64-bit multipliers to perform the third period operation, using the lowest 64-bit data A [0] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And the parameter u of the first round of operation 1 Parallel calculation of the intermediate result r0 1
S5, using two 64-bit multipliers and an adder to perform the fourth round operation, and calculating the parameter u of the second round operation in parallel 2
S6, eight 64-bit multipliers are used for the fifth cycle operation, and the second lower 64 bits of the multiplier A are usedData A [1]]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And a second round parameter u 2 Parallel calculation of the intermediate result r0 2
S7, using two 64-bit multipliers and an adder to perform the sixth cycle operation, and calculating the parameter u of the third cycle in parallel 3
S8, eight 64-bit multipliers are used to perform the seventh cycle operation, the second high 64-bit data A [2] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]、B[3]And a third round parameter u 3 Parallel calculation of the intermediate result r0 3
S9, repeating the steps S7-S8 in a loop until the parameter u of the e-th round is calculated in parallel e+1
S10, using the highest 64 bits data A [ e ] of multiplier A]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And the e-th round parameter u e+1 The final result r0 is calculated in parallel e+1
5. The method according to claim 4, wherein in step S1, e is W/64; w may be selected to have a data length including, but not limited to, 256 bits, 512 bits, 1024 bits.
6. The method of claim 5, wherein W in step S1 is 256 bits, and the pre-calculated value Md in step S2 is B [0]]*Mc,Mc=-{M1,M0} -1 mod P, P is a power of 64 of 2;
parameter u of the first round of operation in step S3 1 The calculation method of (1) is as follows: u. of 1 =T1+T2,T1=r0*Mc、T2=A[0]Md, r0 ═ r% W, W is the power of 64 of 2, r is the modulo 64 th bit of data;
r0 in step S4 1 The calculation formula of (2) is as follows: r0 1 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[0]*B[0]、V[1]=u 1 *M[0]、W[0]=A[0]*B[1]、W[1]=u 1 *M[1]、V`[0]=A[0]*B[2]、V`[1]=u 1 *M[2]、W`[0]=A[0]*B[3]、W`[1]=u 1 *M[3]。
7. The method for fast Montgomery modular multiplication using multiple multipliers of claim 6, wherein in step S5: u. u 2 =T1+T2,T1=r0 1 *Mc、T2=A[0]*Md;
R0 in step S6 2 The calculation formula of (2) is as follows: r0 2 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[1]*B[0]、V[1]=u 2 *M[0]、W[0]=A[1]*B[1]、W[1]=u 2 *M[1]、V`[0]=A[1]*B[2]、V`[1]=u 2 *M[2]、W`[0]=A[1]*B[3]、W`[1]=u 2 *M[3]。
8. The method for fast Montgomery modular multiplication using multiple multipliers of claim 7, wherein in step S7: u. u 3 =T1+T2,T1=r0 2 *Mc、T2=A[0]*Md;
R0 in step S8 3 The calculation formula of (2) is as follows: r0 3 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[2]*B[0]、V[1]=u 3 *M[0]、W[0]=A[2]*B[1]、W[1]=u 3 *M[1]、V`[0]=A[2]*B[2]、V`[1]=u 3 *M[2]、W`[0]=A[2]*B[3]、W`[1]=u 3 *M[3]。
9. The method for fast Montgomery modular multiplication using multiple multipliers of claim 8, wherein in step S9: u. of 4 =T1+T2,T1=r0 3 *Mc、T2=A[0]*Md;
R0 in step S10 4 The calculation formula of (2) is as follows: r0 4 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[3]*B[0]、V[1]=u 4 *M[0]、W[0]=A[3]*B[1]、W[1]=u 4 *M[1]、V`[0]=A[3]*B[2]、V`[1]=u 4 *M[2]、W`[0]=A[3]*B[3]、W`[1]=u 4 *M[3]。
CN202210565348.XA 2022-05-18 2022-05-18 System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers Active CN114840174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210565348.XA CN114840174B (en) 2022-05-18 2022-05-18 System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210565348.XA CN114840174B (en) 2022-05-18 2022-05-18 System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers

Publications (2)

Publication Number Publication Date
CN114840174A true CN114840174A (en) 2022-08-02
CN114840174B CN114840174B (en) 2023-03-03

Family

ID=82571290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210565348.XA Active CN114840174B (en) 2022-05-18 2022-05-18 System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers

Country Status (1)

Country Link
CN (1) CN114840174B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117240601A (en) * 2023-11-09 2023-12-15 深圳大普微电子股份有限公司 Encryption processing method, encryption processing circuit, processing terminal, and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020172355A1 (en) * 2001-04-04 2002-11-21 Chih-Chung Lu High-performance booth-encoded montgomery module
EP1471420A2 (en) * 2003-04-25 2004-10-27 Samsung Electronics Co., Ltd. Montgomery modular multiplier and method thereof using carry save addition
CN1786900A (en) * 2005-10-28 2006-06-14 清华大学 Multiplier based on improved Montgomey's algorithm
CN102609239A (en) * 2011-09-01 2012-07-25 北京华大信安科技有限公司 ECC (elliptic curve cryptography) coprocessor
CN104765586A (en) * 2015-04-15 2015-07-08 深圳国微技术有限公司 Embedded security chip and Montgomery modular multiplication operational method thereof
CN109145616A (en) * 2018-08-01 2019-01-04 上海交通大学 The realization method and system of SM2 encryption, signature and key exchange based on efficient modular multiplication
CN110460443A (en) * 2019-08-09 2019-11-15 南京秉速科技有限公司 The high speed point add operation method and apparatus of elliptic curve cipher

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020172355A1 (en) * 2001-04-04 2002-11-21 Chih-Chung Lu High-performance booth-encoded montgomery module
EP1471420A2 (en) * 2003-04-25 2004-10-27 Samsung Electronics Co., Ltd. Montgomery modular multiplier and method thereof using carry save addition
CN1786900A (en) * 2005-10-28 2006-06-14 清华大学 Multiplier based on improved Montgomey's algorithm
CN102609239A (en) * 2011-09-01 2012-07-25 北京华大信安科技有限公司 ECC (elliptic curve cryptography) coprocessor
CN104765586A (en) * 2015-04-15 2015-07-08 深圳国微技术有限公司 Embedded security chip and Montgomery modular multiplication operational method thereof
CN109145616A (en) * 2018-08-01 2019-01-04 上海交通大学 The realization method and system of SM2 encryption, signature and key exchange based on efficient modular multiplication
CN110460443A (en) * 2019-08-09 2019-11-15 南京秉速科技有限公司 The high speed point add operation method and apparatus of elliptic curve cipher

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIA-HONG ZHANG等: "Hardware Implementation of Improved Montgomery Modular Multiplication Algorithm", 《2009 WRI INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND MOBILE COMPUTING》 *
TRIO ADIONO等: "Full custom design of adaptable montgomery modular multiplier for asymmetric RSA cryptosystem", 《2017 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS)》 *
舒妍等: "基于Booth编码模乘模块RSA的VLSI设计", 《西安电子科技大学学报》 *
许伟: "基于FPGA的RSA密码算法的模幂模乘的快速实现", 《中国优秀硕士学位论文全文数据库》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117240601A (en) * 2023-11-09 2023-12-15 深圳大普微电子股份有限公司 Encryption processing method, encryption processing circuit, processing terminal, and storage medium
CN117240601B (en) * 2023-11-09 2024-03-26 深圳大普微电子股份有限公司 Encryption processing method, encryption processing circuit, processing terminal, and storage medium

Also Published As

Publication number Publication date
CN114840174B (en) 2023-03-03

Similar Documents

Publication Publication Date Title
US7904498B2 (en) Modular multiplication processing apparatus
Öztürk et al. Low-power elliptic curve cryptography using scaled modular arithmetic
WO2015164996A1 (en) Elliptic domain curve operational method and elliptic domain curve operational unit
WO2020006692A1 (en) Fully homomorphic encryption method and device and computer readable storage medium
US9268564B2 (en) Vector and scalar based modular exponentiation
US8862651B2 (en) Method and apparatus for modulus reduction
CN109145616B (en) SM2 encryption, signature and key exchange implementation method and system based on efficient modular multiplication
CN106712965B (en) Digital signature method and device and password equipment
EP3570488A1 (en) Online/offline signature system and method based on multivariate cryptography
CN113010142B (en) Novel pulse node type scalar dot multiplication double-domain implementation system and method
US20220166614A1 (en) System and method to optimize generation of coprime numbers in cryptographic applications
Zhang et al. Efficient prime-field arithmetic for elliptic curve cryptography on wireless sensor nodes
CN114840174B (en) System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers
JP4180024B2 (en) Multiplication remainder calculator and information processing apparatus
JP3542278B2 (en) Montgomery reduction device and recording medium
Moon et al. Fast VLSI arithmetic algorithms for high-security elliptic curve cryptographic applications
JP4170267B2 (en) Multiplication remainder calculator and information processing apparatus
US7319750B1 (en) Digital circuit apparatus and method for accelerating preliminary operations for cryptographic processing
CN112737778A (en) Digital signature generation and verification method and device, electronic equipment and storage medium
CN113114462A (en) Small-area scalar multiplication circuit applied to ECC (error correction code) safety hardware circuit
CN116527274A (en) Elliptic curve signature verification method and system based on multi-scalar multiplication rapid calculation
CN114238205A (en) High-performance ECC coprocessor system resisting power consumption attack
CN114594925A (en) Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof
Arazi et al. On calculating multiplicative inverses modulo $2^{m} $
CN117992990B (en) Efficient homomorphic encryption method for power data, processor and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant