CN114840174B - System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers - Google Patents

System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers Download PDF

Info

Publication number
CN114840174B
CN114840174B CN202210565348.XA CN202210565348A CN114840174B CN 114840174 B CN114840174 B CN 114840174B CN 202210565348 A CN202210565348 A CN 202210565348A CN 114840174 B CN114840174 B CN 114840174B
Authority
CN
China
Prior art keywords
data
bit
module
multipliers
asymmetric
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210565348.XA
Other languages
Chinese (zh)
Other versions
CN114840174A (en
Inventor
王立峰
张奇惠
刘曼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Wise Security Technology Co Ltd
Original Assignee
Guangzhou Wise Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Wise Security Technology Co Ltd filed Critical Guangzhou Wise Security Technology Co Ltd
Priority to CN202210565348.XA priority Critical patent/CN114840174B/en
Publication of CN114840174A publication Critical patent/CN114840174A/en
Application granted granted Critical
Publication of CN114840174B publication Critical patent/CN114840174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/722Modular multiplication
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a system and a method for quickly realizing Montgomery modular multiplication by using multiple multipliers, and relates to the technical field of high-efficiency performance algorithms of security chips. The system carries out combined operation based on the existing point addition, double points, modular exponentiation, modular inversion, modular subtraction, modular addition and modular multiplication modules, optimizes the calculation mode of the original Montgomery modular multiplication formula loop iteration, uses a plurality of 64-bit multipliers for parallel operation, greatly improves the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improves the performance of a security chip.

Description

System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers
Technical Field
The invention relates to the technical field of security algorithms, in particular to a system and a method for quickly realizing Montgomery modular multiplication by using multiple multipliers.
Background
At present, the asymmetric cryptographic chip mostly uses an elliptic curve, and the elliptic curve public key cryptography is based on the following curve characteristics: 1. the elliptic curves on the finite field form a finite exchange group under the point addition operation, and the order of the finite exchange group is similar to the scale of the fundamental field. 2. Similar to exponentiation in finite field multiplications, the multiple point operations in elliptic curves constitute a one-way function.
In the multiple point operation, the problem of solving the multiple by knowing the multiple point and the base point is called an elliptic curve discrete logarithm problem. For the discrete logarithm problem of a general elliptic curve, only an exponential calculation complexity solving method exists at present. Compared with the large number decomposition problem and the discrete logarithm problem in a finite field, the solution difficulty of the elliptic curve discrete logarithm problem is much larger.
The elliptic curve public key password is composed of operations of point multiplication and multiple points and point addition and modular exponentiation in curve calculation, and can be finally decomposed into operation modes of modular multiplication, modular addition and modular subtraction.
The implementation of large digital-to-analog multiplication in the prior art mainly uses Montgomery modular multiplication, and because the Montgomery modular multiplication formula is circularly and iteratively calculated, the current modular multiplication calculation speed is limited by the formula, so that the speed of signature, signature verification, encryption, decryption and key generation of asymmetric algorithm chip operation using the elliptic curve calculation method is only dozens of times to hundreds of times per second, and the method becomes the bottleneck of asymmetric encryption chip operation.
Disclosure of Invention
The present invention is directed to a system and method for fast implementing Montgomery modular multiplication using multiple multipliers, so as to solve the foregoing problems in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a system for rapidly realizing Montgomery modular multiplication by using a multi-multiplier comprises an asymmetric algorithm chip and an upper computer, wherein the asymmetric algorithm chip comprises a processor, an asymmetric hardware module and a random data module, and the processor, the asymmetric hardware module and the random data module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point doubling module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and outputs the result data to the upper computer.
Preferably, the processor is a microprocessor.
Another object of the present invention is to provide a method for fast implementing montgomery modular multiplication by using multiple multipliers, based on the system for fast implementing montgomery modular multiplication by using multiple multipliers, comprising the following steps:
the processor acquires data and parameters to be operated, writes the data and the parameters into the RAM of the asymmetric hardware module, and writes an operation mode to be operated into a register of the asymmetric hardware module;
after the asymmetric hardware module detects an enable bit of a register, calling a corresponding operation module according to a determined operation mode to operate data and parameters to be operated;
writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and uploads the result data to the upper computer.
Preferably, the method calls a corresponding operation module to operate the data and the parameters to be operated according to the determined operation mode, and specifically includes the following steps:
the method comprises the following steps:
s1, confirming a data length W to be subjected to modular multiplication operation, and determining that the modulus of the data length is M, M [0], M [1], M [2] and (8230) \\ M [ e ] is a 64-bit length grouping of M from low to high; a and B are multipliers of the data length, A [0], A [1], A [2], \ 8230, A [ e ] is 64-bit length grouping of A from low to high, B [0], B [1], B [2], \ 8230, B [ e ] is 64-bit length grouping of B from low to high; defining Md, T1, T2, u, W, V, W ', V' as intermediate result registers;
e = W/64 in step S1; w may be selected to have a data length including, but not limited to, 256bit,512bit, and 1024bit.
S2, firstly, a 64-bit multiplier is used for carrying out a first cycle operation to calculate a pre-operation value Md, wherein the pre-operation value Md = B [0] Mc, mc = - { M1, M0} -1mod P, and P is a 64 th power of 2;
s3, using two 64-bit multipliers and an adder to carry out second period operation, and obtaining a parameter u of the first period operation through parallel calculation 1
Parameter u 1 The technical process comprises the following steps: u. of 1 =T1+T2,T1=r0*Mc、T2=A[0]* Md, r0= r% W, W being the power of 64 of 2, r being the modulo 64 th bit of data;
s4, eight 64-bit multipliers are used for third-period operation, and the lowest 64-bit data A [0] of the multipliers A is adopted]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And the parameter u of the first round of operation 1 Parallel computingIntermediate result r0 1
r0 1 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[0]*B[0]、V[1]=u 1 *M[0]、W[0]=A[0]*B[1]、W[1]=u 1 *M[1]、 V`[0]=A[0]*B[2]、V`[1]=u 1 *M[2]、W`[0]=A[0]*B[3]、W`[1]=u 1 *M[3]。
S5, using two 64-bit multipliers and an adder to carry out fourth period operation, and calculating the parameter u of the second round of operation in parallel 2 ;u 2 =T1+T2,T1=r0 1 *Mc、T2=A[0]*Md;
S6, eight 64-bit multipliers are used for the fifth cycle operation, and the second lower 64-bit data A [1] of the multiplier A is used]All bit data B [0] of B and B]、B[1]、B[2]、…、B[e]And a second round parameter u 2 Parallel calculation of intermediate result r0 2
r0 2 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[1]*B[0]、V[1]=u 2 *M[0]、W[0]=A[1]*B[1]、W[1]=u 2 *M[1]、 V`[0]=A[1]*B[2]、V`[1]=u 2 *M[2]、W`[0]=A[1]*B[3]、W`[1]=u 2 *M[3]。
S7, using two 64-bit multipliers and an adder to carry out sixth-cycle operation, and calculating the parameter u of the third cycle in parallel 3 ;u 3 =T1+T2,T1=r0 2 *Mc、T2=A[0]*Md;
S8, eight 64-bit multipliers are used for carrying out seventh-cycle operation, and the second highest 64-bit data A [2] of the multiplier A]All bit data B [0] of and B]、B[1]、B[2]…、B[e]And a third round parameter u 3 Parallel calculation of intermediate result r0 3
r0 3 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[2]*B[0]、V[1]=u 3 *M[0]、W[0]=A[2]*B[1]、W[1]=u 3 *M[1]、 V`[0]=A[2]*B[2]、V`[1]=u 3 *M[2]、W`[0]=A[2]*B[3]、W`[1]=u 3 *M[3]。
S9, circularly repeating the steps S7-S8 until parallel computing is performedParameter u of round 4 4
S10, adopting maximum 64bit data A [3 ] of multiplier A]All bit data B [0] of B and B]、B[1]、 B[2]、…、B[e]And the 4 th round parameter u4, and the final result r0 is calculated in parallel 4
Preferably, the first and second liquid crystal materials are,
preferably, W in step S1 is 256 bits, and the pre-calculation value Md = B [0] in step S2]*Mc, Mc=-{M1,M0} -1 mod P, P being a power of 64 of 2;
parameter u of the first round of operation in step S3 1 The calculation method of (1) is as follows: u. of 1 =T1+T2,T1=r0*Mc、 T2=A[0]* Md, r0= r% W, W being the 64 th power of 2, r being the modulo 64 th bit of data;
r0 in step S4 1 The calculation formula of (c) is: r0 1 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[0]*B[0]、V[1]=u 1 *M[0]、 W[0]=A[0]*B[1]、W[1]=u 1 *M[1]、V`[0]=A[0]*B[2]、V`[1]=u 1 *M[2]、 W`[0]=A[0]*B[3]、W`[1]=u 1 *M[3]。
Preferably, in step S5: u. of 2 =T1+T2,T1=r0 1 *Mc、T2=A[0]*Md;
R0 in step S6 2 The calculation formula of (2) is as follows: r0 2 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[1]*B[0]、V[1]=u 2 *M[0]、 W[0]=A[1]*B[1]、W[1]=u 2 *M[1]、V`[0]=A[1]*B[2]、V`[1]=u 2 *M[2]、 W`[0]=A[1]*B[3]、W`[1]=u 2 *M[3]。
Preferably, in step S7: u. of 3 =T1+T2,T1=r0 2 *Mc、T2=A[0]*Md;
R0 in step S8 3 The calculation formula of (2) is as follows: r0 3 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[2]*B[0]、V[1]=u 3 *M[0]、 W[0]=A[2]*B[1]、W[1]=u 3 *M[1]、V`[0]=A[2]*B[2]、V`[1]=u 3 *M[2]、 W`[0]=A[2]*B[3]、W`[1]=u 3 *M[3]。
Preferably, in step S9: u. of 4 =T1+T2,T1=r0 3 *Mc、T2=A[0]*Md;
R0 in step S10 4 The calculation formula of (c) is: r0 4 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[3]*B[0]、V[1]=u 4 *M[0]、 W[0]=A[3]*B[1]、W[1]=u 4 *M[1]、V`[0]=A[3]*B[2]、V`[1]=u 4 *M[2]、 W`[0]=A[3]*B[3]、W`[1]=u 4 *M[3]。
The invention has the beneficial effects that:
the invention provides a system and a method for rapidly realizing Montgomery modular multiplication by using multiple multipliers, which optimize the calculation mode of the original Montgomery modular multiplication formula loop iteration, use a plurality of 64-bit multipliers for parallel operation, greatly improve the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improve the performance of a security chip.
Drawings
Fig. 1 is a chip structure in a system for quickly implementing montgomery modular multiplication using multiple multipliers provided in embodiment 1;
fig. 2 is a schematic diagram of the principle of the method for quickly implementing montgomery modular multiplication using multiple multipliers provided in embodiment 2.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are given by way of illustration only.
Example 1
The embodiment provides a system for rapidly realizing Montgomery modular multiplication by using a multi-multiplier, which comprises an asymmetric algorithm chip and an upper computer, wherein the asymmetric algorithm chip is shown in FIG. 1 and comprises a processor, an asymmetric hardware module and a random data module, and the processor, the asymmetric hardware module and the random data module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point multiplication module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and outputs the result data to the upper computer.
Example 2
The embodiment provides a method for quickly implementing montgomery modular multiplication by using multiple multipliers, based on the system for quickly implementing montgomery modular multiplication by using multiple multipliers described in embodiment 1, including the following steps:
the processor acquires data and parameters to be operated, writes the data and the parameters into an RAM of the asymmetric hardware module, and writes an operation mode to be operated into a register of the asymmetric hardware module;
after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module according to a determined operation mode to operate data and parameters to be operated;
writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires the end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and uploads the result data to the upper computer.
In this embodiment, a corresponding operation module is called according to a determined operation mode to operate on data and parameters to be operated, and an operation principle of a multiplier adopted in the embodiment is shown in fig. 2, and the method specifically includes the following steps:
s1, confirming a data length W to be subjected to modular multiplication operation, and determining that the modulus of the data length is M, M [0], M [1], M [2] and (8230) \\ M [ e ] is a 64-bit length grouping of M from low to high; a and B are multipliers of the data length, A [0], A [1], A [2], \ 8230, A [ e ] is a 64-bit length grouping of A from low to high, B [0], B [1], B [2], \ 8230, B [ e ] is a 64-bit length grouping of B from low to high; defining Md, T1, T2, u, W, V, W ', V' as intermediate result registers;
s2, firstly, a 64-bit multiplier is used for carrying out first cycle operation to calculate a pre-operation value Md;
s3, using two 64-bit multipliers and an adder to perform second period operation, and obtaining a parameter u of the first period operation through parallel calculation 1
S4, eight 64-bit multipliers are used for third period operation, and the lowest 64-bit data A [0] of the multipliers A is adopted]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And the parameter u of the first round of operation 1 Parallel calculation of the intermediate result r0 1
S5, using two 64-bit multipliers and an adder to carry out fourth period operation, and calculating the parameter u of the second round of operation in parallel 2
S6, eight 64-bit multipliers are used for the fifth cycle operation, and the second lower 64-bit data A [1] of the multiplier A is used]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And a second round parameter u 2 Parallel calculation of the intermediate result r0 2
S7, using two 64-bit multipliers and an adder to carry out sixth-cycle operation, and calculating the parameter u of the third cycle in parallel 3
S8, eight 64-bit multipliers are used for carrying out seventh cycle operation, and 64-bit data A [2] of the second highest of the multipliers A]All bit data B [0] of and B]、B[1]、B[2]…、B[e]And a third round parameter u 3 Parallel calculation of the intermediate result r0 3
S9, repeating the steps S7-S8 circularly until the parameter u of the e-th round is calculated in parallel e+1
S10, adopting the highest 64-bit data A [ e ] of the multiplier A]All bit data B [0] of and B]、B[1]、 B[2]、…、B[e]And the e-th round parameter u e+1 And calculating the final result r0 in parallel e+1
E = W/64 in step S1 in the present embodiment; w may be selected to have a data length including, but not limited to, 256bit,512bit,1024 bit.
In a more preferred embodiment, W in step S1 is 256 bits, the precalculated value Md = B [0] Mc in step S2,
parameter u of the first round of operation in step S3 1 The calculation method of (1) is as follows: u. of 1 =T1+T2,T1=r0*Mc、 T2=A[0]* Md, r0= r% W, W being the power of 64 of 2, r being the modulo 64 th bit of data;
r0 in step S4 1 The calculation formula of (c) is: r0 1 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[0]*B[0]、V[1]=u 1 *M[0]、 W[0]=A[0]*B[1]、W[1]=u 1 *M[1]、V`[0]=A[0]*B[2]、V`[1]=u 1 *M[2]、 W`[0]=A[0]*B[3]、W`[1]=u 1 *M[3]。
In step S5 in this embodiment: u. u 2 =T1+T2,T1=r0 1 *Mc、T2=A[0]*Md;
R0 in step S6 2 The calculation formula of (c) is: r0 2 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[1]*B[0]、V[1]=u 2 *M[0]、 W[0]=A[1]*B[1]、W[1]=u 2 *M[1]、V`[0]=A[1]*B[2]、V`[1]=u 2 *M[2]、 W`[0]=A[1]*B[3]、W`[1]=u 2 *M[3]。
In step S7 in the present embodiment: u. of 3 =T1+T2,T1=r0 2 *Mc、T2=A[0]*Md;
R0 in step S8 3 The calculation formula of (c) is: r0 3 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[2]*B[0]、V[1]=u 3 *M[0]、 W[0]=A[2]*B[1]、W[1]=u 3 *M[1]、V`[0]=A[2]*B[2]、V`[1]=u 3 *M[2]、 W`[0]=A[2]*B[3]、W`[1]=u 3 *M[3]。
In step S9 in the present embodiment: u. of 4 =T1+T2,T1=r0 3 *Mc、T2=A[0]*Md;
R0 in step S10 4 The calculation formula of (2) is as follows: r0 4 ={(W`[0]+W`[1]):(V`[0]+V`[1]): (W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[3]*B[0]、V[1]=u 4 *M[0]、W[0]=A[3]*B[1]、W[1]=u 4 *M[1]、V`[0]=A[3]*B[2]、V`[1]=u 4 *M[2]、 W`[0]=A[3]*B[3]、W`[1]=u 4 *M[3]。
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the method optimizes the calculation mode of the original Montgomery modular multiplication formula loop iteration, uses a plurality of 64-bit multipliers for parallel operation after conversion, greatly improves the speed of operation signature, signature verification, encryption, decryption and key generation of an asymmetric algorithm chip, and improves the performance of a security chip.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims (7)

1. A method for rapidly realizing Montgomery modular multiplication by using a multi-multiplier is characterized in that a system for rapidly realizing Montgomery modular multiplication based on the multi-multiplier is realized, the system comprises an asymmetric algorithm chip and an upper computer, the asymmetric algorithm chip comprises a processor, an asymmetric hardware module and a random number module, and the processor, the asymmetric hardware module and the random number module are all connected with a bus; the asymmetric hardware module comprises a register, a RAM and an algorithm module, wherein the algorithm module comprises a point addition module, a point doubling module, a modular exponentiation module, a modular inversion module, a modular subtraction module, a modular addition module and a modular multiplication module; the processor writes data and parameters to be operated into an RAM of the asymmetric hardware module through a bus, and writes a mode to be operated into a register of the asymmetric hardware module; after the asymmetric hardware module detects the enable bit of the register, calling a corresponding operation module to perform operation according to the operation mode, writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires an end mark in the asymmetric hardware module register, reads out result data in the RAM through the bus, and outputs the result data to the upper computer;
the method comprises the following steps:
the processor acquires data and parameters to be operated, writes the data and the parameters into an RAM of the asymmetric hardware module, and writes an operation mode to be operated into a register of the asymmetric hardware module;
after the asymmetric hardware module detects an enable bit of a register, calling a corresponding operation module according to a determined operation mode to operate data and parameters to be operated;
writing result data into the RAM after the operation is finished, and simultaneously setting an end mark in the register and generating interruption; the processor receives the interrupt or inquires an end mark in the asymmetric hardware module register, reads the result data in the RAM through the bus, and uploads the result data to the upper computer;
calling a corresponding operation module according to the determined operation mode to operate the data and the parameters to be operated, and specifically comprising the following steps of:
s1, confirming a data length W to be subjected to modular multiplication operation, and determining that the module of the data length is M, wherein M0, M1, M2 and 8230, M e is a 64-bit length group of M from low to high; a and B are multipliers of the data length, A [0], A [1], A [2], \ 8230, A [ e ] is a 64-bit length grouping of A from low to high, B [0], B [1], B [2], \ 8230, B [ e ] is a 64-bit length grouping of B from low to high; defining Md, T1, T2, u, W, V, W 'and V' as intermediate result registers;
s2, firstly, a 64-bit multiplier is used for carrying out first cycle operation to calculate a pre-operation value Md;
s3, using two 64-bit multipliers and an adder to perform second period operation, and obtaining a parameter u of the first period operation through parallel calculation 1
S4, eight 64-bit multipliers are used for third period operation, and the lowest 64-bit data A [0] of the multipliers A is adopted]All bit data B [0] of B and B]、B[1]、B[2]、…、B[e]And the parameter u of the first round of operation 1 Parallel calculation of the intermediate result r0 1
S5, using two 64-bit multipliers and an adder to carry out fourth period operation, and calculating the parameter u of the second round operation in parallel 2
S6, eight 64-bit multipliers are used for the fifth cycle operation, and the second lower 64-bit data A [1] of the multiplier A is used]All bit data B [0] of and B]、B[1]、B[2]、…、B[e]And a second round parameter u 2 Parallel calculation of intermediate result r0 2
S7, using two 64-bit multipliers and an adder to carry out sixth-cycle operation, and calculating the parameter u of the third cycle in parallel 3
S8, eight 64-bit multipliers are used for carrying out seventh cycle operation, and 64-bit data A [2] of the second highest of the multipliers A]All bit data B [0] of and B]、B[1]、B[2]、B[3]And a third round parameter u 3 Parallel calculation of the intermediate result r0 3
S9, circularly repeating the steps S7-S8 until the parameter u of the e-th round is calculated in parallel e+1
S10, adopting the highest 64-bit data A [ e ] of the multiplier A]All bit data B [0] of B and B]、B[1]、B[2]、…、B[e]And the e-th round parameter u e+1 And calculating the final result r0 in parallel e+1
2. The method of claim 1, wherein the processor is a microprocessor.
3. The method for fast Montgomery modular multiplication using multiple multipliers of claim 1, wherein e = W/64 in step S1; w may optionally have a data length including, but not limited to, 256bit,512bit, 1024bit.
4. Root of herbaceous plantThe method as claimed in claim 3, wherein W in step S1 is 256 bits, and the pre-calculated value Md = B [0] in step S2]*Mc,Mc=-{M1,M0} -1 mod P, P being a power of 64 of 2;
parameter u of the first round of operation in step S3 1 The calculation method of (1) is as follows: u. of 1 =T1+T2,T1=r0*Mc、T2=A[0]* Md, r0= r% W, W being the power of 64 of 2, r being the modulo 64 th bit of data;
r0 in step S4 1 The calculation formula of (2) is as follows: r0 1 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[0]*B[0]、V[1]=u 1 *M[0]、W[0]=A[0]*B[1]、W[1]=u 1 *M[1]、V`[0]=A[0]*B[2]、V`[1]=u 1 *M[2]、W`[0]=A[0]*B[3]、W`[1]=u 1 *M[3]。
5. The method for fast Montgomery modular multiplication using multiple multipliers as claimed in claim 4, wherein in step S5: u. u 2 =T1+T2,T1=r0 1 *Mc、T2=A[0]*Md;
R0 in step S6 2 The calculation formula of (2) is as follows: r0 2 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[1]*B[0]、V[1]=u 2 *M[0]、W[0]=A[1]*B[1]、W[1]=u 2 *M[1]、V`[0]=A[1]*B[2]、V`[1]=u 2 *M[2]、W`[0]=A[1]*B[3]、W`[1]=u 2 *M[3]。
6. The method for fast Montgomery modular multiplication using multiple multipliers according to claim 5, wherein in step S7: u. of 3 =T1+T2,T1=r0 2 *Mc、T2=A[0]*Md;
R0 in step S8 3 The calculation formula of (2) is as follows: r0 3 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[2]*B[0]、V[1]=u 3 *M[0]、W[0]=A[2]*B[1]、W[1]=u 3 *M[1]、V`[0]=A[2]*B[2]、V`[1]=u 3 *M[2]、W`[0]=A[2]*B[3]、W`[1]=u 3 *M[3]。
7. The method for fast implementation of montgomery modular multiplication using multiple multipliers according to claim 6, wherein in the step S9: u. of 4 =T1+T2,T1=r0 3 *Mc、T2=A[0]*Md;
R0 in step S10 4 The calculation formula of (2) is as follows: r0 4 ={(W`[0]+W`[1]):(V`[0]+V`[1]):(W[0]+W[1]):(V[0]+V[1]) }; wherein, V0]=A[3]*B[0]、V[1]=u 4 *M[0]、W[0]=A[3]*B[1]、W[1]=u 4 *M[1]、V`[0]=A[3]*B[2]、V`[1]=u 4 *M[2]、W`[0]=A[3]*B[3]、W`[1]=u 4 *M[3]。
CN202210565348.XA 2022-05-18 2022-05-18 System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers Active CN114840174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210565348.XA CN114840174B (en) 2022-05-18 2022-05-18 System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210565348.XA CN114840174B (en) 2022-05-18 2022-05-18 System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers

Publications (2)

Publication Number Publication Date
CN114840174A CN114840174A (en) 2022-08-02
CN114840174B true CN114840174B (en) 2023-03-03

Family

ID=82571290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210565348.XA Active CN114840174B (en) 2022-05-18 2022-05-18 System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers

Country Status (1)

Country Link
CN (1) CN114840174B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117240601B (en) * 2023-11-09 2024-03-26 深圳大普微电子股份有限公司 Encryption processing method, encryption processing circuit, processing terminal, and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020172355A1 (en) * 2001-04-04 2002-11-21 Chih-Chung Lu High-performance booth-encoded montgomery module
JP2004326112A (en) * 2003-04-25 2004-11-18 Samsung Electronics Co Ltd Multiple modulus selector, accumulator, montgomery multiplier, method of generating multiple modulus, method of producing partial product, accumulating method, method of performing montgomery multiplication, modulus selector, and booth recorder
CN100470464C (en) * 2005-10-28 2009-03-18 清华大学 Multiplier based on improved Montgomey's algorithm
CN102279725A (en) * 2011-09-01 2011-12-14 北京华大信安科技有限公司 Elliptic curve cipher (ECC) co-processor
CN104765586B (en) * 2015-04-15 2018-09-28 深圳国微技术有限公司 A kind of embedded security chip and its montgomery modulo multiplication operation method
CN109145616B (en) * 2018-08-01 2022-03-22 上海交通大学 SM2 encryption, signature and key exchange implementation method and system based on efficient modular multiplication
CN110460443A (en) * 2019-08-09 2019-11-15 南京秉速科技有限公司 The high speed point add operation method and apparatus of elliptic curve cipher

Also Published As

Publication number Publication date
CN114840174A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
US7904498B2 (en) Modular multiplication processing apparatus
US20210243006A1 (en) Integrated circuit for modular multiplication of two integers for a cryptographic method, and method for the cryptographic processing of data based on modular multiplication
WO2015164996A1 (en) Elliptic domain curve operational method and elliptic domain curve operational unit
WO2020006692A1 (en) Fully homomorphic encryption method and device and computer readable storage medium
Öztürk et al. Low-power elliptic curve cryptography using scaled modular arithmetic
US8862651B2 (en) Method and apparatus for modulus reduction
US9268564B2 (en) Vector and scalar based modular exponentiation
US7218735B2 (en) Cryptography method on elliptic curves
CN109145616B (en) SM2 encryption, signature and key exchange implementation method and system based on efficient modular multiplication
CN114840174B (en) System and method for rapidly realizing Montgomery modular multiplication by using multiple multipliers
JP4180024B2 (en) Multiplication remainder calculator and information processing apparatus
US20020126838A1 (en) Modular exponentiation calculation apparatus and modular exponentiation calculation method
Zhang et al. Efficient prime-field arithmetic for elliptic curve cryptography on wireless sensor nodes
CN113010142A (en) Novel pulse node type scalar dot multiplication dual-domain implementation system and method
JP3542278B2 (en) Montgomery reduction device and recording medium
CN112737778B (en) Digital signature generation and verification method and device, electronic equipment and storage medium
JP4170267B2 (en) Multiplication remainder calculator and information processing apparatus
Moon et al. Fast VLSI arithmetic algorithms for high-security elliptic curve cryptographic applications
CN113467754A (en) Lattice encryption modular multiplication operation method and framework based on decomposition reduction
US7319750B1 (en) Digital circuit apparatus and method for accelerating preliminary operations for cryptographic processing
CN116527274A (en) Elliptic curve signature verification method and system based on multi-scalar multiplication rapid calculation
TW201802666A (en) Non-modular multiplier, method for non-modular multiplication and computational device
CN117254909B (en) Computing method, system and storage medium for rapidly generating high-probability primitive root
Somsuk A new modified integer factorization algorithm using integer modulo 20's technique
Kodali et al. Implementations of Sunar-Koc multiplier using FPGA platform and wsn node

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant