CN111694541B

CN111694541B - Base 32 operation circuit for number theory transformation multiplication

Info

Publication number: CN111694541B
Application number: CN202010371312.9A
Authority: CN
Inventors: 华斯亮; 张惠国; 刘玉申; 徐健; 卞九辉; 张静亚
Original assignee: Changshu Institute of Technology
Current assignee: Changshu Institute of Technology
Priority date: 2020-05-06
Filing date: 2020-05-06
Publication date: 2023-04-21
Anticipated expiration: 2040-05-06
Also published as: CN111694541A

Abstract

The invention discloses a basic 32 operation circuit for number theory transformation multiplication, which comprises 32 operand generation modules, wherein each of 32 input data is divided into 11 words by taking 6 bits as one word after being subjected to high-order zero padding, 1 way of 32 96-bit operands, 16 ways of 11 192-bit operands, 3 ways of 16 192-bit operands and 12 ways of 12 192-bit operands are combined and output, each operand generation module is connected with an operand modular addition module, and the operands output by each operand generation module are subjected to modular addition; the modulo p module is used for modulo outputting the data output by each operand modulo adding module to prime number p, wherein the prime number p=2 ⁶⁴ ‑2 ³² +1. The invention combines 1024 operands from the prior art to 400 operands, greatly reduces the calculation cost and improves the calculation efficiency of the base 32 operation.

Description

Base 32 operation circuit for number theory transformation multiplication

Technical Field

The present invention relates to an arithmetic circuit, and more particularly, to a base 32 arithmetic circuit for multiplication by number-theory transformation.

Background

Large integer multiplication besides traditional long multiplication, also

Strassen algorithm. />

The core idea of Strassen algorithm is: performing FFT on a primary loop on two large integers with the length of n respectively, and converting the two large integers into frequency domain distribution; performing point multiplication on the frequency domain distribution of the two integers to obtain the frequency domain distribution of the product; the frequency domain distribution of the product is subjected to IFFT on a primary loop, thereby obtaining the product. Using a number theory transform instead of a discrete fourier transform, rounding error issues can be avoided by using modular arithmetic instead of floating point arithmetic. Multiplication by number theory transformation, in particular->

Multiplication using a number theory transformation in the Strassen algorithm. The number theory transformation and the inverse number theory transformation are used as operation cores in the number theory transformation multiplication, occupy more than 90% of operation quantity and operation time in the NTT multiplication, optimize the speed, the area and the power consumption of the number theory transformation, and have critical influence on the overall performance of the NTT multiplication.

A 1048576 point number theory transformation can be decomposed into 4-level base 32 arithmetic units and twiddle factor multiplication operations. The rotation factor can be calculated in advance and stored in the ROM, and can be directly read when the rotation factor is needed to be used. The calculated amount of the base 32 operation accounts for more than 90% of the number theory transformation, and the optimization of the number theory transformation is of great importance.

Large integer multiplier FPGA design and implementation, xie Xing et al, electronic and informatics report, 2019. The paper describes a paper based

Large integer multiplier hardware architecture of Strassen algorithm. The paper decomposes the 65536 point number theory transformation into 64 point and 1024 point forms, and the 1024 point number theory transformation uses a structure constructed serially by 2-level base 32 operation. The base 32 operation includes 32 shift units and a tree-like large-number summation processing unit. The "0" padding approach adopted by the paper makes each tree-like large-number summation processing unit required to process 32 data of 192 bits, and the whole radix 32 operation required to process 32×32=1024 operands. The base 32 arithmetic circuit is not efficient enough, resulting in relatively large power consumption and resources required after the circuit is implemented.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a base 32 operation circuit for number theory transformation multiplication, which solves the problems of high power consumption and high resource expense of the base 32 operation circuit.

The technical scheme of the invention is as follows: a base 32 arithmetic circuit for number-wise transform multiplication, comprising:

the operand generating module is provided with 32 operations and 32 operationsThe number generation module numbers Xk, k=0, 1,2, &..31, each of the operand generation modules includes a dividing circuit that divides each of 32 input data into 11 words with 6 bits as one word after performing high-order zero padding, a combining circuit, and a zero padding circuit, and the divided input data is x _n，m N is more than or equal to 0 and less than or equal to 32, m is more than or equal to 0 and less than or equal to 11, the merging circuit forms operand output by the input data divided into 32 multiplied by 11 words, 1 output in the merging circuit of 32 operand generating modules is 32 96-bit operands, 16 outputs are 11 192-bit operands, 3 outputs are 16 192-bit operands and 12 outputs are 12 192-bit operands, and a zero filling circuit fills a gap when the merging circuit outputs the operands into 0;

an operand modulo adding module for modulo adding the operands output by each operand generating module;

the method comprises the steps of,

the modulo p module is used for realizing that the data output by each operand modulo addition module is modulo-added to a prime number p and then output, wherein the prime number p=2 ⁶⁴ -2 ³² +1。

Further, the operand generation module outputting 32 96-bit operands is numbered X0, the last 11 words of each 96-bit operand are input data, and the first 5 words are assigned zero.

Further, the operand generation module outputting 11 192-bit operands is numbered Xk, k is an odd number, and each operand OP _m From 32 different input data x _n，m N is more than or equal to 0 and less than 32, the same word index m is used, m is more than or equal to 0 and less than 11, and x is formed by combining _n，m Is at the lowest position of OP _m Is calculated from 6× (m+nk) (mod 192).

Further, the number of the operand generation modules outputting the 16 192-bit operands is X8, X16 and X24, the 16 operands are divided into 8 groups, each group has 2 operands, OP0 and OP1 are one group, OP2 and OP3 are one group, and so on, the operands OP in each group _2j And OP (optical path) _2j+1 From 44 different input data x _n，m ，4j≤n≤4j+3，0≤mIs combined with < 11, x _n，m Is at the lowest position of OP _2j And OP (optical path) _2j+1 The position of (2) is calculated from 6× (m+nk) (mod 192), x _n，m Preferential placement on OP _2j In, e.g. OP _2j Is already occupied, then is placed in OP _2j+1 Corresponding to the position of the object.

Further, the 12 operands output as 192 bit operands are numbered Xk except X0, X8, X16 and X24, k is even, 12 operands are divided into 2 groups, OP0 to OP5 are one group, OP6 to OP11 are one group, and the operands OP in each group _6j To OP _6j+5 From 176 different input data x _n，m And (2) the components are formed by combining 16j is not less than n and not more than 16j+15,0 is not less than m and not more than 11, and x is not less than 0 and not more than 11 _n，m Is at the lowest position of OP _6j To OP _6j+5 The position of (2) is calculated from 6× (m+nk) (mod 192), x _n，m Merging operands with 2 words as period, and placing the operands in OP preferentially _6j To OP _6j+5 In OP with smaller middle index number.

The technical scheme provided by the invention has the advantages that:

the 'zero filling' vacancy after the operand shift is utilized, the operands of the base 32 operation in the number theory transformation multiplication are combined, the operands are combined to 400 from 1024 in the prior art, the calculation cost is greatly reduced, and the calculation efficiency of the base 32 operation is improved.

Drawings

Fig. 1 is a schematic diagram of the general structure of a base 32 arithmetic circuit for number-theory transform multiplication according to the present invention.

Fig. 2 is a schematic diagram of a partitioning method for filling zero into input data by a partitioning circuit in an operand generation module.

FIG. 3 is a schematic diagram of a split circuit in an operand generation module.

FIG. 4 is a schematic diagram of output data obtained by the merging circuit of the X0 operand generation module.

FIG. 5 is a schematic diagram of a merging circuit of an X0 operand generation module.

FIG. 6 is a diagram of the merged operands of the merging circuit of the X1 operand generation module.

FIG. 7 is a merging circuit of operand number 0 OP0 in the X1 operand generation module.

FIG. 8 is a diagram of the merged operands of the merging circuit of the X3 operand generation module.

FIG. 9 is a diagram of the merged operands of the merging circuit of the X16 operand generation module.

FIG. 10 is a diagram of the merged operands of the merging circuit of the X2 operand generation module.

FIG. 11 is a circuit schematic of a 32-operand modulo addition module.

FIG. 12 is a circuit schematic of an 11-operand modulo addition module.

FIG. 13 is a circuit schematic of a 16-operand modulo addition module.

FIG. 14 is a circuit schematic of a 12-operand modulo addition module.

Detailed Description

The present invention is further described below with reference to examples, which are to be construed as merely illustrative of the present invention and not a limitation of the scope of the present invention, since various modifications to the equivalent arrangements of the present invention will become apparent to those skilled in the art upon reading the present invention, which are intended to be within the scope of the appended claims.

The formula of the base 32 operation is as follows

Wherein k is more than or equal to 0 and less than 32, p is a prime number, W ₃₂ Is the 32 th unit root.

Where prime p is the Solinas prime, p=2 ⁶⁴ -2 ³² +1. The prime number supports efficient modulo operations: 2 ¹⁹² mod p＝1，2 ⁹⁶ mod p＝-1，2 ⁶⁴ mod p＝2 ³² -1. Unit root W calculated by using the prime number ₃₂ ＝2 ⁶ The characteristic of being the power of 2 can conveniently convert the multiplication and addition operation into shift and modulo addition operation, and reduce the computational complexity of the number theory transformation. Thus, the base 32 operation can be written as

Each x _n In a basic unit of 6 bits, divided into 11 words, called x _n，m ，0≤m＜11。x _n Can be expressed as

Wherein m represents the mth word, x _n Is 64 bits, x _n，m Is 6 bits, x _n，10 The valid data bits of (2) are 4 bits. After dividing the input data, the base 32 operation can be written into the following formula, and the shifted operands can be combined by using 0 filling, so that the modulo addition operation operand is reduced.

Referring to fig. 1, a basic 32 operation circuit for multiplication of number theory transformation according to this embodiment includes 32 operand generating modules, operand modulo adding modules and modulo p modules, wherein the operand modulo adding modules are divided into a 32 operand modulo adding module, an 11 operand modulo adding module, a 16 operand modulo adding module and a 12 operand modulo adding module according to the number of input operands. The input 32 64-bit data on the circuit structure is used as the input of each operand generating module, the operand generating module is connected with an operand modulo adding module, and the operand modulo adding module is connected with a modulo p module.

The operand generation module comprises a dividing circuit, a combining circuit and a zero filling circuit, and sequentially divides, combines and fills zero into 32 64-bit data to form operands. Referring to fig. 2 and 3, the dividing circuit divides each 64-bit input data x _n The highest 2 bits of (a) fill 0 to form 66 bits of data, and then split into 11 words, each word containing 6 bits, the 11 th word being 4 bits of valid data because the highest 2 bits fill 0. Data segmentation energyCan be easily implemented with existing hardware with little hardware overhead.

With Xk, k=0, 1, 2..31 numbers operand generation modules, the merging circuits in each operand generation module are different, but may be divided into 4 groups by type, with the circuits within each group being similar.

Group one: x0 is 1 in total; group II: k such as X1, X3, X5 and the like is odd, and 16 are taken as a total; group III: x8, X16 and X24, 3 in total; group four: k other than group one and group three is an even number, such as X2, X4, X6, etc., for a total of 12.

The following grouping explains the data merging operation for each group:

group one, i.e., the merging circuit of the X0 operand generation modules.

The operands are actually aligned input data. In other words, each operand is derived from 11 consecutive words of the output data of the segmentation circuit. The merging circuit outputs 32 96-bit operands, each new 96-bit operand consisting of 16 words, the last 11 words being the input data and the first 5 words being allocated to zeros. As shown in fig. 4, operand No. i OP _j With 96 bits, x _n The merging circuit is shown in fig. 5, which is obtained by setting the merging circuit at the low 66 bits and filling zeros at the high 30 bits.

And the merging circuit of the odd operand generating modules such as the group II, X1, X3, X5 and the like.

For the merging circuit of the Xk operand generation block with k being an odd number, the input is 32 64-bit input data and the output is 11 192-bit operands. Each operand OP _m From 32 different data x _n，m N is more than or equal to 0 and less than 32, and the same word index m is used, and m is more than or equal to 0 and less than 11. X is x _n，m Is at the lowest position of OP _m Is calculated from 6× (m+nk) (mod 192). The following is an example of the operand composition output using X1 and X3:

the merging circuit of the X1 operand generation module merges the operands as shown in fig. 6. There are 11 operands, each of which consists of 32 different data x _n，m N is more than or equal to 0 and less than 32, and the same word index m is used, and m is more than or equal to 0 and less than 11. X is x _0，0 Is the least significant in OP0The position is 6× (0+0×1) (mod 192) =0, x _1，0 The position of the lowest bit in OP0 is 6× (0+1×1) (mod 192) =6, and x _0，1 The position of the lowest bit in OP1 is 6× (1+0×1) (mod 192) =6, x _31，1 The position of the lowest bit in OP1 is 6× (1+31×1) (mod 192) =0. The merging circuit of operand number 0 OP0 in the X1 operand generation module is shown in fig. 7.

The merging circuit of the X3 operand generation module merges the operands as shown in fig. 8. X is x _0，0 The position of the lowest bit in OP0 is 6× (0+0×3) (mod 192) =0, x _1，0 The position of the lowest bit in OP0 is 6× (0+1×3) (mod 192) =18, and x _0，1 The position of the lowest bit in OP1 is 6× (1+0×3) (mod 192) =6, x _31，1 The position of the lowest bit in OP1 is 6× (1+31×3) (mod 192) =180.

The operands output by the merging circuits of the remaining operand generation modules are analogized.

Group three, merging circuits of the X8, X16, and X24 operand generation modules.

The input is 32 64-bit input data and the output is 16 192-bit operands. The 16 operands are grouped into 8 groups of 2 operands each, OP0 and OP1 are one group, OP2 and OP3 are one group, and so on. Operands OP within each group _2j And OP (optical path) _2j+1 From 44 different data x _n，m And n is more than or equal to 4j and less than or equal to 4j+3, and m is more than or equal to 0 and less than 11. X is x _n，m Is at the lowest position of OP _2j And OP (optical path) _2j+1 Is calculated from 6× (m+nk) (mod 192). X is x _n，m Preferential placement on OP _2j In, e.g. OP _2j Is already occupied, then is placed in OP _2j+1 Corresponding to the position of the object. The remaining slots are all filled with "0". Taking the merging circuit output data of the X16 operand generation module as an example, as shown in fig. 9, there are 8 sets of operands, each set including 2 merged operands. Each new 192-bit operand consists of 32 words, from 2 different input data, each providing 11 consecutive words. 192, the upper 30 bits and the 30 bits between two consecutive 11 words are filled with 0's.

Group four, the merging circuits of the even operand generation modules other than group one and group three.

For the merging circuit of the Xk operand generation block where k is an even number other than 0, 8, 16 or 24, the input is 32 64-bit input data and the output is 12 192-bit operands. The 12 operands are grouped into 2 groups of 6 operands each, OP0 through OP5 being one group and OP6 through OP11 being one group. Operands OP within each group _6j To OP _6j+5 From 176 different data x _n，m And (3) the combination of the n and the m which are not less than 16j and not more than 16j+15 and 0 and not less than 11. X is x _n，m Is at the lowest position of OP _6j To OP _6j+5 Is calculated from 6× (m+nk) (mod 192). X is x _n，m Merging operands with 2 words as period, and placing the operands in OP preferentially _6j To OP _6j+5 In OP with smaller middle index number. The remaining slots are all filled with "0". Taking the merging circuit output data of the X2 operand generation module as an example, as shown in fig. 9, there are 2 sets of operands, each set including 6 merged operands. The first group comprises OP0 to OP5; the second group includes OP6 to OP11. Each new 192-bit operand consists of 32 words, which come from 16 different input data, each providing 2 consecutive words.

The number of the operands is different according to the different groups of operand generating modules, and the operand modulo adding module comprises a 32-operand modulo adding module, an 11-operand modulo adding module, a 16-operand modulo adding module and a 12-operand modulo adding module.

The 32-operand modulo addition module is shown in FIG. 11, where CSA represents a Carry-save adder, CPA represents a Carry-ripple adder, and "< 1" represents shifting the Carry-side (Carry-side) of the Carry-save adder 1 bit to the left. Of the 32 operands, the operand in the 4i, i=1, 2,..8 positions is reserved, and the rest of the operands are input into the first layer CSA every three; shifting the carry end of the first layer CSA by 1 bit to the left with its sum end and 4i, i=1, 2; shifting the sum end of each two second-layer CSAs to the left by 1 bit and inputting the bit into the third-layer CSA; the carry end of the third layer CSA shifts 1 bit leftwards, the sum end of the third layer CSA shifts 1 bit leftwards, and the carry end of the other second layer CSA in every two second layers CSA is input into the fourth layer CSA; shifting the sum end of every two fourth-layer CSAs to the left by 1 bit and inputting the bit into a fifth-layer CSA; the carry end of the fifth layer CSA shifts 1 bit leftwards, the sum end of the fifth layer CSA shifts 1 bit leftwards and inputs the carry end of the other fourth layer CSA in every two fourth layers CSA into the sixth layer CSA; the sixth layer is totally two CSAs, shift 1 bit to the left of carry end of the second CSA, the sum end of the second CSA and the sum end of the first CSA input the seventh layer CSA (totally 1); the CSA carry end of the seventh layer shifts 1 bit leftwards, the carry end of the first CSA of the sixth layer shifts 1 bit leftwards, and the eighth layer CSA is input; the CSA carry end of the eighth layer shifts 1 bit leftwards and the data end is input into CPA, and the result is input into the modulo addition module. The modulo addition module realizes the 193-bit width data input, the addition operation of the low 192-bit data and the 193-bit data, and the output result is congruent with the prime number p of the input data.

The 11-operand modulo addition module is shown in FIG. 12, where CSA represents a Carry-save adder, CPA represents a Carry-ripple adder, and "ROL 1-bit" represents a cyclic shift of the Carry-side (Carry-side) of the Carry-save adder by 1 bit to the left. 1,2,3 in 11 operands; 5. 6, 7; 9. 10, 11 respectively inputs three first-layer CSAs, wherein the sum end of a first CSA in the first-layer CSAs, an operand 4 and the carry end of a second CSA in the first-layer CSAs are circularly shifted to the left by 1 bit and input into the first CSA in the second-layer, the operand 8, the carry end of a third CSA in the first-layer CSAs are circularly shifted to the left by 1 bit and input into the second CSA in the second-layer, the carry end of the first CSA in the first-layer CSAs is circularly shifted to the left by 1 bit, the carry end of the first CSA in the second-layer is circularly shifted to the left by 1 bit and input into the first CSA in the third-layer, and the sum end of the second CSA in the first-layer and the carry end of the second CSA in the second-layer are circularly shifted to the left by 1 bit and input into the second CSA in the third-layer; the sum end of the first CSA in the third layer of CSA and the carry end of the second CSA in the third layer of CSA circularly shift 1 bit leftwards and the sum end of the second CSA are input into the fourth layer of CSA; the carry end of the first CSA in the third layer CSA circularly shifts 1 bit leftwards, the carry end of the fourth layer CSA circularly shifts 1 bit leftwards, and the sum end of the fourth layer CSA circularly shifts the carry end of the fourth layer CSA to the left, and the fifth layer CSA is input; the CSA carry end of the fifth layer circularly shifts 1 bit leftwards and inputs CPA to the data end, and the result is input to the modulo addition module. The modulo addition module realizes the 193-bit width data input, the addition operation of the low 192-bit data and the 193-bit data, and the output result is congruent with the prime number p of the input data.

The 16-operand modulo addition module is shown in FIG. 13, where CSA represents a Carry-save adder, CPA represents a Carry-ripple adder, and "< <1" represents shifting the Carry-side (Carry-side) of the Carry-save adder 1 bit to the left. The operands in the positions of 4i, i=1, 2,3 and 4 are reserved in 16 operands, and the rest operands are input into the first layer CSA every three; shifting the carry end of the first layer CSA by 1 bit to the left, shifting the carry end of the first layer CSA by 1 bit to the carry end of the first layer CSA, shifting the carry end of the first layer CSA by 4i, i=1, 2,3,4, and inputting operands in the positions of the carry end and the 4i, i=1, 2,3,4 into the second layer CSA; shifting the sum end of each two second-layer CSAs to the left by 1 bit and inputting the bit into the third-layer CSA; the carry end of the third layer CSA shifts 1 bit leftwards, the sum end of the third layer CSA shifts 1 bit leftwards, and the carry end of the other second layer CSA in every two second layers CSA is input into the fourth layer CSA; the fourth layer of CSA is totally two CSAs, the carry end of the second CSA is shifted to the left by 1 bit, the sum end of the second CSA and the sum end of the first CSA are input into the fifth layer of CSA (totally 1); the CSA carry end of the fifth layer shifts 1 bit leftwards, and the carry end of the first CSA of the fourth layer shifts 1 bit leftwards and inputs the CSA of the sixth layer; the CSA carry end of the sixth layer shifts left by 1 and the data end inputs CPA, and the result is input to the modulo addition module. The modulo addition module realizes the 193-bit width data input, the addition operation of the low 192-bit data and the 193-bit data, and the output result is congruent with the prime number p of the input data.

The 12-operand modulo addition module is shown in FIG. 14, where CSA represents a Carry-save adder, CPA represents a Carry-ripple adder, and "ROL 1-bit" represents a cyclic shift of the Carry-side (Carry-side) of the Carry-save adder by 1 bit to the left. Inputting the first layer CSA into every third of the 12 operands, circularly shifting the sum end of every two first layer CSAs and the carry end of one second layer CSA to the left by 1 bit, and inputting the two first layer CSAs into the second layer CSA; the carry end of the second layer CSA circularly shifts 1 bit leftwards, the sum end of the second layer CSA circularly shifts 1 leftwards and inputs the carry end of the other first layer CSA in every two first layers CSA into the third layer CSA; the third layer of CSA is totally two CSAs, the carry end of the second CSA is circularly shifted to the left by 1 bit, the sum end of the second CSA and the sum end of the first CSA are input into the fourth layer of CSA (totally 1); the CSA carry end of the fourth layer circularly shifts 1 bit leftwards, and the carry end of the first CSA of the third layer circularly shifts 1 bit leftwards and inputs the fifth layer CSA; the CSA carry end of the fifth layer circularly shifts 1 bit leftwards and inputs CPA to the data end, and the result is input to the modulo addition module. The modulo addition module realizes the 193-bit width data input, the addition operation of the low 192-bit data and the 193-bit data, and the output result is congruent with the prime number p of the input data.

The modulo-p module performs modulo-p on the input data.

Claims

1. A base 32 arithmetic circuit for use in a number theory conversion multiplication, wherein there are 32 operand generation modules, the number of the 32 operand generation modules is Xk, k=0, 1,2,..31, each of the operand generation modules includes a dividing circuit that divides each of 32 input data into 11 words with 6 bits as one word after performing high-order zero padding by the dividing circuit, and the divided input data is x, a combining circuit, and a zero padding circuit _n，m N is more than or equal to 0 and less than or equal to 32, m is more than or equal to 0 and less than or equal to 11, the merging circuit forms operand output by the input data divided into 32 multiplied by 11 words, 1 output in the merging circuit of 32 operand generating modules is 32 96-bit operands, 16 outputs are 11 192-bit operands, 3 outputs are 16 192-bit operands and 12 outputs are 12 192-bit operands, and a zero filling circuit fills a gap when the merging circuit outputs the operands into 0;

the method comprises the steps of,

2. The base 32 arithmetic circuit for use in a number theory transform multiplication according to claim 1, wherein the operand generation block number of the output is X0 for 32 96-bit operands, the last 11 words of each 96-bit operand are input data, and the first 5 words are allocated to zero.

3. The base 32 operation circuit for number theory conversion multiplication according to claim 1, wherein said operand generation block number of 11 192-bit operands is Xk, k is an odd number, and each operand OP _m From 32 different input data x _n，m N is more than or equal to 0 and less than 32, the same word index m is used, m is more than or equal to 0 and less than 11, and x is formed by combining _n，m Is at the lowest position of OP _m Is calculated from 6× (m+nk) (mod 192).

4. The base 32 operation circuit for the multiplication of the number theory transform according to claim 1, wherein the number of the operand generation modules outputting the 16 192-bit operands is X8, X16 and X24, the 16 operands are divided into 8 groups of 2 operands each, OP0 and OP1 are one group, OP2 and OP3 are one group, and so on, the operands OP within each group _2j And OP (optical path) _2j+1 From 44 different input data x _n，m N is more than or equal to 4j and less than or equal to 4j+3, m is more than or equal to 0 and less than 11, and x is formed by combining _n，m Is at the lowest position of OP _2j And OP (optical path) _2j+1 The position of (2) is calculated from 6× (m+nk) (mod 192), x _n，m Preferential placement on OP _2j In, e.g. OP _2j Is already occupied, then is placed in OP _2j+1 Corresponding to the position of the object.

5. The base 32 operation circuit for number theory conversion multiplication according to claim 1, wherein the operand generation block number of 12 192-bit operands is Xk excluding X0, X8, X16 and X24, k is an even number, 12 operands are divided into 2 groups, OP0 to OP5 are one group, OP6 to OP11 are one group, and operands OP in each group _6j To OP _6j+5 From 176Different input data x _n，m And (2) the components are formed by combining 16j is not less than n and not more than 16j+15,0 is not less than m and not more than 11, and x is not less than 0 and not more than 11 _n，m Is at the lowest position of OP _6j To OP _6j+5 The position of (2) is calculated from 6× (m+nk) (mod 192), x _n，m Merging operands with 2 words as period, and placing the operands in OP preferentially _6j To OP _6j+5 In OP with smaller middle index number.