KR102498133B1

KR102498133B1 - Apparatus for processing modular multiply operation and methods thereof

Info

Publication number: KR102498133B1
Application number: KR1020210032033A
Authority: KR
Inventors: 천정희
Original assignee: 주식회사 크립토랩
Priority date: 2020-03-12
Filing date: 2021-03-11
Publication date: 2023-02-09
Also published as: KR20210116299A

Abstract

연산 장치가 개시된다. 본 연산 장치는 적어도 하나의 인스트럭션(instruction)을 저장하는 메모리, 및 상기 적어도 하나의 인스트럭션을 실행하는 프로세서;를 포함하고, 상기 프로세서는, 상기 적어도 하나의 인스트럭션을 실행함으로써, 기결정된 기저 소수 정보를 저장하고, 상기 기저장된 기저 소수 정보를 비트 반전 처리하여 상기 기저 소수 정보와 다른 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행한다. A computing device is disclosed. The computing device includes a memory for storing at least one instruction, and a processor for executing the at least one instruction, wherein the processor executes the at least one instruction to obtain predetermined base prime number information. and performs a bit inversion process on the pre-stored base prime number information to generate first prime number information different from the base prime number information, and performs a module operation on the plurality of cipher texts using the generated first prime number information.

Description

모듈러 곱셈 연산을 수행하는 연산 장치 및 방법{APPARATUS FOR PROCESSING MODULAR MULTIPLY OPERATION AND METHODS THEREOF}Calculation device and method for performing modular multiplication operation

본 개시는 모듈러 곱셈 연산을 수행하는 연산 장치 및 방법에 관한 것으로, 보다 상세하게는 기저장된 기저 소수 정보를 이용하여 사이클마다 각 모듈러스에 필요한 소수 정보(또는 제곱근 정보)를 생성하여 모듈러 연산을 수행하는 연산 장치 및 방법에 관한 것이다. The present disclosure relates to an arithmetic device and method for performing a modular multiplication operation, and more particularly, to perform a modular operation by generating decimal information (or square root information) required for each modulus for each cycle using pre-stored basis prime number information. It relates to an arithmetic device and method.

기계 학습은 음성 인식, 이미지 분류, 정밀 의학 등 다양한 응용분야에 대한 뛰어난 솔루션으로 많은 주목을 받고 있다. 전통적인 머신러닝 서비스는 의미 있는 결과를 얻기 위해서는 훈련 및 추론 모두에 있어서 대량의 데이터 세트가 요구되었다. 따라서, 개인 정보 보호는 클라우드 기반 데이터 분석 서비스를 제공할 때 주요 관심 분야이다. Machine learning is receiving a lot of attention as an excellent solution for various applications such as speech recognition, image classification, and precision medicine. Traditional machine learning services require large data sets for both training and inference to obtain meaningful results. Therefore, privacy is a major area of concern when providing cloud-based data analysis services.

암호화된 데이터 간의 계산을 허용하는 암호화 체계인 동형 암호(HE, Homomorphic encryption)는 암호화된 상태에서 연산을 허용하기 때문에, 상술한 개인 정보 보호에 이상적인 솔류션이다. Homomorphic encryption (HE), an encryption scheme that allows calculation between encrypted data, is an ideal solution for protecting personal information as described above because it allows calculation in an encrypted state.

동형 암호는 제한된 횟수만큼만 연산할 수 있는 동형 암호 체계(SHE)와 무제한의 계산이 가능한 완전 동형 체계(FHE)가 존재한다. 완전 동형 체계는 암호화된 데이터 내의 에러를 초기화하는 방법인 부트 스트래핑(bootstrapping)을 이용함으로써 무제한의 계산을 모듈러 곱셈 연산할 수 있다. Homomorphic encryption includes a homomorphic encryption scheme (SHE) that can be operated only a limited number of times, and a fully isomorphic system (FHE) that can perform an unlimited number of calculations. The fully isomorphic system can perform modular multiplication operations without limit by using bootstrapping, which is a method of initializing errors in encrypted data.

그러나 이러한 부트 스트래핑은 큰 동형 계산이 요구되며, 큰 다항식 차수(N)와 같은 큰 파라미터를 요구하기 때문에, 전체 처리 속도의 저하가 발생하는 문제점이 있었다. 따라서, 동형 암호에 대한 부트 스트래핑의 시간 및 속도를 향상할 수 있는 방법이 요구되었다. However, since such bootstrapping requires a large isomorphic calculation and a large parameter such as a large polynomial order (N), there is a problem in that the overall processing speed is lowered. Therefore, a method capable of improving the time and speed of bootstrapping for homomorphic encryption has been required.

따라서, 본 개시는 상술한 바와 같은 문제점을 해결하기 위하여 고안된 것으로, 기저장된 기저 소수 정보를 이용하여 사이클마다 각 모듈러스에 필요한 소수 정보(또는 제곱근 정보)를 생성하여 모듈러 연산을 수행하는 연산 장치 및 방법을 제공하는 데 있다. Accordingly, the present disclosure has been conceived to solve the above-mentioned problems, and an arithmetic device and method for performing modular calculation by generating prime number information (or square root information) required for each modulus for each cycle using pre-stored basis prime number information is to provide

본 개시는 이상과 같은 목적을 달성하기 위한 것으로, 본 연산 장치는 적어도 하나의 인스트럭션(instruction)을 저장하는 메모리, 및 상기 적어도 하나의 인스트럭션을 실행하는 프로세서를 포함하고, 상기 프로세서는, 상기 적어도 하나의 인스트럭션을 실행함으로써, 기결정된 기저 소수 정보를 저장하고, 상기 기저장된 기저 소수 정보를 비트 반전 처리하여 상기 기저 소수 정보와 다른 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행한다. The present disclosure is to achieve the above object, the present operating device includes a memory for storing at least one instruction, and a processor for executing the at least one instruction, wherein the processor includes the at least one instruction. By executing the instructions of, pre-determined base prime number information is stored, bit inversion processing is performed on the previously stored base prime number information to generate first prime number information different from the base prime number information, and the generated first prime number information is used. A module operation is performed on the plurality of cipher texts.

여기서, 상기 기저 소수 정보 및 상기 제1 소수 정보는 서로 다른 지수로 구성된 3개, 4개 또는 5개의 2의 지수승들의 감가산 값일 수 있다. Here, the base prime number information and the first prime number information may be subtracted values of three, four, or five powers of 2 composed of different exponents.

한편, 상기 프로세서는 상기 기저 소수 정보를 저장하는 내부 메모리, 서로 다른 기설정된 동형 연산을 수행하는 복수의 연산기를 포함하는 BU를 복수개 포함하는 GBU, 및 상기 내부 메모리로부터 기저 소수 정보를 읽어 오고, 상기 기저 소수 정보를 비트 반전하여 상기 복수의 BU 각각에 필요한 소수 정보를 생성하여 복수개의 BU 각각에 제공하는 소수 생성기;를 포함할 수 있다. On the other hand, the processor reads the base prime number information from an internal memory storing the base prime number information, a GBU including a plurality of BUs including a plurality of BUs including a plurality of operators performing different predetermined isomorphic operations, and the base prime number information from the internal memory, and a prime number generator for bit-inverting base decimal information to generate decimal information necessary for each of the plurality of BUs and providing the information to each of the plurality of BUs.

이 경우, 상기 소수 생성기는 상기 기저 소수 정보의 k번째 비트를 log h번째 비트 정수로 비트값 전환하여 소수 정보를 생성할 수 있다. In this case, the decimal generator may generate decimal information by converting a bit value of the k-th bit of the base prime number information into a log h-th bit integer.

한편, 상기 소수 생성기는 상기 기저 소수 정보를 이용하여 제1 사이클에 필요한 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보와 상기 기저 소수 정보를 이용하여 제2 사이클에 필요한 제2 소수 정보를 생성할 수 있다. Meanwhile, the prime number generator generates first prime number information necessary for a first cycle using the base prime number information, and second prime number information necessary for a second cycle using the generated first prime number information and the base prime number information. can create

한편, 상기 프로세서는 상기 GBU를 복수개 포함하며, 상기 복수개의 GBU는 직렬 배치되며, 상기 프로세서는, 상기 하나의 GBU의 출력 값을 저장하고, 저장 순서와 다른 순서로 저장된 출력값을 다른 GBU에 제공하는 리오더링 버퍼(RB)를 더 포함할 수 있다. On the other hand, the processor includes a plurality of GBUs, the plurality of GBUs are arranged in series, the processor stores the output value of the one GBU, and provides the stored output value to another GBU in an order different from the storage order A reordering buffer (RB) may be further included.

한편, 상기 GBU는 복수의 스테이지를 구성하며, 상기 복수의 스테이지 각각은 복수의 BU가 병렬 배치될 수 있다. Meanwhile, the GBU constitutes a plurality of stages, and a plurality of BUs may be arranged in parallel in each of the plurality of stages.

이 경우, 상기 하나의 GBU 내의 복수의 BU 중 적어도 두개는 동일한 소수 정보를 이용하여 동형 연산을 수행할 수 있다. In this case, at least two of the plurality of BUs in the one GBU may perform an isomorphic operation using the same prime number information.

한편, 상기 BU 각각은 두개의 동형 암호문을 입력받아 그 차이값을 출력하는 모듈러스 감산기, 두개의 동형 암호문을 입력받아 그 합산 값을 출력하는 모듈러스 가산기, 및 상기 모듈러스 감산기의 출력 값과 소수 정보를 이용하여 모듈러 곱셈 연산을 수행하는 모듈러스 곱셈기를 포함할 수 있다. Meanwhile, each of the BUs uses a modulus subtractor that receives two isomorphic ciphertexts and outputs a difference value, a modulus adder that receives two isomorphic ciphertexts and outputs a sum of the modulus values, and uses the output value of the modulus subtractor and decimal information. and a modulus multiplier that performs a modular multiplication operation.

이 경우, 상기 모듈러스 곱셈기는 상기 소수 정보를 구성하는 복수의 2의 지수승 각각의 지수에 기초하여 개별적인 시프트 연산을 수행하고, 시프트 연산 결과들을 가산 연산 또는 감산 연산하여 모듈러 곱셈 연산을 수행할 수 있다. In this case, the modulus multiplier performs an individual shift operation based on each exponent of a plurality of powers of 2 constituting the prime number information, and performs a modular multiplication operation by adding or subtracting shift operation results. .

한편, 상기 프로세서는 FPGA(Field Programmable Gate Array)일 수 있다. Meanwhile, the processor may be a Field Programmable Gate Array (FPGA).

한편, 본 개시의 일 실시 예에 따른 암호문 연산 방법은 복수의 암호문에 대한 모듈 연산 명령을 입력받는 단계, 2의 지수승들의 조합으로 표현되는 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행하는 단계, 및 상기 연산 결과를 출력하는 단계를 포함하고, 상기 모듈 연산을 수행하는 단계는, 기저 소수 정보를 저장하고, 상기 기저 소수 정보를 비트 반전 처리하여 상기 기저 소수 정보와 다른 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행할 수 있다. Meanwhile, a ciphertext calculation method according to an embodiment of the present disclosure includes the steps of receiving a module calculation command for a plurality of ciphertexts, and performing a module operation on the plurality of ciphertexts using decimal information represented by a combination of powers of 2. and outputting a result of the operation, wherein the performing of the module operation includes storing base-prime number information and bit-inverting the base-prime number information to perform a bit inversion process on the first prime number different from the base-prime number information. Information may be generated, and a module operation may be performed on the plurality of cipher texts using the generated first prime number information.

이 경우, 상기 기저 소수 정보 및 상기 제1 소수 정보는 서로 다른 지수로 구성된 3개, 4개 또는 5개의 2의 지수승들의 감가산 값일 수 있다. In this case, the base prime number information and the first prime number information may be subtracted values of three, four, or five powers of 2 composed of different exponents.

한편, 상기 모듈 연산을 수행하는 단계는 상기 기저 소수 정보의 k번째 비트를 log h번째 비트 정수로 비트값 전환하여 제1 소수 정보를 생성할 수 있다. Meanwhile, the performing of the module operation may generate first prime number information by converting a bit value of a k-th bit of the base prime number information into a log h-th bit integer.

한편, 상기 모듈 연산을 수행하는 단계는 상기 기저 소수 정보를 이용하여 제1 사이클에 필요한 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보와 상기 기저 소수 정보를 이용하여 제2 사이클에 필요한 제2 소수 정보를 생성할 수 있다. Meanwhile, the performing of the module operation may include generating first prime number information necessary for a first cycle using the base prime number information and necessary for a second cycle using the generated first prime number information and the base prime number information. Second decimal information may be generated.

이상과 같은 본 개시의 다양한 실시 예들에 따르면, 본 개시의 암호문 연산 방법은 2의 지수승들의 조합으로 표현되는 소수 정보를 이용하여 모듈러스 연산을 수행하는바 빠른 연산을 수행할 수 있다. 또한, 연산에 필요한 모든 소수 정보를 저장하여 이용하지 않고, 기저 소수 정보만 저장하고 사이클마다 모듈러스 연산에 필요한 소수 정보(또는 제곱근 정보)를 생성하여 이용하는바, 적은 내부 메모리를 갖는 하드웨어에서 모듈러스 연산을 고속으로 수행할 수 있다. According to various embodiments of the present disclosure as described above, the ciphertext operation method of the present disclosure performs a modulus operation using decimal information represented by a combination of powers of 2, and thus can perform fast operation. In addition, instead of storing and using all the prime number information necessary for the operation, only the base number information is stored and the prime number information (or square root information) required for the modulus calculation is generated and used every cycle. can be performed at high speed.

도 1은 본 개시의 일 실시 예에 따른 네트워크 시스템의 구조를 설명하기 위한 도면,
도 2는 본 개시의 일 실시 예에 따른 연산 장치의 구성을 나타낸 블럭도,
도 3은 본 개시의 일 실시 예에 따른 암호문 연산 방법을 설명하기 위한 흐름도,
도 4는 본 개시의 제1 실시 예에 따른 INTT 알고리즘을 설명하기 위한 도면,
도 5는 본 개시의 일 실시 예에 따른 제1 소수 세트의 예를 도시한 도면,
도 6은 본 개시의 일 실시 예에 따른 제2 소수 세트의 예를 도시한 도면,
도 7은 본 개시의 제2 실시 예에 따른 INTT 알고리즘을 설명하기 위한 도면,
도 8은 본 개시의 제1 실시 예에 따른 BU의 구성을 도시한 도면,
도 9는 도 8의 BU의 동작 타이밍을 설명하기 위한 도면,
도 10은 도 7의 알고리즘으로 BU를 동작하는 경우의 동작 타이밍을 설명하기 위한 도면,
도 11은 복수의 BU를 병렬화한 경우의 동작 타이밍을 설명하기 위한 도면,
도 12는 본 개시의 일 실시 예에 따른 GBU의 구성을 도시한 도면,
도 13은 표 1의 SET B로 INTT를 설계한 경우의 동작 타이밍을 설명하기 위한 도면,
도 14는 본 개시의 일 실시 예에 따른 RB의 구성을 도시한 도면,
도 15는 본 개시의 일 실시 예에 따른 소수 생성기의 구성을 도시한 도면,
도 16은 본 개시의 일 실시 예에 따른 내부 메모리에 저장되는 데이터 예를 설명하기 위한 도면, 그리고,
도 17은 본 개시의 일 실시 예에 따른 프로세서 구조를 설명하기 위한 도면이다. 1 is a diagram for explaining the structure of a network system according to an embodiment of the present disclosure;
2 is a block diagram showing the configuration of an arithmetic device according to an embodiment of the present disclosure;
3 is a flowchart for explaining a ciphertext calculation method according to an embodiment of the present disclosure;
4 is a diagram for explaining the INTT algorithm according to the first embodiment of the present disclosure;
5 is a diagram illustrating an example of a first set of prime numbers according to an embodiment of the present disclosure;
6 is a diagram illustrating an example of a second set of prime numbers according to an embodiment of the present disclosure;
7 is a diagram for explaining an INTT algorithm according to a second embodiment of the present disclosure;
8 is a diagram showing the configuration of a BU according to the first embodiment of the present disclosure;
9 is a diagram for explaining the operation timing of the BU of FIG. 8;
10 is a diagram for explaining operation timing when operating a BU with the algorithm of FIG. 7;
11 is a diagram for explaining operation timing when a plurality of BUs are parallelized;
12 is a diagram showing the configuration of a GBU according to an embodiment of the present disclosure;
13 is a diagram for explaining operation timing when INTT is designed with SET B of Table 1;
14 is a diagram showing the configuration of an RB according to an embodiment of the present disclosure;
15 is a diagram showing the configuration of a prime number generator according to an embodiment of the present disclosure;
16 is a diagram for explaining an example of data stored in an internal memory according to an embodiment of the present disclosure, and
17 is a diagram for explaining the structure of a processor according to an embodiment of the present disclosure.

이하에서는 첨부 도면을 참조하여 본 개시에 대해서 자세하게 설명한다. 본 개시에서 수행되는 정보(데이터) 전송 과정은 필요에 따라서 암호화/복호화가 적용될 수 있으며, 본 개시 및 특허청구범위에서 정보(데이터) 전송 과정을 설명하는 표현은 별도로 언급되지 않더라도 모두 암호화/복호화하는 경우도 포함하는 것으로 해석되어야 한다. 본 개시에서 "A로부터 B로 전송(전달)" 또는 "A가 B로부터 수신"과 같은 형태의 표현은 중간에 다른 매개체가 포함되어 전송(전달) 또는 수신되는 것도 포함하며, 반드시 A로부터 B까지 직접 전송(전달) 또는 수신되는 것만을 표현하는 것은 아니다. Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings. Encryption/decryption may be applied to the information (data) transmission process performed in the present disclosure, if necessary, and expressions describing the information (data) transmission process in the present disclosure and claims are all encryption/decryption, even if not separately mentioned. It should be interpreted as including the case. In the present disclosure, expressions such as “transmission (delivery) from A to B” or “A receiving from B” include transmission (transmission) or reception with another medium included in the middle, and must be transmitted from A to B. It does not represent only what is directly transmitted (delivered) or received.

본 개시의 설명에 있어서 각 단계의 순서는 선행 단계가 논리적 및 시간적으로 반드시 후행 단계에 앞서서 수행되어야 하는 경우가 아니라면 각 단계의 순서는 비제한적으로 이해되어야 한다. 즉, 위와 같은 예외적인 경우를 제외하고는 후행 단계로 설명된 과정이 선행단계로 설명된 과정보다 앞서서 수행되더라도 개시의 본질에는 영향이 없으며 권리범위 역시 단계의 순서에 관계없이 정의되어야 한다. 그리고 본 명세서에서 "A 또는 B"라고 기재한 것은 A와 B 중 어느 하나를 선택적으로 가리키는 것뿐만 아니라 A와 B 모두를 포함하는 것도 의미하는 것으로 정의된다. 또한, 본 개시에서 "포함"이라는 용어는 포함하는 것으로 나열된 요소 이외에 추가로 다른 구성요소를 더 포함하는 것도 포괄하는 의미를 가진다.In the description of the present disclosure, the order of each step should be understood as non-limiting, unless the preceding step must logically and temporally necessarily precede the succeeding step. In other words, except for the above exceptional cases, even if the process described as the later step is performed before the process described as the preceding step, the nature of the disclosure is not affected, and the scope of rights must also be defined regardless of the order of the steps. And, in this specification, "A or B" is defined to mean not only selectively indicating either one of A and B, but also including both A and B. In addition, in the present disclosure, the term "include" has a meaning encompassing further including other components in addition to the elements listed as included.

본 개시에서는 본 개시의 설명에 필요한 필수적인 구성요소만을 설명하며, 본 개시의 본질과 관계가 없는 구성요소는 언급하지 아니한다. 그리고 언급되는 구성요소만을 포함하는 배타적인 의미로 해석되어서는 안 되며 다른 구성요소도 포함할 수 있는 비배타적인 의미로 해석되어야 한다.In this disclosure, only essential components necessary for the description of the present disclosure are described, and components unrelated to the essence of the present disclosure are not mentioned. In addition, it should not be interpreted as an exclusive meaning that includes only the mentioned components, but should be interpreted as a non-exclusive meaning that may include other components.

그리고 본 개시에서 "값"이라 함은 스칼라값뿐만 아니라 벡터, 다항식 형태도 포함하는 개념으로 정의된다. In the present disclosure, “value” is defined as a concept including not only scalar values but also vectors and polynomial forms.

후술하는 본 개시의 각 단계의 수학적 연산 및 산출은 해당 연산 또는 산출을 하기 위해 공지되어 있는 코딩 방법 및/또는 본 개시에 적합하게 고안된 코딩에 의해서 컴퓨터 연산으로 구현될 수 있다.Mathematical operations and calculations of each step of the present disclosure described below may be implemented as computer operations by a known coding method and/or coding designed appropriately for the present disclosure to perform the calculations or calculations.

이하에서 설명하는 구체적인 수학식은 가능한 여러 대안 중에서 예시적으로 설명되는 것이며, 본 개시의 권리 범위가 본 개시에 언급된 수학식에 제한되는 것으로 해석되어서는 아니된다.The specific equations described below are illustratively described among possible alternatives, and the scope of the present disclosure should not be construed as being limited to the equations mentioned in the present disclosure.

설명의 편의를 위해서, 본 개시에서는 다음과 같이 표기를 정하기로 한다.For convenience of explanation, in the present disclosure, the notation is defined as follows.

a ← D : 분포(D)에 따라서 원소(a)를 선택함a ← D: select element (a) according to distribution (D)

s₁, s₂ ∈ R : S1, S2 각각은 R 집합에 속하는 원소이다. s ₁ , s ₂ ∈ R : Each of S1 and S2 is an element belonging to the set R.

mod(q) : q 원소로 모듈(modular) 연산mod(q) : Modular operation with elements q

: 내부 값을 반올림함

: round internal value

이하에서는 첨부된 도면을 이용하여 본 개시의 다양한 실시 예들에 대하여 구체적으로 설명한다. Hereinafter, various embodiments of the present disclosure will be described in detail using the accompanying drawings.

도 1은 본 개시의 일 실시 예에 따른 네트워크 시스템의 구조를 설명하기 위한 도면이다. 1 is a diagram for explaining the structure of a network system according to an embodiment of the present disclosure.

도 1을 참조하면, 네트워크 시스템은 복수의 전자 장치(100-1 ~ 100-n), 제1 서버 장치(200), 제2 서버 장치(300)를 포함할 수 있으며, 각 구성들은 네트워크(10)를 통해 서로 연결될 수 있다. Referring to FIG. 1 , a network system may include a plurality of electronic devices 100-1 to 100-n, a first server device 200, and a second server device 300, each of which is a network 10 ) can be connected to each other.

네트워크(10)는 다양한 형태의 유무선 통신 네트워크, 방송 통신 네트워크, 광통신 네트워크, 클라우드 네트워크 등으로 구현될 수 있으며, 각 장치들은 별도의 매개체 없이 와이파이, 블루투스, NFC(Near Field Communication) 등과 같은 방식으로 연결될 수도 있다. The network 10 may be implemented in various types of wired and wireless communication networks, broadcast communication networks, optical communication networks, cloud networks, etc., and each device may be connected in a manner such as Wi-Fi, Bluetooth, NFC (Near Field Communication), etc. without a separate medium. may be

도 1에서는 전자 장치가 복수개(100-1 ~ 100-n)인 것으로 도시하였으나, 반드시 복수개의 전자 장치가 사용되어야 하는 것은 아니며 하나의 장치가 사용될 수도 있다. 일 예로, 전자 장치(100-1 ~ 100-n)는 스마트폰, 태블릿, 게임 플레이어, PC, 랩톱 PC, 홈서버, 키오스크 등과 같은 다양한 형태의 장치로 구현될 수 있으며, 이밖에 IoT 기능이 적용된 가전 제품 형태로도 구현될 수 있다.Although FIG. 1 shows a plurality of electronic devices 100-1 to 100-n, a plurality of electronic devices are not necessarily used, and one device may be used. For example, the electronic devices 100-1 to 100-n may be implemented as various types of devices such as smart phones, tablets, game players, PCs, laptop PCs, home servers, kiosks, etc. In addition, IoT functions are applied. It can also be implemented in the form of home appliances.

사용자는 자신이 사용하는 전자 장치(100-1 ~ 100-n)를 통해서 다양한 정보를 입력할 수 있다. 입력된 정보는 전자 장치(100-1 ~ 100-n) 자체에 저장될 수도 있지만, 저장 용량 및 보안 등을 이유로 외부 장치로 전송되어 저장될 수도 있다. 도 1에서 제1 서버 장치(200)는 이러한 정보들을 저장하는 역할을 수행하고, 제2 서버 장치(300)는 제1 서버 장치(200)에 저장된 정보의 일부 또는 전부를 이용하는 역할을 수행할 수 있다. Users can input various information through the electronic devices 100-1 to 100-n they use. The input information may be stored in the electronic devices 100-1 to 100-n themselves, but may also be transmitted to and stored in an external device for storage capacity and security reasons. In FIG. 1 , the first server device 200 may serve to store such information, and the second server device 300 may serve to use some or all of the information stored in the first server device 200. there is.

각 전자 장치(100-1 ~ 100-n)는 입력된 정보를 동형 암호화하여, 동형 암호문을 제1 서버 장치(200)로 전송할 수 있다. Each of the electronic devices 100-1 to 100-n may homomorphically encrypt input information and transmit the homomorphic cipher text to the first server device 200.

각 전자 장치(100-1 ~ 100-n)는 동형 암호화를 수행하는 과정에서 산출되는 암호화 노이즈, 즉, 에러를 암호문에 포함시킬 수 있다. 예를 들어, 각 전자 장치(100-1 ~ 100-n)에서 생성하는 동형 암호문은, 추후에 비밀 키를 이용하여 복호화하였을 때 메시지 및 에러 값을 포함하는 결과 값이 복원되는 형태로 생성될 수 있다. Each of the electronic devices 100-1 to 100-n may include encryption noise, that is, an error, generated in the process of performing homomorphic encryption in the ciphertext. For example, the isomorphic ciphertext generated by each of the electronic devices 100-1 to 100-n may be generated in a form in which a message and a resultant value including an error value are restored when decrypted later using a secret key. there is.

일 예로, 전자 장치(100-1 ~ 100-n)에서 생성하는 동형 암호문은 비밀 키를 이용하여 복호화 하였을 때 다음과 같은 성질을 만족하는 형태로 생성될 수 있다. For example, the homomorphic ciphertext generated by the electronic devices 100-1 to 100-n may be generated in a form satisfying the following properties when decrypted using a secret key.

[수학식 1][Equation 1]

Dec(ct, sk) = <ct, sk> = M+e(mod q)Dec(ct, sk) = <ct, sk> = M+e(mod q)

여기서 < , >는 내적 연산(usual inner product), ct는 암호문, sk는 비밀 키, M은 평문 메시지, e는 암호화 에러 값, mod q는 암호문의 모듈러스(Modulus)를 의미한다. q는 스케일링 팩터(scaling factor)(Δ)가 메시지에 곱해진 결과 값 M보다 크게 선택되어야 한다. 에러 값 e의 절대값이 M에 비해서 충분히 작다면, 암호문의 복호화 값 M+e 는 유효숫자연산에서 원래의 메시지를 동일한 정밀도로 대체할 수 있는 값이다. 복호화된 데이터 중에서 에러는 최하위 비트(LSB) 측에 배치되고, M은 차하위 비트 측에 배치될 수 있다. Here, < , > are the usual inner product, ct is the ciphertext, sk is the secret key, M is the plaintext message, e is the encryption error value, and mod q is the modulus of the ciphertext. q should be chosen larger than the resulting value M multiplied by a scaling factor (Δ) to the message. If the absolute value of the error value e is sufficiently small compared to M, the decryption value M+e of the ciphertext is a value that can replace the original message with the same precision in significant figure calculation. Among the decoded data, an error may be placed on the least significant bit (LSB) side, and M may be placed on the next least significant bit side.

메시지의 크기가 너무 작거나 너무 큰 경우, 스케일링 팩터를 이용하여 그 크기를 조절할 수도 있다. 스케일링 팩터를 사용하게 되면, 정수 형태의 메시지뿐만 아니라 실수 형태의 메시지까지도 암호화할 수 있게 되므로, 활용성이 크게 증대할 수 있다. 또한, 스케일링 팩터를 이용하여 메시지의 크기를 조절함으로써, 연산이 이루어지고 난 이후의 암호문에서 메시지들이 존재하는 영역, 즉, 유효 영역의 크기도 조절될 수 있다.If the size of the message is too small or too large, the size may be adjusted using a scaling factor. When a scaling factor is used, not only an integer type message but also a real number type message can be encrypted, and thus usability can be greatly increased. In addition, by adjusting the size of the message using the scaling factor, the size of an area where messages exist in the ciphertext after the operation is performed, that is, the size of an effective area can also be adjusted.

실시 예에 따라, 암호문 모듈러스 q는 다양한 형태로 설정되어 사용될 수 있다. 일 예로, 암호문의 모듈러스는 스케일링 팩터 Δ의 지수승 q=Δ^L 형태로 설정될 수 있다. Δ가 2라면, q=2¹⁰ 과 같은 값으로 설정될 수 있다. 또는, q는 도 8에 도시된 바와 같이 일정 조건을 만족하는 2의 지수승의 조합으로 표현될 수 있다. Depending on the embodiment, the ciphertext modulus q may be set and used in various forms. For example, the modulus of the ciphertext may be set in the form of an exponential power q= ^ΔL of the scaling factor Δ. If Δ is 2, it can be set to a value such as q=2 ¹⁰ . Alternatively, q may be expressed as a combination of powers of 2 that satisfy certain conditions, as shown in FIG. 8 .

또 다른 예로, 암호문 모듈러스는 복수의 서로 다른 스케일링 팩터들을 곱한 값으로 설정될 수도 있다. 각 팩터들은 유사 범위 이내의 값, 즉, 서로 비슷한 크기의 값으로 설정될 수 있다. 예를 들어, q=q₁ q₂ q₃…q_x로 설정될 수 있으며, q₁, q₂, q₃ ,…, q_x 각각은 스케일링 인수 Δ와 비슷한 크기이며 서로 소 관계의 값으로 설정될 수 있다. As another example, the ciphertext modulus may be set to a value obtained by multiplying a plurality of different scaling factors. Each factor may be set to a value within a similar range, that is, a value having a similar size to each other. For example, q=q ₁ q ₂ q ₃ . It can be set as q _x , q ₁ , q ₂ , q ₃ ,… , q _x are similar in size to the scaling factor Δ and may be set to values having a small relationship with each other.

스케일링 팩터를 이와 같은 방식으로 설정하게 되면, CRT(Chinese　Remainder Theorem)에 따라 전체 연산을 복수개의 모듈러스 연산으로 분리하여 진행할 수 있게 되므로, 연산 부담을 경감시킬 수 있다. When the scaling factor is set in this way, the entire operation can be separated into a plurality of modulus operations according to the Chinese Remainder Theorem (CRT), and thus the operation burden can be reduced.

또한, 서로 비슷한 크기의 팩터를 사용함에 따라, 후술하는 단계에서 라운딩 처리를 수행하였을 때, 앞선 예에서의 결과값과 거의 같은 결과를 얻을 수 있다.In addition, as factors having similar sizes are used, almost the same results as those in the previous example can be obtained when rounding is performed in a step described later.

제1 서버 장치(200)는 수신된 동형 암호문을 복호화하지 않고, 암호문 상태로 저장할 수 있다. The first server device 200 may store the received homomorphic ciphertext in a ciphertext state without decrypting it.

제2 서버 장치(300)는 동형 암호문에 대한 특정 처리 결과를 제1 서버 장치(200)로 요청할 수 있다. 제1 서버 장치(200)는 제2 서버 장치(300)의 요청에 따라 특정 연산을 수행한 후, 그 결과를 제2 서버 장치(300)로 전송할 수 있다. The second server device 300 may request a specific processing result for the homomorphic cipher text from the first server device 200 . The first server device 200 may perform a specific operation according to the request of the second server device 300 and transmit the result to the second server device 300 .

일 예로, 두 개의 전자 장치(100-1, 100-2)가 전송한 암호문 ct₁, ct₂가 제1 서버 장치(200)에 저장된 경우, 제2 서버 장치(300)는 두 전자 장치(100-1, 100-2)로부터 제공된 정보들을 합산한 값을 제1 서버 장치(200)로 요청할 수 있다. 제1 서버 장치(200)는 요청에 따라 두 암호문을 합산하는 연산을 수행한 후, 그 결과 값(ct₁ + ct₂)을 제2 서버 장치(300)로 전송할 수 있다. For example, when the ciphertexts ct ₁ and ct ₂ transmitted by the two electronic devices 100-1 and 100-2 are stored in the first server device 200, the second server device 300 transmits the two electronic devices 100 -1, 100-2) may request the first server device 200 to add up the information provided. Upon request, the first server device 200 may perform an operation of summing the two cipher texts, and then transmit the resultant value (ct ₁ + ct ₂ ) to the second server device 300 .

동형 암호문의 성질상, 제1 서버 장치(200)는 복호화를 하지 않은 상태에서 연산을 수행할 수 있고, 그 결과 값도 암호문 형태가 될 수 있다. 이때, 제1 서버 장치(200)는 연산 결과에 대한 부트 스트래핑을 수행할 수 있으며, 그 과정에서 후술하는 바와 같은 알고리즘을 적용하여 빠른 고속 부트 스트래핑을 수행할 수 있다. 본 개시에 따른 고속 부트 스트래핑 방법에 대해서는 도 4를 참조하여 후술한다. Due to the nature of homomorphic ciphertext, the first server device 200 may perform an operation without decryption, and the resulting value may also be in the form of ciphertext. At this time, the first server device 200 may perform bootstrapping on the calculation result, and in the process, it may perform fast high-speed bootstrapping by applying an algorithm as will be described later. A fast bootstrapping method according to the present disclosure will be described later with reference to FIG. 4 .

제1 서버 장치(200)는 연산 결과 암호문을 제2 서버 장치(300)로 전송할 수 있다. 제2 서버 장치(300)는 수신된 연산 결과 암호문을 복호화하여, 각 동형 암호문들에 포함된 데이터들의 연산 결과값을 획득할 수 있다. 그리고 제1 서버 장치(200)는 사용자 요청에 따라 연산을 수차례 수행할 수 있다. The first server device 200 may transmit the cipher text resulting from the operation to the second server device 300 . The second server device 300 may decrypt the received ciphertext as a result of the arithmetic operation, and obtain calculation result values of data included in each of the homomorphic ciphertexts. Also, the first server device 200 may perform the operation several times according to a user request.

한편, 도 1에서는 제1 전자 장치 및 제2 전자 장치에서 암호화를 수행하고, 제2 서버 장치가 복호화를 수행하는 경우를 도시하였으나, 이에 한정되는 것은 아니다. Meanwhile, FIG. 1 illustrates a case in which the first electronic device and the second electronic device perform encryption and the second server device performs decryption, but is not limited thereto.

도 2는 본 개시의 일 실시 예에 따른 연산 장치의 구성을 나타낸 블럭도이다. 2 is a block diagram showing the configuration of an arithmetic device according to an embodiment of the present disclosure.

예를 들어, 도 1의 시스템에서 제1 전자 장치, 제2 전자 장치 등과 같이 동형 암호화를 수행하는 장치, 제1 서버 장치 등과 같이 동형 암호문을 연산하는 장치, 제2 서버 장치 등과 같이 동형 암호문을 복호하는 장치 등을 연산 장치라고 지칭할 수 있다. 이러한 연산 장치는 PC(Personal computer), 노트북, 스마트폰, 태블릿, 서버 등 다양한 장치일 수 있다. For example, in the system of FIG. 1 , a device performing homomorphic encryption such as a first electronic device and a second electronic device, a device calculating homomorphic ciphertext such as a first server device, and a device decrypting homomorphic ciphertext such as a second server device A device or the like may be referred to as an arithmetic device. Such a computing device may be various devices such as a personal computer (PC), a laptop computer, a smart phone, a tablet, and a server.

도 2를 참조하면, 연산 장치(400)는 통신 장치(410), 메모리(420), 디스플레이(430), 조작 입력 장치(440) 및 프로세서(450)를 포함할 수 있다. Referring to FIG. 2 , an arithmetic device 400 may include a communication device 410, a memory 420, a display 430, a manipulation input device 440, and a processor 450.

통신 장치(410)는 연산 장치(400)를 외부 장치(미도시)와 연결하기 위해 형성되고, 근거리 통신망(LAN: Local Area Network) 및 인터넷망을 통해 외부 장치에 접속되는 형태뿐만 아니라, USB(Universal Serial Bus) 포트 또는 무선 통신(예를 들어, WiFi 802.11a/b/g/n, NFC, Bluetooth) 포트를 통하여 접속되는 형태도 가능하다. 이러한 통신 장치(410)는 송수신부(transceiver)로 지칭될 수도 있다. The communication device 410 is formed to connect the computing device 400 with an external device (not shown), and is connected to the external device through a local area network (LAN) and an Internet network, as well as a USB ( A form connected through a Universal Serial Bus) port or a wireless communication (eg, WiFi 802.11a/b/g/n, NFC, Bluetooth) port is also possible. Such a communication device 410 may also be referred to as a transceiver.

통신 장치(410)는 공개 키를 외부 장치로부터 수신할 수 있으며, 연산 장치(400) 자체적으로 생성한 공개 키를 외부 장치로 전송할 수 있다. The communication device 410 may receive a public key from an external device, and transmit the public key generated by the computing device 400 itself to the external device.

그리고 통신 장치(410)는 외부 장치로부터 메시지를 수신할 수 있으며, 생성한 동형 암호문을 외부 장치로 송신할 수 있다. The communication device 410 may receive a message from an external device and transmit the generated homomorphic cipher text to the external device.

또한, 통신 장치(410)는 암호문 생성에 필요한 각종 파라미터를 외부 장치로부터 수신할 수 있다. 한편, 구현시에 각종 파라미터는 후술하는 조작 입력 장치(440)를 통하여 사용자로부터 직접 입력받을 수 있다. In addition, the communication device 410 may receive various parameters required for generating ciphertext from an external device. Meanwhile, upon implementation, various parameters may be directly input from a user through a manipulation input device 440 to be described later.

또한, 통신 장치(410)는 외부 장치로부터 동형 암호문에 대한 연산을 요청받을 수 있으며, 그에 따라 계산된 결과를 외부 장치에 전송할 수 있다. 여기서 요청받은 연산은 덧셈, 뺄셈, 곱셈(예를 들어, 모듈러 곱셈 연산)과 같은 연산일 수 있다. 여기서 모듈러 곱셈 연산이란 q 원소로 모듈(modular) 연산하는 것을 의미한다. 그리고 q 원소는 도 5 또는 도 6에 도시된 바와 같은 2의 지수승들의 조합으로 표현되는 값이 이용될 수 있다. Also, the communication device 410 may receive a request for an operation on the homomorphic ciphertext from an external device, and may transmit the calculated result to the external device. The operation requested here may be an operation such as addition, subtraction, or multiplication (eg, modular multiplication operation). Here, the modular multiplication operation means a modular operation with q elements. And as the q element, a value expressed by a combination of powers of 2 as shown in FIG. 5 or 6 may be used.

메모리(420)에는 연산 장치(400)에 관한 적어도 하나의 인스트럭션(instruction)이 저장될 수 있다. 예를 들어, 메모리(420)에는 본 개시의 다양한 실시 예에 따라 연산 장치(400)가 동작하기 위한 각종 프로그램(또는 소프트웨어)이 저장될 수 있다. At least one instruction related to the computing device 400 may be stored in the memory 420 . For example, various programs (or software) for operating the computing device 400 according to various embodiments of the present disclosure may be stored in the memory 420 .

이러한 메모리(420)는 RAM 이나 ROM, Buffer, 캐쉬(Cache), 플래시 메모리, HDD, 외장 메모리, 메모리 카드 등과 같은 다양한 형태로 구현될 수 있으며, 어느 하나로 한정되는 것은 아니다. The memory 420 may be implemented in various forms such as RAM, ROM, buffer, cache, flash memory, HDD, external memory, memory card, etc., but is not limited to any one.

메모리(420)는 암호화할 메시지를 저장할 수 있다. 여기서 메시지는 사용자가 각종 인용한 각종 신용 정보, 개인 정보 등일 수 있으며, 연산 장치(400)에서 사용되는 위치 정보, 인터넷 사용 시간 정보 등 사용 이력 등과 관련된 정보일 수도 있다. The memory 420 may store a message to be encrypted. Here, the message may be various types of credit information, personal information, etc. cited by the user, and may also be information related to use history, such as location information used in the computing device 400 and Internet usage time information.

그리고 메모리(420)는 공개 키를 저장할 수 있으며, 연산 장치(400)가 직접 공개 키를 생성한 경우, 비밀 키뿐만 아니라, 공개 키 및 비밀 키 생성에 필요한 각종 파라미터를 저장할 수 있다. In addition, the memory 420 may store the public key and, when the calculation device 400 directly generates the public key, may store not only the private key but also various parameters necessary for generating the public key and the private key.

그리고 메모리(420)는 복수의 소수 정보를 저장할 수 있다. 여기서 복수의 소수 정보 각각은 2의 지수승들의 조합으로 표현될 수 있다. 구체적으로, 메모리(420)에 저장되는 소수 정보는 후술하는 바와 같은 다른 소수 정보를 생성하는데 이용될 수 있는 기저 소수 정보일 수 있다. 또한, 메모리(420)는 소수 정보와 함께 해당 소수 정보에 대응되는 역수 정보도 저장할 수 있다. Also, the memory 420 may store a plurality of decimal information. Here, each of the plurality of prime numbers may be expressed as a combination of powers of 2. Specifically, the prime number information stored in the memory 420 may be basic prime number information that may be used to generate other prime number information as described below. In addition, the memory 420 may store reciprocal information corresponding to the corresponding decimal information together with decimal information.

그리고 메모리(420)는 후술한 과정에서 생성된 동형 암호문을 저장할 수 있다. 그리고 메모리(420)는 외부 장치에서 전송한 동형 암호문을 저장할 수도 있다. 또한, 메모리(420)는 후술하는 연산 과정에서의 결과물인 연산 결과 암호문을 저장할 수도 있다. Also, the memory 420 may store the homomorphic ciphertext generated in the process described below. Also, the memory 420 may store homomorphic cipher text transmitted from an external device. In addition, the memory 420 may store an operation result cipher text that is a result of an operation process to be described later.

디스플레이(430)는 연산 장치(400)가 지원하는 기능을 선택받기 위한 사용자 인터페이스 창을 표시한다. 예를 들어, 디스플레이(430)는 연산 장치(400)가 제공하는 각종 기능을 선택받기 위한 사용자 인터페이스 창을 표시할 수 있다. 이러한 디스플레이(430)는 LCD(liquid crystal display), OLED(Organic Light Emitting Diodes) 등과 같은 모니터일 수 있으며, 후술할 조작 입력 장치(440)의 기능을 동시에 수행할 수 있는 터치 스크린으로 구현될 수도 있다. The display 430 displays a user interface window for selecting a function supported by the computing device 400 . For example, the display 430 may display a user interface window for selecting various functions provided by the computing device 400 . The display 430 may be a monitor such as a liquid crystal display (LCD), organic light emitting diodes (OLED), or the like, and may be implemented as a touch screen capable of simultaneously performing the functions of the manipulation input device 440 to be described later. .

디스플레이(430)는 비밀 키 및 공개 키 생성에 필요한 파라미터의 입력을 요청하는 메시지를 표시할 수 있다. 그리고 디스플레이(430)는 암호화 대상이 메시지를 선택하는 메시지를 표시할 수 있다. 한편, 구현시에 암호화 대상은 사용자가 직접 선택할 수도 있고, 자동으로 선택될 수 있다. 즉, 암호화가 필요한 개인 정보 등은 사용자가 직접 메시지를 선택하지 않더라도 자동으로 설정될 수 있다. The display 430 may display a message requesting input of parameters necessary for generating a private key and a public key. Also, the display 430 may display a message in which an encryption target selects a message. Meanwhile, in implementation, the encryption target may be directly selected by the user or may be automatically selected. That is, personal information requiring encryption may be automatically set even if the user does not directly select a message.

조작 입력 장치(440)는 사용자로부터 연산 장치(400)의 기능 선택 및 해당 기능에 대한 제어 명령을 입력받을 수 있다. 예를 들어, 조작 입력 장치(440)는 사용자로부터 비밀 키 및 공개 키 생성에 필요한 파라미터를 입력받을 수 있다. 또한, 조작 입력 장치(440)는 사용자로부터 암호화될 메시지를 설정받을 수 있다. The manipulation input device 440 may receive a function selection of the arithmetic device 400 and a control command for the corresponding function from a user. For example, the manipulation input device 440 may receive parameters necessary for generating a private key and a public key from a user. Also, the manipulation input device 440 may receive a message to be encrypted from the user.

프로세서(450)는 연산 장치(400)의 전반적인 동작을 제어한다. 예를 들어, 프로세서(450)는 메모리(420)에 저장된 적어도 하나의 인스트럭션을 실행함으로써 연산 장치(400)의 동작을 전반적으로 제어할 수 있다. 이러한 프로세서(450)는 CPU(central processing unit), ASIC(application-specific integrated circuit)과 같은 단일 장치로 구성될 수 있으며, CPU, GPU(Graphics Processing Unit) 등의 복수의 구성으로 구성될 수도 있다. The processor 450 controls the overall operation of the arithmetic device 400 . For example, the processor 450 may overall control the operation of the computing device 400 by executing at least one instruction stored in the memory 420 . The processor 450 may be composed of a single device such as a central processing unit (CPU) and an application-specific integrated circuit (ASIC), or may be composed of a plurality of devices such as a CPU and a graphics processing unit (GPU).

프로세서(450)는 전송하고자 하는 메시지가 입력되면 메모리(420)에 저장할 수 있다. 그리고 프로세서(450)는 메모리(420)에 저장된 각종 설정 값 및 프로그램을 이용하여, 메시지를 동형 암호화할 수 있다. 이 경우, 공개 키가 사용될 수 있다. When a message to be transmitted is input, the processor 450 may store it in the memory 420 . Further, the processor 450 may homomorphically encrypt the message using various setting values and programs stored in the memory 420 . In this case, a public key may be used.

프로세서(450)는 암호화를 수행하는데 필요한 공개 키를 자체적으로 생성하여 사용할 수도 있고, 외부 장치로부터 수신하여 사용할 수도 있다. 일 예로, 복호화를 수행하는 제2 서버 장치(300)가 공개 키를 다른 장치들에게 배포할 수 있다. The processor 450 may generate and use a public key required to perform encryption by itself, or may receive and use a public key from an external device. For example, the second server device 300 that performs decryption may distribute the public key to other devices.

자체적으로 키를 생성하는 경우, 프로세서(450)는 Ring-LWE 기법을 이용하여 공개 키를 생성할 수 있다. 예를 들면, 프로세서(450)는 먼저 각종 파라미터 및 링을 설정하여, 메모리(420)에 저장할 수 있다. 파라미터의 예로는 평문 메시지 비트의 길이, 공개 키 및 비밀 키의 크기 등이 있을 수 있다. 본 개시에 이용하는 각종 파라미터의 예 및 그 값들에 대해서는 도 4에서 자세히 설명한다. When generating a key by itself, the processor 450 may generate a public key using a Ring-LWE technique. For example, the processor 450 may first set various parameters and rings and store them in the memory 420 . Examples of parameters may include the length of plaintext message bits, the size of public and private keys, and the like. Examples of various parameters used in the present disclosure and their values will be described in detail with reference to FIG. 4 .

링은 다음과 같은 수학식 2로 표현될 수 있다.The ring can be expressed as Equation 2 below.

[수학식 2][Equation 2]

여기서 R은 링, Zq는 계수, f(x)는 n차 다항식이다. where R is a ring, Zq is a coefficient, and f(x) is a polynomial of degree n.

링(Ring)이란 기설정된 계수를 가지는 다항식의 집합으로, 원소들 사이에 덧셈과 곱셈이 정의되어 있으며 덧셈과 곱셈에 대해서 닫혀 있는 집합을 의미한다. 이러한 링은 환으로 지칭될 수 있다. A ring is a set of polynomials having predetermined coefficients, and means a set in which addition and multiplication are defined between elements and are closed with respect to addition and multiplication. Such a ring may be referred to as a ring.

일 예로, 링은 계수가 Zq인 n차 다항식의 집합을 의미한다. 예를 들어, n이 Φ(N)일 때, N차 사이클로토믹 다항식 (N-th cyclotomic polynomial)을 의미할 수 있다. (f(x))란 f(x)로 생성되는 Zq[x]의 이데알(ideal)을 나타낸다. Euler totient 함수 Φ(N)이란 N과 서로 소이고 N보다 작은 자연수의 개수를 의미한다. Φ_N(x)를 N차 사이클로토믹 다항식으로 정의하면, 링은 다음과 같은 수학식 3으로도 표현될 수 있다. 여기서 N은 2¹⁷이 이용될 수 있다. For example, a ring means a set of n-order polynomials having coefficients Zq. For example, when n is Φ(N), it may mean an N-th cyclotomic polynomial. (f(x)) represents the ideal of Zq[x] generated by f(x). The Euler totient function Φ(N) means the number of natural numbers that are prime to N and smaller than N. If Φ _N (x) is defined as an Nth-order cyclotomic polynomial, the ring can also be expressed in Equation 3 as follows. Here, 2 ¹⁷ may be used for N.

[수학식 3][Equation 3]

비밀 키(sk)는 다음과 같이 표현될 수 있다. The secret key (sk) can be expressed as:

한편, 상술한 수학식 3의 링은 평문 공간에서 복소수를 갖는다. 한편, 동형 암호문에 대한 연산 속도를 향상하기 위하여, 상술한 링의 집합 중 평문 공간이 실수인 집합만을 이용할 수도 있다. Meanwhile, the ring of Equation 3 described above has a complex number in the plain text space. Meanwhile, in order to improve the operation speed for homomorphic ciphertext, only sets whose plaintext spaces are real numbers among the above-described ring sets may be used.

이와 같은 링이 설정되면, 프로세서(450)는 링으로부터 비밀 키(sk)를 산출할 수 있다. When such a ring is established, the processor 450 may calculate a secret key sk from the ring.

[수학식 4][Equation 4]

sk ← (1, s(x)), s(x) ∈ Rsk ← (1, s(x)), s(x) ∈ R

여기서, s(x)는 작은 계수로 랜덤하게 생성한 다항식을 의미한다. Here, s(x) means a polynomial generated randomly with small coefficients.

그리고 프로세서(450)는 링으로부터 제1 랜덤 다항식(a(x))을 산출할 수 있다. 제1 랜덤 다항식은 다음과 같이 표현될 수 있다. Further, the processor 450 may calculate a first random polynomial (a(x)) from the ring. The first random polynomial can be expressed as

[수학식 5][Equation 5]

a(x) ← Ra(x) ← R

또한, 프로세서(450)는 에러를 산출할 수 있다. 예를 들어, 프로세서(450)는 이산 가우시안 분포 또는 그와 통계적 거리가 가까운 분포로부터 에러를 추출할 수 있다. 이러한 에러는 다음과 같이 표현될 수 있다.Also, the processor 450 may calculate an error. For example, the processor 450 may extract an error from a discrete Gaussian distribution or a distribution statistically close to the discrete Gaussian distribution. This error can be expressed as:

[수학식 6][Equation 6]

e(x) ←Dⁿ _αq e(x) ←D ⁿ _αq

에러까지 산출되면, 프로세서(450)는 제1 랜덤 다항식 및 비밀 키에 에러를 모듈러 연산하여 제2 랜덤 다항식을 산출할 수 있다. 제2 랜덤 다항식은 다음과 같이 표현될 수 있다. When an error is calculated, the processor 450 may calculate a second random polynomial by performing a modular operation of the error on the first random polynomial and the secret key. The second random polynomial can be expressed as

[수학식 7][Equation 7]

b(x) = -a(x)s(x) + e(x)(mod q)b(x) = -a(x)s(x) + e(x)(mod q)

최종적으로 공개 키(pk)는 제1 랜덤 다항식 및 제2 랜덤 다항식을 포함하는 형태로 다음과 같이 설정된다. 한편, 연산 장치(400)가 RNS(Residue Number System) HEAAN(Homomorphic Encryption for Approximate Number)(또는 HEaaN^TM)을 지원하는 경우, 프로세서(450)는 서로 소인 복수의 정수 각각에 대응되는 복수의 공개키를 생성할 수 있다. Finally, the public key (pk) is set as follows in a form including a first random polynomial and a second random polynomial. On the other hand, when the computing device 400 supports the Residue Number System (RNS) Homomorphic Encryption for Approximate Number (HEAAN) (or HEaaN ^TM ), the processor 450 generates a plurality of public keys corresponding to each of a plurality of integers primed with each other can create

여기서, RNS-HEAAN은 기존의 HEAAN 방식이 중국인의 나머지 정리와 같은 방법이 적용 불가했던 문제를 해결하기 위해 기존의 암호문 공간인 R_qi(q_i=Δⁱ))을 R_qi(q_i=Πp_i,Δⁱ), pi

Δ) 으로 대체하여 사용하는 방식으로, 이에 따라 에러 비트사이즈가 5~10 정도 큰 근사계산 결과를 갖게 되지만, 연산 속도에서 3~10배의 성능 개선이 있을 수 있다. RNS-HEAAN을 이용한 구체적인 암호문 연산은 도 4와 관련하여 후술한다. Here, RNS-HEAAN uses the existing ciphertext space R _qi (q _i =Δ ⁱ )) to R _qi (q _i =Πp _i ,Δ ⁱ ), pi

This method is used by replacing it with Δ), and thus the error bit size has a large approximate calculation result by about 5 to 10, but there may be a performance improvement of 3 to 10 times in the calculation speed. A detailed ciphertext operation using RNS-HEAAN will be described later with reference to FIG. 4 .

[수학식 8][Equation 8]

pk = (b(x), a(x))pk = (b(x), a(x))

상술한 키 생성 방법은 일 예에 불과하므로, 반드시 이에 한정되는 것은 아니며, 이 밖에 다른 방법으로 공개 키 및 비밀 키를 생성할 수도 있음은 물론이다. Since the above key generation method is only an example, it is not necessarily limited thereto, and the public key and the private key may be generated by other methods, of course.

한편, 프로세서(450)는 공개 키가 생성되면, 다른 장치들에 전송되도록 통신 장치(410)를 제어할 수 있다. Meanwhile, when the public key is generated, the processor 450 may control the communication device 410 to transmit it to other devices.

그리고 프로세서(450)는 메시지에 대한 동형 암호문을 생성할 수 있다. 예를 들어, 프로세서(450)는 메시지에 대해서 앞서 생성된 공개 키를 적용하여 동형 암호문을 생성할 수 있다. 이때, 프로세서(450)는 동형 암호문 생성 과정에서, 도 5 또는 도 6에 도시된 바와 같은 소수 정보를 이용하여 암호화 동작을 수행할 수 있다. And the processor 450 may generate a homomorphic ciphertext for the message. For example, the processor 450 may generate a homomorphic ciphertext by applying a previously generated public key to the message. In this case, the processor 450 may perform an encryption operation using the decimal information as shown in FIG. 5 or 6 in the process of generating the homomorphic ciphertext.

복호화할 메시지는 외부 소스로부터 수신할 수도 있고, 연산 장치(400)에 직접 구비 또는 연결된 입력 장치로부터 입력될 수도 있다. 예를 들어, 연산 장치(400)가 터치 스크린이나 키 패드를 포함하는 경우, 프로세서(450)는 사용자가 터치 스크린이나 키 패드를 통해 입력하는 데이터를 메모리(420)에 저장한 후, 암호화할 수 있다. 생성된 동형 암호문은 복호화하였을 때 메시지에 스케일링 팩터를 반영한 값에 에러를 더한 결과값으로 복원되는 형태가 될 수 있다. 스케일링 팩터는 사전에 입력되어 설정된 값을 그대로 사용할 수도 있다. The message to be decrypted may be received from an external source or may be input from an input device directly provided or connected to the computing device 400 . For example, if the computing device 400 includes a touch screen or keypad, the processor 450 may encrypt data input by a user through the touch screen or keypad after storing it in the memory 420. there is. The generated homomorphic ciphertext may be restored as a result value obtained by adding an error to a value in which the scaling factor is reflected in the message when decrypted. As for the scaling factor, a value input and set in advance may be used as it is.

한편, 연산 장치(400)가 RNS-HEAAN을 지원하는 경우, 프로세서(450)는 메시지에 서로 소인 복수의 정수 각각에 대응되는 복수의 공개키를 이용하여 복수의 기저(basis)로 표현되는 동형 암호문을 생성할 수 있다. On the other hand, when the computing device 400 supports RNS-HEAAN, the processor 450 uses a plurality of public keys corresponding to each of a plurality of integers that are primed to each other in the message, and the isomorphic ciphertext expressed in a plurality of basis. can create

또는, 프로세서(450)는 메시지 및 스케일링 팩터를 승산한 상태에서 바로 공개 키를 이용하여 암호화할 수도 있다. 이 경우, 암호화 과정에서 산출되는 에러가 메시지 및 스케일링 팩터를 승산한 결과값에 가산될 수 있다. Alternatively, the processor 450 may directly encrypt using the public key after multiplying the message and the scaling factor. In this case, an error calculated in the encryption process may be added to a result value obtained by multiplying the message and the scaling factor.

또한, 프로세서(450)는 암호문의 길이를 스케일링 팩터의 크기에 대응되도록 생성할 수 있다.Also, the processor 450 may generate the length of the ciphertext to correspond to the size of the scaling factor.

그리고 프로세서(450)는 동형 암호문이 생성되면 메모리(420)에 저장하거나, 사용자 요청 또는 기설정된 디폴트 명령에 따라 동형 암호문을 다른 장치에 전송하도록 통신 장치(410)를 제어할 수 있다. When the homomorphic ciphertext is generated, the processor 450 may control the communication device 410 to store the homomorphic ciphertext in the memory 420 or to transmit the homomorphic ciphertext to another device according to a user request or a preset default command.

한편, 본 개시의 일 실시 예에 따르면, 패킹(packing)이 이루어질 수도 있다. 동형 암호화에서 패킹을 이용하게 되면, 다수의 메시지를 하나의 암호문으로 암호화하는 것이 가능해진다. 이 경우, 연산 장치(400)에서 각 암호문들 간의 연산을 수행하게 되면, 결과적으로 다수의 메시지에 대해 연산이 병렬적으로 처리되므로 연산 부담이 크게 줄어들게 된다. Meanwhile, according to an embodiment of the present disclosure, packing may be performed. Using packing in homomorphic encryption makes it possible to encrypt multiple messages into a single ciphertext. In this case, when the calculation device 400 performs calculations between each ciphertext, as a result, calculations are processed in parallel for a plurality of messages, so the calculation load is greatly reduced.

예를 들어, 프로세서(450)는 메시지가 복수의 메시지 벡터로 이루어지는 경우, 복수의 메시지 벡터를 병렬적으로 암호화할 수 있는 형태의 다항식으로 변환한 후, 그 다항식에 스케일링 팩터를 승산하고 공개 키를 이용하여 동형 암호화할 수도 있다. 이에 따라, 프로세서(450)는 복수의 메시지 벡터를 패킹한 암호문을 생성할 수 있다. For example, when a message is composed of a plurality of message vectors, the processor 450 converts the plurality of message vectors into a polynomial in a form that can be encrypted in parallel, multiplies the polynomial by a scaling factor, and obtains a public key. Homomorphic encryption can also be used. Accordingly, the processor 450 may generate cipher text in which a plurality of message vectors are packed.

그리고 프로세서(450)는 동형 암호문에 대한 복호가 필요한 경우, 동형 암호문에 비밀 키를 적용하여 다항식 형태의 복호문을 생성하고, 다항식 형태의 복호문을 디코딩하여 메시지를 생성할 수 있다. 이때 생성한 메시지는 앞서 설명한 수학식 1에서 언급한 바와 같이 에러를 포함할 수 있다. Further, when decoding of the homomorphic ciphertext is required, the processor 450 may apply a secret key to the homomorphic ciphertext to generate polynomial-type decryption text and decode the polynomial-type decryption text to generate a message. At this time, the generated message may include an error as mentioned in Equation 1 described above.

그리고 프로세서(450)는 암호문에 대해 연산을 수행할 수 있다. 예를 들어, 프로세서(450)는 동형 암호문에 대해서 암호화된 상태를 유지한 상태에서 덧셈, 뺄셈, 또는 곱셈 등의 연산을 수행할 수 있다. 이때, 곱셈은 모듈러 연산일 수 있으며, 후술하는 방식으로 수행될 수 있다. And the processor 450 may perform an operation on the ciphertext. For example, the processor 450 may perform an operation such as addition, subtraction, or multiplication on the homomorphic ciphertext while maintaining an encrypted state. In this case, multiplication may be a modular operation and may be performed in a manner described later.

한편, 동형 암호문을 상술한 RNS 방식으로 생성한 경우, 프로세서(120)는 생성된 동형 암호문 내의 기저(basis)별도 덧셈 및 곱셈을 수행할 수 있다. Meanwhile, when the homomorphic ciphertext is generated by the above-described RNS method, the processor 120 may perform addition and multiplication separately on a basis in the generated homomorphic ciphertext.

한편, 단말 장치(100)는 연산이 완료되면, 연산 결과 데이터로부터 유효 영역의 데이터를 검출할 수 있다. 예를 들어, 단말 장치(100)는 연산 결과 데이터를 라운딩 처리를 수행하여 유효 영역의 데이터를 검출할 수 있다. Meanwhile, when the operation is completed, the terminal device 100 may detect data of an effective area from the operation result data. For example, the terminal device 100 may detect data in an effective area by performing a rounding process on operation result data.

여기서, 라운딩 처리란 암호화된 상태에서 메시지의 반올림(round-off)을 진행하는 것을 의미하며, 다르게는 리스케일링(rescaling)이라고 할 수도 있다. 예를 들어, 단말 장치(100)는 암호문 각각의 성분에 스케일링 인수의 역수인 Δ^-1을 곱하고 반올림하여, 노이즈 영역을 제거할 수 있다. 노이즈 영역은 스케일링 팩터의 크기에 대응되도록 결정될 수 있다. 결과적으로 노이즈 영역이 제외된 유효 영역의 메시지를 검출할 수 있다. 암호화 상태에서 진행되므로 추가적인 에러가 발생하지만 크기는 충분히 작으므로 무시할 수 있다. Here, the rounding process means to round-off a message in an encrypted state, and may also be referred to as rescaling. For example, the terminal device 100 may remove the noise region by multiplying each component of the ciphertext by Δ ^-1 , which is the reciprocal of the scaling factor, and then rounding off. The noise area may be determined to correspond to the magnitude of the scaling factor. As a result, it is possible to detect a message in an effective area from which noise areas are excluded. Since it proceeds in an encrypted state, an additional error occurs, but the size is small enough to be ignored.

그리고 상술한 라운딩 처리는 상술한 바와 같은 모듈러 곱셈 연산이 이용될 수 있다. Further, the above-described rounding processing may use the above-described modular multiplication operation.

만약, 단말 장치(100)가 RNS-HEAAN을 지원하는 경우, 프로세서(120)는 복수의 기저 중 어느 하나의 비중이 임계치를 초과하면, 생성된 동형 암호문 내의 복수의 기저 각각에 대한 메시지의 반올림 처리를 수행하여 동형 암호문을 리스케일링할 수 있다. If the terminal device 100 supports RNS-HEAAN, the processor 120 rounds the message for each of the plurality of bases in the generated homomorphic ciphertext when the proportion of any one of the plurality of bases exceeds the threshold value. You can rescale the homomorphic ciphertext by performing

또한, 단말 장치(100)는 연산 결과 암호문 내의 근사 메시지 비중이 임계치를 초과하면, 연산 결과 암호문의 평문 공간을 확장할 수 있다. 예를 들어, 상술한 수학식 1에서 q가 M보다 작다면 M+e(mod q)는 M+e와 다른 값을 가지므로 복호화가 불가능해진다. 따라서, q 값은 항상 M보다 크게 유지되어야 한다. 하지만, 연산이 진행됨에 따라 q 값은 점차 감소하게 된다. 평문 공간의 확장이란 암호문 ct를 더 큰 모듈러스(modulus)를 가지는 암호문으로 변화시키는 것을 의미한다. 평문 공간을 확장하는 동작은 다르게는 재부팅(rebooting)이라 할 수도 있다. 재부팅을 수행함에 따라, 암호문은 다시 연산이 가능한 상태가 될 수 있다. In addition, the terminal device 100 may expand the plaintext space of the ciphertext as a result of the operation when the weight of the approximation message in the ciphertext as a result of the operation exceeds a threshold value. For example, if q is less than M in Equation 1 described above, M+e (mod q) has a different value from M+e, so decoding becomes impossible. Therefore, the value of q should always be kept larger than M. However, as the calculation proceeds, the value of q gradually decreases. Expansion of the plaintext space means changing the ciphertext ct into a ciphertext having a larger modulus. The operation of expanding the plaintext space may be referred to as rebooting. As the reboot is performed, the ciphertext may be in a state in which calculation is possible again.

한편, Ring LWE 문제에 기반한 동형암호의 암호화, 복호화, 덧셈, 곱셈, 리스케일, 재부팅 등은 다항식 환

의 원소들의 연산으로 구성될 수 있다. On the other hand, encryption, decryption, addition, multiplication, rescale, and reboot of homomorphic encryption based on the Ring LWE problem are polynomial cycles.

It can be composed of the operation of the elements of

상술한 연산 중 다항식 곱셈 연산은 암호화, 복호화, 다항식 곱셈, 재부팅 등에서 가장 시간이 많이 소요되는 연산과정이다. 특히 가장 자주 사용되는 Mult 알고리즘을 수행하는 동안에 대략 5번의 다항식의 곱셈 연산이 수행되므로, 해당 연산의 고속화 기법은 매우 중요하다. Among the above-described operations, the polynomial multiplication operation is the most time-consuming operation process in encryption, decryption, polynomial multiplication, and rebooting. In particular, since the polynomial multiplication operation is performed approximately 5 times during the execution of the Mult algorithm, which is used most frequently, the speed-up technique for the operation is very important.

도 3은 본 개시의 일 실시 예에 따른 암호문 연산 방법을 설명하기 위한 흐름도이다. 3 is a flowchart illustrating a ciphertext calculation method according to an embodiment of the present disclosure.

도 3을 참조하면, 복수의 암호문에 대한 모듈 연산 명령을 입력받을 수 있다(S310). 이러한 명령은 외부 장치로부터 입력될 수 있으며, 연산 장치에서 직접 입력될 수도 있다. 그리고 이러한 연산 명령은 메시지 암호화 또는 동형 암호문 연산을 위한 명령일 수 있다. Referring to FIG. 3 , a module operation command for a plurality of cipher texts may be input (S310). Such a command may be input from an external device or may be directly input from an arithmetic device. Further, such an operation command may be a command for message encryption or homomorphic ciphertext operation.

그리고 기결정된 복수의 소수 정보를 이용하여 복수의 암호문에 대한 모듈 연산을 수행할 수 있다(S320). 여기서 복수의 소수 정보 각각은 2의 지수승들의 조합으로 표현될 수 있다. 소수 정보의 예는 도 5 또는 6에 도시하였다. 한편, 모듈러 연산에 사용되는 소수 정보 모두를 메모리에 저장하는 경우에는 많은 메모리 리소스가 요구된다. 따라서, 일부 소수 정보만을 저장하고, 매 사이클마다 저장된 소수 정보와 앞서 이용한 소수 정보를 이용하여 다음 사이클에 필요한 소수 정보를 생성하여 이용할 수 있다. 이와 같은 소수 정보(또는 제곱근 정보)의 생성 동작에 대해서는 도 7에서 후술한다. Further, a module operation may be performed on a plurality of cipher texts using a plurality of predetermined prime number information (S320). Here, each of the plurality of prime numbers may be expressed as a combination of powers of 2. An example of decimal information is shown in FIG. 5 or 6 . On the other hand, in the case of storing all of the decimal information used in the modular operation in the memory, a lot of memory resources are required. Accordingly, only a part of the information of the prime number is stored, and information for the next cycle may be generated and used by using the stored information of the prime number and the previously used information of the prime number in each cycle. An operation of generating such decimal information (or square root information) will be described later with reference to FIG. 7 .

그리고 연산 결과를 출력할 수 있다(S330). 예를 들어, 연산을 요청한 장치에 연산 결과를 출력할 수 있다. 한편, 상술한 연산 명령이 메시지 암호화 등과 같은 전체 명령을 수행하는데 필요한 일부 명령인 경우, 연산 결과를 다른 연산자(또는 연산 프로그램)에 전달할 수 있다. And the calculation result can be output (S330). For example, a calculation result may be output to a device requesting calculation. On the other hand, if the above-mentioned calculation command is a part of the command required to execute the entire command such as message encryption, the calculation result may be transferred to another operator (or calculation program).

이상과 같이 본 개시에 따른 암호문 연산 방법은 2의 지수승들의 조합으로 표현되는 소수 정보를 이용하여 연산을 수행하는바, 빠른 연산을 수행할 수 있다. 또한, 구현 예에서 모든 소수 정보를 저장하여 이용하지 않고, 일부 소수 정보만 저장하고, 나머지 소수 정보는 매 사이클마다 기저장된 소수 정보를 이용하여 산출하여 이용하는바 낮은 메모리 리소스만을 이용하여 연산을 수행할 수 있다. As described above, the ciphertext calculation method according to the present disclosure performs calculation using decimal information represented by a combination of powers of 2, and thus can perform fast calculation. In addition, in an implementation example, not all decimal information is stored and used, but only some fractional information is stored, and the remaining fractional information is calculated and used using pre-stored decimal information every cycle, so that the operation can be performed using only low memory resources. can

이하에서는 동형 암호문에 대한 제1 모듈 연산 방식을 설명한다. Hereinafter, a first module operation method for homomorphic ciphertext will be described.

제1 모듈 연산 방법(ModMult)은 아래의 수학식 9와 같이 숫자 A에서 [A/q]과 q의 곱셈연산 값을 빼는 것으로 나타낼 수 있다. The first module operation method (ModMult) can be expressed by subtracting the multiplication operation value of [A/q] and q from the number A as shown in Equation 9 below.

[수학식 9][Equation 9]

여기서, A는 암호문(또는 다항식), q는 모듈러스(Modulus)를 위한 원소이다. Here, A is a ciphertext (or polynomial) and q is an element for modulus.

이와 같은 연산을 수행하기 위한, ModMult(또는 모듈러스 연산기)는 제1 승산기, 제2 승산기, 제3 승산기, 쉬프터, 감산기를 포함할 수 있다. 이러한 모듈러스 연산기는 도 2의 연산 장치일 수 있으며, FPGA(Field Programmable Gate Array) 내의 하나의 연산 모듈일 수도 있다. 이하에서는 설명을 용이하게 하기 위하여, 두 개의 암호문에 대한 모듈러스 곱셈 연산 동작을 설명하나, 구현시에 암호문이 아닌 다항식에 대한 모듈러스 곱셈 연산이 이용될 수 있다. 또한 상술한 수학식 9와 다른 수학식(동형 암호에 대한 곱셈을 포함하는 연산)에 대해서도 적용될 수 있다. To perform such an operation, ModMult (or modulus operator) may include a first multiplier, a second multiplier, a third multiplier, a shifter, and a subtractor. The modulus calculator may be the calculation device of FIG. 2 or may be one calculation module in a Field Programmable Gate Array (FPGA). Hereinafter, for ease of explanation, a modulus multiplication operation for two ciphertexts will be described, but a modulus multiplication operation for polynomials rather than ciphertexts may be used in implementation. In addition, it can also be applied to equations other than the above-described Equation 9 (an operation including multiplication for homomorphic encryption).

제1 승산기는 제1 암호문(A)(또는 제1 다항식) 및 제2 암호문(B)(또는 제2 다항식)을 제1 곱셈 연산할 수 있다. 여기서 제1 승산기는 n 비트의 제1 암호문(A)과 n 비트의 제2 암호문(B)을 이용하여 2n 비트의 크기를 갖는 곱셈 결과(V)를 출력하는 Full 승산기(Full-IntMult)일 수 있다. The first multiplier may perform a first multiplication operation on the first ciphertext A (or the first polynomial) and the second ciphertext B (or the second polynomial). Here, the first multiplier may be a full multiplier (Full-IntMult) that outputs a multiplication result (V) having a size of 2n bits using n-bit first ciphertext (A) and n-bit second ciphertext (B). there is.

제2 승산기는 복수의 소수 정보 중 하나의 소수 정보(q)에 대응되는 역수 정보(T)와 제1 곱셈 연산 결과(U)를 제2 곱셈 연산할 수 있다. 구체적으로, 제2 승산기(520, IntMult2)는 제1 승산기 출력의 상위 비트에 1/q로 스케일된 T를 곱하는 동작을 수행할 수 있다. The second multiplier may perform a second multiplication operation on the reciprocal information (T) corresponding to one of the plurality of prime number information (q) and the first multiplication operation result (U). Specifically, the second multiplier 520 (IntMult2) may perform an operation of multiplying the upper bits of the output of the first multiplier by T scaled by 1/q.

예를 들어, 제2 승산기의 출력 값의 상위 비트에만 후술하는 제3 승산기의 계수(q)가 적용되기 때문에, 제2 승산기는 n 비트의 두 암호문이 입력되어 n 비트의 크기를 갖는 곱셈 결과(W)를 출력하는 Upper Half(UH)-IntMult를 이용할 수 있다. 그리고 역수 정보는 소수 정보와 곱해서 1이 되는 수 즉, 소수의 반비례 값(1/q)이며, 해당 값은 룩업 테이블에 미리 저장되어 있을 수 있으며, 기저 소수 정보(또는 기저 제곱근 정보)를 통하여 산출하여 이용할 수도 있다. For example, since the coefficient (q) of the third multiplier described later is applied only to the upper bits of the output value of the second multiplier, the second multiplier inputs two ciphertexts of n bits and produces a multiplication result having a size of n bits ( W) can be used to output Upper Half (UH)-IntMult. In addition, the reciprocal information is a number that is multiplied by the decimal information to become 1, that is, the inverse proportional value (1/q) of the prime number, and the corresponding value may be pre-stored in the lookup table, and is calculated through the base prime number information (or the base square root information) can also be used.

제3 승산기는 제2 곱셈 연산 결과(W)와 하나의 소수 정보(q)를 이용하여 제3 곱셈 연산할 수 있다. 예를 들어, 제3 승산기의 출력 값 중 하위 비트만 쉬프터의 출력 비트와 연산되는바, 제3 승산기는 n 비트의 두 암호문이 입력되어 n 비트의 크기를 갖는 곱셈 결과(W)를 출력하는 Lower Half(LH)-IntMult로 구현될 수 있다.The third multiplier may perform a third multiplication operation using the second multiplication operation result (W) and one decimal point information (q). For example, only the lower bit of the output value of the third multiplier is operated with the output bit of the shifter, and the third multiplier receives two n-bit ciphertexts and outputs a multiplication result (W) having an n-bit size. It can be implemented as Half(LH)-IntMult.

그리고 쉬프터는 제1 승산기의 출력 값을 지연하여 감산기에 제공할 수 있다. 예를 들어, 쉬프터는 제1 승산기의 출력의 하위 비트를 지연시킬 수 있으며, 플리플롭(FF)으로 구현될 수 있다. 이에 따라 감산기는 쉬프터의 출력 값에 제3 승산기의 출력 값을 감산하고, 그 결과를 출력할 수 있다. The shifter may delay the output value of the first multiplier and provide the delayed output value to the subtractor. For example, the shifter may delay a lower bit of the output of the first multiplier and may be implemented as a flip-flop (FF). Accordingly, the subtractor may subtract the output value of the third multiplier from the output value of the shifter and output the result.

상술한 바와 같이 제2 승산기 및 제3 승산기 각각은 역수 정보(T)와 소수 정보(q)를 이용한 곱셈 연산을 수행할 수 있다. As described above, each of the second multiplier and the third multiplier may perform a multiplication operation using reciprocal information (T) and decimal information (q).

한편, RNS-HEAAN 방식에서는 기본 모듈러스, 리스케일 모듈러스 및 모드업 모듈러스와 같은 세 가지 유형이 이용되는데, 다항식의 차수가 N-1인 경우, 1 mod 2N에 적합해야 하며, 소수(q) 및 해당 소수에 대응되는 역수(T)가 낮은 해밍 가중치를 갖는 소수는 도 5 또는 도 6에 도시된 바와 같이 서로 다른 지수로 구성된 3개, 4개, 5개의 2의 지수승들의 감가산 값으로 나타낼 수 있다. On the other hand, in the RNS-HEAAN method, three types are used: base modulus, rescale modulus, and mode-up modulus. If the degree of the polynomial is N-1, it must fit 1 mod 2N, A prime number having a Hamming weight having a low reciprocal number (T) corresponding to the prime number can be represented as a subtraction value of three, four, and five powers of 2 composed of different exponents, as shown in FIG. 5 or FIG. 6 . there is.

이와 같이 본 개시에서 이용하는 소수는 2의 지수승들의 조합으로 표현되는바, 해당 소수 또는 해당 소수에 대한 역수 값에 대한 연산 과정에서는 시프트 연산 및 감가산 동작만으로 소수 곱셈을 수행할 수 있다. As such, since the prime number used in the present disclosure is expressed as a combination of powers of 2, decimal multiplication can be performed only by shift operation and subtraction operation in the operation process for the corresponding prime number or the reciprocal value of the corresponding prime number.

즉, 상술한 제2 승산기 및 제3 승산기 각각은 복수의 2의 지수승 각각의 지수에 기초하여 개별적인 시프트 연산을 수행하고, 시프트 연산 결과들을 가산 연산 또는 감산 연산하여 상술한 제2 곱셈 연산 또는 제3 곱셈 연산을 수행할 수 있다. That is, each of the above-described second multiplier and third multiplier performs an individual shift operation based on an exponent of each of a plurality of powers of 2, and performs an addition operation or subtraction operation on the results of the shift operation to obtain the above-described second multiplication operation or second multiplier operation. It can perform 3 multiplication operations.

이와 같이 복잡한 소수 곱셈 동작을 시프트 연산 및 가산/감산 연산만으로 수행할 수 있다는 점에서, 고속화 연산이 가능하다. Since such a complicated decimal multiplication operation can be performed only by shift operation and addition/subtraction operation, high-speed operation is possible.

한편, 이상에서는 모듈러 곱셈 연산이 암호문을 입력받아 처리하는 것으로 도시하고 설명하였지만, 구현시에 모듈러 곱셈 연산의 입력은 다양한 값이 이용될 수 있다. 즉, 모듈러 곱셈 연산은 암호문 연산뿐만 아니라, 암호 과정에 필요한 값들을 산출하거나, 스케일링 또는 복호화 과정에서도 이용 가능한데 이러한 과정 중에 이용되는 값이라면 암호문이 아니어도 무방하다. Meanwhile, although the modular multiplication operation has been illustrated and described as receiving and processing ciphertext in the above, various values may be used as inputs of the modular multiplication operation during implementation. That is, the modular multiplication operation can be used not only in the ciphertext operation, but also in calculating values necessary for the encryption process or in the scaling or decryption process.

이하에서는 동형 암호문에 대한 제2 모듈 연산 방식을 설명한다. Hereinafter, a second module operation method for homomorphic ciphertext will be described.

제2 모듈 연산 방법(ModMult)의 알고리즘은 제1 모듈 연산 방법과 유사하나, 사전 계산 값을 이용한다는 점에서 차이가 있다. 구체적으로, '하나의 소수 정보에 대응되는 역수와 제2 암호문을 곱셈 연산한 사전 계산 값(B')'을 저장하여 이용할 수 있다. 이와 같은 사전 계산 값(B')은 B/q에 대한 근사 값으로, B' 값을 사용함으로써 A x B /q는 W에 근사될 수 있다. The algorithm of the second module calculation method (ModMult) is similar to the first module calculation method, but is different in that a pre-calculated value is used. Specifically, 'a pre-calculated value (B') obtained by multiplying the reciprocal number corresponding to one decimal point information and the second ciphertext' may be stored and used. This pre-calculated value (B') is an approximate value for B/q, and A x B/q can be approximated to W by using the value of B'.

한편, 제2 모듈 연산 방식에서는 연산에 필요한 값을 미리 사전에 계산하여 저장해 놓고, 연산시에 사전에 계산된 값을 이용하여 연산 속도를 향상할 수 있는 방법에 대해서 설명하였다. 그러나 이와 같은 방식은 연산 속도의 증가를 갖게 하지만, 많은 저장공간이 요구된다. 따라서, 이하에서는 연산 속도를 증가시키면서도 비교적 적은 저장 공간만을 이용하여 모듈러스 연산을 수행할 수 있는 방법에 대해서 설명한다. 먼저, 해당 알고리즘을 설명함에 있어서, 상술한 모듈러스 연산과 NTT 연산, INTT 연산 간의 관계에 대해서 설명한다. On the other hand, in the second module calculation method, a method of pre-calculating and storing values required for calculation and improving calculation speed by using the pre-calculated value during calculation has been described. However, this method increases the computational speed, but requires a lot of storage space. Accordingly, a method for performing a modulus calculation using only a relatively small storage space while increasing the calculation speed will be described below. First, in describing the corresponding algorithm, the relationship between the above-described modulus calculation, NTT calculation, and INTT calculation will be described.

이하에서는, w를 모듈러 소수 p에 대한 N^th 제곱근 법(modulo)이라고 지칭한다. 이것은 w^N

1(mod N)를 만족하는 것을 의미한다. 프리미티브 N^th 제곱근은 모든 N^th 제곱근을 곱하여 생성하는 N^th 제곱근이다. 정의하면, 프리미티브 N^th 제곱근은 N 크기 백터를 DFT하는 것이 요구된다. p

1(mod N) 갖는다면, p에 대한 N^th 제곱근이 존재한다는 것이 알려져 있다. Hereinafter, w is referred to as the N ^th square root modulo for the modular prime p. this w ⁿ

1 (mod N). The primitive N ^th square root is the N ^th square root produced by multiplying all N ^th square roots. By definition, the primitive N ^th square root is required to DFT a vector of size N. p

1(mod N), then it is known that there exists an N ^th square root of p.

동형 암호를 포함하는 래티스 기반 암호에서,

(여기서 N은 거듭제곱, p는 소수) 링 상에서 작업을 수행한다. 링에서의 곱셈은 음 래핑 합성곱(negative wrapped convolutions)에 대응되는 반면 NTT-곱-INTT 패러다임은 링

에서의 곱, 즉 일반적인 합성곱에 대응된다. In lattice-based cryptography involving homomorphic cryptography,

(Where N is a power and p is a prime number) Perform operations on the ring. Multiplication in a ring corresponds to negative wrapped convolutions, whereas the NTT-product-INTT paradigm corresponds to the ring

It corresponds to the multiplication in , that is, the general convolution.

NTT 및 INTT 알고리즘을 조금 수정하면

상의 곱으로 효율적으로 수행할 수 있다. 이 수정을 사용하려면 모듈러스 p가 p

1(mod 2N)을 만족하여야 하나, 일반적인 NTT/INTT는 p

1(mod N)이 요구된다. 따라서 본 개시에서는 효율성을 위해 수정된 프레임워크를 설명하여 이하에서는 이를 수정된 NTT/INTT 알고리즘이라고 지칭한다. With minor modifications to the NTT and INTT algorithms,

It can be performed efficiently by multiplication of phases. To use this correction, the modulus p is

1 (mod 2N), but general NTT/INTT is p

1 (mod N) is required. Therefore, in this disclosure, a modified framework for efficiency is described, which is hereinafter referred to as a modified NTT/INTT algorithm.

음 합성곱에 대한 효율적인 INTT는 알고리즘 4에 도시된다. 이와 같은 효율적인 INTT 동작에 대해서는 도 4를 참조하여 이하에서 설명한다. An efficient INTT for negative convolutions is shown in Algorithm 4. Such an efficient INTT operation will be described below with reference to FIG. 4 .

도 4는 본 개시의 제1 실시 예에 따른 INTT 알고리즘을 설명하기 위한 도면이다. 도 4에서는 설명의 단순화를 위하여 리스케일 단계가 생략되었지만, 실제 구현시에는 리스케일 단계가 추가될 수 있다. 4 is a diagram for explaining the INTT algorithm according to the first embodiment of the present disclosure. In FIG. 4 , the rescale step is omitted for simplicity of description, but the rescale step may be added in actual implementation.

도 4를 참조하면, 입력으로써 비트반전(bit reversal) 차수 내의 고정된 프리미티브(2N)^th 거듭제곱근(Ψ)의 음의 지수의 리스트(이를

로 나타낸다)를 취할 수 있다. 보다 구체적으로,

는

를 포함하며, 여기서 j는 i의 비트 반전(bit reverse)이다. Referring to FIG. 4, as an input, a list of negative exponents of a fixed primitive (2N) ^th power root (Ψ) in bit reversal order (this is

) can be taken. More specifically,

Is

, where j is the bit reverse of i.

일반적으로, NTT/INTT는 빌딩블록(building block)인 BUs를 이용하여 수행될 수 있다. 이하에서는 BU로 지칭하지만, 기능 블록, 빌딩블록 등으로 지칭될 수도 있다. 여기서 도 4의 기능 블록(Function ButterflyUnit)은 a[j], a[j+t], W, p이고, a[j]-a[j+t](mode p)와 (a[j]+a[j+t])·W(mod p)를 계산하고, 각각을 a[j], a[j+t]에 각각을 저장할 수 있다. In general, NTT/INTT can be performed using BUs, which are building blocks. Hereinafter, it is referred to as a BU, but may also be referred to as a function block, building block, or the like. Here, the functional blocks (Function ButterflyUnit) of FIG. 4 are a[j], a[j+t], W, and p, and a[j]-a[j+t] (mode p) and (a[j]+ a[j+t])·W(mod p) can be calculated, and each can be stored in a[j] and a[j+t] respectively.

입력 샘플 수가 N일 때, NTT의 스테이지 수는 log N이고, 각 스테이지는

의 BU로 구성될 수 있다. 따라서,요구되는 전체 BU의 개수는 B

이다. 예를 들어, N이 8이고, 스테이지 수가 3이면, 12개의 BU가 필요하다. 여기서 샘플은 연산기(또는 BU)에 제공되는 입력 데이터를 지칭하며, 동형 암호문 또는 다항식 등이 될 수 있다. When the number of input samples is N, the number of stages in NTT is log N, and each stage is

It can be composed of BUs. Therefore, the total number of BUs required is B

am. For example, if N is 8 and the number of stages is 3, 12 BUs are required. Here, the sample refers to input data provided to the operator (or BU), and may be a homomorphic ciphertext or a polynomial.

이하에서는 RNS 동형 연산(이하에서는 RNS-HEAAN으로 지칭함) 동작을 설명한다. Hereinafter, an RNS isomorphic operation (hereinafter referred to as RNS-HEAAN) operation will be described.

RNS-HEAAN은 기존의 HEAAN 방식이 중국인의 나머지 정리와 같은 방법이 적용 불가했던 문제를 해결하기 위해 기존의 암호문 공간인 R_qi(q_i=Δⁱ))을 R_qi(q_i=Πp_i,Δⁱ), pi

Δ) 으로 대체하여 사용하는 방식이다. 이와 같은 RNS-HEAAN은 고정 소수점을 갖는 근사계산을 지원하기 때문에 동형 암호에 주요한 솔루션이다. 특히, RNS-HEAAN은 다항식의 큰 계수를 작은 계수로 분할하여 연산을 수행하는바 병렬 연산을 가능케 한다. RNS-HEAAN uses the existing ciphertext space R _qi (q _i =Δ ⁱ )) to R _qi (q _i =Πp _i , Δ ⁱ ), pi

Δ) is used instead. RNS-HEAAN like this is a major solution for homomorphic encryption because it supports approximation with a fixed point. In particular, RNS-HEAAN enables parallel computation as it performs computation by dividing large coefficients of polynomials into small coefficients.

동형 곱셈(HomeMult)은 동형 연산에서 많이 사용되는 연산이나, 많은 시간이 소요되기 때문에 동형 암호 기반의 애플리케이션의 실제 사용에 있어서 가장 큰 장해 요소이다. 여기서 가장 큰 병목은 큰 차수의 다항식 링 곱셈을 NTT/INTT를 사용하더라도 여전히 느리다는 것이다. Homomorphic multiplication (HomeMult) is an operation that is widely used in homomorphic operations, but takes a lot of time, so it is the biggest obstacle in the actual use of applications based on homomorphic encryption. The biggest bottleneck here is that polynomial ring multiplication of large order is still slow even with NTT/INTT.

이러한 현상은 RNS-HEAAN에서도 동일하나, RNS-HEAAN에서는 기존 상황과 구별되는 추가 기능을 가지고 있다. 기본적으로 효율적인 동형 연산을 위해 RNS-HEAAN에서 다항식의 입력 계수는 사전에 NTT 도메인으로 변환된다. 그러나 변환되지 않는 계수도 동형 곱이 요구된다. This phenomenon is the same in RNS-HEAAN, but RNS-HEAAN has an additional function that is different from the existing situation. Basically, the input coefficients of polynomials in RNS-HEAAN are converted to the NTT domain beforehand for efficient isomorphism operation. However, isomorphic multiplication is required for coefficients that are not transformed.

이하에서는, 사이클로토믹 링(R² _Q) 상에 두개의 암호문(ct₁= (a₁, b₁ = a₁s +m₁+e₁), ct₂ =(a₂, b₂ = a₂s+m₂+e₂)를 곱한다고 가정한다. 여기서 s, m_i, e_i, Q 각각은 Xkey로부터의 샘플 다항식, 메시지, 에러, 큰 모듈러스(

)이다. Hereinafter, two ciphertexts (ct ₁ = (a ₁ , b ₁ = a ₁ s + m ₁ + e ₁ ), ct ₂ = (a ₂ , b ₂ = a ₂ ) on the cyclotomic ring (R ² _Q ) s+m ₂ +e ₂ ), where s, m _i , e _i , Q are each the sample polynomial from Xkey, the message, the error, the large modulus (

)am.

비밀 키가 (-s, 1)로 설정되었을 때, 암호문의 곱은 아래의 수학식을 이용하여 계산될 수 있다. When the secret key is set to (-s, 1), the product of the ciphertext can be calculated using the equation below.

[수학식 11][Equation 11]

<Ct_mult, sk> = a₁a₂s² - (a₁b₂+b₂b₁)s +b₁b₂ <Ct _mult , sk> = a ₁ a ₂ s ² - (a ₁ b ₂ +b ₂ b ₁ )s + b ₁ b ₂

여기서 <·, ·>는 두 벡터의 내적(dot product)을 나타낸다. Here, <·, ·> represents the dot product of two vectors.

수학식 11의 첫 번째 항을 선형화하고, 큰 에러(a₁b₁e_swk)를 1/P(=

)로 스케일 다운하면, 사이클로토믹 링(R² _PQ) 상의 스위칭 키(swk)는 아래의 수학식 12와 같이 정의할 수 있다. Linearize the first term of Equation 11, and the large error (a ₁ b ₁ e _swk ) is 1/P (=

), the switching key (swk) on the cyclotomic ring (R ² _PQ ) can be defined as in Equation 12 below.

[수학식 12][Equation 12]

여기서 e_swk는 스위칭 키가 복호화할 때 유발되는 에러로 지칭될 수 있다. 스위칭 키를 곱하기 위하여, a₁, a₂ R² _Q상의 도메인은 R² _PQ 도메인으로 변환될 수 있다. 이러한 변환 프로세스는 기저 변환(basis conversion)으로 지칭될 수 있으며, NTT 도메인 상의 a₁a₂를 역변환하는 INTT를 필요로 한다. 이 변환 후에, NTT는 변환된 a₁a₂에 재적용된다. Here, e _swk may be referred to as an error caused when the switching key decrypts. To multiply the switching key, the domain on a ₁ , a ₂ R ² _Q can be converted to the R ² _PQ domain. This conversion process may be referred to as basis conversion and requires INTT to inverse transform a ₁ a ₂ on the NTT domain. After this transformation, NTT is reapplied to the transformed a ₁ a ₂ .

(q_i, p_i)상의 부분 모듈(partial moduli)은 다음의 3 종류로 분류될 수 있다. Partial moduli on (q _i , _pi ) can be classified into the following three types.

1. 기본 모듈러스(Base modulus)(q₀): 각 동형 곱을 수행할 때마다 q_i의 수가 1식 줄어들고, 회로 깊이가 1씩 줄어들며, 이 모듈은 마지막 남은 모듈러스이다. 1. Base modulus (q ₀ ): For each homomorphic multiplication, the number of q _i is reduced by one, and the circuit depth is reduced by one, and this modulus is the last remaining modulus.

2. 리스케일 모듈러스(rescale modulus)(q₁, where 1 ≤i≤ l): 리스케일 모듈러의 수는 회로의 깊이를 나타낸다. 일반적으로 부트 스크랩을 최대한 자주 사용하지 않도록 이 숫자를 크게 만드는 것이 유리하다. 2. Rescale modulus (q ₁ , where 1 ≤ i ≤ l): The number of rescale moduli represents the depth of the circuit. It is generally advantageous to make this number large to avoid using boot scrap as often as possible.

3. 모드업 모듈러스(p_i, where 1 ≤i≤ k) : 동형 곱 동안 발생한 오류의 크기를 줄이는데 사용된다. 3. Mode-up modulus (p _i , where 1 ≤ i ≤ k): Used to reduce the size of errors generated during homomorphic multiplication.

이하에서는 RNS-HEAAN의 부트 스트래핑을 위한 매개변수를 설명한다. Hereinafter, parameters for bootstrapping of RNS-HEAAN will be described.

동형 암호 스킴은 에러를 사용하여 메시지를 암호화한다. 그러나 동형 암호문에 대한 연산이 수행될 때마다 내부의 에러는 증가하게 되며, 특히, 동형 곱을 수행할 때마다 내부의 에러는 급속히 커진다. 더욱이, 에러의 크기가 일정 수준을 초과하는 경우, 복호시 올바른 메시지를 얻을 수 없게 된다. 여기서 일정 수준(또는 임계 값)에 도달하기 전의 동형 곱의 수를 회로 깊이(circuit depth)라고 지칭한다. Homomorphic encryption schemes use errors to encrypt messages. However, the internal error increases whenever an operation on the homomorphic ciphertext is performed, and in particular, the internal error increases rapidly whenever a homomorphic product is performed. Moreover, when the magnitude of the error exceeds a certain level, a correct message cannot be obtained during decoding. Here, the number of isomorphic products before reaching a certain level (or threshold value) is referred to as circuit depth.

에러 및 회로 깊이를 재설정하는 부트 스트래핑을 수행하면 동형 암호문은 동형 연산을 무제한으로 수행할 수 있게 된다. 그러나 부트 스트래핑은 매우 느린 속도로 수행되기 때문에, 실용적인 연산은 아니다. 따라서, 부트 스트래핑의 속도를 증가시킬 필요가 있으며, 다음과 같은 두 방식으로 속도 향상하는 방법을 고려할 수 있다. 첫째는 부트 스트래핑의 처리 속도를 가속하는 방법과 둘째는 부트 스트래핑 간의 간격(예를 들어, 회로 깊이)을 증가시키는 방법이다. 이하에서는 두 번째 방법에 대해서 먼저, 설명한다. By performing bootstrapping that resets the error and circuit depth, the homomorphic cipher text can perform unlimited isomorphic operations. However, since bootstrapping is performed very slowly, it is not a practical operation. Therefore, it is necessary to increase the speed of bootstrapping, and the following two methods of speed improvement can be considered. The first is a method of accelerating the processing speed of bootstrapping, and the second is a method of increasing an interval (eg, circuit depth) between bootstrapping. Hereinafter, the second method will be described first.

일반적인 부트 스트래핑은 15-20의 회로 깊이를 소모한다. 부트 스트래핑이 수행되면 초기 깊이에서 부트 스트래핑에 필요한 회로 깊이가 빠지게 된다. 실용적인 설계를 위하여 부트 스트래핑 이후의 회로 깊이가 20-25가 되도록, 초기 회로 깊이는 대략 40으로 설정되어야 한다. 이하에서는 이러한 초기 회로 깊이를 갖기 위한 본 개시의 파라미터를 표 1를 참조하여 설명한다. Typical bootstrapping consumes 15-20 circuit depth. When bootstrapping is performed, the circuit depth required for bootstrapping is missing from the initial depth. For a practical design, the initial circuit depth should be set to approximately 40, so that the circuit depth after bootstrapping is 20-25. Hereinafter, the parameters of the present disclosure for having such an initial circuit depth will be described with reference to Table 1.

λλ dnumdnum NN l+1l+1 kk logQlogQ logPlogP logPQlogPQ logq₀ logq ₀ logq_i logq _i logp_i logp _i RNS-HEAAN 1RNS-HEAAN 1 7373 1One 2¹⁵ 2 ¹⁵ 1111 1212 611611 660660 12711271 6262 5555 5555 RNS-HEAAN2RNS-HEAAN2 108108 44 2¹⁶ 2 ¹⁶ 2424 66 10901090 273273 13631363 6262 4545 -- RNS-HEAAN 3RNS-HEAAN 3 105105 77 2¹⁶ 2 ¹⁶ 2828 -- 12701270 182182 14521452 6262 4545 -- HEAX set-AHEAX set-A 128.1128.1 -- 2¹² 2 ¹² 22 -- -- -- 109109 -- -- -- HEAX set-BHEAX set-B 128.5128.5 -- 2¹³ 2 ¹³ 44 -- -- -- 218218 -- -- -- HEAX set-CHEAX set-C 128.1128.1 -- 2¹⁴ 2 ¹⁴ 88 -- -- -- 438438 -- -- -- Our SET-AOur SET-A 129.8129.8 22 2¹⁷ 2 ¹⁷ 3636 1616 18821882 992992 28742874 6262 5252 6262 our SET-Bour SET-B 127.3127.3 33 2¹⁷ 2 ¹⁷ 4242 1212 21942194 744744 29382938 6262 5252 6262

표 1을 참조하면, 기존의 기술은 대략 80의 보안 파라미터(λ)가 널리 사용되고 있음을 확인할 수 있다. 그러나 개인 데이터에 대한 관련 연구가 다양해 지고 있다는 점에서, 보안 파라미터는 128까지 증가할 필요가 있다. 구체적으로, 표 1을 참조하면, 기존의 RNS-HEAAN 스킴에서의 보안 파라미터는 128 보안을 만족하지 않음을 확인할 수 있다. 기존의 HEAX 스킴은 128 보안을 만족하나 해당 스킴은 부트 스트래핑을 고려하지 않기 때문에 동형 곱을 8회만 허용한다.Referring to Table 1, it can be confirmed that a security parameter (λ) of about 80 is widely used in the existing technology. However, in that related research on personal data is diversifying, the security parameter needs to be increased to 128. Specifically, referring to Table 1, it can be confirmed that the security parameter in the existing RNS-HEAAN scheme does not satisfy 128 security. The existing HEAX scheme satisfies 128 security, but since the scheme does not consider bootstrapping, it allows only 8 isomorphic products.

한편, 본 개시의 파라미터 중 기존과 가장 차이가 있는 파라미터는 평가 키의 수와 dnum이다. 두 번째 행을 참조하면, logP의 크기와 logQ의 크기가 유사하게 설정됨을 확인할 수 있다. 그러나 초기 회로 깊이를 대략 40으로 증가시키기 위해서는 logQ는 커져야 하나, 보안을 위해 logPQ의 크기에 제한이 있다. On the other hand, among the parameters of the present disclosure, parameters that are most different from those of the existing ones are the number of evaluation keys and dnum. Referring to the second row, it can be seen that the size of logP and logQ are set similarly. However, in order to increase the initial circuit depth to about 40, logQ should be increased, but there is a limit on the size of logPQ for security.

이 점을 해결하기 위하여, dnum을 증가시켜 암호문을 분해할 수 있다. 그 결과 logQ는 LogP×dnum으로 설정된다. 즉, dnum이 증가하면 평가키를 저장할 메모리 크기가 증가하게 되므로, 내부 메모리에 평가 키를 저장할 수 없게 된다. 또한, NTT는 dnum 배수만큼 수행되어야 하여, 큰 지연을 유발하게 된다. 따라서, 본 개시에서는 초기 회로 깊이의 증가와 평가키의 증가를 최적화할 수 있는 dnum 값을 2 또는 3으로 선택하여 사용하였다. To solve this point, the ciphertext can be decomposed by increasing dnum. As a result, logQ is set to LogP×dnum. That is, when dnum increases, the memory size for storing the evaluation key increases, so that the evaluation key cannot be stored in the internal memory. In addition, NTT must be performed as many times as dnum, causing a large delay. Therefore, in the present disclosure, a dnum value of 2 or 3, which can optimize the increase in the initial circuit depth and the increase in the evaluation key, is selected and used.

또한, 본 개시에서는 복호화할 때 메시지의 정밀도를 보존하기 위하여, 기준 모듈러스(log q₀)를 62로 설정하였으며, 리스케일 모듈러스(log qi)는 다음의 두 가지 기준을 충족시키기 위하여 52로 설정하였다. 첫 번째 기준은 RNS-HEAAN의 근사 계산을 수행할 수 있을 만큼 충분히 크고, 두 번째 기준은 많은 경량 소수를 충분히 찾을 수 있을 정도의 조건이다. 이러한 소수를 사용함으로써, 동형 곱을 비트 시프트와 덧셈으로 치환함으로써 모드곱(modMult)의 속도를 증가시킬 수 있다. In addition, in the present disclosure, in order to preserve the precision of the message during decoding, the reference modulus (log q ₀ ) was set to 62, and the rescale modulus (log qi) was set to 52 to satisfy the following two criteria . The first criterion is large enough to perform approximation of RNS-HEAAN, and the second criterion is large enough to find many lightweight primes. By using these prime numbers, the speed of the modMult can be increased by replacing the isomorphic product with a bit shift and addition.

모드업 모듈러스(log p_i)의 크기를 결정할 때, 작은 한계가 있다. 그러나 모드업 모듈러스의 곱은 특정값보다 커야 한다는 점이다. 즉, 각 모드업 모듈러스는 작아야 하며, 모드업 모듈러스의 수는 증가되어야 한다. 그리고 기본 모듈러스를 위한 62-bit 모듈러 연산자를 이미 보유하기 때문에, 모드업 모듈러스의 크기는 62로 선택 사용하였다. When determining the size of the mode-up modulus (log _pi ), there is a small limit. However, the product of the modulus of modulus must be greater than a specific value. That is, each mode-up modulus must be small, and the number of mode-up moduli must be increased. And since we already have a 62-bit modulo operator for the basic modulus, we selected and used the size of the mod-up modulus as 62.

기본/리스케일 모듈러스 및 모드업 모듈러스에 사용되는 소수 정보는 도 5 및 도 6에 도시하였다. Decimal information used for base/rescale modulus and mod-up modulus is shown in FIGS. 5 and 6 .

도 5는 본 개시의 일 실시 예에 따른 제1 소수 세트의 예를 도시한 도면이다. 5 is a diagram illustrating an example of a first prime number set according to an embodiment of the present disclosure.

도 5를 참조하면, 42개의 소수가 표시되며, 42개의 소수 각각은 최대 61 지수승 내에서 2의 지수승들의 조합으로 표현된다. 여기서 첫 번째 소수(i=0)가 기본 모듈러스에서 사용되는 소수로 최대 62비트 크기를 가지며, 1보다 크가 l보다 작은 소수들이 리스케일 모듈러스에 사용되는 소수이다. i >1보다 큰 경우에 소수들은 모두 최대 2⁵²보다 작은 크기를 가짐을 확인할 수 있다. 이와 같이 본 개시에서는 2의 지수승들의 조합으로 표현 가능한 소수를 이용하는바, 해당 소수의 곱셈을 시프트 연산 및 가산/감산 연산만으로 수행할 수 있다. Referring to FIG. 5 , 42 prime numbers are displayed, and each of the 42 prime numbers is expressed as a combination of powers of 2 within a maximum of 61 powers. Here, the first prime number (i = 0) is a prime number used in the basic modulus and has a maximum size of 62 bits, and prime numbers greater than 1 and smaller than l are prime numbers used in the rescale modulus. When i > 1, it can be confirmed that all prime numbers have a size smaller than the maximum of 2 ⁵² . As described above, since the present disclosure uses a prime number that can be expressed as a combination of powers of 2, multiplication of the prime number can be performed only by a shift operation and an addition/subtraction operation.

한편, 상술한 소수에 대한 정보 저장시에는 소수 값 자체를 저장하지 않고, 상술한 소수를 구성하는 거듭제곱에 대한 정보만을 저장할 수 있다. i = 0인 소수에 대해서 51, 0이 +1 값이고, 26이 -1 값을 갖는 다는 정보를 소수 정보로서 저장할 수 있다. 이와 같이 소수 정보를 저장함으로써 2⁶¹ 비트보다 작은 비트로 소수 값을 저장할 수 있다. 상술한 표현 방식은 일예이고, 상술한 방식과 다른 방식으로도 소수 값 정보를 저장할 수도 있다. 특히, 본원은 3개 내지 5개의 거듭제곱만으로 구성되는 소수를 이용하는바, 소수 정보의 저장에 작은 리소스만이 요구된다. Meanwhile, when storing information on the above-mentioned prime numbers, only information on powers constituting the above-mentioned prime numbers may be stored without storing the values of the decimal numbers themselves. For a prime number with i = 0, information that 51 and 0 have a +1 value and 26 has a -1 value can be stored as prime number information. By storing decimal information in this way, a decimal value can be stored with less than 2 ⁶¹ bits. The above expression method is an example, and decimal value information may be stored in a method other than the above method. In particular, since the present invention uses prime numbers composed of only powers of 3 to 5, only small resources are required for storing prime number information.

도 6은 본 개시의 일 실시 예에 따른 제2 소수 세트의 예를 도시한 도면이다. 6 is a diagram illustrating an example of a second set of prime numbers according to an embodiment of the present disclosure.

도 6을 참조하면, 16개의 소수가 표시되며, 16개 소수 각각은 최대 61 지수승 내의 2의 지수승들의 조합으로 표현된다. 이와 같이 본 개시에서는 2의 지수승들의 조합으로 표현 가능한 소수를 이용하는바, 모드업 연산 시에 해당 소수의 곱셈을 시프트 연산 및 가산/감산 연산만으로 수행할 수 있다. Referring to FIG. 6 , 16 prime numbers are displayed, and each of the 16 prime numbers is expressed as a combination of powers of 2 within a maximum of 61 powers. As described above, since the present disclosure uses prime numbers that can be expressed as combinations of powers of 2, multiplication of corresponding prime numbers can be performed only by shift operations and addition/subtraction operations during mode-up operations.

도 5 및 도 6에서는 소수 값만 표시하였지만, 즉, 해당 소수에 대한 스케일된 값(즉, 역수)을 표시하지 않았지만, 해당 값과의 곱을 통하여 1을 산출할 수 있는 2의 지수승으로 표현되는 값(즉, 역수)이 존재한다. 이와 같은 소수 및 이에 대한 역수는 5 이하의 해밍 가중치를 가지는바, 이를 이용하여 공간적으로 효율적인 하드웨어 설계가 가능하게 된다. In FIGS. 5 and 6, only decimal values are displayed, that is, values expressed as powers of 2 capable of calculating 1 through multiplication with corresponding values, although scaled values (ie, reciprocal numbers) for the corresponding prime numbers are not displayed. (i.e. the reciprocal) exists. Since these prime numbers and their reciprocal numbers have Hamming weights of 5 or less, space-efficient hardware design is possible using these prime numbers.

다시 표 1로 돌아가서, 본 개시에서는 2¹⁷의 값의 N 매개변수를 사용하였다. 이와 같이 N 매개변수가 기존보다 증가하였기 때문에, NTT의 실행 시간 및 INTT의 실행 시간이 증가할 수 있다. 따라서, 이하에서는 N 매개변수의 증대에도 불구하고 기존보다 빠른 NTT 및 INTT 연산 속도를 갖기 위한 하드웨어 시스템 설계 방식을 설명한다. Returning back to Table 1, the N parameter with a value of 2 ¹⁷ was used in this disclosure. Since the N parameter is increased in this way, the NTT execution time and the INTT execution time may increase. Therefore, hereinafter, a hardware system design method for having faster NTT and INTT operation speed than before despite the increase of N parameters will be described.

앞서 설명한 바와 같이 NTT를 수행함에 있어서 제곱근이 필요로 한다. 빠른 연산을 위해서는 모든 제곱근을 저장하여 이용할 수 있다. 그러나 이러한 접근 방식은 메모리 필요 공간을 N 및 (l+1)·k 와 같이 선형적으로 증가시키는 문제가 있다. As described above, the square root is required to perform NTT. For fast computation, all square roots can be stored and used. However, this approach has a problem in that the required memory space increases linearly as N and (l+1)·k.

즉, N 및/또는 (l+1)·k가 매우 커지는 경우, 모든 소수(또는 거듭제곱근)을 내부 메모리에 저장할 수 없게 될 수 있다. 특히 FPGA 내의 내부 메모리는 일반적인 경우와 달리 제한된 공간을 갖는다는점에서, FPGA가 허용하는 내부 메모리 크기 내에서 필요한 소수 정보를 저장하는 방법이 요구된다. 예를 들어, 표 1의 SET-B을 INTT에 사용하는 경우, 모든 제곱근의 저장에 총 400MB(

62b*17+52b*41)*2¹⁷)의 내부 메모리가 요구된다. That is, when N and/or (l+1)·k become very large, it may not be possible to store all prime numbers (or power roots) in the internal memory. In particular, since the internal memory in the FPGA has a limited space unlike in general cases, a method of storing necessary decimal information within the size of the internal memory allowed by the FPGA is required. For example, if SET-B in Table 1 is used for INTT, a total of 400MB (

62b*17+52b*41)*2 ¹⁷ ) of internal memory is required.

따라서, 모든 소수 정보(또는 모든 제곱근 정보)를 저장하여 이용하지 않고, 일부 소수 정보(또는 일부 제곱근 정보)만 저장해 놓고, 연산 과정에서 해당 정보로 필요한 소수 정보(또는 제곱근 정보)를 산출하여 이용할 수 있는 방법이 요구된다. 이하에서는 이와 같은 동작을 위한 구체적인 구성 및 방법을 설명한다. 본 개시에 따른 방식은 계산 및 저장을 균형되게 한다. 또한, 이와 같은 수정이 계산량을 비대칭적으로 증가시키지 않는다. 예를 들어, 변경된 알고리즘을 이용하더라도 계산 비용은 여전히 기존과 동일한 O(NlogN)이다. 반대로, 저장 공간은 o(N) 비트에서 O(logN) 비트로 절감되는 효과가 있다. Therefore, instead of storing and using all decimal information (or all square root information), only some decimal information (or some square root information) is stored, and necessary decimal information (or square root information) can be calculated and used in the operation process. method is required. Hereinafter, a detailed configuration and method for such an operation will be described. An approach according to the present disclosure balances computation and storage. Also, such a correction does not asymmetrically increase the amount of calculation. For example, even if the modified algorithm is used, the computational cost is still the same O(NlogN) as before. Conversely, the storage space has the effect of being saved from o(N) bits to O(logN) bits.

이하에서는 도 7을 참조하여 상술한 방법을 자세히 설명한다. Hereinafter, the above method will be described in detail with reference to FIG. 7 .

도 7은 본 개시의 제2 실시 예에 따른 INTT 알고리즘을 설명하기 위한 도면이다. 도 7에서도 도 4와 동일하게 설명의 단순화를 위하여 리스케일 단계가 생략되었지만, 실제 구현시에는 리스케일 단계가 추가될 수 있다. 7 is a diagram for explaining an INTT algorithm according to a second embodiment of the present disclosure. In FIG. 7, as in FIG. 4, the rescale step is omitted for simplicity of explanation, but the rescale step may be added in actual implementation.

도 7을 참조하면, 고정된 프리미티브 (2N)^th제곱근(Ψ)의 (-2ⁱ)^th거듭제곱의 리스트를 취하며, 이를

로 지칭한다. 보다 구체적으로,

는

를 포함한다. 도 7의 BitReverse(k, log h)는 k의 비트 값을 비트 전환하여 log h 비트 정수로 사용하는 것이다. Referring to Figure 7, take a list of (-2 ⁱ ) ^th powers of the fixed primitive (2N) ^th square root (Ψ), which

referred to as More specifically,

Is

includes BitReverse (k, log h) of FIG. 7 converts the bit value of k and uses it as a log h bit integer.

도 4에 도시된 알고리즘 1과의 차이는 다음과 같다. i) 입력의 크기를 줄이기 위하여,

대신에

를 사용한다는 점, ii)기저장된 제곱근을 취하는 대신에 도 7의 line 7의 비트 전환 처리를 수행한다는 점, iii) 모든 제곱근을 미리 계산하지 않고, 필요한 제곱근을 생성 및 업데이트로 이용한다는 점이다. Differences from Algorithm 1 shown in FIG. 4 are as follows. i) to reduce the size of the input;

Instead of

ii) instead of taking a pre-stored square root, the bit conversion process of line 7 of FIG. 7 is performed, iii) all square roots are not calculated in advance, and the necessary square roots are generated and updated.

한편, INTT 스테이지 별로 다른 제곱근이 요구되며, 본 개시는 각 스테이지에 필요한 제곱근을 병렬로 생성한다. 이와 같은 동작에 대해서는 이하에서 설명한다. Meanwhile, different square roots are required for each INTT stage, and the present disclosure generates square roots required for each stage in parallel. Such an operation will be described below.

진행 방향이 다르며, INTT의 경우 스케일링 단계가 추가된다는 점을 제외하고는, NTT와 INTT는 거의 동일한 시스템 설계를 갖는다. 이러한 점에서, NTT와 INTT는 동일한 회로를 사용할 수 있으며, 이하에서는 INTT의 구현예만을 설명한다. NTT and INTT have almost the same system design, except that the direction of travel is different and a scaling step is added in case of INTT. In this regard, NTT and INTT can use the same circuit, and only the implementation of INTT is described below.

도 8은 본 개시의 제1 실시 예에 따른 BU의 구성을 도시한 도면이다. 구체적으로 도 8은 INTT용 radix-2BU 이다. 8 is a diagram showing the configuration of a BU according to the first embodiment of the present disclosure. Specifically, FIG. 8 is radix-2BU for INTT.

도 8을 참조하면, BU(800)는 모듈러 감산기(810), 모듈러 가산기(820), 모듈러 곱셈기(830)를 포함할 수 있다. 그리고 A, B는 입력 샘플을 나타내며, A' 및 B'는 출력 샘플을 나타내며, W는 제곱근 정보를 나타낸다. Referring to FIG. 8 , a BU 800 may include a modular subtractor 810, a modular adder 820, and a modular multiplier 830. A and B represent input samples, A' and B' represent output samples, and W represents square root information.

모듈러 감산기(810)는 A, B를 입력받고, 두 입력 샘플의 모듈러 감산 연산 결과를 모듈러 곱셈기에 출력할 수 있다. The modular subtractor 810 may receive A and B as inputs and output a result of the modular subtraction operation of the two input samples to the modular multiplier.

모듈러 가산기(820)는 A, B를 입력받고, 두 입력 샘플의 모듈러 덧셈 연산 결과를 A'로 출력할 수 있다. The modular adder 820 may receive A and B as inputs and output a result of the modular addition operation of the two input samples as A'.

이러한 모듈러 감산기(810) 및 모듈러 가산기(820)는 일반적인 감산기 및 가산기의 시스템 설계와 동일하며, 해당 감산기 또는 가산기의 연산 결과는 2 사이클의 지연후 출력된다. The modular subtractor 810 and the modular adder 820 have the same system design as a general subtractor and adder, and an operation result of the corresponding subtractor or adder is output after a delay of 2 cycles.

모듈러 곱셈기(830)는 모듈러 감산기(810)의 출력 및 W를 입력받고, 이에 대한 모듈러 곱셈 연산을 출력한다. 여기서 모듈러 곱셈기(830)는 경량 모듈러를 갖는 완전 파이프라인된 시스템 설계를 이용할 수 있다. 이러한 모듈러 곱셈기의 구체적인 구성은 도 3과 관련하여 앞서 설명하였는바 중복 설명은 생략한다. The modular multiplier 830 receives the output of the modular subtractor 810 and W, and outputs a modular multiplication operation for them. Here modular multiplier 830 may use a fully pipelined system design with lightweight modulars. Since the specific configuration of the modular multiplier has been described above with reference to FIG. 3, redundant description will be omitted.

본 개시에 우리의 모듈러의 최대 해밍 가중치 및 스케일된 인버스 값이 기존보다 하나 더 크다는 점에서, 본 개시에서 이용하는 모듈러 곱셈기(830)의 연산 결과의 출력은 기존보다 한 사이클이 더 요구되어 21 사이클의 지연후 출력된다. 여기서 지연 사이클은 일 예이며, 적용하는 하드웨어 환경 및 구현 알고리즘에 따라 지연 사이클은 상술한 값과 상이할 수 있다. Since the maximum Hamming weight and the scaled inverse value of our modulo in this disclosure are one greater than before, the output of the operation result of the modular multiplier 830 used in this disclosure requires one more cycle than before, resulting in 21 cycles. output after a delay. Here, the delay cycle is an example, and the delay cycle may be different from the above-mentioned value depending on the applied hardware environment and implementation algorithm.

한편, 상술한 BU가 NTT 연산에 이용되는 경우, 제곱근 정보 대신에 소수 정보가 모듈러 곱셈기에 제공될 수 있으며, 모듈러 곱셈기의 연산 결과가 모듈러 감산기 또는 모듈러 가산기에 적용될 수 있다. Meanwhile, when the above-described BU is used for NTT calculation, decimal information may be provided to the modular multiplier instead of square root information, and an operation result of the modular multiplier may be applied to the modular subtractor or the modular adder.

이하에서는 상술한 BU 동작을 동작 타이밍 도를 참조하여 자세히 설명한다. Hereinafter, the above-described BU operation will be described in detail with reference to an operation timing diagram.

도 9는 도 8의 BU의 동작 타이밍을 설명하기 위한 도면이다. FIG. 9 is a diagram for explaining the operation timing of the BU of FIG. 8 .

도 9를 참조하면, 제1 출력 값(A')은 두 입력 값(A, B)이 입력된 이후 2 사이클 뒤에 출력되며, 제2 출력 값(B')은 제1 출력 값(A')이 모듈러 곱셈기(830)에 입력된 이후 21 사이클 뒤에 출력됨을 확인할 수 있다. Referring to FIG. 9, the first output value (A') is output 2 cycles after the input of the two input values (A and B), and the second output value (B') is the first output value (A'). It can be seen that it is output 21 cycles after being input to the modular multiplier 830.

한편, 본 개시에 따른 BU는 완전한 파이프라인으로 설계되어 있는바, 2개의 입력 샘플이 사이클마다 연속적으로 입력되고, 출력도 일정 지연 이후에 매 사이클마다 출력됨을 확인할 수 있다. Meanwhile, since the BU according to the present disclosure is designed as a perfect pipeline, it can be seen that two input samples are continuously input every cycle, and output is also output every cycle after a certain delay.

한편, 여러 개의 BU가 직렬로 연결되어 있는 경우, 출력 샘플은 다음 BU의 입력 샘플이 될 수 있다. Meanwhile, when several BUs are connected in series, an output sample may be an input sample of the next BU.

이하에서는 복수의 BU를 그룹화한 경우를 설명한다. Hereinafter, a case in which a plurality of BUs are grouped will be described.

FPGA 상에서 INTT의 속도를 향상시키기 위해서, 동시에 복수의 BU를 사용할 필요가 있다. 그러나 각 BU는 고가의 모듈러 연산자가 포함되기 때문에, N이 매우 클때, N/2*logN 개의 BU를 채용하기 어렵다. In order to speed up INTT on the FPGA, it is necessary to use multiple BUs simultaneously. However, since each BU includes an expensive modular operator, it is difficult to employ N/2*logN BUs when N is very large.

따라서, 합리적인 수의 BU를 사용할 필요가 있으며, 이하에서는 합리적인 BU 배치 방법을 설명한다. 첫 번째 방법은 복수의 BU를 동일한 스테이지에 병렬로 배치하는 것이고, 두 번째는 각 스테이지 별로 단일 BU(또는 몇 개의 BU)가 배치되고 복수의 BU가 직렬로 배치하는 것이다. Therefore, it is necessary to use a reasonable number of BUs, and a reasonable BU arrangement method will be described below. The first method is to arrange a plurality of BUs in parallel on the same stage, and the second method is to arrange a single BU (or several BUs) for each stage and arrange a plurality of BUs in series.

첫 번째 방법은 직관적이며 중간 데이터의 순서가 단순하다. 그러나 BU가 병렬 배치됨에 따라 짧은 시간 동안 높은 입출력 및 메모리 밴드폭이 요구된다. 따라서, 본 개시에서는 두 번째 방법을 이용하는 예를 설명한다. 하지만, 높은 입출력 및 메모리 밴드폭을 해결할 수 있는 환경이라면 첫 번째 방법을 이용할 수도 있다. The first method is intuitive and the sequence of intermediate data is simple. However, as BUs are arranged in parallel, high I/O and memory bandwidth are required for a short period of time. Therefore, this disclosure describes an example using the second method. However, the first method can be used in an environment where high I/O and memory bandwidth can be resolved.

도 10은 도 7의 알고리즘으로 BU를 동작하는 경우의 동작 타이밍을 설명하기 위한 도면이다. 구체적으로, 도 10은 N이 32일 때 복수의 BU를 직렬 배치한 경우의 동작 타이밍을 도시한다. FIG. 10 is a diagram for explaining operation timing when a BU is operated using the algorithm of FIG. 7 . Specifically, FIG. 10 shows operation timing when a plurality of BUs are arranged in series when N is 32.

도 10을 참조하면, 스테이지 순서는 첫 번째 행에 나타내며, 각 스테이지의 첫 번째 열 및 두 번째 열에는 입력되는 샘플의 인덱스가 표시된다. Referring to FIG. 10 , the stage order is shown in the first row, and the indexes of the input samples are displayed in the first and second columns of each stage.

각 스테이지의 세번째 열에 표시된 제곱근은 지수가 표시되며, 이는 고정된 단위로 증가하며, 업데이트 상수로 지칭된다. 스테이지가 증가함에 따라 업데이트 상수는 지수적으로 증가함을 확인할 수 있다. The square root displayed in the third column of each stage is exponential, which increases in fixed units and is referred to as the update constant. It can be seen that the update constant increases exponentially as the stage increases.

그리고 제1 스테이지와 제2 스테이지를 비교하여 보면, 첫 번째 스테이지의 출력이 두 번째 스테이지에 입력되는 첫 번째 경우를 화살표로 표시하였다. Also, comparing the first stage and the second stage, the first case in which the output of the first stage is input to the second stage is indicated by an arrow.

이와 같이 각 스테이지는 종속성을 가지므로, 지연이 누적된다. 따라서, 이러한 지연을 해결하기 위하여, 각 스테이지 별로 BU가 추가 배치할 수 있다. 구체적으로, DSP 슬라이스의 수는 룩업 테이블 및 플립플롭에 제한을 받기 때문에, 사용 가능한 DSP 슬라이스의 총 수에 기초하여, 각 스테이지에 대한 BU의 수(이하, c)를 결정할 수 있다. 그런 다음 각 스테이지에 대한 입력 샘플 시퀀스를 c로 나누고, 나눠진 부분 시퀀스를 각 BU에 입력할 수 있다. In this way, each stage has dependencies, so delays are accumulated. Therefore, in order to solve this delay, BUs may be additionally arranged for each stage. Specifically, since the number of DSP slices is limited by the look-up table and flip-flops, the number of BUs (hereinafter c) for each stage can be determined based on the total number of usable DSP slices. Then, the input sample sequence for each stage may be divided by c, and the divided partial sequence may be input to each BU.

도 11은 복수의 BU를 병렬화한 경우의 동작 타이밍을 설명하기 위한 도면이다. 11 is a diagram for explaining operation timing when a plurality of BUs are parallelized.

도 11을 참조하면, 도시된 예에서의 c는 4이고, ci는 i 번째 BU 코어를 의미한다. 스테이지 1의 0, 2, 4, 5를 갖는 입력 샘플은 C1, C3, C2, C4 각각의 modAdd에서 처리되고, 따라서 스테이지 2의 C5, C5는 2 사이클 지연하여 시작된다. 한편, modSub및 modMult는 1, 3, 5, 7로 표시된 입력 샘플이 적용되어 2 단계의 C7, C8은 23 사이클 지연 후에 시작된다. 이후의 스테이지 3의 BU 코어는 동일한 방식으로 동작할 수 있다. Referring to FIG. 11 , c in the illustrated example is 4, and ci means the i-th BU core. The input samples with 0, 2, 4, and 5 of stage 1 are processed in modAdd of C1, C3, C2, C4 respectively, so C5 and C5 of stage 2 are started with a delay of 2 cycles. On the other hand, modSub and modMult apply input samples marked 1, 3, 5, and 7 so that phase 2's C7, C8 starts after a delay of 23 cycles. The subsequent BU cores of stage 3 may operate in the same way.

우리는 상술한 바와 같이 큰 N값을 갖는 것을 목적으로 하기 때문에, 스테이지 1 내지 3에서의 누적 지연은 무시할 정도이고, 상술한 처리량은 8 samples/cycles이다. Since we aim to have a large N value as described above, the cumulative delay in stages 1 to 3 is negligible, and the above throughput is 8 samples/cycles.

한편, 제4 스테이지의 BU 코어는 8의 인덱스 차이를 갖는 입력 샘플을 수신하지만, 각 입력 샘플은 N/(2*2*4) (N이 2¹⁷인 경우) 사이클 후에 계산될 수 있다. 따라서, 순서를 변경하기 위한 리오더링 버퍼(reordering buffer)가 요구된다. 두개의 리오더링 버퍼 사이에 BU 코어는 BU 그룹(BGU)을 포함할 수 있다. 단일 GBU 내의 스테이지 수 및 전체 INTT 설계 내의 GBU 수는 1+logc, [logN/(1+logc)]로 계산될 수 있다. Meanwhile, the BU core in the fourth stage receives input samples with an index difference of 8, but each input sample can be calculated after N/(2*2*4) (where N is 2 ¹⁷ ) cycles. Therefore, a reordering buffer for changing the order is required. Between the two reordering buffers, the BU core may include a BU group (BGU). The number of stages within a single GBU and the number of GBUs within the entire INTT design can be calculated as 1+logc, [logN/(1+logc)].

도 12는 본 개시의 일 실시 예에 따른 GBU의 구성을 도시한 도면이다. 구체적으로, 도 12는 상술한 c가 4인 경우의 GBU 구성을 도시한 도면이다. 본 예에서는 c가 4인 경우를 설명하였지만, 구현시에는 다름 c 값을 갖도록 GBU를 구성할 수도 있다. 12 is a diagram showing the configuration of a GBU according to an embodiment of the present disclosure. Specifically, FIG. 12 is a diagram showing a GBU configuration when c is 4 described above. Although the case where c is 4 has been described in this example, the GBU may be configured to have a different value of c in implementation.

도 12를 참조하면, 하나의 GBU(1200)는 12개의 BU를 포함한다. 구체적으로, GBU는 3개의 스테이지로 구성되며, 각 스테이지는 4개의 BU 구성될 수 있다. 이와 같은 3*4 형태의 배치는 예시에 불과하며, 구현시에는 설계 파라미터에 따라 다른 스테이지 개수 및 스테이지별로 다른 BU의 개수를 갖도록 배치할 수 있다. Referring to FIG. 12, one GBU 1200 includes 12 BUs. Specifically, a GBU is composed of three stages, and each stage may consist of four BUs. Such a 3*4 arrangement is just an example, and may be arranged to have a different number of stages and a different number of BUs for each stage according to design parameters during implementation.

각 BU의 모듈러 곱셈 연산(ModMults)의 출력은 굵은 선으로 표시되어 있다. GBU는 매 사이클마다 8개의 입력 샘플과 12개의 제곱근을 입력받는다. 한 사이클의 지연 후에 8개의 샘플이 생성되며, RB에 매사이클마다 전달될 수 있다. The output of each BU's modular multiplication operation (ModMults) is indicated by a bold line. GBU receives 8 input samples and 12 square roots in each cycle. Eight samples are generated after a delay of one cycle, and may be delivered to the RB every cycle.

처리량을 더욱 향상시키기 위하여, 추가적인 병렬화 동작을 이용할 수 있다. 이에 대해서는 도 13을 참조하여 설명한다. To further improve throughput, additional parallelization operations can be used. This will be described with reference to FIG. 13 .

도 13은 표 1의 SET B로 INTT를 설계한 경우의 동작 타이밍을 설명하기 위한 도면이다. 13 is a diagram for explaining operation timing when INTT is designed with SET B of Table 1.

도 13을 참조하면, RNS-HEAAN의 동형 곱에서 기본 모듈러스, 리스케일 모듈러스는 INTT에서만 사용되며, 도 5와 같은 42 모듈러스가 이용될 수 있다. 다항식에 대한 INTT를 계산하는데, 각 파이프 시간은 대략 16K 사이클(약 16K*(5+42))이 요구된다. Referring to FIG. 13, in the homomorphic product of RNS-HEAAN, the basic modulus and rescale modulus are used only in INTT, and 42 modulus as shown in FIG. 5 can be used. Calculating the INTT for the polynomial, each pipe time requires approximately 16K cycles (approximately 16K*(5+42)).

이하에서는 리오더링 버퍼에 대해서 도 14를 참조하여 설명한다. Hereinafter, the reordering buffer will be described with reference to FIG. 14 .

도 14는 본 개시의 일 실시 예에 따른 RB의 구성을 도시한 도면이다. 14 is a diagram illustrating a configuration of an RB according to an embodiment of the present disclosure.

도 14를 참조하면, i 번째 RB는 i번째 GBU에서 생성된 출력 샘플을 저장하고, i+1번째 GBU에 리오더링된 샘플을 전달할 수 있다. Referring to FIG. 14 , the i-th RB may store an output sample generated in the i-th GBU and deliver the reordered sample to the i+1-th GBU.

앞선 도 11, 12는 제1 GBU 사이클에서 8개 샘플이 생성될 수 있다. 리오더링을 수행하는 경우, 이들 샘플은 각 RB 내의 버퍼에 저장될 수 있다. 그리고 스테이지 4의 4개의 BU 코어 각각은 8개 샘플을 8 값이 차이나는 인덱스를 갖는 샘플을 읽어올 수 있다. 예를 들어, 0, 8, ..., 48, 56으로 인덱스된 샘플을 제1사이클에서 읽어올 수 있다. 11 and 12 above, 8 samples can be generated in the first GBU cycle. When performing reordering, these samples may be stored in buffers within each RB. In addition, each of the four BU cores of stage 4 can read samples having indexes different from each other by 8 values. For example, samples indexed as 0, 8, ..., 48, 56 may be read in the first cycle.

만약, BU 코어에서의 생성된 샘플을 해당 샘플을 생성한 BU 코어에서 이용하기 위하여 BRAM에 저장한다면 대역폭이 큰 BRAM이 이용되어야 한다. 이는 BRAM의 사용 효율을 저감시키는 것이 된다. 따라서, 각 BU 코어로부터 출력 샘플 시퀀스는 8개의 별도의 BRAM 버퍼에 쓰일 수 있다. 여기서 BRAM은 FPGA에서 내부 캐시(internal Cache)로 저장 기능을 수행하며 일반적인 DDR 방식보다 빠른 읽기/쓰기 속도를 갖는다. If the sample generated in the BU core is stored in the BRAM to be used in the BU core that generated the corresponding sample, a BRAM having a large bandwidth should be used. This reduces the use efficiency of the BRAM. Thus, the output sample sequence from each BU core can be written to 8 separate BRAM buffers. Here, BRAM performs a storage function as an internal cache in the FPGA and has a faster read/write speed than a typical DDR method.

도시하지 않았지만, 읽기/쓰기를 동시에 수행할 수 있는 이중 버퍼링 기술(double buffering technique)이 이용될 수 있다. 따라서, 128(=2*8*8) 62-bit *2K 크기의 BRAM 버퍼가 각 RB에 포함될 수 있다. Although not shown, a double buffering technique capable of simultaneously reading/writing may be used. Accordingly, a 128 (= 2 * 8 * 8) 62-bit * 2K BRAM buffer may be included in each RB.

제4 스테이지 내의 8개의 BU 코어에 전달할 때, 8개 샘플은 도 14에 도시된 바와 같이 수평적으로 샘플을 읽어올 수 있다. 다음 RB는 동일한 버퍼에서 수직으로 8^i-1 샘플을 동일한 버퍼에서 수직방향으로 읽고, 다음으로 수평적으로 다음 버퍼에 전달할 수 있다. When transmitting to 8 BU cores in the fourth stage, 8 samples can be read horizontally as shown in FIG. 14 . The next RB can vertically read 8 ^i-1 samples from the same buffer vertically, and then horizontally pass it to the next buffer.

이하에서는 본 개시의 일 실시 예에 따른 소수 생성기의 동작을 설명한다. Hereinafter, an operation of the prime number generator according to an embodiment of the present disclosure will be described.

도 15는 본 개시의 일 실시 예에 따른 소수 생성기의 구성을 도시한 도면이다. 이하에서는 설명을 용이하기 위하여 소수를 생성한다고 표현하였지만, 해당 소수에 대응되는 제곱근을 생성하는 경우(즉, INTT 동작시)에도 상술한 소수 생성기가 이용될 수 있다. 즉, 소수 생성기는 소수를 생성하는 것뿐만 아니라, 해당 소수에 대응되는 제곱근을 생성할 수도 있다. 이러한 경우, 소수 생성기는 제곱근 생성기로 지칭될 수도 있다. 15 is a diagram illustrating the configuration of a prime number generator according to an embodiment of the present disclosure. Hereinafter, for ease of explanation, it is expressed that a prime number is generated, but the above-described prime number generator can be used even when generating a square root corresponding to a corresponding prime number (ie, during INTT operation). That is, the prime number generator may generate a square root corresponding to the prime number as well as generate a prime number. In this case, the prime number generator may also be referred to as a square root generator.

참고적으로, 도 15는 N이 2¹⁷ 및 c가 4인 경우의 소수 생성기의 예를 도시하였으나, 구현시에는 다른 N 및 다른 c 값을 지원할 수 있도록 소수 생성기의 구성은 변경될 수 있다. For reference, FIG. 15 shows an example of a prime number generator when N is 2 ¹⁷ and c is 4, but the configuration of the prime number generator may be changed to support other values of N and c in implementation.

도 15를 참조하면, 소수 생성기(1500)는 O(logN)인 기저 제곱근(또는 기저 소수)으로부터 모든 제곱근(또는 모든 소수)을 생성할 수 있다. 각 GBU는 12개의 제곱근이 요구된다. 구체적으로, C5와 C7, C6과 C9, C9 내지 C12는 각각 동일한 제곱근을 사용하기 때문에 소수 생성기는 7개의 제곱근을 생성할 수 있다. 각 제곱근은 W_C1, W_C2, W_C3, W_C4, W_C5&7, W_C6&8, W_C9-12로 나타내었다. Referring to FIG. 15 , the prime number generator 1500 may generate all square roots (or all prime numbers) from base square roots (or base prime numbers) of O(logN). Each GBU requires 12 square roots. Specifically, since C5 and C7, C6 and C9, and C9 to C12 each use the same square root, the prime number generator can generate seven square roots. Each square root is represented by W _C1 , W _C2 , W _C3 , W _C4 , W _C5&7 , W _C6&8 , W _C9-12 .

7개의 제곱근은 제곱근의 그룹(W_Gi)을 포함하며, i번째 GBU로 전달될 수 있다. 동시에 이들은 RUG 내의 모듈러스 연산(ModMULTS)에 제공할 수 있으며, 해당 제곱근 생성 이후에는 다음 사이클의 연산에 필요한 제곱근(또는 소수)을 생성할 수 있다. 구체적으로, 소수 생성기(1500)는 현재 사이클에서 생성한 제곱근과 기저 제곱근 정보(또는 기저 소수 정보)를 이용하여 다음 사이클에서 이용할 제곱근 정보(또는 소수 정보)를 생성할 수 있다. The seven square roots contain a group of square roots (W _Gi ) and can be passed to the ith GBU. At the same time, they can be provided to the modulus operation (ModMULTS) in the RUG, and after generating the square root, the square root (or prime number) required for the operation of the next cycle can be generated. Specifically, the prime number generator 1500 may generate square root information (or decimal information) to be used in the next cycle by using the square root and base square root information (or base prime number information) generated in the current cycle.

도 16은 본 개시의 일 실시 예에 따른 내부 메모리에 저장되는 데이터 예를 설명하기 위한 도면이다. 16 is a diagram for explaining an example of data stored in an internal memory according to an embodiment of the present disclosure.

도 16을 참조하면, 각 다른 해칭은 다른 모듈에 사용되는 기저 제곱근을 나타낸다. 상술한 바와 같이 각 LUG에는 7개의 기저 제곱근이 필요하다. 그러나 ModMult RUG의 하드웨어 시스템 설계를 변화시켜 21 사이클 지연이 발생할 수 있다. Referring to Fig. 16, each different hatching represents a square root of a basis used for a different module. As mentioned above, each LUG requires 7 root square roots. However, a 21 cycle delay may occur due to changes in the hardware system design of the ModMult RUG.

(i) 지연 중에 ROM에 저장된 제곱근이 ModMults의 입력 피 연산자로 사용되어 저장할 제곱근의 수를 증가시키고, 지연 후 ModMults에 의해 생성된 제곱근이 입력 피 연산자로 사용된다. (ii) 지연 후에, 제곱근은 ModMults에서 생성되며, 입력 연산에 사용될 수 있다. 따라서, 제1 GBU에 대한 제곱근은 매 사이클마다 변경되기 때문에, 21개의 기저 제곱근이 요구된다. 한편, 제2 GBU에 대한 제곱근은 8 사이클마다 변경되기 때문에, 3개의 기저 제곱근이 저장될 수 있다. 마지막으로 제3 GBU, 제4 GBU, 제5 GBU, 제6 GBU는 매 64 사이클마다 변경되기 때문에 하나의 기저 제곱근만이 요구된다. 제1 GBU에 대한 21개 기저 제곱근은 직접 ModMULTS에 전달될 수 있다. (i) During the delay, the square root stored in ROM is used as the input operand of ModMults to increase the number of square roots to be stored, and after the delay, the square root generated by ModMults is used as the input operand. (ii) After the delay, the square root is generated in ModMults and can be used in the input operation. Therefore, since the square root for the first GBU changes every cycle, 21 basis square roots are required. On the other hand, since the square root for the second GBU is changed every 8 cycles, three basis square roots can be stored. Finally, since the 3rd GBU, 4th GBU, 5th GBU, and 6th GBU change every 64 cycles, only one root square root is required. The 21 basis square roots for the first GBU can be passed directly to ModMULTS.

한편, BRAM 대역폭을 최소화하기 위하여, 다른 GBU에 대한 기저 제곱근은 R1으로 표시된 레지스터에 저장되며, 다음 파이프라인 동안 다음 모듈러스에서 이용될 수 있다. 유사하게 업데이트 상수는 롬(또는 FPGA 내의 내부 메모리)으로부터 읽어지며 R2로 표시된 레지스터에 저장될 수 있다. BU는 7개 기저 제곱근을 동시에 수신하기 때문에, 기저 제곱근은 각각 7개 ROM(또는 내부 메모리, 내부 레지스터, 내부 버퍼 등)에 저장될 수 있다. 기본적으로 모드업 및 기저 모듈러스를 위한 기저 제곱근은 62 비트 ROM에 저장되고, 스케일 모듈러스를 위한 기저 제곱근은 52 비트 ROM에 저장될 수 있다. On the other hand, to minimize the BRAM bandwidth, the square root of the basis for the other GBUs is stored in a register denoted R1 and can be used at the next modulus during the next pipeline. Similarly, the update constant can be read from ROM (or internal memory in the FPGA) and stored in the register marked R2. Since the BU receives the 7 basis square roots simultaneously, each of the 7 basis square roots can be stored in 7 ROMs (or internal memory, internal registers, internal buffers, etc.). Basically, the square root of the basis for mode-up and basis modulus is stored in 62-bit ROM, and the square root of the basis for scale modulus can be stored in 52-bit ROM.

그러나 q₁~q₅에 대한 기저 제곱근은 BRAM의 활용성을 증가시키기 위하여 62 비트 ROM에 저장될 수도 있다. 한편, 표 1의 Set-A에 대해서는 다른 구성이 사용될 수 있다. 구체적으로, p₁~p₁₆, q₀, q₁은 62 비트 롬에 저장되고, q₂~q₃₅는 52 비트 롬에 저장될 수도 있다. However, the square root of the basis for q ₁ to q ₅ may be stored in 62-bit ROM to increase the utilization of BRAM. Meanwhile, other configurations may be used for Set-A of Table 1. Specifically, p ₁ to p ₁₆ , q ₀ , and q ₁ may be stored in a 62-bit ROM, and q ₂ to q ₃₅ may be stored in a 52-bit ROM.

한편, 본 개시의 부트 스트래핑 파라미터 세트는 50 모듈러스 이상 및 스케일링 인버스 값을 갖는다. 이들 값은 MT(Modulus Table)에 저장되고, 파이프라인 타임에 대응되는 쌍이 제1 GBU 및 RUG에 대한 선택 신호에 따라 선택될 수 있다. 이러한 쌍은 레지스터에서 지연되고, 다음 GBU 및 RUG에 제공될 수 있다. Meanwhile, the bootstrapping parameter set of the present disclosure has a modulus of 50 or more and a scaling inverse value. These values are stored in a modulus table (MT), and a pair corresponding to the pipeline time can be selected according to the selection signal for the first GBU and RUG. These pairs can be delayed in registers and provided to the next GBU and RUG.

도 17은 본 개시의 일 실시 예에 따른 프로세서 구조를 설명하기 위한 도면이다. 17 is a diagram for explaining the structure of a processor according to an embodiment of the present disclosure.

도 17에서는 c 값이 4인 경우의 예를 도시하였지만, 구현시에는 다른 c 값을 갖는 형태로도 구현할 수 있다. 구체적으로, 높은 c는 높은 처리율, 짧은 지연 및 적은 BRAM을 유발하지만 많은 DSP 슬라이스가 요구된다.Although FIG. 17 shows an example in which the value of c is 4, a form having a different value of c may also be implemented. Specifically, high c results in high throughput, short latency, and less BRAM, but requires many DSP slices.

본 개시에 따른 INTT를 수행하는 하드웨어 시스템(1700)은 내부 메모리(1710), 6개의 GBU(1720) 및 5개의 RB(1740) 및 6개의 RGU(1730) 및 하나의 MT(1750)로 구성될 수 있다. 특히 INTT 스테이지는 6개의 GBU만 사용하므로 마지막 스테이지는 스케일링을 위해 사용될 수 있다. 구체적으로, 마지막 스테이지 내의 BU는 2개의 ModMults로 대체되고, 스케일링 상수가 제곱근 대신에 ModMults에 입력될 수 있다. A hardware system 1700 performing INTT according to the present disclosure is composed of an internal memory 1710, 6 GBUs 1720, 5 RBs 1740, 6 RGUs 1730, and one MT 1750. can In particular, since the INTT stage only uses 6 GBUs, the last stage can be used for scaling. Specifically, the BU in the last stage is replaced by two ModMults, and a scaling constant can be entered into ModMults instead of the square root.

이하에서는 본 개시에 따른 동형 연산의 성능을 설명한다. The performance of the isomorphic operation according to the present disclosure is described below.

목표 플랫폼은 1800 DSP 슬라이스, 132.9 Mbit BRAMs, 1M LUTs, 2M FFs이다. 입력 샘플이 iNTT 설계에 연속적으로 투입되고, I/O 인터페이스를 통한 데이터 전송 시간은 파이프라인 스케줄링에 의하여 숨겨졌다고 가정한다. The target platform is 1800 DSP slices, 132.9 Mbit BRAMs, 1M LUTs, 2M FFs. It is assumed that the input samples are fed continuously into the iNTT design and the data transfer time through the I/O interface is hidden by pipeline scheduling.

DesignDesign ChenChen RoyRoy OzturkOzturk ProposedProposed DeviceDevice xc6slx100xc6slx100 xczu9egxczu9eg xc7vc690txc7vc690t xcvu190xcvu190 No. of samplesNo. of samples 2¹¹ 2 ¹¹ 2¹² 2 ¹² 2¹⁵ 2 ¹⁵ 2¹⁷ 2 ¹⁷ No. of moduliNo. of moduli 1One 66 4141 ~42~42 Max. bit-widthMax. bit-width 5858 3030 3232 6262 fmax (MHz)fmax (MHz) 210210 200200 250250 200200 kLUTkLUT 66 5555 219219 365365 kFFkFF 1919 2222 9191 335335 DSPDSP 6464 182182 768768 13321332 BRAM (KB)BRAM (KB) 113113 17461746 869869 1016310163 GbpsGbps 4.434.43 1.451.45 20.6020.60 88.6588.65 Mbps/DSPMbps/DSP 69.2069.20 7.947.94 26.8226.82 66.5566.55 Kbps/LUTKbps/LUT 703.59703.59 26.0126.01 93.9893.98 242.68242.68

표 2는 제안된 iNTT 설계와 기존의 방식을 비교한다. Table 2 compares the proposed iNTT design with conventional methods.

표 2를 참조하면 두 번째 행은 Xilinx^TM FPGA 장치를 나타낸다. 기존에는 다항식 곱셈과 같은 더 큰 함수를 위해 설계되었지만, INTT 및 다른 기능에 동일한 회로를 재사용하기 때문에 이 평가를 위해 채용되었다. Referring to Table 2, the second row represents a Xilinx ^TM FPGA device. It was originally designed for larger functions such as polynomial multiplication, but was adopted for this evaluation because it reuses the same circuit for INTT and other functions.

표 2를 참조하면, Chen에서 FPGA에서 BU가 2개만 배치되므로 설계에서 가장 낮은 리소스를 사용하며 4개 설계중 두 번째로 낮은 처리량을 보여준다. 이러한 점에서 RNS 기반 동형 암호 체계에서는 사용할 수 없다. Roy는 표에서 가장 낮은 처리량을 보여주지만 FPGA에 더 많은 코어 프로세서를 배치함으로써 처리량을 더욱 향상시킬 수 있다. Referring to Table 2, since only two BUs are deployed in the FPGA in Chen, the design uses the lowest resources and shows the second lowest throughput among the four designs. In this respect, it cannot be used in RNS-based homomorphic cryptosystems. Roy has the lowest throughput in the table, but the throughput can be further improved by placing more core processors in the FPGA.

또한, 본 개시에 따른 정규화된 처리량은 기존 방식들의 처리량보다 2-3배 더 큼을 확인할 수 있다. 이러한 결과는 본 개시에 따른 하드웨어 설계 방식이 높은 수준의 병렬화를 사용하기 때문에 발생한 것이다. In addition, it can be confirmed that the normalized throughput according to the present disclosure is 2-3 times greater than the throughput of conventional methods. This result occurs because the hardware design approach according to the present disclosure uses a high degree of parallelism.

표 2의 FPGA 리소스 내역을 참조하면, 본 개시에 일 실시예 따른 방법은 BRAM을 제외한 6개의 GBU가 대부분의 리소스를 차지함을 확인할 수 있다. 구체적으로 LUT의 50%, DSP 슬라이스의 68%를 사용합니다. BRAM의 경우 5개의 RB가 10MB를 사용하며 이는 전체 설계에서 대부분이다. 사용 가능한 리소스에 따라 트레이드오프를 선택할 수 있는 DSP 슬라이스를 사용하는 BU 수를 늘리면 이 크기를 줄일 수 있다. Referring to the FPGA resource details in Table 2, in the method according to an embodiment of the present disclosure, it can be confirmed that 6 GBUs excluding BRAM occupy most of the resources. Specifically, it uses 50% of LUTs and 68% of DSP slices. For BRAM, 5 RBs use 10MB, which is the majority of the overall design. This size can be reduced by increasing the number of BUs using DSP slices, which is a trade-off based on available resources.

ParameterParameter w/o our methodw/o our method w/ our methodw/ our method ImprovementImprovement Set-ASet-A 44.91MB44.91MB 64.76KB64.76KB 99.86%99.86% Set-BSet-B 45.91MB45.91MB 70.29KB70.29KB 99.85%99.85%

표 3 은 소수 정보 모두를 저장하여 이용하지 않고, 사이클마다 산출하여 사용하는 경우의 내부 메모리 크기 개선을 나타낸다. 첫 번째 열은 본 개시에서 이용한 매개 변수를 나타내며, 두 번째 열과 세 번째 열은 각각 기존 방법과 제안된 방법에서의 저장하는 제곱근 정보를 저장하는 메모리 크기를 보여준다. 이와 같이 본 개시에 따른 방법을 이용하는 경우, 메모리의 크기를 99% 절감할 수 있음을 확인할 수 있다. Table 3 shows the internal memory size improvement when calculating and using every cycle instead of storing and using all decimal information. The first column represents the parameters used in the present disclosure, and the second and third columns show the memory size for storing the square root information stored in the existing method and the proposed method, respectively. As such, when using the method according to the present disclosure, it can be confirmed that the size of the memory can be reduced by 99%.

FPGA 구현은 하드웨어 가속 효과를 확인하기 위하여, iNTT 소프트웨어 구현과 FPGA 로 구현한 경우를 비교한다. The FPGA implementation compares the iNTT software implementation and the FPGA implementation to confirm the hardware acceleration effect.

Software Impl.Software Imp. FPGA impl.FPGA impl. Set-ASet-A 387ms387ms 3.28ms3.28ms Set-BSet-B 446ms446ms 3.76ms3.76ms

표 4를 참조하면, 본 개시에 알고리즘을 소프트웨어적으로 구현한 경우와 FPGA 구현한 경우의 실행 시간을 나타낸다. 표 4의 두 번째 행과 세번째 행은 매개변수 세트 A와 세트 B를 사용할 경우의 결과를 나타낸다. 주파수가 200MHz일 때, 세트 A와 세트 B에 대한 FPGA 구현의 실행 시간은 각각 3.23ms와 3.76ms로 소프트웨어로 구현된 경우보다 115배 빠름을 확인할 수 있다. Referring to Table 4, it shows the execution time when the algorithm is implemented in software and FPGA in the present disclosure. The second and third rows of Table 4 show the results when using parameter set A and set B. When the frequency is 200 MHz, the execution time of the FPGA implementation for set A and set B is 3.23 ms and 3.76 ms, respectively, which is 115 times faster than the software implementation.

한편, 상술한 다양한 실시 예에 따른 암호문 처리 방법은 각 단계들을 수행하기 위한 프로그램 코드 형태로 구현되어, 기록 매체에 저장되고 배포될 수도 있다. 이 경우, 기록 매체가 탑재된 장치는 상술한 암호화 또는 암호문 처리 등의 동작들을 수행할 수 있다. Meanwhile, the cipher text processing method according to various embodiments described above may be implemented in the form of program code for performing each step, stored in a recording medium, and distributed. In this case, the device in which the recording medium is mounted may perform operations such as encryption or cipher text processing described above.

이러한 기록 매체는, ROM, RAM, 메모리 칩, 메모리 카드, 외장형 하드, 하드, CD, DVD, 자기 디스크 또는 자기 테이프 등과 같은 다양한 유형의 컴퓨터 판독 가능 매체가 될 수 있다. Such a recording medium may be various types of computer readable media such as ROM, RAM, memory chip, memory card, external hard drive, hard drive, CD, DVD, magnetic disk or magnetic tape.

이상 첨부 도면을 참고하여 본 개시에 대해서 설명하였지만 본 개시의 권리범위는 후술하는 특허청구범위에 의해 결정되며 전술한 실시 예 및/또는 도면에 제한되는 것으로 해석되어서는 안 된다. 그리고 특허청구범위에 기재된 개시의, 당업자에게 자명한 개량, 변경 및 수정도 본 개시의 권리범위에 포함된다는 점이 명백하게 이해되어야 한다.Although the present disclosure has been described with reference to the accompanying drawings, the scope of the present disclosure is determined by the claims described below and should not be construed as being limited to the foregoing embodiments and/or drawings. And it should be clearly understood that improvements, changes and modifications obvious to those skilled in the art of the disclosure described in the claims are also included in the scope of the present disclosure.

100: 전자 장치 200: 제1 서버 장치
300: 제2 서버 장치 400: 연산 장치
410: 통신 장치 420: 메모리
430: 디스플레이 440: 조작 입력 장치
450: 프로세서100: electronic device 200: first server device
300: second server device 400: arithmetic device
410: communication device 420: memory
430: display 440: operation input device
450: processor

Claims

연산 장치에 있어서,
적어도 하나의 인스트럭션(instruction)을 저장하는 메모리; 및
상기 적어도 하나의 인스트럭션을 실행하는 프로세서;를 포함하고,
상기 프로세서는,
상기 적어도 하나의 인스트럭션을 실행함으로써,
기결정된 기저 소수 정보를 저장하고, 상기 기저장된 기저 소수 정보를 비트 반전 처리하여 상기 기저 소수 정보와 다른 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행하고,
상기 기저 소수 정보 및 상기 제1 소수 정보는,
서로 다른 지수로 구성된 3개, 4개 또는 5개의 2의 지수승들의 감가산 값인 연산 장치. In the arithmetic device,
a memory for storing at least one instruction; and
a processor to execute the at least one instruction;
the processor,
By executing the at least one instruction,
Pre-determined basis prime number information is stored, bit inversion processing is performed on the pre-stored basis prime number information to generate first prime number information different from the basis number information, and information for the plurality of cipher texts is generated using the generated first prime number information. perform modulo operation,
The base prime number information and the first prime number information,
An operating unit that is the subtraction value of three, four or five powers of 2 composed of different exponents.

삭제delete

연산 장치에 있어서,
적어도 하나의 인스트럭션(instruction)을 저장하는 메모리; 및
상기 적어도 하나의 인스트럭션을 실행하는 프로세서;를 포함하고,
상기 프로세서는,
상기 적어도 하나의 인스트럭션을 실행함으로써,
기결정된 기저 소수 정보를 저장하고, 상기 기저장된 기저 소수 정보를 비트 반전 처리하여 상기 기저 소수 정보와 다른 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행하고,
상기 프로세서는,
상기 기저 소수 정보를 저장하는 내부 메모리;
서로 다른 기설정된 동형 연산을 수행하는 복수의 연산기를 포함하는 BU를 복수개 포함하는 GBU;
상기 내부 메모리로부터 기저 소수 정보를 읽어 오고, 상기 기저 소수 정보를 비트 반전하여 상기 복수의 BU 각각에 필요한 소수 정보를 생성하여 복수개의 BU 각각에 제공하는 소수 생성기;를 포함하는 연산 장치. In the arithmetic device,
a memory for storing at least one instruction; and
a processor to execute the at least one instruction;
the processor,
By executing the at least one instruction,
Pre-determined basis prime number information is stored, bit inversion processing is performed on the pre-stored basis prime number information to generate first prime number information different from the basis number information, and information for the plurality of cipher texts is generated using the generated first prime number information. perform modulo operation,
the processor,
an internal memory for storing the base prime number information;
a GBU including a plurality of BUs including a plurality of operators performing different predetermined isomorphic operations;
and a prime number generator that reads base prime number information from the internal memory, bit-inverts the base prime number information, generates prime number information required for each of the plurality of BUs, and provides the information to each of the plurality of BUs.

제3항에 있어서,
상기 소수 생성기는,
상기 기저 소수 정보의 k번째 비트를 log h번째 비트 정수로 비트 값 전환하여 소수 정보를 생성하는 연산 장치. According to claim 3,
The prime number generator,
An arithmetic device for generating decimal information by converting a bit value of a k-th bit of the base prime number information into a log h-th bit integer.

제3항에 있어서,
상기 소수 생성기는,
상기 기저 소수 정보를 이용하여 제1 사이클에 필요한 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보와 상기 기저 소수 정보를 이용하여 제2 사이클에 필요한 제2 소수 정보를 생성하는 연산 장치. According to claim 3,
The prime number generator,
An arithmetic device configured to generate first prime number information required for a first cycle using the base prime number information, and to generate second prime number information required for a second cycle using the generated first prime number information and the base prime number information.

제3항에 있어서,
상기 프로세서는,
상기 GBU를 복수개 포함하며,
상기 복수개의 GBU는 직렬 배치되며,
상기 프로세서는,
상기 하나의 GBU의 출력 값을 저장하고, 저장 순서와 다른 순서로 저장된 출력값을 다른 GBU에 제공하는 리오더링 버퍼(RB)를 더 포함하는 연산 장치. According to claim 3,
the processor,
Including a plurality of the GBU,
The plurality of GBUs are arranged in series,
the processor,
and a reordering buffer (RB) for storing the output values of the one GBU and providing the stored output values in an order different from the storage order to another GBU.

제3항에 있어서,
상기 GBU는,
복수의 스테이지를 구성하며, 상기 복수의 스테이지 각각은 복수의 BU가 병렬 배치되는 연산 장치. According to claim 3,
The GBU,
An arithmetic unit comprising a plurality of stages, wherein a plurality of BUs are arranged in parallel in each of the plurality of stages.

제3항에 있어서,
상기 하나의 GBU 내의 복수의 BU 중 적어도 두개는 동일한 소수 정보를 이용하여 동형 연산을 수행하는 연산 장치. According to claim 3,
At least two of the plurality of BUs in the one GBU perform an isomorphic operation using the same prime number information.

제3항에 있어서,
상기 BU 각각은,
두개의 동형 암호문을 입력받아 그 차이값을 출력하는 모듈러스 감산기;
두개의 동형 암호문을 입력받아 그 합산 값을 출력하는 모듈러스 가산기; 및
상기 모듈러스 감산기의 출력 값과 소수 정보를 이용하여 모듈러 곱셈 연산을 수행하는 모듈러스 곱셈기;를 포함하는 연산 장치. According to claim 3,
Each of the BUs,
a modulus subtractor that receives two isomorphic ciphertexts and outputs a difference between them;
a modulus adder that receives two isomorphic ciphertexts and outputs the sum of them; and
and a modulus multiplier for performing a modulo multiplication operation using the output value of the modulus subtractor and decimal information.

제9항에 있어서,
상기 모듈러스 곱셈기는,
상기 소수 정보를 구성하는 복수의 2의 지수승 각각의 지수에 기초하여 개별적인 시프트 연산을 수행하고, 시프트 연산 결과들을 가산 연산 또는 감산 연산하여 모듈러 곱셈 연산을 수행하는 연산 장치. According to claim 9,
The modulus multiplier,
An operation device for performing an individual shift operation based on each exponent of a plurality of powers of 2 constituting the decimal information, and performing a modular multiplication operation by performing an addition operation or subtraction operation on shift operation results.

제1항에 있어서,
상기 프로세서는,
FPGA(Field Programmable Gate Array)인 연산 장치. According to claim 1,
the processor,
A computing device that is a Field Programmable Gate Array (FPGA).

복수의 암호문에 대한 모듈 연산 명령을 입력받는 단계;
2의 지수승들의 조합으로 표현되는 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행하는 단계; 및
상기 연산 결과를 출력하는 단계;를 포함하고,
상기 모듈 연산을 수행하는 단계는,
기저 소수 정보를 저장하고, 상기 기저 소수 정보를 비트 반전 처리하여 상기 기저 소수 정보와 다른 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행하고,
상기 기저 소수 정보 및 상기 제1 소수 정보는,
서로 다른 지수로 구성된 3개, 4개 또는 5개의 2의 지수승들의 감가산 값인 암호문 연산 방법. Receiving a module operation command for a plurality of cipher texts;
performing a modular operation on the plurality of cipher texts using decimal information expressed as a combination of powers of 2; and
Including; outputting the result of the operation;
The step of performing the module operation,
Stores base prime number information, performs bit inversion processing on the base prime number information to generate first prime number information different from the base prime number information, and performs a module operation on the plurality of ciphertexts using the generated first prime number information. do,
The base prime number information and the first prime number information,
A ciphertext operation method that is the subtraction value of three, four or five powers of 2 composed of different exponents.

삭제delete

복수의 암호문에 대한 모듈 연산 명령을 입력받는 단계;
2의 지수승들의 조합으로 표현되는 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행하는 단계; 및
상기 연산 결과를 출력하는 단계;를 포함하고,
상기 모듈 연산을 수행하는 단계는,
기저 소수 정보를 저장하고, 상기 기저 소수 정보를 비트 반전 처리하여 상기 기저 소수 정보와 다른 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행하고,
상기 모듈 연산을 수행하는 단계는,
상기 기저 소수 정보의 k번째 비트를 log h번째 비트 정수로 비트 값 전환하여 제1 소수 정보를 생성하는 암호문 연산 방법. Receiving a module operation command for a plurality of cipher texts;
performing a modular operation on the plurality of cipher texts using decimal information expressed as a combination of powers of 2; and
Including; outputting the result of the operation;
The step of performing the module operation,
Stores base prime number information, performs bit inversion processing on the base prime number information to generate first prime number information different from the base prime number information, and performs a module operation on the plurality of ciphertexts using the generated first prime number information. do,
The step of performing the module operation,
The ciphertext operation method of generating first decimal information by converting a bit value of the k-th bit of the base prime number information into a log h-th bit integer.

복수의 암호문에 대한 모듈 연산 명령을 입력받는 단계;
2의 지수승들의 조합으로 표현되는 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행하는 단계; 및
상기 연산 결과를 출력하는 단계;를 포함하고,
상기 모듈 연산을 수행하는 단계는,
기저 소수 정보를 저장하고, 상기 기저 소수 정보를 비트 반전 처리하여 상기 기저 소수 정보와 다른 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보를 이용하여 상기 복수의 암호문에 대한 모듈 연산을 수행하고,
상기 모듈 연산을 수행하는 단계는,
상기 기저 소수 정보를 이용하여 제1 사이클에 필요한 제1 소수 정보를 생성하고, 상기 생성된 제1 소수 정보와 상기 기저 소수 정보를 이용하여 제2 사이클에 필요한 제2 소수 정보를 생성하는 암호문 연산 방법.

Receiving a module operation command for a plurality of cipher texts;
performing a modular operation on the plurality of ciphertexts using decimal information expressed as a combination of powers of 2; and
Including; outputting the result of the operation;
The step of performing the module operation,
Stores base prime number information, performs bit inversion processing on the base prime number information to generate first prime number information different from the base prime number information, and performs a module operation on the plurality of ciphertexts using the generated first prime number information. do,
The step of performing the module operation,
A ciphertext operation method for generating first prime number information required for a first cycle using the base prime number information and generating second prime number information required for a second cycle using the generated first prime number information and the base prime number information. .