TWI828934B

TWI828934B - Processor, method for operating the same, and electronic device including the same

Info

Publication number: TWI828934B
Application number: TW109126747A
Authority: TW
Inventors: 金東奎; 崔必周
Original assignee: 南韓商Ｉｃｔｋ控股有限公司; 漢陽大學校産學協力團
Priority date: 2019-08-06
Filing date: 2020-08-06
Publication date: 2024-01-11
Also published as: TW202127291A; KR20210018130A

Abstract

A processor, a method for operating the same, and an electronic device including the same are disclosed. The method includes the following steps: identifying an instruction used to instruct the execution of a first operation and an address information of an operand corresponding to the instruction; and executing the instruction based on whether the address information of the operand meets a predetermined condition. In the step of executing the instruction, when the address information of the operand meets the predetermined condition, a second operation set in the instruction is performed on the operand; when the address information of the operand does not meet the predetermined condition, a first operation is performed on the operand.

Description

處理器、處理器的操作方法及包括其的電子裝置 Processor, method of operating the processor and electronics including the same device

本發明涉及一種處理器、處理器的操作方法及包括其的電子裝置。 The present invention relates to a processor, an operating method of the processor and an electronic device including the same.

區塊編碼器演算法對於提供基本的安全服務(如，機密性和完整性)至關重要。這些演算法可以藉由軟體或硬體來實現。藉由軟體的實現方法可以提供靈活性，而藉由硬體的實現方法可以支援高性能。為了同時實現這些好處，人們正在對區塊編碼器的可程式設計處理器進行廣泛的研究。 The block encoder algorithm is essential to provide basic security services such as confidentiality and integrity. These algorithms can be implemented by software or hardware. Implementation through software can provide flexibility, while implementation through hardware can support high performance. To realize these benefits simultaneously, extensive research is being conducted on programmable processors for block encoders.

第一種方法可以是使用可程式設計密碼輔助處理器，如CryptoManiac、Cryptonite、Cryptoraptor和SPARX提出的輔助處理器(coprocessors)。這些輔助處理器不僅可以藉由定制設計和並行處理更快地執行區塊編碼器，而且還可以使用自己的指令集架構(ISA，instruction set architecture)進行程式設計。然而，這些輔助處理器需要額外的硬體資源，如控制邏輯門、寄存器檔、指令記憶體和資料記憶體等，這可能會大大增加成本。 The first approach could be to use programmable cryptographic auxiliary processors, such as coprocessors proposed by CryptoManiac, Cryptonite, Cryptoraptor and SPARX. Not only can these auxiliary processors be custom designed and combined Line processing executes the block encoder faster, and it can also be programmed using its own instruction set architecture (ISA). However, these auxiliary processors require additional hardware resources, such as control logic gates, register files, instruction memory, and data memory, which may significantly increase the cost.

第二種方法可以是擴展現有的CPU內核以支援加密運算。一般來說，可以藉由擴展ISA來實現用於區塊編碼器的CPU內核的擴展。ISA擴展最初是在CPU內核提供專用於多種加密演算法的附加指令的思想中提出的。然而，大多數受支援的演算法都已過時，如今可能很少使用。CPU內核可以提供執行AES的位元組代替(SubByte)和列混合(MixColumns)變換的指令。作為Xilinx的商業核心，CryptoBlaze處理器是8位元微控制器內核，可以支援AES和RSA的運算(如，有限場域(GF，Galois fields))的反轉和捲積。 The second approach could be to extend existing CPU cores to support cryptographic operations. In general, expansion of the CPU core for block encoders can be achieved by extending the ISA. ISA extensions were originally proposed with the idea that the CPU core provides additional instructions dedicated to various encryption algorithms. However, most of the supported algorithms are obsolete and may be rarely used today. The CPU core can provide instructions for performing AES byte replacement (SubByte) and column mix (MixColumns) transformations. As the commercial core of Xilinx, the CryptoBlaze processor is an 8-bit microcontroller core that can support inversion and convolution of AES and RSA operations (such as finite field (GF, Galois fields)).

根據一實施例的處理器的操作方法包括以下步驟：識別(identify)用於指示執行第一運算的指令以及與所述指令相對應的運算元(operand)的位址資訊；以及基於所述運算元的位址資訊是否滿足預定條件來執行所述指令，在執行所述指令的步驟中，當所述運算元的位址資訊滿足所述預定條件時，對所述運算元執行設定在所述指令中的第二運算，當所述運算元的位址資訊不滿足所述預定條件時，對所述運算元執行所述第一運算。 An operating method of a processor according to an embodiment includes the following steps: identifying an instruction used to instruct execution of a first operation and address information of an operand corresponding to the instruction; and based on the operation Is the address information of the element sufficient? Predetermined conditions are used to execute the instruction. In the step of executing the instruction, when the address information of the operand meets the predetermined condition, the second operation set in the instruction is performed on the operand, When the address information of the operand does not meet the predetermined condition, the first operation is performed on the operand.

根據一實施例的處理器的操作方法，所述預定條件可以對應於所述運算元的位址資訊是否屬於預設的位址範圍。 According to the operating method of the processor of an embodiment, the predetermined condition may correspond to whether the address information of the operand belongs to a preset address range.

根據一實施例的處理器的操作方法，所述第一運算可以是在所述處理器中比所述第二運算執行得少的操作。 According to the operating method of a processor of an embodiment, the first operation may be an operation performed less frequently in the processor than the second operation.

根據一實施例的處理器的操作方法，在執行所述指令之前，根據所述預定條件的位址範圍可以預先被註冊在所述處理器。 According to an operating method of a processor according to an embodiment, before executing the instruction, an address range according to the predetermined condition may be pre-registered in the processor.

根據一實施例的處理器的操作方法，所述第二運算可以是不包括在所述處理器的ISA中的操作。 According to the operating method of a processor of an embodiment, the second operation may be an operation not included in the ISA of the processor.

根據一實施例的處理器的操作方法，所述運算元從連接到所述處理器的記憶體中被載入，並被存儲在所述處理器中的專用緩衝區中，並且，所述運算元的位址資訊可以表示存儲所述運算元的所述專用緩衝區中的位址。 According to the operating method of a processor of an embodiment, the operation element is loaded from a memory connected to the processor and stored in a dedicated buffer in the processor, and the operation element The address information of an element may represent an address in the dedicated buffer where the operand is stored.

根據一實施例的處理器的操作方法，當所述運算元的位址資訊滿足所述預定條件時，所述運算元可以被存儲在所述處理器的資料緩衝區(data-buffer)中，所述運算元的位址資訊可以被存儲在所述處理器的配置緩衝區(configuration-buffer)中。 According to the operating method of a processor of an embodiment, when the address information of the operand meets the predetermined condition, the operand may be stored in the processing In the data-buffer of the processor, the address information of the operand may be stored in the configuration-buffer of the processor.

根據一實施例的處理器的操作方法，所述配置緩衝區可以是記憶體映射緩衝區(memory mapped buffer)。 According to an operating method of a processor according to an embodiment, the configuration buffer may be a memory mapped buffer.

根據一實施例的處理器的操作方法，可以藉由連接到所述處理器中通用暫存器的標誌資訊來表示所述運算元的位址資訊是否滿足所述預定條件。 According to an operating method of a processor according to an embodiment, flag information connected to a general register in the processor can be used to indicate whether the address information of the operation element satisfies the predetermined condition.

根據一實施例的處理器的操作方法，當所述運算元的位址資訊滿足所述預定條件時，在所述處理器的操作中使用的輪次計算器(round counter)和輪金鑰指標(round-key pointer)可以被存儲在所述處理器的資料緩衝區中，所述輪次計數器和所述輪金鑰指標的位址資訊可以被存儲在所述處理器的配置緩衝區中。 According to an operating method of a processor according to an embodiment, when the address information of the operand meets the predetermined condition, a round counter and a round key indicator are used in the operation of the processor (round-key pointer) may be stored in a data buffer of the processor, and the address information of the round counter and the round-key pointer may be stored in a configuration buffer of the processor.

根據一實施例的處理器包括：資料緩衝區(data-buffer)，其存儲運算元；配置緩衝區(configuration-buffer)，其存儲所述運算元的位址資訊；以及處理器單元，其識別用於指示執行第一運算的指令以及與所述指令相對應的所述運算元的位址資訊，並基於所述運算元的位址資訊是否滿足預定條件來執行所述指令，在所述處理器單元中，當所述運算元的位址資訊滿足所述預定條件時，對所述運算元執行設定在所述指令中的第二運算，當所述運算元的位址資訊不滿足所述預定條件時，對所述運算元執行所述第一運算。 A processor according to an embodiment includes: a data-buffer that stores operands; a configuration-buffer that stores address information of the operands; and a processor unit that identifies An instruction used to instruct the execution of the first operation and the address information of the operand corresponding to the instruction, and the instruction is executed based on whether the address information of the operand satisfies a predetermined condition. In the processing In the processor unit, when the address information of the operand meets the predetermined condition, the The operand performs a second operation set in the instruction. When the address information of the operand does not satisfy the predetermined condition, the first operation is performed on the operand.

根據一實施例的處理器，所述預定條件可以對應於所述運算元的位址資訊是否屬於預設的位址範圍。 According to the processor of an embodiment, the predetermined condition may correspond to whether the address information of the operand belongs to a preset address range.

根據一實施例的處理器，所述第一運算可以是在所述處理器中比所述第二運算執行得少的操作。 According to the processor of an embodiment, the first operation may be an operation performed less frequently in the processor than the second operation.

根據一實施例的處理器，在執行所述指令之前，根據所述預定條件的位址範圍可以預先被註冊在所述處理器。 According to the processor of an embodiment, before executing the instruction, the address range according to the predetermined condition may be pre-registered in the processor.

根據一實施例的處理器，所述第二運算可以是不包括在所述處理器的ISA中的操作。 According to the processor of an embodiment, the second operation may be an operation not included in the ISA of the processor.

根據一實施例的處理器，所述運算元從連接到所述處理器的記憶體中被載入，並被存儲在所述處理器中的專用緩衝區中，並且，所述運算元的位址資訊可以表示存儲所述運算元的所述專用緩衝區中的位址。 According to a processor of an embodiment, the operand is loaded from a memory connected to the processor and stored in a dedicated buffer in the processor, and the bits of the operand The address information may represent an address in the dedicated buffer where the operand is stored.

根據一實施例的處理器，當所述運算元的位址資訊滿足所述預定條件時，所述運算元被存儲在所述處理器的資料緩衝區中，並且，所述運算元的位址資訊可以被存儲在所述處理器的配置緩衝區中。 According to the processor of an embodiment, when the address information of the operand meets the predetermined condition, the operand is stored in the data buffer of the processor. , and the address information of the operand may be stored in a configuration buffer of the processor.

根據一實施例的處理器還可以包括執行所述第二運算的專用運算子。 The processor according to an embodiment may further include a dedicated operator that performs the second operation.

根據一實施例的電子裝置，包括：記憶體，其存儲指令及與所述指令相對應的運算元；以及處理器，其執行所述指令，所述處理器，包括：緩衝區，其存儲從所述記憶體接收的用於執行所述指令的所述運算元及所述運算元的位址資訊；以及處理器單元，其識別用於指示執行第一運算的指令以及與所述指令相對應的所述運算元的位址資訊，並基於所述運算元的位址資訊是否滿足預定條件來執行所述指令，在所述處理器單元中，當所述運算元的位址資訊滿足所述預定條件時，對所述運算元執行設定在所述指令中的第二運算，當所述運算元的位址資訊不滿足所述預定條件時，對所述運算元執行所述第一運算。 An electronic device according to an embodiment includes a memory that stores instructions and operands corresponding to the instructions; and a processor that executes the instructions. The processor includes a buffer that stores the instructions. The memory receives the operand for executing the instruction and the address information of the operand; and a processor unit that identifies the instruction for instructing execution of the first operation and the instruction corresponding to the instruction. The address information of the operand, and execute the instruction based on whether the address information of the operand satisfies a predetermined condition. In the processor unit, when the address information of the operand satisfies the predetermined condition, When a predetermined condition is met, the second operation set in the instruction is performed on the operand. When the address information of the operand does not meet the predetermined condition, the first operation is performed on the operand.

根據一實施例，藉由根據資料位址不同地解釋相同的指令並相應地執行不同的運算，可以在未改變現有CPU內核的ISA或編譯器的情況下有效地支援新的運算。因此，可以高效、安全地實施在 CPU分配之後開發的加密技術。新的運算是在專用於CPU內核中相應運算的處理器中執行的，因此可以期望快速運算。 According to one embodiment, by interpreting the same instructions differently based on data addresses and performing different operations accordingly, new operations can be efficiently supported without changing the ISA or compiler of the existing CPU core. Therefore, it can be implemented efficiently and safely in Encryption technology developed after CPU allocation. New operations are performed in processors dedicated to corresponding operations in the CPU core, so fast operations can be expected.

根據一實施例，藉由向CPU內核添加用於分組加密的專用加密緩衝區，可以大大減少佔用程式大部分的載入和存儲的記憶體存取。根據是否從專用密碼緩衝區載入運算元，可以執行現有的運算或新定義的密碼運算，從而獲取較少的代碼行和記憶體存取、快速運算及靈活性，而無需修改現有的ISA和編譯器框架即可加快複雜的密碼運算。 According to one embodiment, by adding a dedicated encryption buffer for block encryption to the CPU core, memory accesses that take up a large portion of the load and store of the program can be greatly reduced. Depending on whether the operands are loaded from a dedicated cryptographic buffer, existing operations or newly defined cryptographic operations can be performed, resulting in fewer lines of code and memory access, fast operations, and flexibility without modifying existing ISA and The compiler framework can speed up complex cryptographic operations.

根據一實施例，算數運算的執行速度可以比對存儲在記憶體中的表的存取速度更快，並且，可以藉由超載的算術指令來更簡單地表示變換。結果，與傳統的AES軟體代碼相比，帶有指令重載的新AES代碼可以實現更快的執行速度和更小的記憶體耗用量(smaller memory footprint)。 According to one embodiment, arithmetic operations can be performed faster than accessing tables stored in memory, and transformations can be expressed more simply by overloading arithmetic instructions. As a result, the new AES code with instruction overloading can achieve faster execution and a smaller memory footprint compared to traditional AES software code.

根據一實施例，可以藉由使用附加緩衝區來有效地減少記憶體載入的開銷。藉由對ALU運算的擴展和運算的重新定義，可以提高區塊編碼器的運算速度，並為支援各種區塊編碼器演算法的運算提供靈活性。可以藉由自動遮罩來提高功耗分析阻力。 According to one embodiment, memory loading overhead can be effectively reduced by using additional buffers. Through the expansion of the ALU operation and the redefinition of the operation, the operation speed of the block encoder can be improved and the flexibility can be provided to support the operation of various block encoder algorithms. Power analysis resistance can be improved through automatic masking.

根據一實施例，藉由對硬體內部的CPU內核應用遮罩技術，可以自動遮罩專用加密緩衝區中的資料，並且，可以使用遮罩值來執行運算。即，即使是在不考慮功耗分析阻力的情況下開發的代碼，也可以在相同的處理速度下有效地防止功耗分析攻擊。 According to one embodiment, by applying masking technology to the CPU core inside the hardware, the data in the dedicated encryption buffer can be automatically masked, and the mask value can be used to perform operations. That is, even code developed without considering power analysis resistance can effectively prevent power analysis attacks at the same processing speed.

100:電子裝置 100: Electronic devices

110:記憶體 110:Memory

120:處理器 120: Processor

200:處理器 200:processor

210:緩衝區 210:Buffer

220:處理器單元 220: Processor unit

211:通用緩衝區 211: Universal buffer

213:配置緩衝區 213:Configure buffer

215:資料緩衝區 215: Data buffer

410:通用暫存器 410: General register

420:配置緩衝區 420:Configure buffer

430:資料緩衝區 430: Data buffer

440:記憶體載入單元 440: Memory load unit

450:記憶體存儲單元 450: Memory storage unit

710:RowCalc_SB 710: RowCalc_SB

720:RowCalc_SB_SR 720: RowCalc_SB_SR

730:RowCalc_MC 730: RowCalc_MC

圖1為顯示根據一實施例的電子裝置的附圖。 FIG. 1 is a diagram showing an electronic device according to an embodiment.

圖2為顯示根據一實施例的處理器的附圖。 Figure 2 is a diagram showing a processor according to an embodiment.

圖3為說明根據一實施例的RISC-V內核的附圖。 Figure 3 is a diagram illustrating a RISC-V core according to an embodiment.

圖4為說明根據一實施例的處理器中的緩衝區的附圖。 Figure 4 is a diagram illustrating a buffer in a processor according to an embodiment.

圖5為顯示根據一實施例的算術指令超載的示例的附圖。 Figure 5 is a diagram showing an example of arithmetic instruction overloading according to an embodiment.

圖6為顯示根據一實施例的對於超載指令的C運算式(expression)的示例的附圖。 FIG. 6 is a diagram showing an example of a C expression for an overload instruction according to an embodiment.

圖7為顯示根據一實施例的藉由使用巨集的指令超載來在AES代碼中進行變換(transformation)的示例的附圖。 7 is a diagram showing an example of transformation in AES code by using instruction overloading of macros, according to an embodiment.

圖8為顯示根據一實施例的用於使用超載指令的AES變換的巨集的示例的附圖。 8 is a diagram showing an example of a macro for AES transformation using overload instructions, according to an embodiment.

圖9為顯示根據一實施例的變數註冊(registration)及釋放(releasing)的示例的附圖。 FIG. 9 is a diagram showing an example of variable registration and releasing according to an embodiment.

圖10為顯示根據一實施例的藉由遮罩進行反轉(inversion)的示例的附圖。 FIG. 10 is a diagram showing an example of inversion through masking according to an embodiment.

圖11為顯示根據一實施例的處理器的操作方法的附圖。 FIG. 11 is a diagram illustrating an operating method of a processor according to an embodiment.

以下，將參照附圖對實施例進行詳細說明。然而，能夠對實施例進行多種變更，本申請的權利範圍並非受到以下實施例的限制或限定。對於實施例的全部更改、其等同物乃至其替代物均包括在權利要求範圍。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, various changes can be made to the embodiments, and the scope of rights of the present application is not restricted or limited by the following embodiments. All modifications of the embodiments, equivalents and alternatives thereof are included in the scope of the claims.

實施利中使用的術語僅用於說明特定實施例，並非用於限定實施例。在內容中沒有特別說明的情況下，單數表達包括複數含義。在本說明書中，「包括」或者「具有」等術語用於表達存在說明書中所記載的特徵、數字、步驟、操作、構成要素、配件或其組合，並不排除還具有一個或以上的其他特徵、數字、步驟、操作、構成要素、配件或其組合，或者附加功能。 The terms used in the embodiments are only used to describe specific embodiments and are not used to limit the embodiments. Unless otherwise specified in the content, singular expressions include the plural. In this specification, terms such as "including" or "having" are used to express the presence of the features, numbers, steps, operations, components, accessories or combinations thereof described in the specification, and do not exclude the presence of one or more other features. , numbers, steps, operations, components, accessories or combinations thereof, or additional functions.

在沒有其他定義的情況下，包括技術或者科學術語在內的在此使用的全部術語，都具有本領域普通技術人員所理解的通常的含義。通常使用的與詞典定義相同的術語，應理解為與相關技術的通常的內容相一致的含義，在本申請中沒有明確言及的情況下，不能過度理想化或解釋為形式上的含義。 Unless otherwise defined, all terms used herein, including technical or scientific terms, have the ordinary meaning understood by one of ordinary skill in the art. Commonly used terms that are the same as dictionary definitions should be understood to have meanings consistent with the usual content of the relevant technology. Unless explicitly mentioned in this application, they cannot be overly idealized or interpreted as formal meanings.

並且，在參照附圖進行說明的過程中，與附圖標記無關，相同的構成要素賦予相同的附圖標記，並省略對此的重複的說明。在說明實施例的過程中，當判斷對於相關公知技術的具體說明會不必要地混淆實施例時，省略對其詳細說明。 In the description with reference to the drawings, the same constituent elements will be assigned the same reference numerals regardless of the reference numerals, and repeated description thereof will be omitted. During the description of the embodiments, when it is judged that detailed description of relevant publicly known technologies will unnecessarily obscure the embodiments, detailed descriptions thereof will be omitted.

並且，在對實施例的構成要素進行說明時，可以使用第一、第二、A、B、(a)、(b)等術語。然而，上述術語的使用僅作為將該構成要素區別於其他構成要素，並非用於限定相應構成要素的本質、排列或順序。當說明一個構成要素與其他構成要素「連接」、「結合」或「接觸」時，該構成要素可以直接連接或接觸於其他構成要素，另一其他構成要素也可以「連接」、「結合」或「接觸」到各構成要素之間。 Furthermore, when describing the components of the embodiment, terms such as first, second, A, B, (a), (b), etc. may be used. However, the use of the above terms is only to distinguish this constituent element from other constituent elements, and is not used to limit the nature, arrangement or sequence of the corresponding constituent elements. When it is stated that a component is "connected", "combined" or "in contact" with other components, the component may be directly connected or in contact with the other component, and another component may also be "connected", "combined" or "contacted". "Contact" between each component.

在其他實施例中，將使用相同的名稱來描述包括在一實施例中的構成要素和包括公共功能的構成要素。除非另有說明，一實施例中的描述可以應用於其他實施例，並且在重複範圍內將省略詳細描述。 In other embodiments, the same names will be used to describe constituent elements included in an embodiment and constituent elements including common functions. Unless otherwise stated, a The description in the embodiment can be applied to other embodiments, and detailed description will be omitted within the scope of repetition.

參照圖1，根據一實施例的電子裝置100包括記憶體110及處理器120。在一實施例中，電子裝置100可以包括各種計算裝置(如，智慧手機、平板電腦、筆記型電腦、個人電腦等)、各種可穿戴設備(如，智慧手錶、智慧眼鏡等)、各種家用電器(如，智慧揚聲器、智慧電視、智慧冰箱等)、智慧汽車、智慧亭、物聯網(IoT，Internet of Things)設備、無人機、機器人等。 Referring to FIG. 1 , an electronic device 100 according to an embodiment includes a memory 110 and a processor 120 . In one embodiment, the electronic device 100 may include various computing devices (such as smart phones, tablets, notebook computers, personal computers, etc.), various wearable devices (such as smart watches, smart glasses, etc.), various household appliances (such as smart speakers, smart TVs, smart refrigerators, etc.), smart cars, smart kiosks, Internet of Things (IoT, Internet of Things) devices, drones, robots, etc.

記憶體110存儲要由處理器120執行的指令和對應於該指令的運算元。例如，記憶體110可以是揮發性記憶體或非揮發性記憶體。 Memory 110 stores instructions to be executed by processor 120 and operands corresponding to the instructions. For example, memory 110 may be volatile memory or non-volatile memory.

處理器120是執行指令或程式或控制電子裝置100的裝置，例如，可以是中央處理器單元(CPU，Central Processing Unit)、圖形處理器單元(GPU，Graphic Processing Unit)、神經處理器單元(NPU，Neural Processing Unit)等。處理器120可以讀取存儲在記憶體110中的指令和/或運算元，並將其存儲在內部緩衝區中，並且，可以基於存儲在內部緩衝區中的資料來快速執行根據指令的運算。如上所述，處理器120中可執行的指令由指令集架構(ISA，instruction set architecture)定義，在各種情況下，處理器120可以執行ISA中未定義的新指令。如果為此而向ISA添加新的指令，則可能需要ISA擴展和編譯器更改。為了避免這種情況，可以只擴展現有指令的執行，而不需要向ISA添加新的指令。換言之，處理器120可以根據運算元的資料位址對相同的指令執行不同的運算。在本說明書中，可以將這種全新的電腦架構概念稱為指令超載(instruction overloading)。以下，將藉由附圖對其進行詳細說明。 The processor 120 is a device that executes instructions or programs or controls the electronic device 100. For example, it can be a central processing unit (CPU), a graphics processing unit (GPU), or a neural processing unit (NPU). , Neural Processing Unit), etc. The processor 120 can read instructions and/or operands stored in the memory 110 and store them in an internal buffer, and can quickly execute instructions according to the information stored in the internal buffer. Operation. As mentioned above, the instructions executable in the processor 120 are defined by the instruction set architecture (ISA), and under various circumstances, the processor 120 can execute new instructions not defined in the ISA. If new instructions are added to the ISA for this purpose, ISA extensions and compiler changes may be required. To avoid this situation, you can just extend the execution of existing instructions without adding new instructions to the ISA. In other words, the processor 120 can perform different operations on the same instruction according to the data address of the operand. In this specification, this new computer architecture concept can be called instruction overloading. This will be described in detail below with reference to the drawings.

參照圖2，處理器200包括緩衝區210及處理器單元220。緩衝區210可以包括通用緩衝區211、配置緩衝區213及資料緩衝區215。在此，通用緩衝區211可以是通常包括在處理器200中的通用暫存器(GPR，general-purpose register)，並且，配置緩衝區213和資料緩衝區215可以是用於上述指令超載的專用緩衝區。將參照圖4詳細描述緩衝區210。處理器單元220是根據指令執行操作的裝置，例如，可以包括精簡指令集電腦(RISC-V，reduced instruction set computer-V)內核。處理器單元220也可以被稱為處理內核(processing core)。將參照圖3詳細描述處理器單元220。 Referring to FIG. 2 , the processor 200 includes a buffer 210 and a processor unit 220 . The buffer 210 may include a general buffer 211, a configuration buffer 213, and a data buffer 215. Here, the general buffer 211 may be a general-purpose register (GPR) usually included in the processor 200, and the configuration buffer 213 and the data buffer 215 may be dedicated for the above-mentioned instruction overload. buffer. Buffer 210 will be described in detail with reference to FIG. 4 . The processor unit 220 is a device that performs operations according to instructions, and may include, for example, a reduced instruction set computer-V (RISC-V) core. The processor unit 220 may also be referred to as a processing core. The processor unit 220 will be described in detail with reference to FIG. 3 .

在一實施例中，在由處理器200執行的運算中，可以存在常用的運算和不常用的運算。例如，當在處理器200中執行區塊編碼器(Block Ciphers)時，很少使用乘法(*)和除法(/)等一些運算，反之，經常使用捲積(convolution)、乘法逆元素(multiplicative inversion)等有限場域(finite fields)的對等部分(counterparts)，但可能未在對應於處理器200的ISA中被定義。此時，藉由根據情況不同地解釋指示執行不常用的運算(如，乘法(*)或除法(/))的指令，可以在處理器200中支援新運算，而無需更改現有處理器200的ISA。可以將這種指令超載與運算子超載區分開，而在運算子超載中，根據物件導向語言(object-oriented languages)中運算元的資料類型，將運算子解釋為另一個運算。 In an embodiment, among the operations performed by the processor 200, there may be commonly used operations and uncommon operations. For example, when executing block ciphers in the processor 200, some operations such as multiplication (*) and division (/) are rarely used. On the contrary, convolution (convolution), multiplicative inverse (multiplicative) are often used. inversion), but may not be defined in the ISA corresponding to processor 200. At this time, by interpreting instructions to perform uncommon operations (eg, multiplication (*) or division (/)) differently depending on the situation, new operations can be supported in the processor 200 without changing the existing processor 200 ISA. This type of instruction overloading can be distinguished from operator overloading, in which an operator is interpreted as another operation based on the data type of the operand in object-oriented languages.

處理器200可以根據運算元的位址資訊對同一條指令執行不同的運算。這種變數(variable)的位址預先被註冊在處理器200中，以便可以指定要以不同方式處理的變數。換言之，處理器200中指令的執行可以根據是否已註冊運算元的位址而不同。 The processor 200 can perform different operations on the same instruction according to the address information of the operands. The addresses of such variables are registered in the processor 200 in advance so that variables to be processed in different ways can be specified. In other words, execution of instructions in processor 200 may differ depending on whether the address of the operand has been registered.

在一實施例中，可以使用三種不同的使用指令超載的RISC-V內核擴展方法。 In one embodiment, three different RISC-V kernel extension methods using instruction overloading may be used.

第一種方法可以是擴展內核以支援特定區塊編碼器。例如，可以執行進階加密標準擴展(AES，Advanced Encryption Standard extension)。為此，可以有兩種類型的指令超載，即載入/存儲指令(load/store instruction)和算術指令(arithmetic instruction)。這些指令可以為註冊的變數提供以下功能：- 由單個超載的載入指令(single overloaded load instruction)執行一系列常式運算；- CPU內核中的變數管理；- 密碼運算與變換。 The first approach could be to extend the kernel to support specific block encoders. For example, Advanced Encryption Standard extension (AES) can be implemented. For this purpose, there are two types of instruction overloads, namely load/store instructions and arithmetic instructions. These instructions can provide the following functions for registered variables: - execution of a series of routine operations by a single overloaded load instruction; - variable management in the CPU core; - cryptographic operations and transformations.

首先，對一些變數(如，輪次計算器(round counter)及輪金鑰指標(round-key pointer))的每個常式運算都可以作為單個超載的指令來處理，從而減少指令數。 First, each routine operation on some variables (such as round counter and round-key pointer) can be processed as a single overloaded instruction, thereby reducing the instruction count.

第二，可以在處理器200中的專用緩衝區(如，配置緩衝區213和資料緩衝區215))中管理註冊變數，並且，將註冊變數作為超載的載入和存儲指令來載入和存儲。由此，可以提高對區塊編碼器變數的存取速度。 Second, registration variables can be managed in dedicated buffers in processor 200 (eg, configuration buffer 213 and data buffer 215) and loaded and stored as overloaded load and store instructions. . As a result, the access speed to the block encoder variables can be increased.

第三，超載的算術指令(overloaded arithmetic instruction)可以支援密碼運算和變換，但僅使用通常由處理器200提供的指令來進行處理可能並不容易。通常可以使用在既有的軟體(legacy software)中預先計算的大表來處理變換。然而，超載的算數運算的執行時間比存取存儲在記憶體中的表的時間快，並且可以藉由超載的算術指令來更簡單地表示變換。結果，與傳統的AES軟體代碼相比，帶有指令超載的新AES代碼可以實現更快的執行速度和更小的記憶體耗用量。這種AES擴展也可以應用於SIMON等基於XOR(基於ARX)的區塊編碼器加/AND。SIMON是一種羽量級的區塊編碼器，具有幾乎相同的記憶體占用量，但指令超載可以有效減少SIMON代碼的執行時間。 Third, overloaded arithmetic instructions can support cryptographic operations and transformations, but only use instructions normally provided by processor 200. It may not be easy to process the order. Transformations can usually be handled using large tables precomputed in legacy software. However, the execution time of overloaded arithmetic operations is faster than the time to access the table stored in memory, and the transformation can be expressed more simply by overloaded arithmetic instructions. As a result, the new AES code with instruction overloading can achieve faster execution and smaller memory consumption than traditional AES software code. This AES extension can also be applied to XOR (based on ARX) block encoders plus/AND such as SIMON. SIMON is a featherweight block encoder with almost the same memory footprint, but instruction overloading can effectively reduce the execution time of SIMON code.

RISC-V內核擴展的第二種方法可以是提供一般的密碼運算，並支援藉由GF(2⁸)執行捲積和乘法逆元素之類的運算的區塊編碼器。這可能意味著，如有必要，與第一種方法相比，可以以更少的開銷有效地支援各種區塊編碼器，包括在處理器200的分發之後開發的密碼。 A second way to extend the RISC-V core could be to provide general cryptographic operations and support block encoders that perform operations like convolution and multiplicative inverse elements via GF( ²⁸ ). This may mean that, if necessary, various block encoders, including ciphers developed after the distribution of processor 200, can be efficiently supported with less overhead than the first approach.

在第三種方法中，可以將硬體遮罩附加地應用於第二種方法。在此，遮罩是針對功耗分析攻擊廣泛使用的對策之一。對此，將參照圖10進行詳細描述。 In the third method, a hardware mask can be additionally applied to the second method. Here, masking is one of the widely used countermeasures against power analysis attacks. In this regard, a detailed description will be given with reference to FIG. 10 .

參照圖3，根據一實施例的RISC-V內核的操作可以區分為指令取出(IF，instruction fetch)階段、指令解碼(ID，instruction decode)階段、DP階段、執行(EX，execute)階段及回寫(WB，writeback)階段。RISC-V可用於設計應用指令超載的處理器。RISC-V可以是基於RISC原理的開源ISA。對於廣泛的應用，支援小型嵌入式系統、個人電腦及超級電腦，可以同時考慮高性能和低功耗。 Referring to Figure 3, the operation of the RISC-V core according to an embodiment can be divided into an instruction fetch (IF, instruction fetch) stage, an instruction decode (ID, instruction decode) stage, a DP stage, an execution (EX, execute) stage and a callback. Write (WB, writeback) stage. RISC-V can be used to design processors that apply instruction overload. RISC-V can be an open source ISA based on RISC principles. For a wide range of applications, supporting small embedded systems, personal computers and supercomputers, both high performance and low power consumption can be considered.

基於RISC-V RV32IM ISA，可以設計定制CPU內核，該內核可以處理基於32位元整數的指令以進行整數乘法和除法(M擴展，M extension)。具體地，基本內核(base core)的微架構(microarchitecture)支援單一問題(single-issue)、亂序執行(out-of-order execution)及AXI4-Lite介面，M擴展(M extension)可以包括32位元×8位元乘法器和32位元×1位元除法器(divider)。 Based on the RISC-V RV32IM ISA, it is possible to design a custom CPU core that can handle 32-bit integer-based instructions for integer multiplication and division (M extension). Specifically, the microarchitecture of the base core supports single-issue, out-of-order execution and AXI4-Lite interface, and the M extension can include 32 Bit × 8-bit multiplier and 32-bit × 1-bit divider (divider).

圖3所示的5層基本內核中各步驟之間的灰色區域示出了針對所建議的指令超載的新提議部分。算術指令超載(Arithmetic instruction overloading)需要更改算數邏輯單位(ALU，arithmetic logic unit)、乘法器和除法器，而載入/存儲指令超載可能需要更改記憶體載入及存放裝置。在圖3中，可以省略跳轉和分支單元(jump-and-branch unit)及重排序單元(re-order unit)等一些構成要素。 The gray areas between the steps in the 5-layer basic kernel shown in Figure 3 show the new proposed parts for the proposed instruction overload. Arithmetic instruction overloading requires changes to the arithmetic logic unit (ALU, arithmetic logic unit), multipliers and dividers, while load/store instruction overloading may require changes Memory loading and storage device. In Figure 3, some components such as jump-and-branch unit and re-order unit can be omitted.

參照圖4，示出了根據一實施例的包括在處理器中的通用暫存器(GPR)410、配置緩衝區420及資料緩衝區430。配置緩衝區420可以包括srcFlags、TextAddr、TextNum、RndAddr及KeyAddr。資料緩衝區430可以包括Text0-Text7、Round、KeyPointer及KeyConfig。儘管後面將詳細描述，但srcFlags、TextAddr和TextNum可用於算術指令超載；RndAddr、KeyAddr、Text0-Text7、Round、KeyPointer和KeyConfig可用於載入/存儲指令超載。記憶體載入單元440可以是載入存儲在記憶體中的指令和/或運算元的設備，並且，記憶體存儲單元450可以是將GPR 410資料存儲在記憶體和/或資料緩衝區430中的設備。 Referring to Figure 4, a general purpose register (GPR) 410, a configuration buffer 420 and a data buffer 430 included in a processor are shown according to an embodiment. Configuration buffer 420 may include srcFlags, TextAddr, TextNum, RndAddr, and KeyAddr. Data buffer 430 may include Text0-Text7, Round, KeyPointer and KeyConfig. Although described in detail later, srcFlags, TextAddr, and TextNum can be used for arithmetic instruction overloading; RndAddr, KeyAddr, Text0-Text7, Round, KeyPointer, and KeyConfig can be used for load/store instruction overloading. The memory loading unit 440 may be a device that loads instructions and/or operands stored in the memory, and the memory storage unit 450 may be a device that stores the GPR 410 data in the memory and/or data buffer 430 equipment.

算術指令超載Arithmetic instruction overload

在區塊編碼器中可能很少使用整數乘法(*)和整數除法(/)之類的算數運算。因此，可以將這些運算的指令分配給加密變換及運算，如進階加密標準(AES，Advanced Encryption Standard)的位元組代替(SubByte)及捲積。此外，移位元指令(如，<<及>>)可以分配給旋轉，這可以比區塊編碼器中的移位運算更經常使用。因此，可以藉由超載與運算子*、/、<<及>>相對應的指令，為已註冊的變數提供擴展運算。 Arithmetic operations such as integer multiplication (*) and integer division (/) may be rarely used in block encoders. Therefore, instructions for these operations can be assigned to encryption transformations and operations, such as byte substitution (SubByte) and convolution of the Advanced Encryption Standard (AES). In addition, shift meta-instructions (e.g., << and >>) can be assigned to the rotation, this can be used more often than the shift operation in block encoders. Therefore, extended operations can be provided for registered variables by overloading the instructions corresponding to the operators *, /, << and >>.

對於變數註冊，處理器可以包括配置緩衝區(Config-Buffers)，其包括圖4所示的TextAddr、TextNum及srcFlags。TextAddr可以存儲要註冊的中間值變數(intermediate value variable)的起始位址，TextNum可以存儲中間值變數的數目。例如，TextAddr和TextNum可以每個都有4位元(4-bit)。 For variable registration, the processor can include a configuration buffer (Config-Buffers), which includes TextAddr, TextNum and srcFlags shown in Figure 4. TextAddr can store the starting address of the intermediate value variable to be registered, and TextNum can store the number of intermediate value variables. For example, TextAddr and TextNum can each have 4-bit.

srcFlags可以連接到GPR 410中的a0-a7(即，x10-x17)，其用於將函數引數(function arguments)或內部變數(internal variables)存儲在RISC-V內核中。當載入的資料的位址在TextAddr和TextNum的位址範圍內時，每個標誌都可以設置為1。換言之，如果存儲運算元的資料位址在[TextAddr，TextAddr+TextNum]範圍內，則當該運算元載入到GPR 410時，srcFlags可以將與a0-a7之中存儲一個以上該資料相對應的標記設置為1。例如，srcFlags可以具有1位元x8。 srcFlags can be connected to a0-a7 (i.e., x10-x17) in GPR 410, which is used to store function arguments or internal variables in the RISC-V kernel. Each flag can be set to 1 when the address of the loaded data is within the address range of TextAddr and TextNum. In other words, if the data address storing the operand is in the range of [TextAddr, TextAddr+TextNum], then when the operand is loaded into GPR 410, srcFlags can store more than one data corresponding to a0-a7. The flag is set to 1. For example, srcFlags can have 1 bit x8.

參照圖5，示出了擴展移位(<<)、乘法(*)和除法(/)指令的執行以支援AES的示例。由於右旋可以表示為左旋，因此右移(>>)可能不會單獨擴展。從註冊變數(如，[TextAddr，TextAddr+TextNum]位址範圍)載入運算元時，即，如果運算元的srcFlag位元為1，則可以按以下方式執行擴展運算。 Referring to Figure 5, an example is shown of extending the execution of shift (<<), multiply (*), and divide (/) instructions to support AES. Since right rotation can be expressed as left rotation, right shift (>>) may not expand on its own. When an operand is loaded from a registered variable (e.g., [TextAddr, TextAddr+TextNum] address range), that is, if the operand's srcFlag bit is 1, the extended operation can be performed as follows.

首先，移位命令可以解釋為旋轉(rotation)。因此，可以使用符號<<旋轉字元(word)。 First, the shift command can be interpreted as rotation. Therefore, you can use the symbol << to rotate a word.

接著，乘法指令可以解釋為所選字元中4個位元組的並行捲積。該C運算式為如下：[數學式1]w＊0xA₁A₀1B₁B₀000在此，w是來自註冊變數的字元，0xA₁A₀是要乘以w中每個位元組的多項式(polynomial)，0x1B₁B₀可以是GF(2⁸)的歸約多項式(reduction polynomial)。即，如果w中的4個位元組由b ₀ 、b ₁ 、w ₂及b ₃來表示，則數學式1可以意味著b _i←b _i×0xA₁A₀ mod 0x1B₁B₀(i=0,...,3)。 The multiply instruction can then be interpreted as a parallel convolution of 4 bytes in the selected word. The C operation formula is as follows: [Mathematical formula 1] w *0xA ₁ A ₀ 1B ₁ B ₀ 000 Here, w is the character from the registered variable, and 0xA ₁ A ₀ is to be multiplied by each byte in w The polynomial (polynomial), 0x1B ₁ B ₀ can be the reduction polynomial (reduction polynomial) of GF (2 ⁸ ). That is, if the 4 bytes in w are represented by b ₀ , b ₁ , w ₂ and b ₃ , then Mathematical Expression 1 can mean b _i ← b _i ×0xA ₁ A ₀ mod 0x1B ₁ B ₀ ( i =0,...,3).

最後，超載指令可以是除法。藉由使用除法指令，可以在所選字元的4個位元組中同時執行位元組代替(SubBytes)、逆位元組代替(inverse SubBytes)。該C運算式為如下：[數學式2]±1/w在此，負號可以表示用於解碼的逆位元組代替。 Finally, the overload instruction can be a division. By using the division instruction, byte substitution (SubBytes) and inverse byte substitution (inverse SubBytes) can be performed simultaneously in the 4 bytes of the selected character. The C operation formula is as follows: [Math. 2] ±1/ w Here, the negative sign may represent the inverse byte used for decoding instead.

回到圖4，將詳細描述載入/存儲指令超載。 Returning to Figure 4, load/store instruction overloading will be described in detail.

載入/存儲指令超載Load/store instruction overload

如上所述，區塊編碼器通常可以使用輪次計數器、輪金鑰指標及中間值變數。對這些變數的運算也很常見。超載這些變數的載入/存儲指令可以藉由兩種方式提高性能。 As mentioned above, block encoders can typically use round counters, round key indicators, and intermediate variables. Operations on these variables are also common. Overloading these variable load/store instructions can improve performance in two ways.

首先，藉由使用專用緩衝區(dedicated buffer)，可以更快地存取公共區塊編碼器變數(common block cipher variables)的值。如果正在運行的程式的資料在記憶體中，則記憶體存取可能會非常緩慢。如果基本RISC-V內核上沒有記憶體等待時間，則一條載入指令可能需要一個或兩個時鐘週期。但是，根據系統組態，週期數可以增加到數十個週期。可以藉由為區塊編碼器變數添加專用緩衝區來解決該問題。在建議的指令超載的說明下，緩衝區可用作快速存取的手段。在此，專用緩衝區可以稱為資料緩衝區。 First, by using a dedicated buffer, the values of common block cipher variables can be accessed faster. If the data of a running program is in memory, memory access may be very slow. Without memory latency on a base RISC-V core, a load instruction might take one or two clock cycles. However, depending on the system configuration, weekly The number of periods can be increased to dozens of periods. This problem can be solved by adding a dedicated buffer for the block encoder variables. Buffers can be used as a means of fast access under the instructions recommended for overloading. Here, the dedicated buffer can be called a data buffer.

如上所述，中間值變數的位址藉由使用TextAddr和TextNum進行註冊，該值可以用來為超載的算術指令設置srcFlags。這兩個配置緩衝區(即，TextAddr和TextNum)也從作為中間值變數的資料緩衝區的Text0-Text7載入中間值變數，或者可用於超載的載入及存儲指令，以將中間值變數存儲在Text0-Text7中。其他區塊編碼器變數(如，輪次計數器和輪金鑰指標))可以分別常駐(reside)資料緩衝區(如，輪次和金鑰指標)中，並且，該位址可以分別註冊在RndAddr和KeyAddr等其他配置緩衝區(additional Congif-Buffers)中。無論記憶體延遲如何，資料緩衝區都可以提供快速的資料存取。這意味著存取每個資料緩衝區可能總是只需要一個時鐘週期。使用快取(cache)和註冊(register)關鍵字的其他方法也可以提供更快的資料存取，但由於並非所有所需變數始終位於GPR或快取中，因此這些方法可能無法保證快速且恒定的資料存取。 As mentioned above, the address of the intermediate value variable is registered using TextAddr and TextNum. This value can be used to set srcFlags for overloaded arithmetic instructions. The two configuration buffers (i.e., TextAddr and TextNum) also load intermediate variables from Text0-Text7 as data buffers for intermediate variables, or can be used for overloaded load and store instructions to store intermediate variables In Text0-Text7. Other block encoder variables (e.g., round counter and round key indicators) can reside in the data buffer (e.g., round and key indicators) respectively, and the address can be separately registered in RndAddr and KeyAddr and other configuration buffers (additional Congif-Buffers). Data buffers provide fast data access regardless of memory latency. This means that accessing each data buffer may always require only one clock cycle. Other methods using the cache and register keywords may also provide faster data access, but since not all required variables are always in the GPR or cache, these methods may not be guaranteed to be fast and constant. data access.

在一實施例中，RndAddr可以存儲其中存儲輪次計數器變數的位址，而KeyAddr可以存儲其中存儲輪金鑰指標的位址。例如，RndAddr和KeyAddr都可以分別具有4位元。此外，Text0-Text7存儲中間值變數(如，運算元等)，例如，每個變數可以具有32位元。輪次(Round)可以存儲輪次計數器，金鑰指標(KeyPointer)可以存儲輪金鑰指標，KeyConfig可以存儲金鑰指標的遞增(increment)(如，0、1、-1等)。例如，輪次可以具有8位元，金鑰指標可以具有32位元，KeyConfig可以具有2位元。 In one embodiment, RndAddr may store the address where the round counter variable is stored, and KeyAddr may store the address where the round key pointer is stored. For example, both RndAddr and KeyAddr can have 4 bits each. In addition, Text0-Text7 store intermediate value variables (eg, operands, etc.), for example, each variable may have 32 bits. Round can store the round counter, KeyPointer can store the round key pointer, and KeyConfig can store the increment of the key pointer (e.g., 0, 1, -1, etc.). For example, Round can have 8 bits, KeyPointer can have 32 bits, and KeyConfig can have 2 bits.

其次，每當載入資料緩衝區中的資料時，可以自動執行遞增、遞減(decrement)和比較(comparison)之類的常規運算。具體地，當讀取輪次計數器時，其值被自動遞減，並且可以返回將更新的輪次計數器與0進行比較的結果。當讀取輪金鑰指標時，返回指定的值而不是指標值，由此可以減少一個載入指令的數量。同時，可以對指標進行遞增和遞減以分別進行加密和解密。指示輪金鑰指標的增加或減少的標誌可以存儲在附加緩衝區(additional buffer)的KeyConfig中。當執行超載的載入指令時，輪次和金鑰指標之類的資料緩衝區可用於常規運算以及快速資料存取。 Secondly, regular operations such as increment, decrement, and comparison can be automatically performed whenever data in the data buffer is loaded. Specifically, when the round counter is read, its value is automatically decremented, and the result of comparing the updated round counter with 0 can be returned. When reading the wheel key indicator, return the specified value instead of the indicator value, thereby reducing the number of load instructions. At the same time, the indicator can be incremented and decremented for encryption and decryption respectively. Flags indicating an increase or decrease in the round key metric can be stored in the KeyConfig in an additional buffer. When executing overloaded load instructions, data buffers such as round and key indicators can be used for regular operations as well as fast data access.

使用指令超載的AES代碼AES code using instruction overloading

在AES中，包括輸入的純文字的16位元組的中間值可以被處理為名為state的(4×4)-位元組的陣列。假設bi是每個中間值的第i個位元組。然後state可以對應於bi(0

i<16)。 In AES, intermediate values consisting of 16 bytes of input plain text can be processed as a (4×4)-byte array named state. Suppose bi is the ith byte of each intermediate value. Then state can correspond to bi(0

i<16).

每一輪AES可以由AddRoundKey、SubBytes、ShiftRows及MixColumns等變換組成。AddRoundKey是帶有輪金鑰的簡單XOR，其他三個變換為如下。 Each round of AES can be composed of AddRoundKey, SubBytes, ShiftRows and MixColumns transformations. AddRoundKey is a simple XOR with a round key, and the other three transformations are as follows.

SubBytes(SBs)：SubByte(b _i)可以對每個位元組b _i執行非線性替換(nonlinear substitutions)。這種非線性運算通常被定義為乘法逆元素，然後緊接著可以是具有預定義向量的矩陣乘法和使用XOR(♁)的仿射變換(affine transformation)。然而，藉由使用歸約多項式0x101，可以將矩陣乘法表示為捲積。因此，b _i '=位元組代替(b _i)可以如下所示：[數學式3]b _i '←((b _i ^-1 mod 0x11B)＊0x1F mod 0x101)♁0x63 SubBytes(SBs): SubByte( b _i ) can perform nonlinear substitutions on each byte b _i . This nonlinear operation is usually defined as a multiplicative inverse, which can then be followed by matrix multiplication with predefined vectors and an affine transformation using XOR(♁). However, by using the reduction polynomial 0x101, matrix multiplication can be expressed as a convolution. Therefore, b _i ' = byte instead of ( b _i ) can be as follows: [Math 3] b _i ' ←(( b _i ^-1 mod 0x11B)*0x1F mod 0x101)♁0x63

ShiftRows(SRs)：ShiftRow(r _j)可以將每行r _j向左旋轉j個位元組。 ShiftRows(SRs): ShiftRow( r _j ) can rotate each row r _j to the left by j bytes.

MixColumns(MCs)：MixColumn(c _j)可以混合c _j列。MixColumn(c _j)的每個位元組可以定義如下。 MixColumns(MCs): MixColumn( c _j ) can mix c _j columns. Each byte of MixColumn( c _j ) can be defined as follows.

在此，xi及yi(0

i

4)分別是輸入和輸出列的第i個位元組，並且，*可以表示使用多項式0x11B的捲積。

Here, xi and yi (0

i

4) are the i-th byte of the input and output columns respectively, and, * can represent the convolution using polynomial 0x11B.

綜上所述，AES變換可能需要對GF(2⁸)、XOR及旋轉進行捲積及乘法逆元素。在這些運算中，只有XOR在通用處理器作為單個指令被支援，而其他運算可能需要多個指令。尤其，捲積和乘法逆元素計算起來可能比較複雜，並且可能是照位元組(byte-wise)，而不是照字元(word-wise)。藉由將指令超載應用於此類AES代碼，可以有效地克服這些問題。 To sum up, AES transformation may require convolution and multiplicative inverse elements of GF(2 ⁸ ), XOR and rotation. Among these operations, only XOR is supported as a single instruction on general-purpose processors, while other operations may require multiple instructions. In particular, the convolutional and multiplicative inverses may be complex to compute and may be byte-wise rather than word-wise. These problems can be effectively overcome by applying instruction overloading to such AES code.

參照圖6，示出了根據一實施例的對於超載指令的擴展運算(extended operations)的巨集(macros)的示例。巨集可以定義為C運算式(C expressions)。可以藉由定義這些巨集來支援除錯，而不是直接使用擴展運算，如*、/及<<。圖6最後一列中的定義可用於在沒有指令超載的情況下使用正常運算(normal operations)的除錯。函數SB、invSB及捲積可以分別在所選字元中對4個位元組執行位元組代替(SubBytes)、逆位元組代替(inverse SubBytes)及捲積。圖6的巨集ROL、MLT及INV可以在編譯時變換為移位元、乘法及除法指令。然而，如果運算元是從資料緩衝區載入的，則處理器可以分別以旋轉、捲積及位元組代替的形式來執行這些指令。 Referring to FIG. 6 , an example of a macro for extended operations of an overload instruction is shown, according to an embodiment. Macros can be defined as C expressions. Debugging can be supported by defining these macros, and Instead of using expansion operations directly, such as *, / and <<. The definitions in the last column of Figure 6 can be used for debugging using normal operations without instruction overload. Functions SB, invSB and convolution can respectively perform byte substitution (SubBytes), inverse byte substitution (inverse SubBytes) and convolution on 4 bytes in the selected character. The macros ROL, MLT and INV in Figure 6 can be converted into shift elements, multiplication and division instructions at compile time. However, if the operands are loaded from the data buffer, the processor can execute these instructions as rotations, convolutions, and byte substitution respectively.

參照圖7，作為根據一實施例的使用巨集的指令超載來在AES代碼中進行變換的示例，示出了RowCalc_SB 710、RowCalc_SB_SR 720、RowCalc_MC 730。 Referring to FIG. 7 , RowCalc_SB 710 , RowCalc_SB_SR 720 , RowCalc_MC 730 are shown as an example of using instruction overloading of macros to perform transformations in AES code according to an embodiment.

使用上述巨集的AES加密代碼可以如下所示。 The AES encryption code using the above macro can be as shown below.

該代碼具有以下三個優點：首先，在保持加密速度的同時，可能不需要預先計算的表。可以將先前代碼的第10-18行中的列級巨集(如，ColumnCalc及ColumnCalc_Last)更改為行級巨集(如，RowCalc_SB、RowCalc_SB_SR及RowCalc_MC)。 This code has three advantages: First, precomputed tables may not be needed while maintaining encryption speed. You can replace lines 10-18 of the previous code with Column-level macros (eg, ColumnCalc and ColumnCalc_Last) are changed to row-level macros (eg, RowCalc_SB, RowCalc_SB_SR, and RowCalc_MC).

如下面將要描述的圖8所示，該巨集可以定義為ROL、MLT及INV，並且可以應用於如圖7所示的state的每一行。可以考慮按行排列(row-wise arrangement)，而不是按列排列(column-wise arrangement)。由此，可以充分利用擴展運算的優點。例如，3個ROL運算可以在一輪次中完成ShiftRows。另外，RowCalc_MC 730可以指示使用一個MLT運算將相同運算元乘以一行的4個位元組。 As shown in Figure 8 to be described below, this macro can be defined as ROL, MLT, and INV, and can be applied to each row of state as shown in Figure 7. Consider row-wise arrangement rather than column-wise arrangement. Thus, the advantages of extended operations can be fully utilized. For example, 3 ROL operations can be completed in one round of ShiftRows. Additionally, RowCalc_MC 730 may indicate using an MLT operation to multiply the same operand by the 4 bytes of a row.

參照圖8，示出了根據一實施例的用於使用超載指令的AES變換的巨集的示例。可以藉由擴展運算(如，ROL、MLT及INV)快速執行AES變換，而無需對預先計算的表進行記憶體存取。當變換被分離時，while迴圈的每個反覆運算都可以對應於一個輪次(round)。 Referring to FIG. 8 , an example of a macro for AES transformation using overload instructions is shown, according to an embodiment. AES transformations can be performed quickly through extended operations such as ROL, MLT, and INV without requiring memory access to precomputed tables. When transformations are separated, each iteration of the while loop can correspond to a round.

其次，可以簡化與輪次計算器及輪金鑰指標相關聯的C運算式。在先前的AES加密代碼的第5-8、15-18及20-23行中，巨集LoadKey在C運算式中只能表示載入指標rk。然而，處理器實際上載入由rk表示的輪金鑰，並且如圖6所示，可以增加或減少rk。在第14行中，巨集LoadRound在C運算式中只能指示load rnd。然而，處理器實際上可以載入rnd，將rnd減1，並且當rnd=0時，可以返回1。 Secondly, the C expressions associated with the round calculator and round key indicators can be simplified. In lines 5-8, 15-18 and 20-23 of the previous AES encryption code, The macro LoadKey can only represent the load index rk in C calculations. However, the processor actually loads the round key represented by rk, and can increase or decrease rk as shown in Figure 6. In line 14, the macro LoadRound can only indicate load rnd in the C expression. However, the processor can actually load rnd, decrement it by 1, and return 1 when rnd=0.

第三，由於有資料緩衝區，可以更快地存取區塊編碼器變數。儘管這未出現在C運算式中，但rnd、rk、r0-r3及rr0-rr3等區塊編碼器變數可以駐留在資料緩衝區中而不是記憶體中。無論記憶體配置如何，這都可以提供快速且恒定的存取區塊編碼器變數的速度。 Third, block encoder variables can be accessed faster due to the data buffer. Although this does not appear in the C expression, block encoder variables such as rnd, rk, r0-r3, and rr0-rr3 can reside in the data buffer instead of memory. This provides fast and constant access to block encoder variables regardless of memory configuration.

參照圖9，示出了根據一實施例的用於變數註冊及釋放的巨集的示例。 Referring to FIG. 9 , an example of a macro for variable registration and release is shown according to an embodiment.

要使用超載指令，必須將區塊編碼器變數的位址預先註冊到配置緩衝區中。配置緩衝區可以是記憶體映射緩衝區(memory mapped buffers)。因此，可以使用預定義的指標(如，TEXT_ADDR、TEXT_NUM、RND_ADDR、KEY_ADDR及 KEY_CONFIG等)進行存取。指標可以定義為揮發性指標類型。例如，TEXT_ADDR可以定義為：#define TEXT_ADDR(volatile u32 *)0x10000000通常，此類方法可以用於控制裝置驅動程式的週邊裝置或配置處理器。例如，可以存取RISC-V內核的控制和狀態註冊以及ARM內核的系統控制段來讀取狀態並更改配置。也可以使用類似的方法。為了註冊並釋放變數，可以將使用上述五個指標的巨集顯示在圖9中。使用巨集SetText，可以將先前的AES加密代碼的r0-r3及rr0-rr3等中間值變數的起始位址和數量記錄在TextAddr及TextNum中。這些可以分別由TEXT_ADDR和TEXT_NUM來指向。同樣地，巨集SetRound及SetKeyPointer可以用於將輪次計算器及輪金鑰指標的位址註冊到RND_ADDR及KEY_ADDR分別指向的RndAddr及KeyAddr。此外，輪金鑰指標每次載入時都會自動增加或減少，並且，巨集SetKeyPointer可以包括在KEY_CONFIG指向的KeyConfig中寫入指示增加或減少的標誌。註冊之後，在使用巨集ReleaseAddr來釋放這些變數之前，只能在處理器的資料緩衝區而不是記憶體中載入並保存已註冊的變數。這些巨集的定義只將TEXT_NUM重置為0，但處理器實際上可以釋放所有註冊的變數。 To use the overload instruction, the addresses of the block encoder variables must be pre-registered in the configuration buffer. Configuration buffers can be memory mapped buffers. Therefore, it is possible to use predefined indicators such as TEXT_ADDR, TEXT_NUM, RND_ADDR, KEY_ADDR and KEY_CONFIG, etc.) to access. Indicators can be defined as volatile indicator types. For example, TEXT_ADDR can be defined as: #define TEXT_ADDR(volatile u32 *)0x10000000 Typically, such methods can be used to control a device driver's peripherals or configure the processor. For example, the RISC-V core's control and status registers and the ARM core's system control segment can be accessed to read status and change configuration. A similar approach can also be used. In order to register and release variables, a macro using the above five indicators can be shown in Figure 9. Using the macro SetText, the starting address and number of intermediate value variables such as r0-r3 and rr0-rr3 of the previous AES encryption code can be recorded in TextAddr and TextNum. These can be pointed to by TEXT_ADDR and TEXT_NUM respectively. Similarly, the macros SetRound and SetKeyPointer can be used to register the addresses of the round calculator and the round key pointer to the RndAddr and KeyAddr pointed to by RND_ADDR and KEY_ADDR respectively. Additionally, the wheel key pointer is automatically incremented or decremented each time it is loaded, and the macro SetKeyPointer can include writing a flag indicating the increment or decrement in the KeyConfig pointed to by KEY_CONFIG. After registration, registered variables can only be loaded and saved in the processor's data buffer rather than in memory before using the macro ReleaseAddr to release these variables. The definition of these macros only resets TEXT_NUM to 0, but the processor can actually release all registered variables.

用於多個分組加密(Multiple Block Ciphers)的指令超載Instruction overload for Multiple Block Ciphers

SM4、SEED及ARIA等一些最新的分組加密可以使用多個GF(2⁸)以上的運算進行設計。這些密碼的主要常用運算可以是捲積和乘法逆元素。然而，歸約多項式可能因密碼而異。因此，為了支援各種分組加密，需要具有任意歸約多項式(arbitrary reduction polynomial)的乘法逆元素。這些功能對於實現靈活的S-box特別有用。可見，可以應用指令超載來支援GF(2⁸)中定義的更廣泛的分組加密。 Some of the latest block ciphers such as SM4, SEED and ARIA can be designed using operations above multiple GF(2 ⁸ ). The main common operations of these ciphers can be convolution and multiplicative inverse elements. However, the reduction polynomial may vary from cipher to cipher. Therefore, in order to support various block encryptions, multiplicative inverse elements with arbitrary reduction polynomials are needed. These features are particularly useful for implementing flexible S-boxes. It can be seen that instruction overloading can be applied to support broader block ciphers defined in GF( ²⁸ ).

當運算元的srcFlag位元為1時，可以重新定義除法(/)指令的副檔名，以便可以執行乘法逆元素。對於乘法逆元素的相應C運算式為如下：[數學式5]0x1A₁A₀/w 在此，0x1A₁A₀可以是GF(2⁸)的歸約多項式。換言之，如果將w中的4個位元組表示為b ₀ 、b ₁ 、b ₂及b₃，則數學式5可以意味著非零(nonzero)b _i的b _i←b _i ^-1 mod 0x1A₁A₀(i=0,...,3)。零(zero)b _i可以自己映射。用於此操作的硬體邏輯閘可能需要二進位求逆，並需要對於歸約多項式的三個時鐘週期來進行計算。 When the srcFlag bit of the operand is 1, the extension of the division (/) instruction can be redefined so that multiplicative inverse elements can be performed. The corresponding C operation formula for the multiplicative inverse element is as follows: [Math. 5] 0x1A ₁ A ₀ / w Here, 0x1A ₁ A ₀ can be the reduction polynomial of GF(2 ⁸ ). In other words, if the 4 bytes in w are represented as b ₀ , b ₁ , b ₂ and b ₃ , then Mathematical Expression 5 can mean that nonzero b _i of b i ← b i _-1 _mod ^0x1A ₁ A ₀ ( i =0,...,3). Zero (zero) b _i can map itself. Hardware logic gates for this operation may require binary inversion and require three clock cycles for the reduction polynomial to perform the calculation.

在重新定義的除法指令的情況下，用於圖6的除錯的巨集INV的定義可能必須使用歸約多項式n更改為乘法逆元素，而不考慮加密或解密。藉由將數學式1的捲積和數學式5的乘法逆元素相結合，將圖8的RowCalc_SB(r)重新定義為((0x11B/r)*A)^C，其中，可以是A=0x1F101000、C=0x63636363。這可能與將數學式2的位元組代替應用於4位元組的行相同。超載的乘法指令數學式1可以已經支援任意歸約多項式。 In the case of a redefined divide instruction, the definition of the macro INV used for the debug of Figure 6 may have to be changed to a multiplicative inverse element using the reduction polynomial n, regardless of encryption or decryption. By combining the convolution of Mathematical Expression 1 and the multiplicative inverse element of Mathematical Expression 5, RowCalc_SB(r) in Figure 8 is redefined as ((0x11B/r)*A)^C, where A=0x1F101000 can be , C=0x63636363. This might be the same as applying the byte substitution of math 2 to a row of 4 bytes. The overloaded multiplication instruction Math 1 can already support arbitrary reduction polynomials.

現在，可以舉出在SM4中使用超載指令的示例。SM4的位元組代替可以定義如下。 Now, an example of using the overload instruction in SM4 can be given. The byte replacement for SM4 can be defined as follows.

[數學式6]b _i '←(((b _i＊0xCB)♁0xD3)^-1＊0xCB)♁0xD3 在此，0x1F5及0x101可以分別是乘法逆元素和捲積的歸約多項式。因此，可以將SM4中所選擇的字元的4個位元組的位元組代替定義為((((0x1F5/((r*A)^C))*A)^C。其中，可以是A=0xCB101000、C=0xD3D3D3D3。可以使用乘法逆元素類似地定義AES及SM4的逆位元組代替(inverse SubBytes)。即，超載的除法指令可能需要相同的乘法逆元素邏輯電路進行加密及解密。 [Math. 6] b _i ' ←((( b _i ＊0xCB)♁0xD3) ^-1 ＊0xCB)♁0xD3 Here, 0x1F5 and 0x101 can be the multiplicative inverse element and the reduction polynomial of the convolution respectively. Therefore, the 4-byte bytes of the selected character in SM4 can be defined as ((((0x1F5/((r*A)^C))*A)^C. Where, it can be A=0xCB101000, C=0xD3D3D3D3. The inverse SubBytes of AES and SM4 can be similarly defined using the multiplicative inverse element. That is, an overloaded division instruction may require the same multiplicative inverse element logic circuit for encryption and decryption.

用於遮罩的指令超載Directive for masking overloaded

功耗分析攻擊可以是用於藉由統計地分析在與金鑰有關的加密運算期間收集的功耗跟蹤來恢復秘密金鑰的技術。遮罩可以是一種保護分組加密免受此類攻擊的措施。這可能是一種秘密分享技術(secret sharing technique)，其將用於加密運算的敏感中間變數以使用稱為遮罩的隨機產生器(randomizers)分為分享。 A power analysis attack may be a technique used to recover a secret key by statistically analyzing power consumption traces collected during cryptographic operations associated with the key. Masking can be a measure to protect block encryption from such attacks. This may be a secret sharing technique in which sensitive intermediate variables used in cryptographic operations are shared using randomizers called masks.

要向處理器添加硬體遮罩功能，可以考慮以下四個因素：亂數產生器(random number generator)；遮罩值緩衝區(buffers for mask values)；載入和存儲單元的擴展(extension of load and store units)；用於遮罩運算的ALU、乘法器和除法器的擴展。 To add hardware masking capabilities to the processor, consider the following four factors: random number generator; buffers for mask values mask values); extension of load and store units; extension of ALU, multipliers and dividers for mask operations.

首先，可以隨機設置遮罩值以抵抗功耗分析攻擊。這時會用到真亂數產生器(TRNG，true random number generator)，其生成的亂數序列可以藉由美國國家標準與技術研究所(NIST，National institute of Standards and Technology)的隨機測試套件(random test suites)。當首次將通用字元存儲在Text0-Text7中以及在遮罩運算期間當前的遮罩值更新為新值時，由4對TRNG和8位元線性反饋移位暫存器(LFSR，linear feedback shift register)生成的32位亂數可以用作初始遮罩值。 First, the mask value can be randomly set to resist power analysis attacks. At this time, a true random number generator (TRNG) will be used. The random number sequence generated by it can be passed through the random test suite (random test suite) of the National Institute of Standards and Technology (NIST). test suites). When the general character is first stored in Text0-Text7 and the current mask value is updated to a new value during the mask operation, it is composed of 4 pairs of TRNG and 8-bit linear feedback shift register (LFSR, linear feedback shift The 32-bit random number generated by register) can be used as the initial mask value.

第二，為了保持遮罩值，可以添加稱為遮罩緩衝區(Mask-Buffers)的緩衝區，該緩衝區包括分別連接到GPR中Text0-Text7及a0-a7的TextMask0-TextMask7及GprMask0-GprMask7。遮罩值存儲在遮罩緩衝區(MaskBuffers)中，如果使用該值，則相應的Text0-Text7及a0-a7的值可以被XOR-遮罩。 Second, in order to maintain the mask value, you can add a buffer called mask buffer (Mask-Buffers), which includes TextMask0-TextMask7 and GprMask0-GprMask7 respectively connected to Text0-Text7 and a0-a7 in GPR. . The mask value is stored in the mask buffer (MaskBuffers). If this value is used, the corresponding Text0-Text7 and a0-a7 values can be XOR-masked.

第三，可以重新定義載入及存儲指令，以支援遮罩值的傳輸。即，當在Text0-Text7與a0-a7之間傳輸遮罩值時，超載的載入和存儲指令可以在TextMask0-TextMask7與GprMask0-GprMask7之間傳輸該遮罩。 Third, the load and store instructions can be redefined to support the transmission of mask values. That is, when transferring mask values between Text0-Text7 and a0-a7, the overloaded Load and store instructions transfer this mask between TextMask0-TextMask7 and GprMask0-GprMask7.

最後，為了支援對AES、SM4及SIMON遮罩的運算，可以修改以下5種運算的執行：乘法逆元素(/)、AND(&)、捲積(*)、旋轉(<<)及XOR(^)。 Finally, in order to support the operation of AES, SM4 and SIMON masks, the execution of the following five operations can be modified: multiplicative inverse element (/), AND (&), convolution (*), rotation (<<) and XOR ( ^).

第一條修改的指令可以是超載的除法指令，即乘法逆元素。遮罩方法如圖10所示。其中，Ma是值a的遮罩值。當給定XOR遮罩值a♁Ma時，可以藉由乘法和新的遮罩值Ma'變換為乘法遮罩值axMa'。在計算出倒數後，可以藉由進一步的乘法將結果變換成XOR遮罩值a-1♁ma’。 The first modified instruction could be an overloaded division instruction, i.e. multiply inverse elements. The masking method is shown in Figure 10. where Ma is the mask value of value a . When an XOR mask value a♁Ma is given, it can be transformed into a multiplicative mask value axMa ' by multiplication and the new mask value Ma ' . After calculating the reciprocal, the result can be transformed into an XOR mask value a-1♁ma' by further multiplication.

待修改的第二條指令可以是AND。假設要計算c=a&b。此時，&可以表示按位AND運算。假設AND指令的兩個運算元是XOR遮罩的A=a♁Ma及B=b♁Mb'，C=(A&B)♁(M _a&B)♁(M _b&A)等於c♁Mc，其中，M_c=M_a&M_b。遮罩的AND僅適用於SIMON，但如果另外應用遮罩，則還可以支援防止功耗分析的SPECK。最後，即使應用了遮罩，也不需要改變對XOR、捲積及旋轉的ALU及乘法器的運算。然而，可能需要將於遮罩值相同的運算應用到遮罩值。例如，如果GPR的遮罩值旋轉8位元，則遮罩緩衝區(Mask-Buffer)的該遮罩值也可能需要旋轉8位元。 The second instruction to be modified can be AND. Suppose you want to calculate c=a&b. At this time, & can represent a bitwise AND operation. Assume that the two operands of the AND instruction are A=a♁Ma and B=b♁Mb' of the XOR mask, C =( A & B )♁( M _a & B )♁( M _b & A ) is equal to c♁ Mc, where, M _c =M _a &M _b . AND of masks only works with SIMON, but if a mask is additionally applied, SPECK to prevent power analysis is also supported. Finally, even if a mask is applied, there is no need to change the ALU and multiplier operations for XOR, convolution, and rotation. However, it may be necessary to apply the same operation to the mask value. For example, if the mask value of the GPR is rotated by 8 bits, the mask value of the mask-buffer (Mask-Buffer) may also need to be rotated by 8 bits.

在步驟1110中，處理器識別用於指示執行第一運算的指令以及與指令相對應的運算元的位址資訊。例如，運算元從連接到處理器的記憶體中被載入，並被存儲在處理器中的專用緩衝區中，運算元的位址資訊可以表示存儲運算元的專用緩衝區中的位址。當運算元的位址資訊滿足預定條件時，運算元可以被存儲在處理器的資料緩衝區中；運算元的位址資訊可以被存儲在處理器的配置緩衝區中。配置緩衝區可以是記憶體映射緩衝區。 In step 1110, the processor identifies an instruction indicating execution of a first operation and address information of an operand corresponding to the instruction. For example, an operand is loaded from a memory connected to the processor and stored in a dedicated buffer in the processor. The address information of the operand may represent the address in the dedicated buffer where the operand is stored. When the address information of the operand meets the predetermined condition, the operand can be stored in the data buffer of the processor; the address information of the operand can be stored in the configuration buffer of the processor. The configuration buffer can be a memory mapped buffer.

在步驟1120中，處理器基於運算元的位址資訊是否滿足預定條件來執行指令。當運算元的位址資訊滿足預定條件時，處理器對運算元執行設定在指令中的第二運算；當運算元的位址資訊不滿足預定條件時，處理器對運算元執行第一運算。 In step 1120, the processor executes the instruction based on whether the address information of the operand meets a predetermined condition. When the address information of the operand meets the predetermined condition, the processor performs the second operation set in the instruction on the operand; when the address information of the operand does not meet the predetermined condition, the processor performs the first operation on the operand.

例如，預定條件可以對應於運算元的位址資訊是否屬於預設的位址範圍。在此，第一運算可以是在處理器中比第二運算執行得少的操作。在執行指令之前，根據預定條件的位址範圍可以預先被註冊在處理器。第二運算可以是不包括在處理器的ISA中的操作。 For example, the predetermined condition may correspond to whether the address information of the operand belongs to a preset address range. Here, the first operation may be an operation that is performed less frequently in the processor than the second operation. Before executing the instruction, the address range can be predetermined according to predetermined conditions. First be registered with the processor. The second operation may be an operation not included in the processor's ISA.

可以藉由連接到處理器中通用暫存器的標誌資訊來表示運算元的位址資訊是否滿足預定條件。當運算元的位址資訊滿足預定條件時，在處理器的操作中使用的輪次計算器和輪金鑰指標被存儲在處理器的資料緩衝區中，並且，輪次計數器和輪金鑰指標的位址資訊可以被存儲在處理器的配置緩衝區中。 Whether the address information of the operand meets the predetermined condition can be indicated by flag information connected to the general register in the processor. When the address information of the operand meets the predetermined condition, the round counter and round key pointer used in the operation of the processor are stored in the data buffer of the processor, and the round counter and round key pointer The address information can be stored in the processor's configuration buffer.

由於藉由圖1至圖10的上述內容原樣地應用於圖11中所示的每個步驟，因此將省略更詳細的描述。 Since the above through FIGS. 1 to 10 applies as it is to each step shown in FIG. 11 , a more detailed description will be omitted.

根據實施例的方法以能夠藉由多種電腦手段執行的程式命令的形式體現，並記錄在電腦讀寫介質中。所述電腦讀寫介質能夠以單獨或者組合的形式包括程式命令、資料檔案、資料結構等。記錄在所述介質的程式指令能夠是為實現實施例而特別設計與構成的指令，或者是電腦軟體領域普通技術人員能夠基於公知使用的指令。電腦讀寫記錄介質能夠包括硬碟、軟碟以及磁帶等磁性媒介(magnetic media)；與CD-ROM、DVD等類似的光學媒介(optical media)；與光磁軟碟(floptical disk)類似的磁光媒介(magneto-optical media)，以及與唯讀記憶體(ROM)、隨機存取記憶體(RAM)、快閃記憶體等類似的為存儲並執行程式命令而特別構成的硬體裝置。程式指令的例子不僅包括藉由編譯器生成的機器語言代碼，還包括藉由使用直譯器等能夠由電腦執行的高階語言代碼。為執行實施例的操作，所述硬體裝置能夠構成為以一個以上的軟體模組實現操作的方式，反之亦然。 The method according to the embodiment is embodied in the form of a program command that can be executed by a variety of computer means, and is recorded in a computer read-write medium. The computer read-write medium can include program commands, data files, data structures, etc. individually or in combination. The program instructions recorded on the medium can be instructions specially designed and constructed to implement the embodiments, or instructions that can be used by those of ordinary skill in the computer software field based on known usage. Computer read-write recording media can include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical media similar to CD-ROM, DVD, etc.; and magnetic media similar to floppy disks. Optical media (magneto-optical media), as well as read-only memory (ROM) and random access memory RAM, flash memory and other similar hardware devices specially constructed for storing and executing program commands. Examples of program instructions include not only machine language code generated by a compiler, but also high-level language code that can be executed by a computer by using an interpreter or the like. In order to perform the operations of the embodiments, the hardware device can be configured to implement the operations with more than one software module, and vice versa.

軟體能夠包括電腦程式(computer program)、代碼(code)、指令(instruction)，或其中的一個以上的組合，能夠使處理裝置按照所期待的方式操作，或者，單獨或共同(collectively)命令處理裝置。為藉由處理裝置進行解釋或者向處理裝置提供命令或資料，軟體和/或資料能夠永久或臨時體現於(embody)任何類型的設備、構成要素(component)、物理裝置、虛擬裝置(virtual equipment)、電腦存儲介質或裝置，或者傳送的信號波(signal wave)。軟體分佈於藉由網路連接的電腦系統上，能夠以分散式存儲或執行。軟體及資料能夠存儲於一個以上的電腦讀寫存儲介質中。 Software can include computer programs, codes, instructions, or a combination of more than one thereof, which can cause the processing device to operate in the desired manner, or command the processing device individually or collectively. . Software and/or data can be permanently or temporarily embodied in any type of equipment, component, physical device, or virtual equipment for the purpose of interpreting by or providing commands or information to a processing device. , computer storage media or devices, or transmitted signal waves. Software is distributed on computer systems connected through a network and can be stored or executed in a distributed manner. Software and data can be stored in more than one computer read-write storage medium.

綜上，藉由有限的實施例對實施例進行了說明，本領域的普通技術人員能夠對上述記載進行多種修改與變形。例如，所說明的技術以與所說明的方法不同的循序執行，和/或所說明的構成要素以與所說明的方法不同的形態結合或組合，或者，由其他構成要素或等同物進行替換或置換也能夠獲得相同的效果。 In summary, the embodiments have been described through limited embodiments, and those of ordinary skill in the art can make various modifications and variations to the above descriptions. For example, the described technology may be performed in a different order than the described method, and/or the described composition may The same effect can be obtained even if the elements are combined or combined in a form different from the method described, or if they are replaced or replaced by other constituent elements or equivalents.

由此，其他體現、其他實施例及權利要求範圍的均等物全部屬於專利權利要求的範圍。 Therefore, other embodiments, other embodiments, and equivalents of the scope of the claims all fall within the scope of the patent claims.

410:通用暫存器 410: General register

420:配置緩衝區 420:Configure buffer

430:資料緩衝區 430: Data buffer

440:記憶體載入單元 440: Memory load unit

450:記憶體存儲單元 450: Memory storage unit

Claims

一種處理器的操作方法，包括以下步驟：識別用於指示執行第一運算的指令以及與所述指令相對應的運算元的位址資訊；以及基於所述運算元的位址資訊是否滿足預定條件來執行所述指令；其中執行所述指令的步驟為：當所述運算元的位址資訊滿足所述預定條件時，對所述運算元執行設定在所述指令中的第二運算；當所述運算元的位址資訊不滿足所述預定條件時，對所述運算元執行所述第一運算；以及當所述運算元的位址資訊滿足所述預定條件時，在所述處理器的操作中使用的輪次計算器(round counter)和輪金鑰指標(round-key pointer)被存儲在所述處理器的資料緩衝區中，所述輪次計數器和所述輪金鑰指標的位址資訊被存儲在所述處理器的配置緩衝區中。 An operating method of a processor, including the following steps: identifying an instruction used to instruct execution of a first operation and address information of an operand corresponding to the instruction; and whether the address information of the operand satisfies a predetermined condition based on the instruction to execute the instruction; wherein the step of executing the instruction is: when the address information of the operand meets the predetermined condition, perform the second operation set in the instruction on the operand; when the address information of the operand meets the predetermined condition, When the address information of the operand does not satisfy the predetermined condition, perform the first operation on the operand; and when the address information of the operand meets the predetermined condition, in the processor The round counter and round-key pointer used in the operation are stored in the data buffer of the processor. The bits of the round counter and the round-key pointer are stored in the data buffer of the processor. Address information is stored in the processor's configuration buffer.

如請求項1所述之處理器的操作方法，其中所述預定條件為對應於所述運算元的位址資訊是否屬於預設的位址範圍。 The operating method of the processor according to claim 1, wherein the predetermined condition is whether the address information corresponding to the operand belongs to a preset address range.

如請求項1所述之處理器的操作方法，其中所述第一運算是在所述處理器中比所述第二運算執行得少的操作。 The operating method of a processor as claimed in claim 1, wherein the first operation is an operation performed less frequently in the processor than the second operation.

如請求項1所述之處理器的操作方法，其中在執行所述指令之前，根據所述預定條件的位址範圍預先被註冊在所述處理器。 The operating method of the processor according to claim 1, wherein before executing the instruction, the address range according to the predetermined condition is pre-registered in the processor.

如請求項1所述之處理器的操作方法，其中所述第二運算是不包括在所述處理器的ISA中的操作。 The operating method of the processor according to claim 1, wherein the second operation is an operation not included in the ISA of the processor.

如請求項1所述之處理器的操作方法，其中：所述運算元從連接到所述處理器的記憶體中被載入，並被存儲在所述處理器中的專用緩衝區中；以及所述運算元的位址資訊表示存儲所述運算元的所述專用緩衝區中的位址。 The operating method of the processor according to claim 1, wherein: the operation element is loaded from a memory connected to the processor and stored in a dedicated buffer in the processor; and The address information of the operand represents the address in the dedicated buffer where the operand is stored.

如請求項1所述之處理器的操作方法，其中：當所述運算元的位址資訊滿足所述預定條件時，所述運算元被存儲在所述處理器的資料緩衝區中，所述運算元的位址資訊被存儲在所述處理器的配置緩衝區中。 The operating method of the processor according to claim 1, wherein: when the address information of the operand meets the predetermined condition, the operand is stored in the data buffer of the processor, and the The address information of the operands is stored in the processor's configuration buffer.

如請求項1所述之處理器的操作方法，其中所述配置緩衝區是記憶體映射緩衝區。 The operating method of the processor as described in claim 1, wherein the configuration buffer is a memory mapping buffer.

如請求項1所述之處理器的操作方法，其中藉由連接到所述處理器中通用暫存器的標誌資訊來表示所述運算元的位址資訊是否滿足所述預定條件。 The operating method of a processor as described in claim 1, wherein flag information connected to a general register in the processor is used to indicate whether the address information of the operand meets the predetermined condition.

一種處理器，包括：一資料緩衝區，其存儲運算元；一配置緩衝區，其存儲所述運算元的位址資訊；以及一處理器單元，其識別用於指示執行第一運算的指令以及與所述指令相對應的所述運算元的位址資訊，並基於所述運算元的位址資訊是否滿足預定條件來執行所述指令；其中所述處理器單元在當所述運算元的位址資訊滿足所述預定條件時，對所述運算元執行設定在所述指令中的第二運算；在當所述運算元的位址資訊不滿足所述預定條件時，對所述運算元執行所述第一運算；輪次計算器(round counter)和輪金鑰指標(round-key pointer)被存儲在所述處理器的資料緩衝區中，並在所述處理器的操作中使用所述輪次計算器和所述輪金鑰指標；以及所述輪次計數器和所述輪金鑰指標的位址資訊被存儲在所述處理器的配置緩衝區中。 A processor includes: a data buffer that stores operands; a configuration buffer that stores address information of the operands; and a processor unit that identifies instructions for instructing execution of a first operation; The address information of the operand corresponding to the instruction, and executing the instruction based on whether the address information of the operand meets a predetermined condition; Wherein the processor unit performs the second operation set in the instruction on the operand when the address information of the operand meets the predetermined condition; when the address information of the operand When the predetermined condition is not met, the first operation is performed on the operand; a round counter and a round-key pointer are stored in the data buffer of the processor and the round counter and the round key pointer are used in the operation of the processor; and the address information of the round counter and the round key pointer is stored in the processor in the configuration buffer.

如請求項10所述之處理器，其中所述預定條件為對應於所述運算元的位址資訊是否屬於預設的位址範圍。 The processor of claim 10, wherein the predetermined condition is whether the address information corresponding to the operand belongs to a preset address range.

如請求項10所述之處理器，其中所述第一運算是在所述處理器中比所述第二運算執行得少的操作。 The processor of claim 10, wherein the first operation is an operation performed less frequently in the processor than the second operation.

如請求項10所述之處理器，其中在執行所述指令之前，根據所述預定條件的位址範圍預先被註冊在所述處理器。 The processor of claim 10, wherein before executing the instruction, the address range according to the predetermined condition is pre-registered in the processor.

如請求項10所述之處理器，其中所述第二運算是不包括在所述處理器的ISA中的操作。 The processor of claim 10, wherein the second operation is an operation not included in the ISA of the processor.

如請求項10所述之處理器，其中所述運算元從連接到所述處理器的記憶體中被載入，並被存儲在所述處理器中的專用緩衝區中，所述運算元的位址資訊表示存儲所述運算元的所述專用緩衝區中的位址。 The processor of claim 10, wherein the operand is loaded from a memory connected to the processor and stored in a dedicated buffer in the processor, and the operand The address information represents the address in the dedicated buffer where the operand is stored.

如請求項10所述之處理器，其中當所述運算元的位址資訊滿足所述預定條件時，所述運算元被存儲在所述處理器的資料緩衝區中，所述運算元的位址資訊被存儲在所述處理器的配置緩衝區中。 The processor of claim 10, wherein when the address information of the operand meets the predetermined condition, the operand is stored in the data buffer of the processor, and the bit of the operand Address information is stored in the processor's configuration buffer.

如請求項10所述之處理器，還包括：執行所述第二運算的專用運算子。 The processor according to claim 10, further comprising: a special operator for performing the second operation.

一種電子裝置，包括：一記憶體，其存儲指令及與所述指令相對應的運算元；以及一處理器，其執行所述指令；所述處理器，包括：一緩衝區，其存儲從所述記憶體接收的用於執行所述指令的所述運算元及所述運算元的位址資訊；以及一處理器單元，其識別用於指示執行第一運算的指令以及與所述指令相對應的所述運算元的位址資訊，並基於所述運算元的位址資訊是否滿足預定條件來執行所述指令；其中所述處理器單元在當所述運算元的位址資訊滿足所述預定條件時，對所述運算元執行設定在所述指令中的第二運算；在當所述運算元的位址資訊不滿足所述預定條件時，對所述運算元執行所述第一運算；輪次計算器(round counter)和輪金鑰指標(round-key pointer)被存儲在所述處理器的資料緩衝區中，並在所述處理器的操作中使用所述輪次計算器和所述輪金鑰指標；以及所述輪次計數器和所述輪金鑰指標的位址資訊被存儲在所述處理器的配置緩衝區中。 An electronic device includes: a memory that stores instructions and operation elements corresponding to the instructions; and a processor that executes the instructions; the processor includes: a buffer that stores the the operand and the address information of the operand for executing the instruction received by the memory; and a processor unit that identifies the instruction used to instruct execution of the first operation and the instruction corresponding to the instruction The address information of the operand, and executes the instruction based on whether the address information of the operand meets the predetermined condition; wherein the processor unit executes the instruction when the address information of the operand meets the predetermined condition. When the condition is met, perform the second operation set in the instruction on the operand; when the address information of the operand does not meet the predetermined condition, perform the first operation on the operand; A round counter and a round-key pointer are stored in the processor's data buffer and are used in the operation of the processor. The Wheel Key Indicator; and The round counter and the address information of the round key pointer are stored in the configuration buffer of the processor.