TW539965B - Automated processor generation system for designing a configurable processor and method for the same - Google Patents

Automated processor generation system for designing a configurable processor and method for the same Download PDF

Info

Publication number
TW539965B
TW539965B TW089102150A TW89102150A TW539965B TW 539965 B TW539965 B TW 539965B TW 089102150 A TW089102150 A TW 089102150A TW 89102150 A TW89102150 A TW 89102150A TW 539965 B TW539965 B TW 539965B
Authority
TW
Taiwan
Prior art keywords
user
processor
instructions
instruction
patent application
Prior art date
Application number
TW089102150A
Other languages
Chinese (zh)
Inventor
Earl A Killian
Ricardo E Gonzulez
Ashish B Dixit
Monica Lam
Walter D Lichtenstein
Original Assignee
Tensilica Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/246,047 external-priority patent/US6477683B1/en
Priority claimed from US09/323,161 external-priority patent/US6701515B1/en
Priority claimed from US09/322,735 external-priority patent/US6477697B1/en
Application filed by Tensilica Inc filed Critical Tensilica Inc
Application granted granted Critical
Publication of TW539965B publication Critical patent/TW539965B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/28Error detection; Error correction; Monitoring by checking the correct order of processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Program control for peripheral devices
    • G06F13/12Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3308Design verification, e.g. functional simulation or model checking using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/37Compiler construction; Parser generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines

Abstract

In a first aspect of the invention, a configurable RISC processor implements an instruction set which provides good code density in a fixed-length high-performance encoding based on RISC principles, including a general register with load/store architecture. Further, the processor implements a simple variable-length encoding that maintains high performance. In a second aspect of the invention, when selecting and building a processor configuration, a user creates a new set of user-defined instructions, places them in a file directory, and invokes a tool that processes the user instructions and transforms them into a form usable by the software development tools. In this way, the user may customize a processor configuration by adding new instructions and within minutes, be able to evaluate that feature. The user is able to keep multiple sets of potential instructions and easily switch between them when evaluating their application. In a third aspect of the invention, an automated processor design tool uses a description of customized processor instruction set extensions in a standardized language to develop a configurable definition of a target instruction set, a hardware description language description of circuitry necessary to implement the instruction set, and development tools which can be used to develop applications for the processor and to verify it. The standardized language is capable of handling instruction set extensions which modify processor state or use configurable processors. By providing a constrained domain of extensions and optimizations, the process can be automated to a high degree, thereby facilitating fast and reliable development.

Description

539965 A7 __B7_ 五、發明説明(1 ) (請先閲讀背面之注意事項再填寫本頁) 本發明是關於微處理機系統;尤其是,本發明關於包 含一組或更多組處理器之應用解決方法之設計,其中位於 .系統中之處理器於設計時被組態並且被增強以改進它們以 適合於特定的應用中。另外,本發明更關於一種系統,其 中應用發展器可以快速地發展指令延伸,例如:新的指令, 至現有之指令集結構,其包含操作使用者定義之處理器狀 態的新指令,並且即時地量測至應用執行時間以及至處理 器週期時間延伸之衝擊。 傳統上,處理器不易設計並且修改。因此,爲此因 素,大部分包含處理器之系統則使用被設計並且被確認一 次以供一般目的使用之處理器,並接著被多重應用使用。 如是,它們之對於特定應用之適應性並非永遠理想。因 此,時常適當地修改處理器以更佳地執行特定的應用碼(例 如:執行更快、消耗更少功率、或更少成本)。但是,即使修 改現有處理器設計仍然不易、耗時、成本高、並且風險亦 大,因此一般並未如此做。 爲了更瞭解製作先前技術之可組態處理器的不易之 處,要考慮其發展。首先,指令集結構(ISA)被產生。這必 要步驟僅被完成一次並且被許多系統使用多年。例如,英 特爾Pentium處理器可以回溯其指令集之前身到1 970年中期 被引介之8008以及8080微處理機。於這過程中,依據預定 之ISA設計規範,ISA指令、語法、等等被產生,並且ISA之 軟體發展工具,例如:組譯器、除錯器、編輯器、等等也被 產生。接著,對應特定的IS A之模擬器被產生並且各種評鑑 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ 五、發明説明(2 ) 被執行以估算ISA之效率,而依據評估結果ISA被再修正。 在某些方面,ISA會被考慮合乎滿意’並且該ISA程序將結 束於完全地產生ISA格式、ISA模擬器、ISA確認套組以及一 組發展套組包括,例如:組譯器、除錯器、編輯器、等等。 接著,處理器設計開始。因爲處理器具有之可用生命 週期爲數年,因此這程序同時也被相當不頻繁地完成。一 般而言,一組處理器會被設計一次而且被許多系統使用許 多年。給予其ISA、其確認套組以及模擬器和各種處理器發 展目標,該處理器之微結構便被設計、被模擬並且被修 正。一旦該微結構完成後,其便以硬體說明語言(HDL)被製 作並且一組微結構確認套組被產生以及被使用以確認該 HDL製作(稍後將更完整說明)。接著,相對於說明這方面之 手動處理程序,自動設計工具可以依據HDL說明而合成電 路並且置放和引導其構件。該佈局可以接著被修正至最佳 化晶片面積使用以及時序。此外,另外的手動處理程序可 以被使用以依據該HDL說明產生底圖,並轉換HDL至電路並 且接著同時手動地以及自動地確認並且配置電路。最後, 該佈局被使用自動工具確認以確定其匹配電路並且該電路 依據佈局參數被確認。 在處理器發展完全之後,便設計其總系統。相異於IS A 和處理器之設計,系統設計(其可以包含現在包含處理器之 晶片設計)爲相當普遍並且系統一般而言是連續地被設計 的。各系統藉特定的應用被使用相當短的週期時間(一或二 年)。依據預定之系統目標例如:成本、性能、功率以及功能 5 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) .、?τ· 539965 A7 B7 五、發明説明(3 ) 性;預先現存之處理器格式;晶片鑄造之格式(通常與處理 器賣方密切相關),該總系統結構被設計,一組處理器被選 擇以匹配其設計目標,並且該晶片鑄造被選擇(這與處理器 選擇密切相關)。 接著,給予被選擇之處理器、ISA以及鑄造和模擬、確· 認以及先前產生之發展工具(以及供被選擇鑄造之標準胞檔 案庫),一組系統之HDL製作被設計出,一組確認套組被產 生以供系統HDL製作並且該製作被確認。接著,其系統電 路被合成,被置於並且被引導於電路板上,而其佈局以及 時序被再最佳化。最後,電路板被設計並且被配置,晶片 被製造而電路板被組合。 先前技術的處理器設計之其他不易之處源自以更多特 點覆蓋所有之應用而不適當地簡單設計傳統的處理器,因 爲任何所給予的應用僅需要一組特定特點,而應用上具有_ 不需要特點之處理器是過於昂貴的、消耗更多功率並且更 加不易製造。此外’當一組處理器於啓始被設計時,是不-可能預知所有的應用目標的。如果該處理器修改程序可以 自動化並且被製作爲可靠的,則系統設計者產生應用解決 方法之能力會顯著地被增強。 例如,考慮一組在通道之上使用複雜協定被設計以傳 輸並且接收資料之元件。因爲該協定爲複雜的,因此處理 程序無法合理地整體以硬體接線達成,例如,組合,邏輯, 而取而代之一組可程式處理器被引介進入系統以供協定處 理程序。可程式化能力同時也允許修正錯誤並且藉新的軟 6 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)539965 A7 __B7_ V. Description of the invention (1) (Please read the notes on the back before filling out this page) The present invention relates to a microprocessor system; in particular, the present invention relates to an application solution containing one or more sets of processors Method design, in which processors located in the system are configured at design time and enhanced to improve them to suit a particular application. In addition, the present invention is more about a system in which an application developer can quickly develop instruction extensions, such as: new instructions, to an existing instruction set structure, which includes new instructions for operating user-defined processor states, and instant Measure impact to application execution time and processor cycle time extension. Traditionally, processors have been difficult to design and modify. Therefore, for this reason, most systems containing processors use processors that have been designed and validated once for general purposes, and then used by multiple applications. If so, their adaptability to specific applications is not always ideal. Therefore, the processor is often modified appropriately to better execute specific application code (for example, faster execution, less power consumption, or less cost). However, even modifying existing processor designs is still not easy, time consuming, costly, and risky, so this is generally not done. In order to better understand the difficulties of making configurable processors of the prior art, consider their development. First, an instruction set structure (ISA) is generated. This necessary step is done only once and used by many systems for many years. For example, Intel's Pentium processor can trace back to its instruction set from the 8008 and 8080 microprocessors introduced in the mid-1970s. In this process, according to the predetermined ISA design specification, ISA instructions, syntax, etc. are generated, and software development tools of ISA, such as a translator, debugger, editor, etc. are also generated. Next, a simulator corresponding to a specific IS A was generated and the various evaluations of this paper are in accordance with the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7_ V. Description of the invention (2) was performed to estimate the efficiency of ISA , And based on the evaluation results ISA was revised again. In some ways, ISA will be considered satisfactory 'and the ISA process will end with the complete generation of the ISA format, the ISA simulator, the ISA validation suite, and a set of development suites including, for example: translators, debuggers , Editor, etc. Then, the processor design begins. Because the processor has an available life cycle of several years, this process is also completed relatively infrequently. In general, a group of processors is designed once and used by many systems for many years. Given its ISA, its validation suite, and simulator and various processor development goals, the processor's microstructure was designed, simulated, and modified. Once the microstructure is completed, it is made in hardware description language (HDL) and a set of microstructure confirmation sets is generated and used to confirm the HDL production (more on this later). Next, as opposed to a manual process that explains this, an automated design tool can synthesize a circuit and place and guide its components based on HDL instructions. This layout can then be modified to optimize chip area usage and timing. In addition, another manual processing program can be used to generate a basemap according to the HDL specification, and convert the HDL to the circuit and then simultaneously and manually confirm and configure the circuit simultaneously and automatically. Finally, the layout is confirmed using an automated tool to determine its matching circuit and the circuit is validated according to the layout parameters. After the processor is fully developed, its overall system is designed. Unlike IS A and processor designs, system designs (which can include chip designs that now include processors) are quite common and systems are generally designed continuously. Each system is used for a specific application with a relatively short cycle time (one or two years). Based on predetermined system goals such as: cost, performance, power, and function 5 This paper size applies the Chinese National Standard (CNS) Α4 specification (210X297 mm) (Please read the precautions on the back before filling this page).,? Τ · 539965 A7 B7 V. Description of invention (3); pre-existing processor format; wafer casting format (usually closely related to processor vendor), the overall system structure is designed, and a group of processors is selected to match its design Target, and the wafer casting is selected (this is closely related to processor selection). Then, given the selected processor, ISA, and foundry and simulation, confirmation and development tools (and standard cell archives for the selected foundry), a set of system HDL production is designed, and a set of confirmation The kit is generated for system HDL production and the production is confirmed. Then, its system circuit is synthesized, placed and guided on the circuit board, and its layout and timing are re-optimized. Finally, the circuit board is designed and configured, the wafer is manufactured and the circuit board is assembled. Other difficulties of prior art processor design stem from improperly simple design of traditional processors by covering all applications with more features, because any given application requires only a specific set of features, and the application has _ Processors that do not require features are too expensive, consume more power, and are more difficult to manufacture. In addition, when a set of processors is designed from the beginning, it is not possible to predict all application goals. If the processor modification program can be automated and made reliable, the ability of system designers to generate application solutions can be significantly enhanced. For example, consider a set of components designed to transmit and receive data using complex protocols over a channel. Because the protocol is complex, the processing procedures cannot reasonably be achieved by hardware wiring, such as combination, logic, and instead a set of programmable processors are introduced into the system for protocol processing. Programmability also allows for correction of errors and borrowing new software. 6 This paper size applies Chinese National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling out this page)

539965 A7 ______B7_ 五、發明説明(4 ) 體負載指令記憶體完成稍後升級至協定。但是,該傳統的 處理器可能非被設計以供這特定的應用(當處理器被設計 時,該應用甚至可能尙未現存),而可能有該處理器需要進 行許多指令以達成之操作,其可以藉另外的處理器邏輯以 一組或少數指令被完成。 因爲該處理器無法容易地被增強,因而許多系統設計 者未企圖如此做,而是選擇於一組可用的一般目的處理器 上執行一種無效益之純軟體解決方法。該無效率產生之解 決方法,其可能較慢,或需要更多功率,或高成本(例如, 其可能需要一組較大的,更有效的處理器以在充足速度下 執行程式)。其他的設計者則選擇於特別目的硬體中提供某 些程序之需要於他們設計之應用,例如,共同處理器,接 著要求程式員在程式中之各點編碼存取特別目的硬體。但 是,在處理器和此特別目的硬體之間傳送資料之時間限制 了這方法讓系統最佳化之實用性,因爲僅使用該特別目的 硬體所節省之時間仍大於另外所需以傳送資料來回特別的 硬體之時間的相當大單位之工作量情況才可以被充分地加 速。 在通訊頻道應用範例中,其協定可能需要加密碼,錯 誤更正,或壓縮/解壓縮處理程序。此等處理程序時常操作 於分別的位元而非於處理器之較大的字組上。用以計算之 電路可以相當小,但是需要讓處理器抽取各位元、依順序 地處理它們並接著再封包位元則會添加相當之成本。 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 、可…...... ........· (請先閲讀背面之注意事項再填寫本頁) _____£7 539965 A7 五、發明説明(5 ) 於一組非常特定之範例中,考慮使用展示於表1(被使用 於mpeg壓縮標準之一種相似編碼)之法則的霍夫曼 (Huffman)解碼。 樣型 數値 長度 OOxxxxxx 0 2 Olxxxxxx 1 2 ΙΟχχχχχχ 2 2 110 xxxxx 3 3 111Oxxxx 4 4 111lOxxx 5 5 1111lOxx 6 6 llllllOx 7 7 11111110 8 8 11111111 9 8539965 A7 ______B7_ 5. Description of the Invention (4) The body load instruction memory is completed and upgraded to the agreement later. However, the traditional processor may not be designed for this particular application (when the processor is designed, the application may not even exist), and there may be many instructions that the processor needs to perform to achieve the operation, which It can be done in one or a few instructions with additional processor logic. Because the processor cannot be easily enhanced, many system designers do not attempt to do so, but instead choose to implement a non-effective, purely software solution on a set of available general-purpose processors. This inefficient solution may be slower, require more power, or costly (for example, it may require a larger, more efficient processor to execute programs at sufficient speed). Other designers choose to provide certain programs in special-purpose hardware for applications they need to design, such as coprocessors, and then require programmers to code to access special-purpose hardware at various points in the program. However, the time to transfer data between the processor and this special-purpose hardware limits the practicality of this method to optimize the system, because the time saved by using the special-purpose hardware alone is still greater than the additional time required to transfer data The considerable workload of a unit of time to and fro special hardware can be fully accelerated. In the communication channel application example, the protocol may require encryption, error correction, or compression / decompression processing. These handlers often operate on separate bits rather than larger blocks of processors. The circuit used for calculation can be quite small, but requiring the processor to extract each bit, process them in order, and then packetize the bits adds considerable cost. This paper size is applicable to Chinese National Standard (CNS) A4 specification (210X297 mm), but can be ......... ........ (Please read the notes on the back before filling this page) _____ £ 7 539965 A7 5. Invention Description (5) In a very specific set of examples, consider using Huffman decoding shown in Table 1 (a similar encoding used in the mpeg compression standard). Model Number Length OOxxxxxx 0 2 Olxxxxxx 1 2 IO × χχχχχ 2 2 110 xxxxx 3 3 111Oxxxx 4 4 111lOxxx 5 5 1111lOxx 6 6 llllllOx 7 7 11111110 8 8 11111111 9 8

表I 該値以及該長度必須被計算,如此該長度位元可以被 移出以找出於資料流中將被解碼之接著元件的起始。 有眾多種方式可以將這編碼爲習見的指令集,但是它 們都需要許多指令,因爲有許多測試待完成’然而與單一 邏輯閘延遲以供組合邏輯對照之下’各軟體製作需要多重 處理器週期。例如,一種使用厘11>3指令集之有效益的先前 8 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公爱) (請先閲讀背面之注意事項再填寫本頁) 、可| 539965 A7 _B7_ 五、發明説明(6 ) 技術之製作可能需要六組邏輯操作、六組狀況性分支、一 組算術操作、以及相關的暫存器負載。使用一種有優點設 計之指令集其編碼會較佳,但是以時間之觀點來看仍然昂 貴:一組邏輯操作、六組狀況性分支、一組算術操作以及相.. 關的暫存器負載。 以處理器資源之觀點來看,其如此昂貴所以25 6-項對· 照表一般被使用以取代位元-接-位元比較順序之編碼程序。 但是,256-項對照表佔據主要空間並且也可以需要許多週 期以存取。對於較長的Huffman編碼,該表之尺寸會成爲阻 礙性,導致更加複雜化並且較慢的編碼。 一種容納特定應用需要於處理器中之問題的可能解決 方法爲使用具有指令集和結構之可組態處理器,其可以被 容易地修改並且延展以增強處理器之功能以及依照指定規 格製作該功能性。組態性允許設計者指定是否或多少另外 的功能性爲產品所需。最簡單種類之組態性爲一種二分法 選擇··一組特點被提供或者不被提供。例如,一組處理器可 能被提供或不被提供浮點硬體。 其彈性可以藉由更精細等級之組態選擇而被改進。該 處理器可能,例如,允許系統設計者於暫存器檔案中指定 暫存器之數目、記憶體寬度、快取尺寸、快取結合性、等 等;但是,這些選擇仍然未達系統設計者所需的依照指定 規格製作之位準。例如,上述Huffman解碼範例中,雖然並 未在先前技術中習知系統設計者可能會包含一組特定指令 以進行解碼,例如, 9 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)Table I The length and the length must be calculated so that the length bits can be shifted out to find the beginning of the next element in the data stream to be decoded. There are many ways to code this into a common instruction set, but they all require many instructions because there are many tests to complete 'however delayed with a single logic gate for combinational logic' each software production requires multiple processor cycles . For example, a profitable previous 8 paper size using the PCT 11 > 3 instruction set applies the Chinese National Standard (CNS) A4 specification (210X297 public love) (please read the precautions on the back before filling this page), may | 539965 A7 _B7_ 5. Description of the Invention (6) The production of the technology may require six sets of logical operations, six sets of conditional branches, one set of arithmetic operations, and related register loads. The coding is better using an instruction set with an advantageous design, but still expensive from a time point of view: a set of logical operations, six sets of conditional branches, a set of arithmetic operations, and related register loads. From the point of view of processor resources, it is so expensive that 25 6-item lookup tables are generally used to replace the bit-to-bit-to-bit comparison sequence of encoding procedures. However, the 256-item lookup table occupies the main space and may also require many cycles to access. For longer Huffman codes, the size of the table becomes obstructive, resulting in more complicated and slower codes. One possible solution to the problem that needs to be accommodated in a processor for a particular application is to use a configurable processor with instruction set and structure that can be easily modified and extended to enhance the processor's capabilities and make the function according to specified specifications Sex. Configurability allows the designer to specify whether or how much additional functionality is required by the product. The simplest kind of configuration is a dichotomy. A set of features is provided or not provided. For example, a group of processors may or may not be provided with floating point hardware. Its flexibility can be improved by more fine-grained configuration choices. The processor may, for example, allow the system designer to specify the number of registers, memory width, cache size, cache coherence, etc. in the register file; however, these choices still fall short of the system designer The required level of production according to the specified specifications. For example, in the Huffman decoding example described above, although the system designer may not be familiar with the prior art, the system designer may include a specific set of instructions for decoding. ) (Please read the notes on the back before filling this page)

-訂I 539965 A7 _B7__ 五、發明説明(7 )-Order I 539965 A7 _B7__ V. Description of the invention (7)

huff8 tl , tO 其中在結果裡最主要的八位元爲解碼値而最次要的八 位元爲長度。相對於先前說明之軟體製作,該Huffman解碼 之直接硬體製作是相當簡單-其解碼指令邏輯粗略地代表三 十組邏輯閘之組合邏輯功能,不包括指令解碼,等等’或 少於一般的處理器之邏輯閘計算之〇· 1 % ’並且可以藉由特 別目的處理器指令於單一週期中計算,因此代表比使用一 般目的指令有4-20倍的改進因素。 可組態處理器之先前技術的努力一般分爲兩組範疇:配 合參數化硬體說明被使用之邏輯合成;以及自摘要機器說 明之編輯器和組譯器的自動再目標化。在第一分類屬於可 合成的處理器硬體設計,例如:Synopsys DW805 1處理器, ARM/Synopsys ARM7-S,Lexra LX-4080 ,ARC 可組態 RISC核心;以及至某些程度Synopsys可合成/可組態PCI匯 流排界面。 上述中,該Synopsys DW805 1包含一種現存處理器結構~ 之二進位相容的製作;以及小數目之合成參數,例如,1 28 或256位元組之內部RAM、藉由參數rom_addr_size決定之 ROM位址範圍、選擇區間計時器,串列埠之可變化的數目 (0-2)、以及支援六組或十三組來源之中斷單元。雖然該 DW805 1結構可以有些變化,然而其指令集結構之改變是不 可能的。 該ARM/Synopsys ARM7-S處理器包含一種現存結構以 及微結構之二進位相容的製作。其具有兩組可組態參數··高 10 本紙張尺度適用中國國家標準(CNS〉A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) •訂— ;·· 539965 A7 B7 五、發明説明(8 ) (請先閲讀背面之注意事項再填寫本頁) 性能或低性能乘法器之選擇,以及包括除錯和電路中估算 邏輯。雖然改變ARM7-S之指令集結構是可能的,但是它們 爲現存不可組態處理器製作之子集,因此無須新的軟體。 該Lexra LX-4080處理器具有一組標準MIPS結構之可組 態變體並且不具有對指令集延伸之軟體支援。其選擇包含^ 一組定製的引擎界面,其允許以應用特定操作MIPS ALU操 作碼之延伸;一組內部硬體界面,其包含暫存器來源以及 一組暫存器或1 6位元寬立即來源,以及目的地和停止信 號;一組簡單記憶體管理單元選擇;三組MIPS共同處理器 界面;一組界面至快取、試算RAM或ROM之彈性本地記憶 體;連接週邊功能以及記憶體至處理器之獨有的本地匯流 排之一組匯流排控制器;以及一組可組態深度之寫入緩衝 器。 該ARC可組態RISC核心具有快速邏輯閘計算評估之使 用者界面,其依據目標技術以及時脈速度、指令快取組 態、指令集延伸、計時器選擇、試算板記憶體選擇、和記· 憶體控制器選擇;一組指令集,其具有可選擇之選擇性, 例如以塊狀移動至記憶體之本地試算板RAM、特別的暫存^ 器、高至十六組額外狀況碼選擇、一組32x3 2位元計分板多 層區塊、一組單一週期32位元圓筒狀移位器/轉動區塊、一 組標準化(找尋第一位元)指令、直接地寫入結果至命令緩衝 器(並非至暫存器檔案)、一組16位元MUL/MAC區塊以及 3 6位元累積器,以及使用線性算術以滑動指標存取至本 地SRAM ;以及利用手動編輯VHDL原始碼所定義之使用 11 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐〉 539965 A7 B7 _ 五、發明説明(9 ) (請先閲讀背面之注意事項再填寫本頁) 者指令。該ARC設計不具有供製作指令集說明語言之設 備,其亦不產生對特定的被組態處理器之軟體工具。 該Synopsys可組態PCI界面包含一組GUI或界面至安 裝、組態以及合成活動之命令線;檢查在各步驟中預定必需 的使用者之動作;依據組態(例如,Venlog對VHDL)選擇設~ 計檔案之安裝;選擇性組態,例如參數設定以及提示使用者 有關組態値而檢查組合有效性,以及藉使用者更新1101^原始-碼以及HDL源檔案之不編輯之HDL產生;以及合成功能,例 如使用者界面,其分析技術檔案庫以選擇I/O板、技術無關 限制以及合成原本(script)、板***以及技術特定板之提示、 以及技術無關公式之編譯成爲技術相關原本。該可組態PCI 匯流排界面是値得注意的,因爲其製作參數之一致性檢查、 組態爲主之安裝、以及HDL檔案之自動修改。 另外,先前合成技術依據使用者目標格式選擇不同的 映射,允許該映射使速度、功率、面積、或目標構件最佳 化。在這要點上,先前技術中不可能獲得以這些方式再組 態處理器之效應的回饋而不採取經由整個映射程序之設 計。此回饋可以被使用以引導處理器之進一步地再組態直 至系統設計目標被達成爲止。 第二分類先前技術運作於可組態處理器產生之領域, (亦即,編輯器以及組譯器之自動再目標化)包括學術硏究 之廣大領域;參考,例如,Hanono等人之"Instruction Selection, Resource Allocation and Scheduling in the AVIV Retargetable Code Generator"(被使用以供編碼產生器之自 12 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 ___B7_ 五、發明説明(l〇 ) (請先閲讀背面之注意事項再填寫本頁) 動產生的機器指令表示);Fauth等人之"Describing Instruction Set Processors Usingn ML” ; Rarnsey等人之 ’’Machine Descriptions to Build Tools for Embedded Systems" ; Aho 等人之 ’’Code Genreation Using Tree Matching and Dynamic Programming”(演算法以匹配與各機 器指令相關的轉移,例如,添加、負載、儲存、分支、等 等,以由某些使用例如圖型匹配之方法的機器-無關中間型 式所表示程式操作之順序);以及Cattell之” Formalization and Automatic Derivation of Code Generators”(被使用以供 編輯器硏究之機器結構的摘要說明)。 一旦處理器被設計,其操作必須被確認。亦即,一般 情形下處理器使用管線方式以各步驟對應於一組指令執行 之相位執行來自被儲存程式的指令。因此,改變或添加一 組指令或改變組態可能需要處理器之邏輯中廣泛的改變, 如此各組多重管線步驟可以於各該指令下進行適當的動 作。處理器之組態需要其被再確認,並且這確認適應於改 變以及添加。這並非是項簡單工作。處理器爲具有外延的 內部資料以及控制狀態之複雜邏輯元件,而控制以及資料 和程式之組合使得處理器確認一組需求之技術。添加處理 器確認之不易處爲不易發展適當的確認工具。因爲確認在 先前的技術中不爲自動的,其彈性、速度以及可靠度並非 爲最佳的。 此外,一旦處理器被設計並且被確認,若其無法被容 易地規劃則並非特別地有用。一般,處理器之規劃是藉強 13 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 __B7_ 五、發明説明(11 ) (請先閲讀背面之注意事項再填寫本頁) 化軟體工具之輔助,包含編輯器、組譯器、鏈接器、除錯 器、模擬器以及造型器。當處理器改變時’軟體工具同時 也必須改變。添加一組指令並非有益的’如果指令無法被 編輯、被組合、被模擬或被除錯。軟體改變之成本與處理 器修改以及增強部相關之處已經是在先前的技術中對於彈 性處理器設計之一主要障礙。 因此,可以明白先前的技術處理器設計是具有某一程 度之不易性,因此一般情形下處理器一般不被設計或修改 以供特定應用。同時,可以明白的是,若處理器可以被組 態或延伸以供特定應用,則系統效率之相當改進是可能 的。更進一步地說,設計過程之效率以及效應可以被增 強,如果其能夠使用回饋於製作特性上,例如在改進處理 器設計中之功率消耗、速度、等等。此外,在先前技術 中,一旦處理器被修改後,則在修改之後需要相當大的努 力以確認處理器之更正操作。最後,雖然先前的技術提供 有限制之處理器組態性,但是它們無法提供被量裁以供該 被組態處理器使用之軟體發展工具的產生。 雖然達成上面準則之一組系統必定爲在技術之上的改 進,其改進可以被製作一例如,需要處理器系統具有存取或 修改儲存於特別暫存器之資訊的指令,亦即,處理器狀 態,其顯著地限制可得到之指令範圍,因而限制可達到性 能改進之數量。 同時,發明新的特定應用指令涵蓋複雜之週期計算減 少、另外的硬體資源以及CPU週期時間衝擊之間的協調。 14 本紙張尺度適用中國國家標準(®S) A4規格(210X297公釐) 539965 A7 _B7_ 五、發明説明(12 ) (請先閲讀背面之注意事項再填窝本頁) 其他挑戰爲得到新的指令之有效益硬體製作而不涵蓋應用 發展器於高性能微處理機製作之通常需高技巧的細節。 上述系統給予使用者彈性以設計完整合適於應用之處 理器,但是對於硬體和軟體之互動的發展爲不便的。爲了 更完全地瞭解這問題,則考慮被許多軟體設計者使用以調 整它們的軟體應用之性能的一般方法。它們將一般地考慮 一種可觀的改進,修改它們的軟體以使用該可觀的改進, 再編輯它們的軟體源以產生一組包含可觀的改進之可運作 之應用並接著估算該可觀的改進。依據該評估之結果,它 們可能會保留或捨棄其可觀的改進。一般而言,整個程序 可以在幾分鐘內完成。這允許使用者自由地試驗、快速地 嘗試並且保留或拋棄構想。在某些情況中,僅評估一組可 行構想是複雜的。使用者可能欲於許多種情形中測試該構 想。在此情況,使用者時常保留被編輯應用之多重版本:一 組原始的版本以及包含可觀改進之其他版本。在某些情況 中,可觀的改進可能產生互動,而使用者可能保留多於兩 組應用之拷貝,各使用可觀改進之不同的子集。藉由保留 多重版本,使用者可以容易地在不同的環境之下重複地測 試不同的版本。 可組態處理器之使用者會想要以一種相似於軟體發展 者於傳統的處理器上發展軟體之方式互動地聯合發展硬體 以及軟體。考慮使用者添加定製的指令至可組處理益之 情況。使用者會想要互動地增加指令至它們的處理器並且 於它們特定的應用上測試與估算該指令。以先前的技術系 15 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7__ 五、發明説明(13 ) 統而言,這情形之不易達成有三組理由。 (請先閲讀背面之注意事項再填寫本頁) 第一,在提出可行指令之後,在得到可以利用指令之 編輯器以及模擬器之前,使用者必須等待一小時或以上。 第二,當使用者希望以許多可行指令試驗時,該使用 者必須對各組產生並且保留一組軟體發展系統。該軟體發 展系統可以爲非常巨大。保留許多版本會導致無法管理。 最後,該軟體發展系統被組態以供用於整個處理器。 其使得不易分離開發程序於不同的工程師中。考慮一範 例,其中兩組發展器運作於特定的應用上。一組發展器可 能要負責決定處理器之快取特性而其他的則負責添加定製 化指令。當兩組發展器之工作是相關時,各組便足以分 離,如此各發展器可以隔離地運作其工作。該快取發展器 可能啓始地提出一種特定的組態。另一組發展器開始該組 態並且嘗試許多指令,建立一組各可行指令之軟體發展系 統。接著,該快取發展器修正其提出之快取組態。另一組 發展器則必須接著再建立其每一組組態,因爲各組組態假 設爲原始的快取組態。藉由許多發展器運作於一組計畫, 組織不同的組態會快速地成爲無法管理的。 本發明之簡略摘要 本發明克服先前技術之這些問題並且具有目的以提供 可自動地藉同時產生一組處理器之硬體製作說明以及一組 軟體發展工具以供自相同組態格式規劃處理器以組態處理 器的系統。 本發明之另一目的爲提供一種系統,其可以對各種性 16 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 _ 五、發明説明(14 ) 能準則將硬體製作以及軟體工具最佳化° 此外,本發明之另一目的爲提供一種系統,其允許處 理器之各種型式的組態性,包含延伸性、二分法選擇以及 參數之修改。 本發明之另一目的爲提供一種系統’其可以用一種可 以容易地被製作於硬體中之語言說明處理器之指令集結 構。 本發明之進一步目的爲提供一種系統以及方法以發展 並且製作修改處理器狀態之指令集延伸。 本發明之另一目的爲提供一種系統以及方法以發展並 且製作修改可組態處理器暫存器之指令集延伸。 本發明之另一項目的爲允許使用者藉由添加新的指令 並且在幾分鐘之內定製處理器組態,且可以估算該特點。 上述目的之達成是藉由提供一種自動處理器生成系 統,其使用一種被標準化語言中自製處理器指令集選擇以 及延伸之一組說明,以發展目標指令集之被組態定義,電 路說明製作指令集必須的硬體說明語言(HDL),以及發展工 具例如:可以被使用以產生處理器軟體並且確認處理器之編 輯器、組譯器、除錯器和模擬器。處理器電路之製作可以 於各種準則下被最佳化,例如面積、功率消耗以及速度。 一旦處理器組態被產生,其可以被測試並且輸入至被修改 以反覆地將處理器製作最佳化系統。 爲了依據本發明發展一種自動處理器生成系統,一組 指令集結構說明語言被定義而可組態處理器/系統組態工具 17 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) f · (請先閱讀背面之注意事項再填寫本頁) 、可丨 539965 A7 ___B7_ 五、發明説明(l5 ) (請先閲讀背面之注意事項再填寫本頁) 以及發展工具例如:組譯器、鏈接器、編輯器和除錯器則被 產生。這是發展程序之部份,因爲雖然大部分工具是標準. 的’它們必須被製作爲自動地自ISA說明被組態。這部分之 設計程序一般是由自動處理器設計工具本身之設計者或製 造商完成。 依據本發明之自動處理器生成系統操作如下。一位使 用者,例如,一位系統設計者,發展一組被組態指令集結 構。亦即,使用先前產生之ISA定義以及工具,一組依循某 種ISA設計目標之可組態指令集結構便產生。接著,發展工· 具以及模擬器針對這指令集結構被組態。使用該被組態之 模擬器,評鑑被執行以估算可組態指令集結構以及依據估 算之結果被修正之核心之效應。一旦可組態指令集結構在 令人滿意的狀態下,一組確認套組便被產生。 隨著這些程序之軟體論點,系統藉發展一種可組態處 理器而處理硬體論點。接著,使用系統目標例如:成本、性 能、功率以及功能性和可用的處理器上之資訊,系統設計 全面系統結構,其將可組態ISA選擇、延伸以及處理器特點 選擇列入考慮。使用該全面系統結構,發展軟體、模擬 器、可組態指令集結構以及處理器HDL製作、處理器ISA、 HDL製作、軟體以及模擬器被系統所組態而系統HDL被設計-以供晶片上面系統設計。同時,依據系統結構以及晶片鑄 造之格式,一種晶片鑄造依據相關系統HDL(無關如先前的 技術中之處理器選擇)鑄造能力之估算被選擇。最後,使用 其鑄造之標準胞檔案庫,組態系統合成電路,置放並且引 18 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ 五、發明説明(16 ) 導它,並提供再最佳化佈局以及時序之能力。接著’如果 該設計不爲單一晶片型式’則電路板佈局被設計’晶片被 製造,並且板被組合。 如上述說明,許多技術被使用以便利處理器設計處理 過程之延伸的自動化。首先被使用以涉及這些議題之技術 爲設計並且製作不具如任意的修改或延伸之彈性的特定機 構,但是儘管如此,其允許主要功能性改進。藉由限制任 意改變,其相關的問題也受限制。 第二技術爲提供改變之單一說明並且自動地產生修改-或延伸至所有受影響之構件。以先前技術被設計之處理器 並未完成這點,因爲通常手動製作一次會比寫一種工具去 自動地製作並且只使用該工具一次較價廉。當工作需要被 重複許多次時顯示自動化之優點。 第三被採用技術爲建立一組資料庫以協助評估以及自 動組態以供依序的使用者估算。 最後,第四技術爲以一種提供本身組態之型式提供硬 體以及軟體。在本發明之一組實施例中,某些硬體以及軟 體不被以標準硬體和軟體語言直接地寫入,而是藉由添加 一種預處理器被增強之語言,其允許組態資料庫之詢問以 及具替代、調節、複製、以及其他的修改之標準硬體和軟 體語言碼的產生。該核心處理器設計接著被完成具備允許 增強部被鏈路之掛鉤。 爲展示這些技術,考慮特定應用指令之添加。藉限制 方法至具有暫存器和常數操作元以及其產生暫存器結果之 19 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) 0 (請先閲讀背面之注意事項再填寫本頁) -、τ 539965 A7 _ B7___ 五、發明説明(Π ) (請先閲讀背面之注意事項再填寫本頁) 指令,其指令之操作可以被僅以組合邏輯(無狀態,無回饋) 指定。這輸入指定了該指令之操作碼指定、指令名稱、組-譯器語法以及組合邏輯,自該處工具產生: 一指令解碼邏輯以供處理器確認新的操作碼; 一功能性單元之添加以進行暫存器操作元上之組合邏輯 功能; --輸入至處理器之指令排程邏輯以便僅當其操作元有效 時確認指令發出; --組譯器修改以接受新的操作碼以及其操作元並且產生 更正機器碼; --編輯器修改以增加新的本質功能以存取新的指令; --反組譯器/除錯器修改以編譯機器碼爲新的指令; 一模擬器修改以接受新的操作碼並進行指定邏輯功能; --診斷產生器,同時產生直接以及隨機碼序列,其包含 並檢查添加指令之結果。 上述所有的技術被採用以增加特定應用指令。該輸入 受限制於輸入和輸出操作元以及估算它們之邏輯。其改變 被說明於一處而所有硬體和軟體修改是導出自該說明。這 設備展示單一輸入如何可以被使用於增強多重構件。 這過程之結果爲一種系統,其對於達成其應用之需要 優於現存技術,因爲在處理器以及系統邏輯之其他部分之 間的協調可以在稍後設計處理過程中完成。其優於許多上= 面討論的先前技術方法之處在於其組態可以被施加至更多 之表示形式。一組單一源可以被使用以供用於所有的ISA編 20 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 ____B7_ 五、發明説明(is ) (請先閲讀背面之注意事項再填寫本頁) 碼,軟體工具和高位準模擬可以被包含於可組態封裝中, 並且流程可以被設計以供疊代地找出組態値之最佳組合。 進一步地說,先前的方法僅著重於硬體組態或軟體組態而 無供控制之單一使用者界面,或供使用者導向再定義之量 測系統,而本發明提供處理器硬體以及軟體之組態的完全 流程,包含自硬體設計結果以及軟體性能之回饋以協助最 佳組態之選擇。 這些目的之達成是依據本發明之一種論點,其提供一 種自動處理器設計工具,其使用一種標準化語言的自製處 理器指令集延伸之說明以發展目標指令集之可組態定義, 一種說明製作指令集必須的電路之硬體說明語言’以及發 展工具例如:可以被使用以發展處理器應用並且確認其之編 輯器、組譯器、除錯器和模擬器。該標準化之語言能夠處 理修改處理器狀態或使用可組態處理器之指令集延伸。藉 提供一延伸以及最佳化之受限制領域,該程序可以被高度 自動化,因而促進快速以及可靠之發展。 上述目的可依據本發明之另外的論點而更進一步地被 達成,其提供一種系統,其中使用者能夠保留可行指令或 狀態(此後可行可組態指令或狀態之組合將合倂稱爲’’處理 器增強部")之多重設定並且當評估它們的應用時’容易地 在它們之間切換。 使用者使用此處說明之方法選擇並且建立基礎處理器 組態。使用者產生一組新的使用者定義處理器增強部並置 放它們於檔案目錄中。使用者接著引用一種處理使用者增 21 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) A7 539965 B7 五、發明説明(19 ) (請先閲讀背面之注意事項再填寫本頁) 強部並且轉換它們成爲可被基礎軟體發展工具使用之形式 的工具。這種轉換是非常迅速的,因爲其僅涵蓋使用者定 義增強部但並不建立整個軟體系統。使用者接著引用該基 礎軟體發展工具,並告知工具動態地使用被產生於新的目 錄中之處理器增強部。最好是,該目錄位置是經由命令線 選擇or經由環境變數被給予至工具。爲了更進一步地簡化 程序,使用者可以使用標準軟體製作檔案。這讓使用者修。 改它們的處理器指令並且接著經由單一製作命令,處理增 強部並使用基礎軟體發展系統以再建立與估算它們於新的 處理器增強部之中的應用。 本發明克服了先前技術方法之的三種限制。給予一新· 的組可行增強部,使用者可以於幾分鐘之內估算新的增強 部。藉由產生對各組之新的目錄,使用者可以保留許多版 本之可行增強部。因爲目錄僅包含新的增強部之說明而非 整個軟體系統,因此所需的儲存空間便是極少的。最後, 新的增強部自組態之其他部分被解開。一旦使用者已經以 新的一組可行增強部產生目錄,則其可以以任何基礎組態 使用該目錄。 當讀取下面的詳細說明並且配合其中附圖時’本發明 之上述以及其他目的將隨即明顯: 第1圖是依據本發明之較佳實施例中處理器製作指令集 之方塊圖; 第2圖是依據實施例中被使用於處理器之管線方塊圖; 第3圖展示依據實施例GUI中之組態管理器; 22 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7____ 五、發明説明(20 ) 第4圖展示依據實施例GUI中之組態編輯器; 第5圖展示依據實施例之不同的型式組態性; 第6圖是展示實施例中之處理器組態流程的方塊圖; 第7圖是依據實施例中指令集模擬器的方塊圖。 第8圖是依據本發明中配合被組態處理器使用之估算板 的方塊圖。 第9圖是展示依據實施例可組態處理器之邏輯結構的方 塊圖; 第10圖是展示添加乘法器至第9圖之結構的方塊圖; 第11圖是展示添加乘法·累積單元至第9圖之結構的方 塊圖; < 第1 2和1 3圖展示實施例中之記憶體的組態; 第1 4和1 5圖展示第8圖之結構中使用者-定義功能性單 元之添加。 第1 6圖是展示在其他較佳實施例中系統構件之間的資-訊流程之方塊圖; 第17圖是展示實施例中供用於軟體發展工具的定製碼 如何產生之方塊圖,; 第18圖是展示被使用於本發明之其他較佳實施例中各 種軟體模組之產生的方塊圖; 第19圖是依據實施例可組態處理器中管線結構的方塊 圖; 第20圖是依據實施例中狀態暫存器之製作; 第2 1圖展示需要製作於實施例中之狀態暫存器製作之 23 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 、町— # 539965 A7 __B7_^ 五、發明説明(21 ) 另外的邏輯圖; (請先閲讀背面之注意事項再填寫本頁) 第22圖展示依據實施例自許多語意區塊之狀態之下一 狀態輸出以及選擇一組以輸入至狀態暫存器之組合; 第23圖展示依據實施例對應於語意邏輯之邏輯; 第24圖展示實施例中當其被映射至使用者暫存器之一 位元時對於狀態之一位元的邏輯。 一般而言,自動處理器產生程序開始於可組態處理器 定義和其使用者指定修改,以及處理器將被組態之使用者: 指定應用。這資訊被使用以產生一組被組態處理器並考慮 使用者修改以及產生軟體發展工具,例如,編輯器、模擬 器、組譯器和反組譯器,等等。同時,其應用使用新的軟 體發展工具被再編輯。該被再編輯應用是使用模擬器被模 擬以產生描述被組態處理器執行應用之性能的軟體簡介, 並且該被組態處理器被估算關於矽晶片面積使用率、功率 消耗、速度、等等,以產生一組顯示處理器電路製作之硬 體簡介。該軟體以及硬體簡介被回饋並且提供至使用者以 引動進一步地反覆組態,如此處理器可以針對該特定的應 用被最佳化。 依據本發明之較佳實施例,一組自動處理器生成系統: 1 0具有如第1圖展示之四組主要構件:一組使用者組態界面 20,希望設計處理器之使用者可經由它鍵入其組態性以及 延伸性選擇與其他的設計限制;一組軟體發展工具30,其 可以爲自製的以供處理器如使用者選擇之準則被設計;一 組處理器之硬體製作的參數化可延伸之說明;以及一組 24 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(22 ) (請先閲讀背面之注意事項再填寫本頁) 建立系統50,其自使用者界面接收輸入資料’產生所要求= 之處理器的自製可合成的硬體說明,並且修改軟體發展工 具以容納被選擇之設計。最好是,該建立系統50另外產生 診斷工具以確認其硬體和軟體設計以及一組評估器以評估 硬體和軟體特性。 如此處使用以及在附加申請專利範圍中的”硬體製作說 明"指示一組或以上之說明,其說明處理器設計之實際製作 論點並且,單獨或附加一組或以上其他的說明,便利依據 該設計之晶片製造。因此,硬體製作說明之構件可以爲不 同程度之摘要,自例如相當高位準之硬體說明語言,經由 網路表以及微編碼至遮罩說明。但是,在這實施例中,硬 體製作說明之主要的構件是以HDL,網路表和原本被寫入: 的。 進一步地說,如此處使用以及在附加申請專利範圍中 的HDL是指被使用以說明微結構,等等之一般類別硬體說 明語言,而非有意指任何特定語言之範例。 0 在這實施例中,處理器組態之基本爲展示於第2圖中之 結構60。結構之一些元件爲無法直接地被使用者修改之基 本特點。這些包含處理器控制部份62、對齊以及解碼部份 64(雖然這區域之部份是依據使用者-指定組態)、ALU以及 位址產生部份6 6、分支邏輯以及指令擷取6 8以及處理器界 面70。其他的單元爲基本處理器之部份但仍爲使用者-可組 態的。這些包含中斷控制部份72、資料以及指令位址觀看 部份74和76、視窗暫存器檔案78、資料以及指令快取和 25 本紙張尺度適用中國國家標準(CNs) A4規格(210X297公釐) 539965 A7 _B7 _ 五、發明説明(23 ) (請先閲讀背面之注意事項再填寫本頁) 標籤部份80、寫入緩衝器82以及計時器84。第2圖所展示之 其餘部份被使用者選擇性地包含。 處理器組態系統1 〇之中央構件爲使用者組態界面20。 這是一種最好能提供使用者一組圖形使用者界面(GUI)之模 組,藉其可能用以選擇包含編輯器之再組態以及組譯器、 反組譯器和指令集模擬器(IS S)之再產生的處理器功能性; 以及供全處理器合成、安置和路由之輸入預備。其同時也 允許使用者利用處理器面積、功率消耗、週期時間、應用 性能與碼尺寸之迅速評估優點以供進一步地處理器組態之 疊代以及增強。最好是,該GUI同時也存取一組組態資料庫 以獲得原定値並且對使用者輸入執行錯誤檢查。 爲了依據這實施例使用自動處理器生成系統1 〇設計一 組處理器60,使用者輸入設計參數進入使用者組態界面 20。該自動處理器生成系統1〇可以在使用者控制之下於電 腦系統中執行爲獨立系統;但是,最好是主要運作在自動 處理器生成系統1 〇之製造商控制下的系統。使用者存取可 以接著被提供在一通訊網路之上。例如,該GUI可以使用網 路瀏覽器並具備以HTML和:Tava寫成之資料輸入屏幕而被提 供。這具有許多優點,例如保持任何專賣軟體之機密性, 簡化專賣軟體之維持以及更新,等等。在這情況中’爲了 存取GUI,使用者可以首先登入系統1〇以證實其身份。 一旦使用者已經存取,系統便顯示如第3圖所展不之組 態管理器屏幕8 6。該組態管理器屏幕8 6是列出所有使用 者可存取之組態的目錄。第3圖中之該組態管理器屏幕8 6 26 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ 五、發明説明(24 ) 展示出使用者具有兩組組態,"just intr"以及”high prio”,第一組已經建立了,亦即,產品之最終化,而第二 組是將被建立。自這屏幕,使用者可以建立一組選擇之組 態,刪除之,編輯之,產生一組報告指明何組組態以及延 伸選擇已經被選擇以供用於該組態,或產生一組新的組 態。對於那些已經被建立之組態,例如’’just intr",一組自 製的軟體發展工具30之套組可以被下載。 產生一組新的組態或編輯一組現存之組態提出了第4圖 所展示之組態編輯器88。該組態編輯器88具有一組"選擇" 部份選單於左方,其展示可以被組態並且被延伸之處理器 60的各種一般項目。當一組選擇部份被選擇時,具該部份 的組態選擇之屏幕出現於右方,而這些選擇可以如習知於 技術中藉拉降選單,記憶盒子,檢查盒子,無線按鈕,等 等而被設定。雖然使用者可以選擇選項並且任意鍵入資 料,但最好是資料被依序地輸入,因爲在部門之間具有邏 輯從屬物;例如,用以於"中斷"部門適當地顯示選項’中 斷數目必須已經被選擇於"ISA選擇”部門。 在這實施例中,對於各部門下面的組態選擇是可用的: 目標 評估技術 目標ASIC技術:.18,.25,.3 5微米 目標操作狀況:一般的,最差的情況 製作目標 目標速度:任意的 27 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 、可— 539965 A7 B7 五、發明説明(25 ) (請先閲讀背面之注意事項再填寫本頁) 邏輯閘計算:任意的 目標功率:任意的 目標優先序:速度,面積功率;速度,功率,面積 ISA選擇 數値選擇 具備40-位元累積器之MACI6:yes,no 1 6-位元乘法器:yes,no 例外選擇 中斷數目:0-32 高優先順序中斷位準:0-14 引動除錯:yes,no 計時器數目:0-3 其他 位元順序:稍微endian,大endian 可用以呼叫視窗之暫存器數目:32,64 處理器快取以及記憶體 處理器界面讀取寬度(位元):32,64,128 寫入-緩衝器項目(位址/數値組對):4, 8, 16, 32 處理器快取 指令/資料快取尺度(kB):l,2,4,8,16 指令/資料快取線尺度(kB): 16,32,64 週邊構件 計時器 計時器中斷數目 28 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(26 ) (請先閲讀背面之注意事項再填寫本頁) 計時器中斷位準 除錯支援 指令位址***點暫存器之數目.· 0-2 資料位址***點暫存器之數目:〇-2 除錯中斷位準 軌跡璋:yes,no 晶片上除錯模組:y e s,η ο 完全掃猫:yes.no 中斷 來源:外部,軟體 優先位準 系統記憶體位址 向量以及位址計算方法.·ΧΤΟΞ,手動 組態參數 RAM尺寸,開始位址:任意的 ROM尺寸,開始位址:任意的 XTOS:任意的 組態特定位址 使用者例外向量:任意的 核心例外向量:任意的 暫存器視窗滿溢/未滿溢向量基礎:任意的 重置向量:任意的 XTOS開始位址:任意的 應用開始位址:任意的 29 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _ B7_ 五、發明説明(27 ) TIE指令 (請先閲讀背面之注意事項再填寫本頁) (定義ISA延伸) 目標CAD環境 模擬huff8 tl, tO where the most significant octet in the result is the decoded frame and the least significant octet is the length. Compared to the previously described software production, the direct hardware production of the Huffman decoding is quite simple-its decoding instruction logic roughly represents the combined logic function of thirty sets of logic gates, excluding instruction decoding, etc. 'or less than ordinary The logic gate of the processor is calculated as 0.1% 'and can be calculated in a single cycle by special-purpose processor instructions, so it represents a 4-20 times improvement factor over the use of general-purpose instructions. Prior art efforts for configurable processors generally fall into two groups of categories: logical synthesis with parameterized hardware descriptions used; and automatic retargeting of editors and translators from summary machine descriptions. The first category belongs to synthesizable processor hardware designs, such as: Synopsys DW805 1 processor, ARM / Synopsys ARM7-S, Lexra LX-4080, ARC configurable RISC core; and to some extent Synopsys can synthesize / Configurable PCI bus interface. In the above, the Synopsys DW805 1 includes an existing processor structure ~ binary compatible production; and a small number of synthetic parameters, such as 1 28 or 256 bytes of internal RAM, ROM bits determined by the parameter rom_addr_size Address range, selection interval timer, variable number of serial ports (0-2), and interrupt units supporting six or thirteen sources. Although the structure of the DW805 1 may be somewhat changed, it is not possible to change the structure of its instruction set. The ARM / Synopsys ARM7-S processor includes a binary compatible fabrication of existing structures and microstructures. It has two sets of configurable parameters. · 10 paper sizes are applicable to Chinese national standards (CNS> A4 size (210X297 mm) (please read the precautions on the back before filling this page).] — — — 539965 A7 B7 V. Description of the invention (8) (Please read the notes on the back before filling this page) Choice of performance or low performance multiplier, and include debugging and in-circuit estimation logic. Although the instruction set structure of the ARM7-S is changed Possible, but they are a subset of existing non-configurable processors, so no new software is required. The Lexra LX-4080 processor has a set of configurable variants of the standard MIPS architecture and does not have software support for instruction set extensions . The choice includes ^ a set of customized engine interfaces that allows extensions to apply specific operation MIPS ALU opcodes; a set of internal hardware interfaces that contain the source of the registers and a set of registers or 16 bits Wide immediate sources, as well as destination and stop signals; a set of simple memory management unit choices; three sets of MIPS co-processor interfaces; one set of interfaces to cache, flexible local RAM or trial memory A set of bus controllers that connect peripheral functions and memory to the processor's unique local bus; and a set of configurable depth write buffers. The ARC configurable RISC core has a fast logic gate User interface for calculation and evaluation, which is based on target technology and clock speed, instruction cache configuration, instruction set extension, timer selection, trial board memory selection, and memory controller selection; a set of instruction sets, which Selectable options, such as local trial board RAM moved to memory in blocks, special scratchpads, up to sixteen additional status code options, and a 32x3 2-bit scoreboard with multiple blocks , A set of single-cycle 32-bit cylindrical shifters / rotation blocks, a set of standardized (find first bit) instructions, write the results directly to the command buffer (not to the register file), a Set of 16-bit MUL / MAC blocks and 36-bit accumulators, and use linear arithmetic to access the local SRAM with sliding indicators; and use manual editing of the VHDL source code to define the use of 11 paper standards applicable to China National Standard (CNS) A4 Specification (210X297 mm) 539965 A7 B7 _ V. Description of Invention (9) (Please read the precautions on the back before filling this page). This ARC design does not have a description language for making instruction sets. It also does not produce software tools for specific configured processors. The Synopsys configurable PCI interface includes a set of GUIs or interfaces to command lines for installation, configuration, and synthesis activities; check in each step to order Required user actions; select installation of design files based on configuration (eg, Venlog vs. VHDL); optional configuration, such as parameter settings and prompt the user about the configuration, check the validity of the combination, and borrow Update 1101 ^ source-code and HDL source files without editing; and synthesis functions, such as user interface, which analyzes technology archives to select I / O boards, technology-independent restrictions, and synthesizes scripts, boards Tips for inserting and technology-specific boards, and compilation of technology-independent formulas become technology-related originals. The configurable PCI bus interface is worthy of attention because of consistency check of its production parameters, configuration-based installation, and automatic modification of HDL files. In addition, previous synthesis techniques select different mappings based on the user's target format, allowing the mapping to optimize speed, power, area, or target components. In this regard, it is not possible in the prior art to obtain feedback on the effects of reconfiguring the processor in these ways without adopting a design that goes through the entire mapping process. This feedback can be used to guide further reconfiguration of the processor until the system design goals are achieved. The second category of the prior art operates in fields generated by configurable processors (ie, automatic retargeting of editors and translators) including a wide range of academic research; reference, for example, Hanon et al. Instruction Selection, Resource Allocation and Scheduling in the AVIV Retargetable Code Generator " (from 12 paper sizes used for the code generator applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 ___B7_ V. Description of the invention ( l〇) (Please read the notes on the back before filling out this page) Automatic machine instructions); Fauth et al. "Describing Instruction Set Processors Usingn ML"; Rarnsey et al. "Machine Descriptions to Build Tools for Embedded Systems " Aho et al.'S `` Code Genreation Using Tree Matching and Dynamic Programming '' (algorithms to match transitions related to various machine instructions, such as add, load, store, branch, etc., to be used by some Machine-independent intermediate pattern The sequential operation); Cattell and the "Formalization and Automatic Derivation of Code Generators" (editor is used for the study summary WH machine structure). Once the processor is designed, its operation must be confirmed. That is, in general, a processor uses a pipeline to execute instructions from a stored program in a phase where each step corresponds to the execution of a group of instructions. Therefore, changing or adding a group of instructions or changing the configuration may require extensive changes in the logic of the processor, so that multiple sets of multiple pipeline steps can perform appropriate actions under each of the instructions. The configuration of the processor needs to be revalidated, and this validation is adapted to changes and additions. This is not a simple task. A processor is a complex logic element with internal data and control states that are extended. The combination of control and data and programs enables the processor to identify a set of required technologies. Adding processor validation is not easy because it is not easy to develop a proper validation tool. Because confirmation is not automatic in the prior art, its flexibility, speed, and reliability are not optimal. Moreover, once a processor is designed and validated, it is not particularly useful if it cannot be easily planned. Generally, the processor planning is to use 13 paper sizes to apply Chinese National Standard (CNS) A4 specifications (210X297 mm) 539965 A7 __B7_ V. Invention Description (11) (Please read the precautions on the back before filling this page) Software software tools include editors, translators, linkers, debuggers, simulators, and styling tools. When the processor changes, the software tool must also change. Adding a set of instructions is not beneficial 'if the instructions cannot be edited, combined, simulated, or debugged. The cost of software changes related to processor modifications and enhancements has been one of the major obstacles to elastic processor design in the prior art. Therefore, it can be understood that the previous technology processor design has a certain degree of difficulty, so in general, the processor is generally not designed or modified for a specific application. At the same time, it is understood that if the processor can be configured or extended for a particular application, a considerable improvement in system efficiency is possible. Furthermore, the efficiency and effects of the design process can be enhanced if it can be used to contribute to manufacturing characteristics, such as power consumption, speed, etc. in improving processor design. In addition, in the prior art, once a processor is modified, considerable effort is required to confirm the corrective operation of the processor after the modification. Finally, although previous technologies provided limited processor configuration, they could not provide the generation of software development tools that were tailored for use by the configured processor. Although a group of systems meeting the above criteria must be an improvement over technology, the improvement can be made, for example, by requiring the processor system to have instructions to access or modify information stored in a special register, that is, the processor State, which significantly limits the range of instructions available, thus limiting the amount of performance improvement that can be achieved. At the same time, the invention of new application-specific instructions covers the coordination between complex cycle calculation reductions, additional hardware resources, and CPU cycle time shocks. 14 This paper size applies Chinese National Standard (®S) A4 specification (210X297 mm) 539965 A7 _B7_ V. Description of the invention (12) (Please read the precautions on the back before filling this page) Other challenges are to get new instructions The effective hardware production does not cover the details of the application developers who usually need high-tech production of high-performance microprocessors. The above system gives users flexibility to design a complete and suitable processor, but it is inconvenient for the development of interaction between hardware and software. To understand this more fully, consider the general approach used by many software designers to adjust the performance of their software applications. They will generally consider a substantial improvement, modify their software to use that substantial improvement, and then edit their software source to produce a set of operational applications containing the substantial improvement and then estimate the substantial improvement. Based on the results of this assessment, they may retain or discard significant improvements. In general, the entire process can be completed in minutes. This allows the user to experiment freely, try quickly, and retain or discard ideas. In some cases, evaluating only a set of feasible ideas is complicated. The user may want to test the idea in many situations. In this case, users often keep multiple versions of the application being edited: an original set and other versions containing considerable improvements. In some cases, substantial improvements may be interactive, and users may retain more than two sets of applications, each using a different subset of the substantial improvements. By retaining multiple versions, users can easily test different versions repeatedly in different environments. Users of configurable processors will want to jointly develop hardware and software in a manner similar to software developers developing software on traditional processors. Consider the case where a user adds custom instructions to a group of processing benefits. Users will want to interactively add instructions to their processors and test and evaluate the instructions on their specific applications. According to the previous technical department, the paper size of this paper applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7__ V. Description of the invention (13) In general, there are three sets of reasons for this difficulty. (Please read the precautions on the back before filling out this page) First, after proposing a feasible command, the user must wait for an hour or more before getting an editor and simulator that can use the command. Second, when a user wishes to experiment with many feasible instructions, the user must create and maintain a software development system for each group. The software development system can be very huge. Keeping many versions makes them unmanageable. Finally, the software development system is configured for the entire processor. It makes it difficult to separate development programs among different engineers. Consider a paradigm in which two groups of developers operate on specific applications. One group of developers may be responsible for determining the cache characteristics of the processor while others may be responsible for adding custom instructions. When the work of the two groups of developers is related, the groups are sufficiently separated so that each developer can operate its work in isolation. The cache developer may initially propose a specific configuration. Another group of developers started the configuration and tried many instructions to build a software development system with a set of feasible instructions. The cache developer then corrects its proposed cache configuration. The other group of developers must then build each of its configurations, as each group assumes the original cache configuration. With many developers operating on a set of projects, organizing different configurations can quickly become unmanageable. Brief Summary of the Invention The present invention overcomes these problems of the prior art and has the purpose to provide hardware production instructions and a set of software development tools that can automatically generate a set of processors simultaneously for planning processors from the same configuration format. Configure the processor's system. Another object of the present invention is to provide a system that can apply the Chinese National Standard (CNS) A4 specification (210X297 mm) to various paper sizes. 539965 A7 B7 _ V. Specification of the invention (14) Production and software tool optimization In addition, another object of the present invention is to provide a system that allows the configuration of various types of processors, including extensibility, dichotomy selection, and parameter modification. Another object of the present invention is to provide a system 'which can explain the instruction set structure of a processor in a language which can be easily made in hardware. It is a further object of the present invention to provide a system and method for developing and making instruction set extensions that modify processor states. Another object of the present invention is to provide a system and method for developing and making extensions to the instruction set of a configurable processor register. Another item of the present invention is to allow the user to customize the processor configuration in a few minutes by adding new instructions and to estimate the characteristics. The above purpose is achieved by providing an automatic processor generation system that uses a set of instructions selected and extended by a self-made processor instruction set in a standardized language to develop the configured definition of the target instruction set and the circuit description to make instructions Sets the required hardware description language (HDL), and development tools such as editors, translators, debuggers, and simulators that can be used to generate processor software and validate the processor. The fabrication of the processor circuit can be optimized under various criteria, such as area, power consumption, and speed. Once the processor configuration is generated, it can be tested and entered into the processor to iteratively optimize the system. In order to develop an automatic processor generation system according to the present invention, a set of instruction set structure description language is defined and configurable processors / system configuration tools. 17 This paper size is applicable to China National Standard (CNS) A4 specification (210X297 mm) f · (Please read the notes on the back before filling this page), OK 丨 539965 A7 ___B7_ V. Description of invention (l5) (Please read the notes on the back before filling this page) and development tools such as translator, Linkers, editors, and debuggers are generated. This is part of the development process, because although most tools are standard, they must be made to be automatically configured from the ISA specification. This part of the design process is usually completed by the designer or manufacturer of the automatic processor design tool itself. The automatic processor generation system according to the present invention operates as follows. A user, for example, a system designer, develops a set of configured instruction set structures. That is, using previously generated ISA definitions and tools, a set of configurable instruction set structures that follow certain ISA design goals is generated. Next, development tools and simulators are configured for this instruction set structure. Using the configured simulator, evaluation is performed to estimate the configurable instruction set structure and the core effects that are modified based on the results of the estimation. Once the configurable instruction set structure is in a satisfactory state, a set of confirmation sets is generated. With the software arguments of these programs, the system handles hardware arguments by developing a configurable processor. Then, using system goals such as: cost, performance, power, and information on available and available processors, the system is designed with a comprehensive system architecture that takes into account configurable ISA selection, extension, and processor feature selection. Use this comprehensive system structure to develop software, simulators, configurable instruction set structures and processor HDL production, processor ISA, HDL production, software and simulators configured by the system and system HDL designed-for use on the chip system design. At the same time, based on the system structure and the format of the wafer casting, a wafer casting is selected based on the estimation of the casting capacity of the related system HDL (unrelated to the processor selection in the prior art). Finally, it uses its foundry standard cell archives, configures the system synthesis circuit, places and quotes 18 paper sizes that are applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7_ V. Description of the invention (16) It also provides the ability to re-optimize layout and timing. Then 'if the design is not a single wafer type', the circuit board layout is designed 'and the wafer is manufactured and the boards are assembled. As explained above, many techniques are used to facilitate the extension of the processor design process automation. Specific mechanisms were first designed using technologies that address these issues and produced without the flexibility to modify or extend arbitrarily, but nevertheless allow for major functional improvements. By limiting arbitrary changes, its related issues are also limited. The second technique is to provide a single description of the changes and automatically generate modifications-or extend to all affected components. Processors designed with the prior art do not do this, because it is usually cheaper to make it manually once than to write a tool to make it automatically and use it only once. Shows the benefits of automation when work needs to be repeated many times. The third technology used was to create a set of databases to assist in evaluation and automatic configuration for sequential user estimates. Finally, the fourth technology provides hardware and software in a form that provides its own configuration. In a set of embodiments of the present invention, certain hardware and software are not written directly in standard hardware and software languages, but are enhanced by the addition of a preprocessor that allows configuration of the database Inquiry and generation of standard hardware and software language codes with substitutions, adjustments, reproductions, and other modifications. The core processor design is then completed with hooks that allow the enhancements to be linked. To demonstrate these techniques, consider the addition of application-specific instructions. Borrowing methods to 19 with register and constant operands and the results of register generation This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) 0 (Please read the precautions on the back before filling in this Page)-, τ 539965 A7 _ B7___ V. Description of invention (Π) (Please read the notes on the back before filling this page) instructions, the operation of the instructions can be specified only by combinational logic (stateless, no feedback). This input specifies the instruction's opcode designation, instruction name, group-translator syntax, and combinational logic, generated from the tool there: an instruction decoding logic for the processor to confirm the new operation code; a functional unit added to Perform the combined logic function on the register operands;-the instruction scheduling logic input to the processor to confirm the instruction is issued only when its operands are valid;-the translator is modified to accept the new opcode and its operation And modify the machine code;-the editor is modified to add new essential functions to access new instructions;-the decompiler / debugger is modified to compile the machine code as the new instruction; a simulator is modified to Accept new opcodes and perform assigned logic functions;-Diagnostic generator, generate both direct and random code sequences, which contain and check the results of the add instruction. All of the above techniques are used to add application specific instructions. The input is limited to the input and output operands and the logic to estimate them. Changes are described in one place and all hardware and software modifications are derived from that description. This device shows how a single input can be used to enhance multiple components. The result of this process is a system that is better than existing technologies for its application needs, because coordination between the processor and other parts of the system logic can be done later in the design process. It is superior to many of the prior art approaches discussed above in that its configuration can be applied to more representations. A set of single sources can be used for all ISA series. 20 This paper size is applicable to Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 ____B7_ V. Description of the invention (is) (Please read the notes on the back first Fill out this page again), software tools and high-level simulations can be included in the configurable package, and the process can be designed to iteratively find the best combination of configuration. Furthermore, the previous method only focused on hardware or software configuration without a single user interface for control or a user-oriented redefined measurement system. The present invention provides processor hardware and software The complete configuration process includes feedback from hardware design results and software performance to assist in the selection of the best configuration. These objectives are achieved in accordance with an argument of the present invention, which provides an automatic processor design tool that uses a standardized language's self-made processor instruction set to extend the description to develop a configurable definition of the target instruction set, a description of the production instruction Sets the required hardware description language of the circuit, and development tools such as editors, translators, debuggers, and simulators that can be used to develop and validate processor applications. This standardized language can handle modifying processor states or extending it using the configurable processor's instruction set. By providing an extended and optimized restricted area, the program can be highly automated, thereby facilitating rapid and reliable development. The above-mentioned object can be further achieved according to another argument of the present invention, which provides a system in which a user can retain feasible instructions or states (hereafter a combination of feasible configurable instructions or states will be collectively referred to as `` processing Multiple settings of the Enhancer ") and 'easily switch between them when evaluating their applications. The user selects and builds the basic processor configuration using the method described here. The user generates a new set of user-defined processor enhancements and places them in the file directory. The user then quotes a treatment for adding 21 paper sizes to the Chinese National Standard (CNS) A4 (210X297 mm) A7 539965 B7 V. Description of the invention (19) (Please read the precautions on the back before filling this page) Strengthen and transform them into tools that can be used in the form of basic software development tools. This conversion is very rapid because it only covers user-defined enhancements and does not build the entire software system. The user then refers to the basic software development tool and tells the tool to dynamically use the processor enhancements created in the new directory. Preferably, the directory location is given to the tool via command line selection or via environment variables. To further simplify the process, users can create files using standard software. This allows users to repair. Modify their processor instructions and then process the enhancements via a single production command and use the underlying software development system to re-establish and evaluate their applications in the new processor enhancements. The present invention overcomes three limitations of the prior art methods. Given a new group of viable enhancements, users can estimate new enhancements in minutes. By generating new catalogs for each group, users can keep many versions of feasible enhancements. Because the catalog contains only descriptions of the new enhancements, not the entire software system, the required storage space is minimal. Finally, other parts of the new enhancements self-configuration are untied. Once the user has generated the catalog with a new set of feasible enhancements, he can use the catalog in any basic configuration. The above and other objects of the present invention will become apparent when the following detailed description is read in conjunction with the accompanying drawings: FIG. 1 is a block diagram of a processor-made instruction set according to a preferred embodiment of the present invention; FIG. 2 It is a block diagram of the pipeline used in the processor according to the embodiment; Fig. 3 shows the configuration manager in the GUI according to the embodiment; 22 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7____ 5. Description of the invention (20) Figure 4 shows the configuration editor in the GUI according to the embodiment; Figure 5 shows the different types of configuration according to the embodiment; Figure 6 shows the processor group in the embodiment FIG. 7 is a block diagram of an instruction set simulator according to an embodiment. Figure 8 is a block diagram of an estimation board used in conjunction with a configured processor according to the present invention. Fig. 9 is a block diagram showing the logical structure of the configurable processor according to the embodiment; Fig. 10 is a block diagram showing the structure of adding a multiplier to Fig. 9; Fig. 11 is a diagram showing adding a multiplication and accumulation unit to the first 9 block diagram of the structure; < Figures 12 and 13 show the memory configuration in the embodiment; Figures 14 and 15 show the addition of user-defined functional units in the structure of Figure 8. FIG. 16 is a block diagram showing the information-message flow between system components in other preferred embodiments; FIG. 17 is a block diagram showing how a custom code for a software development tool is generated in the embodiment; FIG. 18 is a block diagram showing generation of various software modules used in other preferred embodiments of the present invention; FIG. 19 is a block diagram of a pipeline structure in a configurable processor according to the embodiment; FIG. 20 is According to the production of the state register in the embodiment; Figure 21 shows the 23 state register that needs to be produced in the embodiment. The paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) (please first Read the notes on the back and fill in this page) 、 町 — # 539965 A7 __B7_ ^ V. Description of the invention (21) Other logic diagrams; A combination of a state output from a state of many semantic blocks and selecting a group to be input to a state register; FIG. 23 shows the logic corresponding to semantic logic according to the embodiment; FIG. 24 shows when it is in the embodiment For a logical state of one bit of the register when the user is mapped to one bit. In general, the automatic processor generation process starts with a configurable processor definition and its user-specific modification, and the user whose processor is to be configured: the specified application. This information is used to generate a set of configured processors and to take into account user modifications and software development tools, such as editors, simulators, translators and anti-translators, and so on. At the same time, its applications were re-edited using new software development tools. The re-edited application is simulated using a simulator to generate a software profile describing the performance of the application executed by the configured processor, and the configured processor is estimated about silicon chip area usage, power consumption, speed, etc. To produce a set of hardware profiles for display processor circuits. The software and hardware profiles are fed back and provided to the user to motivate further iterations so that the processor can be optimized for that particular application. According to a preferred embodiment of the present invention, a set of automatic processor generation systems: 1 0 has four main components as shown in FIG. 1: a set of user configuration interface 20, users who wish to design a processor can use it Enter its configuration and extensibility options and other design restrictions; a set of software development tools 30, which can be designed for the processor to be designed by the user, such as user selection criteria; a set of parameters for the processor's hardware Extensible description; and a set of 24 paper sizes applicable to Chinese National Standard (CNS) A4 specifications (210X297 mm) 539965 A7 B7 V. Description of invention (22) (Please read the precautions on the back before filling this page) Establish a system 50 that receives input data from the user interface to 'generate a self-synthesizable hardware description of the processor required and modify the software development tools to accommodate the selected design. Preferably, the building system 50 additionally generates diagnostic tools to confirm its hardware and software design and a set of evaluators to evaluate hardware and software characteristics. As used herein and in the scope of an additional patent application, "hardware production instructions" indicates a set of or more descriptions that explain the actual production arguments for the processor design and, alone or in addition, a set of or more other descriptions, to facilitate the basis The design of the chip is manufactured. Therefore, the components of the hardware production instructions can be abstracted to varying degrees, from, for example, a high-level hardware description language, via a netlist and microcoded to a mask description. However, in this embodiment The main components of the hardware production description are HDL, netlist, and originally written in: Further, HDL, as used here and in the scope of additional patent applications, is used to describe the microstructure, The general category of hardware description language, etc., is not intended to mean an example of any particular language. 0 In this embodiment, the basic configuration of the processor is the structure 60 shown in Figure 2. Some elements of the structure are impossible. Basic features directly modified by the user. These include processor control section 62, alignment and decoding section 64 (although parts of this area are based on User-specific configuration), ALU and address generation part 6, branch logic and instruction fetching 6, 8 and processor interface 70. The other units are part of the basic processor but still user-configurable These include interrupt control section 72, data and instruction address viewing sections 74 and 76, window register file 78, data and instruction cache, and 25. This paper standard applies to China National Standards (CNs) A4 specifications ( 210X297 mm) 539965 A7 _B7 _ V. Description of the invention (23) (Please read the precautions on the back before filling out this page) Label section 80, write buffer 82 and timer 84. The rest shown in Figure 2 Some are selectively included by the user. The central component of the processor configuration system 10 is the user configuration interface 20. This is a module that can best provide the user with a set of graphical user interface (GUI) , By which it may be used to select processor reconfigurations including editors and re-generation of translators, anti-translators, and instruction set simulators (IS S); and processor functionality for full processor synthesis, placement, and Input preparation for routing. It also allows users to take advantage of the rapid evaluation of processor area, power consumption, cycle time, application performance, and code size for further iteration and enhancement of processor configuration. Preferably, the GUI also accesses a set of Configure the database to obtain the original design and perform error checking on user input. In order to use the automatic processor generation system 10 according to this embodiment, a set of processors 60 is designed, and the user enters design parameters into the user configuration interface 20 The automatic processor generation system 10 can be implemented as a stand-alone system in a computer system under user control; however, it is preferably a system that mainly operates under the manufacturer's control of the automatic processor generation system 10. Users Access can then be provided over a communication network. For example, the GUI may be provided using a web browser and having a data input screen written in HTML and: Tava. This has many advantages, such as maintaining the confidentiality of any proprietary software, simplifying the maintenance and updating of proprietary software, and so on. In this case 'in order to access the GUI, the user may first log in to the system 10 to verify his identity. Once the user has accessed, the system displays the configuration manager screen 86 as shown in Figure 3. The Configuration Manager screen 86 is a directory listing all configurations accessible to the user. The configuration manager screen in Figure 3 8 6 26 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7_ V. Description of the invention (24) Shows that the user has two sets of configurations "&Quot; just intr " and" high prio ", the first group has been established, that is, the finalization of the product, and the second group is to be established. From this screen, the user can create a set of selected configurations, delete them, edit them, generate a set of reports indicating which set of configurations and extended selections have been selected for that configuration, or generate a new set of sets state. For those configurations that have already been created, such as '' just intr ", a set of custom software development tools 30 can be downloaded. Creating a new set of configurations or editing a set of existing configurations presents the configuration editor 88 shown in FIG. The configuration editor 88 has a set of "selection" partial menus on the left, which show various general items of the processor 60 that can be configured and extended. When a set of selection parts is selected, a screen with configuration selections for that part appears on the right, and these choices can be used to pull down menus, memory boxes, check boxes, wireless buttons, etc. as is known in the technology And so on. Although the user can select options and enter data arbitrarily, it is best to enter the data sequentially, because there are logical dependencies between the departments; for example, to " break " the department appropriately displays the option 'break number' Must be already selected in the "ISA Selection" department. In this embodiment, the following configuration options are available for each department: Target Evaluation Technology Target ASIC Technology: .18, .25, .3 5 micron target operating conditions : Normal, worst-case production target speed: Any 27 paper sizes apply Chinese National Standard (CNS) A4 specifications (210X297 mm) (Please read the precautions on the back before filling this page), OK — 539965 A7 B7 5. Description of the invention (25) (Please read the notes on the back before filling this page) Logic gate calculation: Any target power: Any target priority: Speed, area power; Speed, power, area ISA selection number値 Select MACI6: yes, no 1 6-bit multiplier with 40-bit accumulator: yes, no Number of exception selection interrupts: 0-32 High priority interrupts Alignment: 0-14 Trigger debugging: yes, no Number of timers: 0-3 Other bit order: slightly endian, large endian Number of registers that can be used to call the window: 32, 64 Processor cache and memory processing Interface read width (bits): 32, 64, 128 write-buffer entries (address / number pairs): 4, 8, 16, 32 processor cache instruction / data cache scale (kB ): l, 2, 4, 8, 16 instruction / data cache line size (kB): 16, 32, 64 Peripheral component timer timer interruption number 28 This paper size applies to China National Standard (CNS) A4 specification (210X297 (Mm) 539965 A7 B7 V. Description of the invention (26) (Please read the precautions on the back before filling out this page) Timer interrupt level debug support instruction number of address split point register. · 0-2 Information Number of address split point registers: 〇-2 Debug interrupt level trace 璋: yes, no On-chip debug module: yes, η ο Full scan: yes.no Interrupt source: external, software priority Quasi-system memory address vector and address calculation method. · × ΤΞ, manual parameter RAM configuration Inch, start address: any ROM size, start address: any XTOS: any configuration-specific address user exception vector: any core exception vector: any register window overflow / non-overflow vector Basis: Arbitrary reset vector: Arbitrary XTOS start address: Arbitrary application start address: Arbitrary 29 This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _ B7_ V. Description of the invention (27) TIE Directive (Please read the notes on the back before filling this page) (define ISA extension) Target CAD environment simulation

VerilogTM : yes,no 合成 設計 CompilerTM:yes,no 置放以及引導VerilogTM: yes, no synthesis design CompilerTM: yes, no placement and guidance

Apollo™ : yes » no 另外,系統1 〇可以提供選擇以添加其他功能性單元, 例如:一組32-位元整數乘/除單元或一組浮動點算術單元; 一組記憶體管理單元;晶片上RAM以及ROM選擇;快取連 接;增強之DSP以及共同處理器指令集;一組寫回快取;多 處理器同步;編輯器引導之推算;以及對於另外CAD封裝 之支援。不論對於所給予的可組態處理器有何種組態選擇 是可用的,它們最好是被列於定義檔案中(例如展示於附錄 A),其一旦使用者選擇適當的選擇後,系統10便使用以供 語法檢查等等。 從上述說明中,可知自動處理器組態系統1 〇提供兩組 主要型式的組態性300給使用者,如第5圖展示:可延伸性 3 02,其允許使用者自名單定義任意的功能以及結構,以及 可改變性304,其允許使用者自預定受限制之一組選擇去選 擇。在可改變性內,系統允許某種特點之二分法選擇306, (例如,是否一組MAC16或一組DSP應該被添加至處理器 30 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ 五、發明説明(28 ) (請先閲讀背面之注意事項再填寫本頁) 60)以及其他的處理器特點之參數性格式3 08,例如,中斷 之數目和快取尺寸。 許多上述之組態選擇是相關技術人員所熟習的;但. 是,其他則需要特別注意。例如,該RAM以及ROM選擇允 許設計者包含名單板(scratch pad)或軔體於處理器1〇本身。。 處理器1 〇可以自這些記憶體擷取指令或讀取並寫入資料。 其記憶體之尺寸以及安置是可組態的。在這實施例中,這 些各組記憶體被存取爲組別結合性快取的另外組別。藉由 比較單一標籤項,記憶體中之命中率可以被檢測出。 系統10提供中斷之分別組態選擇(製作位準1中斷)以及 高優先順序中斷選擇(製作位準2-1 5中斷以及非可遮罩中 斷),因爲各高優先順序中斷位準需要三組特別的暫存器, 而這些因此則更加昂貴。 具備40-位元累積器選擇(展示於第2圖中90)之MAC 16以 一組40-位元累積器添加一組16-位元乘法器/相加功能,八 組1 6-位元操作元暫存器與一組結合乘法、累積、操作元負。 載以及位址更動指令之複合指令。該操作元暫存器可以自 與相乘/累積操作平行之記憶體被負載以1 6-位元値之組對。 這元件可以以每週期兩組負載以及一組相乘/累積支持演算 法。 該晶片上除錯模組92(展示於第2圖之92)被使用以經由 JTAG埠94存取處理器60之內部可見軟體狀態。該晶片上除 錯模組92對例外產生提供支援以置處理器60於除錯模式 中;存取所有的可見程式暫存器或記憶體位置;執行處理 31 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 __ _B7_ 五、發明説明(29 ) 、 (請先閲讀背面之注意事項再填寫本頁) 器60被組態以執行之任何指令;修改pC以跳至指令碼中所 需的位置;以及實用性以允許返回至正常操作模式,其經 由JTAG埠94自處理器60之外部被觸發。 一旦處理器10進入除錯模式,其等待一組自外部世界 有關其有效指令已經經由JTAG埠94被掃瞄進入之指示。處 理器接著執行這指令並且等待下一有效指令。一旦處理器 10之硬體製作已經被製造了,這晶片上除錯模組92便可以 被使用以除錯系統。處理器1 0之執行可以經由執行於遠處· 主機之除錯器而控制。該除錯器經由JTAG埠94與處理器界 面化並且使用晶片上除錯模組92之能力以決定並控制處理 器1 〇之狀態以及控制指令之執行。 至多三組32-位元計數器/計時器84可以被組態。這需要 使用一組增量各時脈週期之32-位元暫存器,以及(對於各被 組態計時器)一組比較暫存器以及一組比較該比較暫存器內 容與目前時脈暫存器計算之比較器,配合中斷以及相似特 點使用。計數器/計時器可以被組態爲邊緣觸發並且可以產 生正常或高優先順序內部中斷。 該推測選擇藉由允許負載被推測性地移動至其中它們 不會永遠被執行之控制資料流而提供更大之編輯器排程彈-性。因爲負載可能導致例外,此負載移動可以引介例外進 入原先不會發生之有效程式。當負載被執行時,推測性負u 載則防止這些例外發生,但當需要資料時提供一組例外。 取代於導致一組負載錯誤之例外,推測性負載重置目的地 暫存器之有效位元(相關於這選擇之新的處理器狀態)。 32 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7___ 五、發明説明(30 ) 雖然核心處理器60最好是具有某些基本的管線同步能 力,當多重處理器被使用於系統中時,在處理器之間某些。 種類之通訊以及同步爲所需的。在某些情況中’例如輸Λ 以及輸出佇列之自我同步通訊技術被使用。在其他的情況 中,一組被分享之記憶體模式被使用以供通訊並且必須提 供指令集支援以供同步’因爲被分享之記憶體並未提供其 所需的語意。例如,具備取得以及釋放語意之另外的負載 以及儲存指令可以被添加。這些對於控制多重處理器系統 中之記憶體參考的順序是有用的,其中不同的記憶體位置 可以被使用以供用於同步以及資料’以至於在同步參考之 間的確切順序必須被保持。其他的指令可以被使用以產生 習知技術中之信號機系統。 在某些情況中,一組被分享之記憶體模式被使用以供 通訊,而且必須提供指令集支援以供同步,因爲被分享之 記憶體並未提供所需的語意。這可藉由多重處理器同步選 擇完成。 也許在組態選擇當中最顯著的是TIE指令定義,設計者 定義指令執行單元96自該處被建立。由加州SantaClara之 Tensilica公司生產之TIETM(Tensilica指令集延伸)語言允許 使用者以延伸以及新的指令形式說明其應用之定製功能以 擴增基礎I S A。另外,由於TIE之彈性,其可以被使用以說 明無法由使用者改變部份之ISA ;以此方式,整個ISA可以 被使用以一致地產生軟體發展工具3〇以及硬體製作說明-40。一組TIE說明使用一些構成區塊以描述新的指令之屬 33 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) .訂— 539965 A7 B7 ___ 五、發明説明(31 ) 性,如下所示: --指令欄 一指令類別 --指令操作碼 一指令語意 …指令操作元 一常數表 指令欄陳述field被使用以改進該TIE碼之讀取能力。該· 等欄爲其他被群聚一起並且依名稱參考的欄之子集或連 鎖。指令中完全位元組爲最高的位準超組欄inst,而這欄可 以被分割成爲較小的欄。例如, fieldxinst[l 1:8] fieldyinst[15 :12] fieldxy {x,y} 定義兩組4-位元欄,x以及y,如最高的位準欄Inst之副 欄(分別地爲位元8-11和12-15)而8-位元欄xy爲X和y欄之連 鎖。 該陳述opcode定義操作碼爲編碼特定欄。將被因此定 義之操作碼所使用之有意指定操作元之指令欄,例如,暫-存器或立即常數,必須首先以欄陳述被定義並且接著以操 作元陳述被定義。 ^ 例如,Apollo ™: yes »no In addition, System 10 can provide options to add other functional units, such as: a set of 32-bit integer multiply / divide units or a set of floating-point arithmetic units; a set of memory management units; a chip RAM and ROM selection; cache connection; enhanced DSP and co-processor instruction set; a set of write-back cache; multi-processor synchronization; editor-guided estimation; and support for additional CAD packages. Regardless of what configuration options are available for a given configurable processor, they are preferably listed in a definition file (for example, shown in Appendix A). Once the user selects the appropriate option, the system 10 It is used for syntax checking and so on. From the above description, it can be seen that the automatic processor configuration system 10 provides two sets of main types of configurability 300 to users, as shown in Figure 5: extensibility 3 02, which allows users to define arbitrary functions from the list And structure, and changeability 304, which allows the user to choose from a predetermined limited set of options. Within the changeability, the system allows a dichotomy of certain characteristics to be selected 306, (for example, whether a set of MAC16 or a set of DSPs should be added to the processor 30. This paper size applies the Chinese National Standard (CNS) A4 specification (210X297) PCT) 539965 A7 _B7_ V. Description of the invention (28) (Please read the notes on the back before filling out this page) 60) and other parametric formats of processor characteristics 3 08, such as the number of interrupts and cache size. Many of the configuration options described above are familiar to those skilled in the art; however, others require special attention. For example, the RAM and ROM options allow the designer to include a scratch pad or the processor on the processor 10 itself. . The processor 10 can fetch instructions from these memories or read and write data. Its memory size and placement are configurable. In this embodiment, these sets of memories are accessed as another set of group-associative caches. By comparing a single tag item, the hit rate in memory can be detected. System 10 provides separate configuration options for interrupts (production level 1 interrupts) and high priority interrupt selections (production levels 2-1 5 interrupts and non-maskable interrupts), because each high priority interrupt level requires three groups Special registers, and these are therefore more expensive. MAC 16 with 40-bit accumulator selection (shown at 90 in Figure 2) Adds a group of 16-bit multiplier / add functions with a group of 40-bit accumulators, eight groups of 16-bit The operand register is combined with a set of multiplication, accumulation, and operand negative. Load and address change instructions. The operand register can be loaded in 16-bit units from the memory parallel to the multiply / accumulate operation. This element can support the algorithm with two sets of loads per cycle and a set of multiply / accumulate. The on-chip debug module 92 (shown in 92 in FIG. 2) is used to access the internal software state of the processor 60 through the JTAG port 94. The on-chip debug module 92 provides support for exception generation to place the processor 60 in debug mode; accesses all visible program registers or memory locations; executes processing 31 This paper standard applies to Chinese national standards (CNS ) A4 specification (210X297 mm) 539965 A7 __ _B7_ V. Description of invention (29), (Please read the precautions on the back before filling this page) Any instruction that the device 60 is configured to execute; modify the pC to skip to the instruction The desired location in the code; and practicality to allow return to normal operating mode, which is triggered from outside the processor 60 via JTAG port 94. Once the processor 10 enters the debug mode, it waits for a set of instructions from the outside world that its valid instructions have been scanned in via JTAG port 94. The processor then executes this instruction and waits for the next valid instruction. Once the hardware of the processor 10 has been manufactured, the on-chip debug module 92 can be used to debug the system. The execution of the processor 10 can be controlled by a debugger that is executed at a remote host. The debugger is interfaced with the processor via the JTAG port 94 and uses the ability of the on-chip debug module 92 to determine and control the state of the processor 10 and the execution of control instructions. Up to three sets of 32-bit counters / timers 84 can be configured. This requires the use of a set of 32-bit registers in increments of each clock cycle, and (for each configured timer) a set of comparison registers and a set of comparisons of the contents of the comparison register with the current clock Comparator for register calculation, used with interrupts and similar features. Counters / timers can be configured as edge triggered and can generate normal or high priority internal interrupts. This speculative selection provides greater editor scheduling flexibility by allowing loads to be speculatively moved to a control data stream in which they will not always be executed. Because the load may cause an exception, this load movement can introduce the exception into a valid program that would not have occurred before. The speculative u load prevents these exceptions from occurring when the load is executed, but provides a set of exceptions when information is needed. Instead of the exception that caused a set of load errors, the speculative load resets the valid bits of the destination register (relative to the new processor state of this choice). 32 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7___ V. Description of the invention (30) Although the core processor 60 preferably has some basic pipeline synchronization capabilities, when multiple processors are When used in a system, some are between processors. Kinds of communication and synchronization are required. In some cases, self-synchronous communication techniques such as input Λ and output queue are used. In other cases, a set of shared memory patterns is used for communication and must provide instruction set support for synchronization 'because the shared memory does not provide its required semantics. For example, additional loads and store instructions with semantics of fetch and release can be added. These are useful for controlling the order of memory references in a multiprocessor system, where different memory locations can be used for synchronization and data 'so that the exact order between synchronized references must be maintained. Other instructions can be used to generate the signal system in the conventional art. In some cases, a set of shared memory modes is used for communication, and instruction set support must be provided for synchronization, because the shared memory does not provide the required semantics. This can be done through multiple processor simultaneous selection. Perhaps the most significant of the configuration choices is the TIE instruction definition, where the designer-defined instruction execution unit 96 is established. The TIETM (Tensilica Instruction Set Extension) language produced by Tensilica, Inc. of Santa Clara, California allows users to explain the application's custom functions in extensions and new instructions to augment the basic IS. In addition, due to the flexibility of TIE, it can be used to indicate that parts of the ISA cannot be changed by the user; in this way, the entire ISA can be used to consistently generate software development tools 30 and hardware production instructions -40. A set of TIE instructions uses some of the building blocks to describe the new directive. 33 This paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the precautions on the back before filling this page). Order — 539965 A7 B7 ___ 5. The description of the invention (31) is as follows:-Instruction column-Instruction category-Instruction operation code-Instruction semantics ... Instruction operand-Constant table Instruction column statement field is used to improve the TIE code Reading ability. This column is a subset or chain of other columns grouped together and referred to by name. The complete byte in the instruction is the highest level supergroup column inst, and this column can be split into smaller columns. For example, fieldxinst [l 1: 8] fieldyinst [15:12] fieldxy {x, y} defines two sets of 4-bit columns, x and y, such as the secondary column of the highest level column Inst (respectively bits) 8-11 and 12-15) and the 8-bit column xy is a linkage of the X and y columns. The statement opcode defines the opcode as encoding a specific column. Instruction columns that intentionally specify operands, such as temporary-registers or immediate constants, to be used by the opcode thus defined, must first be defined in column statements and then defined in operand statements. ^ For example,

opcode acs ορ2 = 4^0000 CUSTOopcode acs ορ2 = 4 ^ 0000 CUSTO

opcode adsel op2=4*b0001 CUSTO 依據先前被定義操作碼CUST0(4’b0000指示一組四位元 長二進位常數0000)定義兩組新的操作碼,acs以及adsel。 該較佳核心ISA之TIE格式具有下列陳述 34 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)opcode adsel op2 = 4 * b0001 CUSTO defines two new sets of opcodes, acs, and adsel, based on the previously defined opcode CUST0 (4'b0000 indicates a set of four-bit long binary constants 0000). The TIE format of this preferred core ISA has the following statement 34 This paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page)

•、可I # 539965 A7 B7 五、發明説明(32 ) field opO inst[3:0] field op 1 inst[19: :16] field op2 inst[23 : :20] opcode QRST 〇p0 = 4'b0000 opcode CUST0 〇ρ1=4^0100 (請先閲讀背面之注意事項再填寫本頁) 作爲部份之其基礎定義。因此,acs以及adsel之定義導 致該TIE編輯器產生指令解碼邏輯,分別地表示如下: inst[23:0] = 0000 01 10 xxxx xxxx xxxx 0000 inst[23:0] = 0001 01 10 xxxx xxxx xxxx 0000 指令操作元陳述opr and確認暫存器以及立即常數。但 是,在定義一攔爲一組操作元之前,其先前必須已經被定 義爲一組如上述之欄。如果該操作元爲一組立即常數’則 該常數値可以被產生自操作元,或其可以被採取自一組先 前被定義之如下面說明被定義之常數表。例如,編碼一組 立即操作元,該TIE碼 field offset inst[23:6] operand offests4 offset { assign offsets4 = {{14{offset [ 17]}},offset}«2 ; }{ wire [31:0] t ; assign t = offsets4»2 i assign offset = t[17:0] } 定義一組名稱爲offset保持一符號數目的18-位元欄以 及一組爲偏移欄中儲存之數目的四倍操作元offsets。 35 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(33 ) (請先閲讀背面之注意事項再填寫本頁) operand陳述之最後部份確實地說明該電路被使用以進行供 描述組合電路之VerilogTM HDL之子集中的計算,其對於-熟習本技術之人員是明顯的。 。 此處,該wire陳述定義一組名稱爲t之三十二位元寬邏 輯接線。在接線陳述之後的第一assign陳述指定驅動邏輯接 線之邏輯信號爲移位至右方之〇ffsets4常數,而第二assign 陳述指定t之更低位十八位元被置放入offset欄。該第一 assign陳述直接地指定offsets4操作元之値爲offset之連鎖以 及其符號位元(位元17)之十四組複製,接著有兩位元之左方 移位。 對於常數表操作元,該TIE碼 table prime 16 { 2, 3, 5, 7, 9, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53 operand prime一 s S { assign \ [ prime一 s =prime[s]; /1 assign s =prime. s ==prime[0] ? 4,b0000 prime. _s ==prime[l] ? 4,b0001 prime· _s ==prime[2] ? 4,b0〇l〇 prime. _s ==prime[3] ? 4,b0011 prime. _s ==prime[4] ? 4’b0100 prime. _s ==prime[5] ? 4,b0101 prime. _s ==prime[6] ? 4,b0110 prime· _s ==prime[7] ? 4’b0111 prime. _s ==prime[8] ? 4,bl000 prime. _s ==prime[9] ? 4,bl001 36 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(34 ) prime_s ==prime[10] ? 4,bl010 prime_s ==prime[ll] ? 4,bl011 prime_s ==prime[12] ? 4,bll00 prime_s ==prime[13] ? 4,bll01 prime_s ==prime[14] ? 4’blllO 4,bllll; 使用該table陳述以定義一組常數之陣列prime(接著表 名稱之數目爲表中元件之數目)並且使用該操作元s作爲一 組指數進入表Prime以編碼操作元prime_s2 —組値(注意定 義該指數中VerilogTM陳述之使用)。 指令類別陳述iclass以一種共同格式連結操作碼與操作 元。所有以iclass陳述被定義的指令具有相同格式以及操作 元使用。在定義一組指令類別之前,其構件必須被定義, 第一爲欄並且接著爲操作碼以及操作元。例如,使用於先 前範例建立於碼定義操作碼a c s以及a d s e 1,另外的陳述 (請先閲讀背面之注意事項再填寫本頁) -訂·• 、 可 I # 539965 A7 B7 V. Description of the invention (32) field opO inst [3: 0] field op 1 inst [19:: 16] field op2 inst [23:: 20] opcode QRST 〇p0 = 4'b0000 opcode CUST0 〇ρ1 = 4 ^ 0100 (Please read the notes on the back before filling this page) as its basic definition. Therefore, the definition of acs and adsel causes the TIE editor to generate instruction decoding logic, which are expressed as follows: inst [23: 0] = 0000 01 10 xxxx xxxx xxxx 0000 inst [23: 0] = 0001 01 10 xxxx xxxx xxxx 0000 The instruction operand states opr and acknowledge registers and immediate constants. However, before a block can be defined as a set of operands, it must have been previously defined as a set of columns as described above. If the operand is a set of immediate constants, then the constant 値 can be generated from the operand, or it can be taken from a set of previously defined constant tables as defined below. For example, to encode a set of immediate operands, the TIE code field offset inst [23: 6] operand offests4 offset {assign offsets4 = {{14 {offset [17]}}, offset} «2;} {wire [31: 0 ] t; assign t = offsets4 »2 i assign offset = t [17: 0]} defines a set of 18-bit columns whose name is offset to hold one symbol and a set of four times the number stored in the offset column Operand offsets. 35 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (33) (Please read the precautions on the back before filling out this page) The final part of the operand statement definitely states This circuit is used to perform calculations in a subset of Verilog ™ HDL for describing combination circuits, which will be apparent to those skilled in the art. . Here, the wire statement defines a set of thirty-two-bit wide logic wires named t. The first assignment statement after the wiring statement specifies that the logic signal driving the logic wiring is a constant of 0ffsets4 shifted to the right, and the second assignment statement specifies that the lower eighteen bits of t are placed in the offset column. The first assign statement directly specifies that the offset 4 operand is the linkage of offset and its fourteen copies of the sign bit (bit 17), followed by a left shift of two bits. For constant table operands, the TIE code table prime 16 {2, 3, 5, 7, 9, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53 operand primes S {assign \ [prime-s = prime [s]; / 1 assign s = prime. S == prime [0]? 4, b0000 prime. _S == prime [l]? 4, b0001 prime · _s == prime [2]? 4, b0〇l〇prime. _s == prime [3]? 4, b0011 prime. _s == prime [4]? 4'b0100 prime. _s == prime [5]? 4, b0101 prime. _s == prime [6]? 4, b0110 prime · _s == prime [7]? 4'b0111 prime. _s == prime [8]? 4, bl000 prime. _s == prime [9]? 4 , Bl001 36 This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (34) prime_s == prime [10]? 4, bl010 prime_s == prime [ll]? 4 Bl011 prime_s == prime [12]? 4, bll00 prime_s == prime [13]? 4, bll01 prime_s == prime [14]? 4'blllO 4, bllll; Use this table statement to define an array of constants prime (then the number of table names is the number of elements in the table) and use the operand s as a set of index values. Enter Prime to encode the operand prime_s2—group (note the use of the VerilogTM statement in this index). The instruction class statement iclass links opcodes and operands in a common format. All instructions defined as iclass statements have the same format and use of operands. Before defining a set of instruction categories, its components must be defined, the first is the column and then the opcode and operand. For example, the previous example is based on the code definition opcodes a c s and a d s e 1, and other statements (please read the precautions on the back before filling this page)-Order ·

operand art t { assignart = AR[t] ; } {} operand ars s { assignars = AR { s } ; } {) operand arr r {assignAR[r] = arr ; } U #· 使用operand陳述以定義三組暫存器操作元art,ars以及· arr(再次地注意定義中Veril〇gTM陳述之使用)。接著,該ώ i c 1 a s s陳述 iclass viterbi { adsel,ac s} {out arr,in art,in ars) 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _ B7___ 五、發明説明(35 ) 指定其操作元adsel以及acs屬於指令viterbi之共同類 別,其採用兩組暫存器操作元art以及ars爲輸入並且寫入輸 出至一組暫存器操作元arr。 該指令語意陳述semantic說明一組或多組指令使用供編 碼操作元的Veril〇gTM2相同子集行爲。藉由定義多重指令 於一組單一語意陳述中,某些共同表示可以被分享並且硬 體製作可以被製作爲更有效益。被允許於語意陳述中之變 數爲被定義於陳述之操作碼列表中操作碼之操作元,以及 指定於操作碼列表中之各操作碼有一組單一位元變數。這 變數具有與操作碼之相同名稱並且當操作碼被檢測時,估 算至1。其被使用於計算部份(乂^丨1〇§^子集部份)以指示對 應指令之存在。 例如,TIE碼定義一組新的指令ADD 8_4,其進行一組 3 2-位元字組之四組8-位元操作元與於其他的一組3 2-位元字 組的分別8-位元操作元之相加,以及一組新的指令 MINI6_2,其進行在一組32-位元字組中兩組16-位元操作元 以及於可讀取的其他一組32-位元字組的分別16-位元操作元 之間最小選擇:operand art t {assignart = AR [t];} {} operand ars s {assignars = AR {s};} {) operand arr r {assignAR [r] = arr;} U # · Use operand statements to define three groups The register operands art, ars, and · arr (again note the use of the VerilOgTM statement in the definition). Next, the free ic 1 ass statement iclass viterbi {adsel, ac s} {out arr, in art, in ars) This paper size is applicable to China National Standard (CNS) A4 specifications (210X297 mm) 539965 A7 _ B7___ V. Invention Explanation (35) specifies that its operands adsel and acs belong to a common category of the instruction viterbi, which uses two sets of register operands art and ars as inputs and writes output to a set of register operands arr. The instruction semantic statement semantics states that one or more groups of instructions use the same subset of VerilOgTM2 for encoding operands. By defining multiple instructions in a single set of semantic statements, certain common representations can be shared and hardware production can be made more efficient. The variables allowed in a semantic statement are the operands defined in the opcode list of the statement, and each opcode specified in the opcode list has a single set of bit variables. This variable has the same name as the opcode and is estimated to 1 when the opcode is detected. It is used in the calculation part (乂 ^ 丨 1〇§ ^ subset part) to indicate the existence of the corresponding instruction. For example, the TIE code defines a new set of instructions ADD 8_4, which performs four sets of 8-bit operands of a set of 3 2-bit characters and separates 8-bits from other sets of 3 2-bit characters. Addition of bit operands, and a new set of instructions MINI6_2, which performs two sets of 16-bit operands in a set of 32-bit words and other sets of 32-bit words that can be read Minimum selection between groups of 16-bit operands:

opcode ADD8_4 op2=4'b(XXX) CUSTOopcode ADD8_4 op2 = 4'b (XXX) CUSTO

opcode MIN16—2 op2=4'b0C01 CUSTO iciass add_min {ADD8_4,MIN16—2} {out arr, in ars, in art} semantic add—min {ADD8_4, MIN16—2} { wire [31:0] add, min; wire [7:0] add3, add2, addl, addO; wire [15:0] mini, minO; 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) f! (請先閲讀背面之注意事項再填寫本頁) •、^τ— 539965 A7 B7 五、發明説明(36 ) assign add3 = art[31:24] + ars[31:24]; assign add2 = art[23:16] + ars[23:16]; assign addl = art[15:8] + ars[15:8]; assign addO = art[7:0] + ars[7:0]; assign add = {add3, add2, addl, addO} ·’ assign mini = art[31:16] < ars[31:16] ? art[31:16] : ars[31;16]; assign minO = art[15:0] < ars[15:0] ? art[15:0] ars[15:0]; assign min = {mini , minO}; assign arr = (({32{{ADD8_4}}}) & (add)) | (({32{{MIN16一2}}}) & (min)); 此處,〇p2,CUSTO,arr,art以及ars爲如上面所述被 預先定義之操作碼,而opcode以及iclass陳述運作功能如上 述。 該semantic陳述指定由新的指令達成計算。如對於熟習 於技術之人員是相當明顯的,該semantic陳述中之第二行指 定由新的ADD8_4指令達成計算,而其中的第三以及第四行 指定由新的MIN16_2指令達成計算,並且該部份中之最後 行指定其結果被寫入arr暫存器中。 返回對於使用者輸入界面之討論,一旦使用者已經輸 入其所希望之所有的組態以及延伸選擇,建立系統50便接 管。如第6圖所展示,該建立系統50接收一組由被使用者設 定之參數以及被使用者設計的可延伸之特點所構成的組態 格式,並且將它們於另外的定義核心處理器結構之參數組 合,例如:使用者不可改變之特點,以產生一組描述整 (請先閲讀背面之注意事項再填寫本頁) -、tr— ·- 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 ___B7__ 五、發明説明(37 ) 個處理器之單一組態格式100。例如,除了被使用者選擇的 組態設定102之外,該建立系統50可增加指明供處理器之實 際的位址空間之實際的位址位元數目,在重置之後由處理 器60執行之第一指令的位置’等等參數。opcode MIN16—2 op2 = 4'b0C01 CUSTO iciass add_min {ADD8_4, MIN16—2} {out arr, in ars, in art} semantic add—min {ADD8_4, MIN16—2} {wire [31: 0] add, min ; wire [7: 0] add3, add2, addl, addO; wire [15: 0] mini, minO; This paper size applies to China National Standard (CNS) Α4 specification (210X297 mm) f! (Please read the first Please fill in this page again for attention) ^ τ— 539965 A7 B7 V. Description of the invention (36) assign add3 = art [31:24] + ars [31:24]; assign add2 = art [23:16] + ars [23:16]; assign addl = art [15: 8] + ars [15: 8]; assign addO = art [7: 0] + ars [7: 0]; assign add = {add3, add2, addl, addO} · 'assign mini = art [31:16] < ars [31:16]? art [31:16]: ars [31; 16]; assign minO = art [15: 0] < ars [15 : 0]? Art [15: 0] ars [15: 0]; assign min = {mini, minO}; assign arr = (({32 {{ADD8_4}}}) & (add)) | (({ 32 {{MIN16-1 2}}}) &(min)); Here, oop2, CUSTO, arr, art, and ars are pre-defined operation codes as described above, and opcode and iclass state the operation functions as Above. The semantic statement specifies that calculations are reached by new instructions. If it is quite obvious to those skilled in technology, the second line in the semantic statement specifies that the calculation is to be performed by the new ADD8_4 instruction, and the third and fourth lines therein specify that the calculation is to be performed by the new MIN16_2 instruction, and the Ministry The last line in the copy specifies that its result is written to the arr register. Returning to the discussion of the user input interface, once the user has entered all the desired configurations and extended options, the build system 50 takes over. As shown in FIG. 6, the establishment system 50 receives a set of configuration formats composed of parameters set by the user and extensible features designed by the user, and defines them in another core processor structure. Parameter combinations, such as: the user's unchangeable characteristics to generate a set of descriptions (please read the precautions on the back before filling this page)-, tr— ·-This paper size applies to China National Standard (CNS) A4 specifications ( 210X297 mm) 539965 A7 ___B7__ 5. Description of the invention (37) A single configuration format of 100 processors. For example, in addition to the configuration setting 102 selected by the user, the establishment system 50 may increase the actual number of address bits indicating the actual address space for the processor, which is executed by the processor 60 after reset. The position of the first instruction 'and so on.

Tensilica公司所修正之1·〇版XtensaTM指令集結構(ISA) 參考手冊,配合此處參考以供展示指令範例之目的,其指 令可以被可組態處理器製作爲經由選擇組態選擇爲可用的 核心指令以及指令。 該組態格式1〇〇同時也包含一組ISA封裝,其包含指明 基礎IS A之TIE語言陳述,可能已經被使用者選擇,例如一 組共同處理器98封裝(參考第2圖)或一組DSP封裝之任何另 外的封裝,以及任何使用者供應之TIE延伸。另外,該組態 格式1 00可以具有一些陳述設定旗標,指示是否某種結構性 特點將被包含於處理器60中。例如,Tensilica's Revised Version 1.0 XtensaTM Instruction Set Architecture (ISA) Reference Manual, with the reference here for the purpose of showing instruction examples, its instructions can be made by a configurable processor to be made available through a select configuration Core instructions and instructions. The configuration format 100 also includes a set of ISA packages, which contains a TIE language statement indicating the basic IS A, which may have been selected by the user, such as a set of common processor 98 packages (refer to Figure 2) or a set of Any additional packages for the DSP package, and any user-supplied TIE extensions. In addition, the configuration format 100 may have some setting flags indicating whether certain structural features will be included in the processor 60. E.g,

IsaUseDebug 1IsaUseDebug 1

IsaUselnterrupt 1IsaUselnterrupt 1

IsaUseHighPrioritylnterruptO IsaUseException 1 指示該處理器將包含晶片上除錯模組92,中斷裝置72_ 以及例外處理,而非高位優先順序中斷裝置。 w 使用組態格式1 00,下列陳述可以自動地被產生如下面 所展示: --處理器60之指令解碼邏輯; 一對處理器60之不合法指令檢測邏輯; 40 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) .、可丨 #- 539965 A7 __B7_ 五、發明説明(38 ) 一組譯器110之ISA特定部分; (請先閲讀背面之注意事項再填寫本頁) 一對編輯器108之ISA特定支援常式; ϋ 一反組譯器1〇〇(被除錯器使用)之ISA特定部分;以及 …模擬器112之ISA特定部分。 自動地產生這些物件是値得的,因爲一重要組態能力 是用以指定指令封裝之包含物。對於某些物件,是可能以 在各組工具中之狀況化碼製作以便處理指令,如果其已經 被組態,但這是難以應付的;更加重要,並未允許系統設 計者容易地對於系統增加指令。 除採取一組組態格式1 〇〇爲一組來自設計者的輸入之 外,同時也可能接受目標並且使建立系統50自動地決定組 態。設計者可以指定處理器60之目標。例如,時脈率、面 積、成本、一般功率消耗、以及最大功率消耗可爲目標/ 因爲某些目標產生衝突(例如,時常性能可以僅藉由增加面 積或功率消耗或兩者同時而被增加),該建立系統50同時也 採取目標之優先順序。建立系統50接著商議一搜尋引擎106 以決定可用的組態選擇組並且決定如何自欲同時達成輸入 目標之演算法設定各選擇。 該搜尋引擎106包含一組資料庫,其具有說明對於各種 尺度之影響的項目。項目可以指定對於一尺度具有相加、 相乘、或限制效應的一組特定組態設定。項目同時也可以 被標記爲需要其他的組態選擇爲預定要件,或爲不相容於 其他的選擇。例如,簡單分支預報選擇可以指定相乘或相 加的效應於各指令週期(CPI…性能之決定因子),於時脈率β 41 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(39 ) (請先閲讀背面之注意事項再填寫本頁) 之限制,於面積的相加效應,以及於功率的相加效應。其 可以被標記爲不相容於愛好者之分支預報器,並且仰賴設 定指令擷取ί宁列尺寸爲至少兩組項目。這些效應値可以爲 一組參數之函數,例如分支預報表尺寸。一般而言,該資 料庫項目藉由可以被估算之函數表示。 各種演算法是可能用以搜尋最接近達成輸入目標之組/ 態設定。例如,一組簡單背包封裝演算法考慮被成本分割 之値的分類順序中之各組選擇並且接受任何增加値且保持 成本低於指定限制範圍之選擇格式。因此,例如將性能最 佳化且保持功率在指定値之下,該選擇會被依功率分割之 性能分類,並且各組增加可以被組態卻不超出功率限制之 性能之選擇即會被接受。更加精巧之背包演算法提供某些 數量之回溯。 一組自目標以及設計資料庫決定組態之非常不同種類之演算 法是依據於模擬韌化程序(simulated annealmg)。一組隨機啓始參數 組被使用爲開始點,接著藉由評估廣域應用函數,分別參數之改變 被接受或被拒絕。應用函數中之改進永遠被接受而負改變依據衰退/ 爲其最佳化前進之臨限機率地被接受。在這系統中,應用性函數是 自輸入目標所構成。例如,給予目標Perf〇rmanCe>200, Power<100,Area<4,以 Power,Area,以及Performance之優先順 序,下面的應用函數可以被使用:IsaUseHighPrioritylnterruptO IsaUseException 1 indicates that the processor will include on-chip debug module 92, interrupt device 72_, and exception handling, rather than high-priority interrupt devices. w Using configuration format 1 00, the following statements can be automatically generated as shown below:-Instruction decoding logic of processor 60; Invalid instruction detection logic of a pair of processors 60; 40 This paper size applies Chinese national standards (CNS) A4 specification (210X297 mm) (Please read the notes on the back before filling out this page). ## 539965 A7 __B7_ V. Description of the invention (38) A set of translator 110 specific ISA; Please read the notes on the back before filling this page) A pair of ISA-specific support routines for editor 108; ϋ an ISA-specific part of translator 100 (used by the debugger); and ... simulator 112 Specific part of ISA. These objects are generated automatically because an important configuration capability is to specify the contents of the instruction package. For some objects, it is possible to use conditional codes in various sets of tools to process instructions. If they have been configured, this is difficult to cope with. It is more important and does not allow system designers to easily add instruction. In addition to taking a set of configuration formats 100 as a set of inputs from the designer, it is also possible to accept goals and have the system 50 determine the configuration automatically. The designer can specify the target of the processor 60. For example, clock rate, area, cost, general power consumption, and maximum power consumption can be goals / because some goals conflict (eg, constant performance can be increased only by increasing area or power consumption or both) At the same time, the establishment system 50 also adopts the priority order of the targets. The system 50 then negotiates a search engine 106 to determine the set of available configuration choices and decides how to set the choices at the same time as it desires to achieve the input goals. The search engine 106 contains a set of databases with items illustrating the effects on various scales. Projects can specify a specific set of configuration settings that have an additive, multiplicative, or limiting effect on a scale. Projects can also be marked as requiring additional configuration options as predetermined requirements, or as incompatible with other options. For example, the simple branch forecast selection can specify the effect of multiplication or addition on each instruction cycle (CPI ... determinant of performance), at a clock rate of β 41. This paper size applies Chinese National Standard (CNS) A4 specifications (210X297 mm ) 539965 A7 B7 V. Description of the invention (39) (please read the precautions on the back before filling out this page), the addition effect on area, and the addition effect on power. It can be marked as a branch forecaster that is not compatible with enthusiasts, and relies on setting instructions to fetch columns with at least two sets of items. These effects can be a function of a set of parameters, such as branch prediction table size. Generally speaking, the database item is represented by a function that can be estimated. Various algorithms are possible to search for the group / state setting that is closest to achieving the input goal. For example, a set of simple backpack encapsulation algorithms considers each set of choices in the sort order of cost segmentation and accepts any selection format that increases and keeps costs below a specified limit. Therefore, for example, to optimize the performance and keep the power below the specified threshold, the selection will be classified according to the performance of the power division, and each group to increase the performance that can be configured without exceeding the power limit will be accepted. A more elaborate backpack algorithm provides some amount of backtracking. A very different kind of algorithm that determines the configuration from the target and the design database is based on a simulated annealmg. A set of random start parameters is used as the starting point. Then, by evaluating the wide-area application function, the parameter changes are accepted or rejected. Improvements in application functions are always accepted, while negative changes are accepted based on the probability of a recession / threshold for optimizing forward. In this system, the applicability function consists of self-input targets. For example, given the target PerformanCe > 200, Power < 100, Area < 4, in order of priority of Power, Area, and Performance, the following application functions can be used:

Max((l-Power/100)*0.5,0) + (max((l-Area/4)*0.3,0)* (if Po wer< 100 then 1 else (1-ower/1 00) * *2)) + (max (Performance /200*0.2,0)*(if Power< 1 00 then 1 else (1- 42 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7____ 五、發明説明(4〇) .Max ((l-Power / 100) * 0.5,0) + (max ((l-Area / 4) * 0.3,0) * (if Po wer < 100 then 1 else (1-ower / 1 00) * * 2)) + (max (Performance /200*0.2,0)*(if Power < 1 00 then 1 else (1- 42 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7____ 5 Description of the invention (40).

Power/100) **2))*(if Area<4 then 1 else( 1-area/4) * *2)) (請先閲讀背面之注意事項再填寫本頁) 該回饋於功率消耗減少直至在100之下並且接著爲中 性,於面積之回饋減少直至在4之下,並且接著爲中性,而 於性能之回饋增加直至在200以上,並且接著爲中性。同時 也有當功率爲預估之外時減低面積使用之構件,以及當功 率或面積爲預估之外時,減低性能使用之構件。 這些以及其他演算法都可以被使用以搜尋滿足指定目 標之組態。重要的是該可組態處理器設計已經被說明於設 計資料庫,其具有預定要件以及不相容性選擇格式與對於 各種尺度的組態選擇之衝擊。 ° 所給予的範例已經使用一般化且不仰賴執行於處理器 60上之特定演算法之硬體目標。該已說明之演算法同時也 可以被使用於選擇和適於特定使用者程式之組態。例如, 使用者程式可以被以快取精確模擬器執行於對於具備不同 特性快取之不同型式,例如不同的尺寸,不同的線尺寸以 及不同組關連性中量測快取失誤之數目。這些模擬之結果 可以被添加至上述說明之搜尋引擎106之搜尋演算法所使用 之資料庫以幫助選擇硬體製作說明40。 同樣地,使用者演算法可以對於某種可以選擇性地被 製作於硬體中之指令存在被簡介。例如,如果使用者演算 法使用主要時間於相乘,則該搜尋引擎106可自動地提議包β 含一組硬體乘法器。此演算法不需要受限制於考慮一組使 用者演算法。使用者可以饋送一組演算法進入系統,並 且搜尋引擎106可以選擇一組對於使用者程式組平均上是 43 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) 539965 A7 B7 五、發明説明(41 ) 有用的組態。 除了選擇處理器60之預定被組態特性之外,搜尋演算 法同時也可以被使用於自動地選擇或提議使用者可能TIE延 伸。所給予的輸入目標以及所給予也許以C程式語言被寫入 的使用者程式之範例,這些演算法會提議可行TIE延伸。對 於缺乏狀態之TIE延伸,編輯器狀工具可以被以圖型匹配器 實施。這些圖型匹配器以自底部往上方式行走表示節點而 尋找可以被單一指令取代之多重指令樣型。例如,使用者C 程式包含下面的陳述。 x = (y + z)<<2; x2=(y2+z2)<<2; 該圖型匹配器會發現在兩組不同位置之使用者添加兩 組數目並且將結果移位兩組位元至左方。系統會增加產生 一組添加兩組數目並且將結果移位兩組位元至左方的TIE指 令之可能性至資料庫。 建立系統50追蹤許多可能TIE指令以及計算它們出現之 次數。使用一種簡介工具,該系統50同時也追蹤各指令被 執行在演算法之總執行時的頻率。使用一種硬體測定器, 系統50追蹤製作各可行TIE指令於硬體中之費用。這些數目 被饋送進入該自行硏發的搜尋演算法以選擇一組可行TIE指 令,其將輸入目標最大化;目標爲,例如性能、指令碼大 小、硬體複雜性、等等。 相似但更加有效的演算法被使用於發現具備狀態之可 行TIE指令。許多不同的演算法被使用於檢測不同型式的 44 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)Power / 100) ** 2)) * (if Area < 4 then 1 else (1-area / 4) * * 2)) (Please read the precautions on the back before filling in this page) This feedback will reduce the power consumption until Below 100 and then neutral, the feedback on area decreases until it is below 4, and then it is neutral, while the feedback on performance increases until it is above 200, and then it is neutral. There are also components that reduce area use when power is outside estimates, and components that reduce performance when power or area is outside estimates. These and other algorithms can be used to search for configurations that meet specified goals. What is important is that the configurable processor design has been explained in the design database, which has predetermined requirements and incompatible selection formats and the impact of configuration choices on various scales. ° The examples given have used generalized hardware targets that do not rely on specific algorithms running on processor 60. The described algorithm can also be used to select and adapt to a specific user program configuration. For example, the user program can be executed as a cache-accurate simulator for different types of caches with different characteristics, such as different sizes, different line sizes, and the number of measured cache errors in different sets of correlations. The results of these simulations can be added to the database used by the search algorithm of the search engine 106 described above to help select the hardware production instructions 40. Similarly, user algorithms can be introduced for the existence of certain instructions that can be selectively made in hardware. For example, if the user algorithm uses the main time for multiplication, the search engine 106 may automatically propose that β include a set of hardware multipliers. This algorithm need not be limited to considering a set of user algorithms. The user can feed a set of algorithms into the system, and the search engine 106 can choose a set of 43 for the user program on average. The paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. DESCRIPTION OF THE INVENTION (41) Useful configuration. In addition to the selected configured features of the processor 60, the search algorithm can also be used to automatically select or suggest possible TIE extensions by the user. Given the input targets given and examples of user programs given that may be written in the C programming language, these algorithms would suggest feasible TIE extensions. For TIE extensions that lack state, editor-like tools can be implemented as pattern matchers. These pattern matchers walk from the bottom up to represent nodes and look for multiple instruction patterns that can be replaced by a single instruction. For example, the user C program contains the following statement. x = (y + z) < <2; x2 = (y2 + z2) < <2; The pattern matcher will find that users in two different positions add two sets of numbers and shift the result Two sets of bits go to the left. The system will increase the possibility of generating a set of TIE instructions that add two sets of numbers and shift the result by two bits to the left to the database. The system 50 is built to track many possible TIE instructions and count the number of times they occur. Using a profile tool, the system 50 also tracks how often each instruction is executed during the total execution of the algorithm. Using a hardware tester, the system 50 tracks the cost of making each possible TIE instruction in hardware. These numbers are fed into the self-evolving search algorithm to select a set of feasible TIE instructions that maximizes the input goals; the goals are, for example, performance, instruction code size, hardware complexity, and so on. A similar but more efficient algorithm is used to find feasible TIE instructions with status. Many different algorithms are used to detect different types of 44 paper sizes. Applicable to China National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling out this page)

539965 A7 B7 五、發明説明(42 ) 時機。一組演算法使用一種編輯器狀工具以掃瞄使用者程 式並且檢測是否使用者程式需要比硬體上可用的更多暫存 器。如從事於本技術之人員所習知,這可以藉由計數暫存 器滿溢以及重存於使用者碼之編輯版本的數目而被檢測。 該編輯器狀工具提議搜尋引擎一種共同處理器具備另外的 硬體暫存器但僅支援被使用於具有許多滿溢以及重存的使 用者之編碼部份之操作。該工具是負責告知被搜尋引擎1 06 使用之資料庫有關:共同處理器之硬體成本評估以及使用者 之演算法性能如何被改進之評估。如上述說明之搜尋引擎 1 06作一廣域決定是否該提議之共同處理器9 8會導致一較佳 的組態。 另外地或內部整合,一組編輯器般工具檢查是否使用 者程式使用位元-遮罩操作以確保某種可變化性絕不大於某 種限制範圍。在這情形中,工具提議搜尋引擎1 06—組使用 資料型式符合使用者限制範圍之共同處理器98(例如,12位 元或20位元或任何其他整數)。在被使用於其他的實施例之 第三演算法,被使用於以C + +語言寫成之使用者程式,一組 編輯器般工具發現許多時間被使用於操作使用者定義之摘 要資料型式。如果所有的資料型式上之操作爲對於TIE適當 的,則演算法提議搜尋引擎1〇6以TIE共同處理器製作所有 的操作於資料型式上。 爲了產生處理器60之指令解碼邏輯,一組信號被產生 以供各組被定義於組態格式中之操作碼。該碼之產生是藉 由簡單地再寫入 45 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) -------……--------變-!..........…訂…-------------*· (請先閱讀背面之注意事項再填寫本頁) 539965 A7 _B7__ 五、發明説明(43 )539965 A7 B7 5. Description of the invention (42) Timing. A set of algorithms uses an editor-like tool to scan user programs and detect if user programs require more registers than are available on the hardware. As known to those skilled in the art, this can be detected by counting the number of temporary register overflows and the number of edited versions re-stored in the user code. The editor-like tool proposes a search engine with a coprocessor with additional hardware registers but only supports operations that are used in the coding part of users with many overflows and re-saves. This tool is responsible for informing the database used by the search engine 10 06 about the hardware cost evaluation of the coprocessor and the evaluation of how the user's algorithm performance can be improved. The search engine 106 described above makes a wide-area decision whether the proposed co-processor 98 will result in a better configuration. Additionally or internally, a set of editor-like tools checks whether the user program uses bit-mask operations to ensure that certain variability is never greater than a certain limit. In this case, the tool proposes a search engine 1 06-group using a common processor 98 (e.g., 12-bit or 20-bit or any other integer) whose data pattern conforms to the user's limit. In the third algorithm used in other embodiments, it is used in user programs written in the C ++ language. A set of editor-like tools found that many times were used to manipulate user-defined abstract data types. If all operations on the data type are appropriate for TIE, the algorithm proposes that the search engine 106 use the TIE coprocessor to make all operations on the data type. To generate the instruction decoding logic of the processor 60, a set of signals is generated for each set of opcodes defined in a configuration format. The code was generated by simply rewriting 45 paper sizes to apply the Chinese National Standard (CNS) A4 specification (210X297 mm) ------- …… -------- change-! ............. Order ...------------- * · (Please read the notes on the back before filling out this page) 539965 A7 _B7__ V. Description of the Invention (43 )

opcode NAME FIELD = VALUE 宣告至該HDL陳述 assign NAME = FIELD = = VALUE ; 以及 opcode NAMEFIELD = VALUE PARENTNANE [FIELD2 = VALUE2] 至 assign N AME = P ARENTN AME&(FIELD = = V ALUE) 暫存器連鎖以及管線分隔信號之產生也被自動化。這 邏輯同時也依據組態格式中之資訊被產生。依據包含於 iclass陳述中之暫存器使用資訊以及指令之潛伏期,當目前 指令之來源操作元取決於並未完成之先前的指令之目的地 操作元時,該被產生邏輯***一組分隔。用以製作這分隔 功能性之機構被製作爲核心硬體之一部份。 不合法指令檢測邏輯之產生是藉由將與它們的攔限制 進行AND操作的分別被產生指令信號一起進行NOR操作: assign ille galinat= ! (IN ST 1 | INST2* | INSTn); 該指令解碼信號以及其不合法的指令信號是可用爲解 碼模組之輸出以及至手寫處理器邏輯輸入。 爲了產生其他的處理器特點,這實施例使用一組具備 Perl-基礎之預處理器語言被增強之Veril〇gTM說明之可組態 處理器60。Perl是包含複雜控制結構、副常式、以及I/O裝 置之全特點語言。前處理器,其在本發明之實施例中稱 爲TPP(如展示在列於附錄B之原始碼,TPP本身爲一 Perl 46 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) -訂| 539965 A7 ____B7______ 五、發明説明(44 ) 程式),掃描其輸入,識別某些行爲以預處理器語言(Perl代 表TPP)被寫入之預處理器碼(加分號字首爲代表TPP) ’並且 建構一組包含抽取行以及陳述之程式以產生其他行之文 字。該等非預處理器行可能具有內藏表示,其中被TPP處理 程序產生結果之表示取代之。該結果化程式接著被執行以 τ μ 產生原始碼,亦即,供描述詳細之處理器邏輯之Verilog 碼(如將於下列陳述出,TPP也被使用以組態該軟體發展工 具 30)。 當被使用於這情形時,TPP是一種有效的前處理語言’. 因爲其允許包含建構,例如Veril〇gTM碼中之組態格式詢 問、狀況的表示以及反覆的結構、以及製作仰賴Vei:il()gTM 碼中之組態格式1 00的嵌入表示,如上面所述。例如,依據 資料庫詢問之一組TPP指定可爲 ;$endian = config_get_value("IsaMemoryOrder") 其中config_get_value爲TPP功能被使用以詢問組態格 式100,IsaMemoryOrder是組態格式100中之旗標組,而 SencHari是稍後將被使用於產生Veril〇gTM碼之TPP變數。 一組TPP狀況表示可爲 ;if (config_get_value(MIsaMemoryOrder ") eq "LittleEndian")opcode NAME FIELD = VALUE is declared to the HDL statement assign NAME = FIELD = = VALUE; and opcode NAMEFIELD = VALUE PARENTNANE [FIELD2 = VALUE2] to assign N AME = P ARENTN AME & (FIELD = = V ALUE) register chain and The generation of pipeline separation signals is also automated. This logic is also generated based on the information in the configuration format. Based on the register usage information contained in the iclass statement and the latency of the instruction, when the source operand of the current instruction depends on the destination operand of the previous instruction that was not completed, the generated logic inserts a set of partitions. The mechanism used to make this partitioning functionality is made as part of the core hardware. Illegal instruction detection logic is generated by performing NOR operations with the respective generated instruction signals that are ANDed with their blocking restrictions: assign ille galinat =! (IN ST 1 | INST2 * | INSTn); the instruction decodes the signal And its illegal command signal can be used as the output of the decoding module and the logic input to the handwriting processor. In order to generate other processor features, this embodiment uses a set of configurable processors 60 with a VerlOgTM specification enhanced with a Perl-based preprocessor language. Perl is a full featured language that includes complex control structures, subroutines, and I / O devices. Pre-processor, which is called TPP in the embodiment of the present invention (if shown in the source code listed in Appendix B, the TPP itself is a Perl 46. This paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the notes on the back before filling this page) -Order | 539965 A7 ____B7______ V. Description of Invention (44) Program), scan its input, identify certain behaviors written in preprocessor language (Perl stands for TPP) Preprocessor code (plus a semicolon at the beginning of the TPP) 'and construct a set of programs that contain lines and statements to produce other lines of text. These non-preprocessor lines may have built-in representations, which are replaced by representations of results produced by the TPP handler. The resulting program is then executed to generate the source code at τ μ, that is, Verilog code for describing the detailed processor logic (as will be stated below, TPP is also used to configure the software development tool 30). When used in this context, TPP is an effective pre-processing language '. Because it allows constructs to be included, such as configuration format queries in VerilOgTM codes, representation of conditions and iterative structures, and production depends on Vei: il () The embedded representation of the configuration format 100 in the gTM code, as described above. For example, a group of TPP assignments according to the database query may be; $ endian = config_get_value (" IsaMemoryOrder ") where config_get_value is the TPP function used to query the configuration format 100, and IsaMemoryOrder is the flag group in the configuration format 100 SencHari is a TPP variable that will be used later to generate VerilOgTM codes. A set of TPP status representations can be; if (config_get_value (MIsaMemoryOrder ") eq " LittleEndian ")

TM ; {do Verilog code for little endian ordering) ;else ;(do Verilog™ code for big endian ordering) 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) ---------------------0-----------------、可----------------·. (請先閲讀背面之注意事項再填寫本頁) 539965 A7 _ B7_ 五、發明説明(45 ) (請先閲讀背面之注意事項再填寫本頁) 反覆迴路可以被TPP建構製作,例如 ;for($i = 〇 * $i<$ninterrupts ; $i + + ) ; {do Verilog™ code for each of 1..N interrupts} 其中$丨是TPP迴路指標變數而Sninterrupts是指定以供用 於處理器60之中斷數目(使用c〇nfig_get_value得到自組態格 式 100)。 最後,TPP碼可以被嵌入verii〇gTM表示,例如 wire ['$ninterrupts-l' :0] srlnterruptEn ; xtscenflop #('$ninterrupts') srintrenreg (srlnterruptEn, srDataIn_W['$ninterrupts-l' :0], srlntrEnWEn, !cReset,CLK) * 其中: $ninterrupts定義其中斷數目並且決定xtscenflop模組 (一組正反器基本模組)之寬度(以位元觀點而言); srlnterruptEn是正反器之輸出,被定義爲適當數目位元 之接線;TM; {do Verilog code for little endian ordering); else; (do Verilog ™ code for big endian ordering) This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) --------- ------------ 0 ----------------- 、 may ---------------- ·. (Please read the notes on the back before filling this page) 539965 A7 _ B7_ V. Description of the invention (45) (Please read the notes on the back before filling this page) Repeated loops can be constructed by TPP, for example; for ($ i = 〇 * $ i < $ ninterrupts; $ i + +); {do Verilog ™ code for each of 1..N interrupts} where $ 丨 is the TPP loop indicator variable and Sninterrupts are designated for use by processor 60 interrupts Number (use self-configuration format 100 to get the value of config_get_value). Finally, the TPP code can be embedded in the verii〇gTM representation, such as wire ['$ ninterrupts-l': 0] srlnterruptEn; xtscenflop # ('$ ninterrupts') srintrenreg (srlnterruptEn, srDataIn_W ['$ ninterrupts-l': 0], srlntrEnWEn,! cReset, CLK) * where: $ ninterrupts defines the number of interrupts and determines the width of the xtscenflop module (a set of basic modules of flip-flops) (in terms of bits); srlnterruptEn is the output of flip-flops, Is defined as a wiring of the appropriate number of bits;

SrDataIn_W是至正反器之輸入,但僅相關的位元依據 中斷數目被輸入; srlntrEnWEn是正反器之寫入引動; cReset是淸除輸入至正反器; 而CLK是輸入時脈至正反器。 例如,給予的下列輸入至TPP : ; # Timer interrupt 48 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(46 ) if (SIsaUseTimer) { wire ['$width-r :0] srCCount; wire ccountWEn; //------ CCOUNT Register //------ assign ccountWEn = s rWEn_W M (srWrAdr_W == ‘SR0C0UNT); xtflop #(‘$width‘) srccntreg (srCCount, (ccountWEn ? srDatalnJV : srCCount+1) ,CLK); 9 for ($i=0; $i<$TimerNumber; //...... //—— CCONPARE Register //------ wire [‘$width-l‘:0] srCCompare‘$i‘; wire ccompWEn‘$i‘; assign ccompWEn ‘$i‘ =s rWEn_W && (srWrAdr_W == 4SRCC0MPARE‘$i‘); xtenflop #(‘$width‘) srccmp ‘ $i4reg (srCCompare‘$i‘,srDataln—W, ccompWEn‘$i4,CLK); assign setCCompIntr ‘ $i‘ = (srCCompare4$i4 == srCCount); assign clrCCompIntr‘$i‘ = ccompWEn‘$i‘; (請先閲讀背面之注意事項再填寫本頁) ;} ## IsaUseTimer 以及其宣告 $ I s aU s eT imer = 1 $T imerNumber =2 $ width = 3 2 TPP產生 49 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(47 ) wire [31:0] srCCount; wire ccountWEn; (請先閲讀背面之注意事項再填寫本頁) //-......................................................................... // CCDUNT Register //.......................................................................... assign ccountWEn = srWEn—W M (srWrAdr_w = = ‘SRCCOUNT); xtflop #(32) srccntreg (srCCount, (ccountWEn ? srDataln一W : srCCount+1) ,CLK); //.......................................................................... // CCDMPARE Register // ...............................-....... wire [31:0] srCCompareO; wire ccompWEnO; assign ccompWEnO = s rWEn_W && (srWrAdr一w = = 'SRCCOMPAREO); xtenflop #(32) srccmpOreg (srCCompareO, srDataln—W, ccompWEnO, CLK); assign setCCompInt rO = (srCCompareO == stCCdunt); assign clrCCompIntrO = ccompWEnO; // .........Γ // CCOMPARE Register //---....................... wire [31:0] srCComparel; wire ccompWEnl; assign ccompWEnl 二 srWEn一W M (srWrAdr_w == ‘SRCCOMPARE1); xtenflop #(32) srccmplreg (srCComparel, srDataln一W, ccompWEnl, CLK) » assign setCCompIntrl = (srCComparel = srCCount); assign clrCCompIntrl = ccompWEnl; 因此被產生之該HDL說明1 14被使用於合成硬體以供 處理器製作使用,例如,方塊122中Synopsys公司製造 50 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _ B7 五、發明説明(48 ) 之Design CompilerTM。結果接著被置放並且被引導,其使 用,例如,方塊128中Avant!公司之公司 之Silicon Ense mb leTM。一旦其構件已經被引導,則結果便 可以於區塊132被使用以供接線背景-註解以及時序確認, 其使用,例如,Synopsys公司之Prime TimeTM。這處理過 程之產品是一組硬體外形1 34,其可以被使用者使用以提供 進一步地輸入至組態擷取常式而供進一步地組態疊代。 如上述與邏輯合成部份122之連接,組態處理器60的結 果之一爲一組自製的HDL檔案,藉由使用任何數目之商業 化合成工具可以自其得到特定邏輯閘-位準製作。一組如此 之工具爲出自Synopsys公司製造之Design CompilerTM。爲 了確保更正以及高性能邏輯閘-位準製作,這實施例提供在 自製環境中自動化合成程序必須的原本。提供此原本之挑. 戰爲支援多種合成方法以及使用者之不同的製作目的。爲 了達成其第一挑戰,這實施例***原本成爲較小的並且功 能上完全之原本。一組如此之範例爲提供一組可以讀取所 有與特定處理器組態60相關的HDL檔案之讀取原本、一組 設定處理器60中唯一的時序需要之時序限制原本、以及一 組寫出合成結果以一種可以被使用以供邏輯閘·位準網路之 安置與路由的方式之原本。爲了達成該第二挑戰’這實施 例提供一組供各製作目的之原本。一組如此之範例爲提供 一組供達成最快週期時間之原本、一組供達成最小矽面積. 之原本、以及一組供達成最小功率消耗之原本。 51 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) ---------------------變------------------、可----------------0 (請先閲讀背面之注意事項再填寫本頁) 539965 A7 _B7_ 五、發明説明(49 ) (請先閲讀背面之注意事項再填寫本頁) 原本也被使用於處理器組態之其他的步驟中。例如, 一旦處理器60之HDL模式已經被寫入,則一組模擬器可以 被使用於確認處理器60之更正操作,如配合區塊132之上面 說明。這時常是藉由執行許多測試程式,或診斷,於被模 擬處理器60而達成。執行一測試程式於被模擬處理器60上 可能需要許多步驟,例如產生一組測試程式之可執行影 像,產生一組這可執行影像之可以被模擬器1 1 2讀取的表 示,產生一組暫時位置其中模擬之結果可以被蒐集以供將 來分析,分析模擬之結果,等等。在先前的技術中,這目 標是以一些丟棄原本完成。這些原本具有模擬環境之某些 內建智慧,例如何組HDL檔案應該被包含,其中該檔案可 以發現於目錄結構中,何組檔案是測試平台所需的,等 等。在目前設計中,較佳機構爲寫入一組被參數替代組態 之原本樣版。該組態機構同時也使用TPP以產生模擬所需的 檔案之列表。 進一步地說,在區塊132之確認程序中時常必須寫入允 許設計者執行一測試程式串列之其他的原本。這時常被使 用於執行給予設計者信心之復原套組:HDL模式中所給予的 改變並不會引介新的錯誤。這些復原之原本也時常被丟 棄,因它們具有許多關於檔案名稱’位置’等等之內建假 設。如上所述,爲了執行單一測試程式之原本創作,該復 原原本被寫入爲一組樣版。這樣版藉由於組態時以參數取 代實際値而被組態。 轉換HTL說明爲硬體製作之處理程序的最後步驟爲使 52 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公楚) 539965 A7 _B7_ 五、發明説明(5〇) 用一種置放以及引導(P&R)軟體以轉換該抽象的網路成爲幾< 何的表示。該P&R軟體分析網路之連接性並且決定組成胞 之安置。其接著嘗試拉取在所有的組成胞之間的連接。該 時脈網絡通常受到特別的注意並且被引導爲最後步驟。這 處理過程同時可以藉提供其工具某些資訊而受到協助,例 如那些組成胞應該靠在一起(習知爲軟體族群),組成胞之相 關的安置,何組網絡應該具有小傳輸延遲,等等。 爲了使得這處理過程更加簡易並且同時確保所需的性 能目標被達成--週期時間,面積,功率散逸--組態機構產生 一組原本或輸入檔案以供用於P&R軟體。這些原本包含如 上述之資訊,例如:對於組成胞之相對安置。原本同時也包 含資訊,例如需要多少電源以及接地連接,這些應該如何、 沿著界線被分配,等等。該原本之產生是藉由詢問包含資 訊之資料庫有關於多少軟體族群以產生以及何種胞應該被 包含於它們之中,何組網絡具有時序需要性,等等。這些 參數依據何組被選擇而改變。依據將被使用於置放以及引 導之工具,這些原本必須爲可組態的。 組態機構可以選擇性地自使用者要求更多資訊並且將 之傳遞至P&R原本。例如界面可以要求使用者最後佈局之 所需的長寬比率,多少級之緩衝應該被塞入時脈樹,輸入 以及輸出插銷何側應該被置放於這些插銷之相對或絕對位 置,功率以及接地之寬度與位置,等等。這些參數接著會^ 被傳送至P&R原本以產生所需的佈局。 甚至更加精細之原本可以被使用,其允許例如一組更 53 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) -..................:0..................、玎................0. (請先閲讀背面之注意事項再填寫本頁) 539965 A7 B7 五、發明説明(51 ) (請先閲讀背面之注意事項再填寫本頁) 加精細之時脈樹。一種減低功率散逸之普遍最佳化是聞控。 時脈信號。但是,這使得時脈樹合成更加困難,因爲其更 加不易平衡所有分支之延遲。該組態界面可以要求使用者 使用更正組成胞以供用於時脈樹以及進行時脈樹合成之部 分或整體。這目標之達成藉助於具有某些訊息:邏輯閘時脈 被置放於設計中之何處並且估計延遲形成限制邏輯閘至正 反器之時脈輸入。其接著會給予時脈樹合成工具一組限制 以匹配時脈緩衝器之延遲與邏輯閘胞之延遲。在目前製作 中,這情形是利用一般目標之Perl原本完成。這原本讀取依 據何組被選擇而被組態代理器產生的邏輯閘之時脈資訊。 一旦其設計已經被放置並且被引導並且在最後時脈樹合成 被完成之前,該Perl原本即被執行。 進一步的改進可以被製作至上述簡介處理程序。明確 地說,我們將說明一組處理程序,其允許使用者可以幾乎 同時地得到相似硬體簡介資訊而不需花費幾小時去執行那< 些CAD工具。這處理過程具有許多步驟。 在這處理過程中之第一步驟爲將所有的組態選擇組別 區隔成爲正交的選擇之族群而使硬體簡介上選擇一族群之 效應無關於在任何其他的族群中之選擇。例如,MAC 1 6單 元對於硬體簡介之衝擊爲無關於任何其他的選擇。因此, 僅具備MAC 16選擇之選擇族群便形成。一組更加複雜之範 例爲包含中斷選擇,高位準中斷選擇以及計時器選擇之選 擇族群,因爲對於硬體簡介之衝擊是由這些選擇之特定組 合所決定。 < 54 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ 五、發明説明(52 ) 第二步驟爲特徵化各選擇族群之硬體簡介衝擊。該特 徵化是藉由得到對於選擇於族群中之各種組合的硬體簡介 衝擊而完成。對於各組合而言,該簡介是使用先前說明之 處理程序所得到,其中一組實際製作被導出並且其硬體簡 介被量測。此資訊儲存於一組評估資料庫中。 最後步驟爲導出供計算硬體簡介衝擊之特定公式,其藉 使用曲線合適化以及內插法技術選擇於選擇族群中之特定的 組合。依據選擇之自然性,不同的公式被使用。例如,因爲 各另外中斷向量添加大約相同之邏輯至硬體,因此我們使用 線性函數以模式化其硬體衝擊。在其他的範例中,具有一組 計時器單元需要高優先順序中斷選擇,因此計時器選擇之硬 體衝擊的公式是涵蓋許多選擇之條件公式。 提供建構選擇可能如何影響其執行時間性能以及應用 之碼尺寸的迅速回饋是有用的。自多重應用領域之許多組 評鑑程式被選擇。對於各領域而言,一組資料庫被預先建 立,其估計不同建構之設計決定將如何影響領域中執行時 間性能以及應用之指令碼尺度。當使用者變化該建構之設 計時,其資料庫便被詢問令使用者感興趣之應用領域或多 重領域。其估算結果被提供給使用者,如此其便可以獲得 在軟體獲利以及硬體成本之間協調的評估。 該迅速估算系統可以容易地被擴展以提供使用者關於 如何修改組態以進一步地最佳化其處理器之建議。一組此 範例是聯合各組態選擇與一組代表選擇各種成本尺度,例 如,面積,延遲以及功率之增加衝擊的數目。對於所給予 55 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) --------------------朦,............……、玎----------------· (請先閲讀背面之注意事項再填寫本頁) 539965 A7 B7 五、發明説明(53 ) 的選擇計算增加成本衝擊可以藉迅速估算系統完成。其僅 涵蓋對於估算系統之兩組呼叫,具備與不具備選擇。對於 兩組估算之成本差量代表其選擇之增加衝擊。例如, MAC 16選擇之增量面積衝擊之計算是藉由評估具備與不具 備MAC16選擇之兩組組態面積成本。其差量接著以MAC16選 擇被顯示於互動的組態系統中。此一系統可以經由一序列 之單一步驟改進導引使用者朝向一種最佳的解決方法。 當移至自動處理器組態程序之軟體側時,本發明中之 這組實施例組態軟體發展工具30以至於它們對於處理器而 言是特定的。該組態處理程序開始於可以被接埠至多種不 同的系統以及指令集結構之軟體工具30。此種可再目標化 工具已經被廣泛地硏究並且在相關技術中是習知的。這實· 施例使用免費軟體之GNU工具族群,包含例如,GNU C編 輯器,GNU組譯器,GNU除錯器,GNU鏈接器,GNU造型 器,以及各種實用性程式。這些工具30接著藉由直接地自 ISA說明產生軟體之部份並且藉由使用TPP以修改被手寫入 之軟體部份而自動地被組態。 該GNU C編輯器以許多不同的方式被組態。給予核心 ISA說明,編輯器中許多機器-相關邏輯可以被手寫入。這 編輯器之部分是共用於可組態處理器指令集之所有的組 態,並且以手動再目標化允許供最佳結果之精細調整。但^ 是,即使有這編輯器之手寫編碼部份,某些指令碼仍自動 地自ISA說明被產生。明確地說,該ISA說明定義可以被使、-用於各種指令之立即欄的常數値之設定。對於各立即欄而 56 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) .......... 、可 (請先閲讀背面之注意事項再填寫本頁) 539965 A7 _B7_ _ 五、發明説明(54 ) (請先閲讀背面之注意事項再填寫本頁) 言,一種述語函數被產生以測試是否一組特定的常數値可 以被編碼於欄當中。當產生處理器60之指令碼時,該編輯。 器使用這些述語函數。自動化編輯器組態之這論點消除在 ISA說明以及編輯器之間不一致性之機會,並且以最小的力 量引動改變ISA中之常數。 ^ 經由以TPP前處理,編輯器之許多方面被組態。對於由 參數選擇控制之組態選擇而言,編輯器中對應的參數經由 TPP被設定。例如,該編輯器具有一組旗標變數以指示是否 目標處理器60使用大邏輯排列或小邏輯排列位元組順序, 而這變數使用自組態格式100讀取邏輯排列參數之TPP命令 而自動地被設定。TPP同時也依據是否對應的封裝被引動於 組態格式1 〇〇而被使用於有條件地引動或不引動產生選擇性 ISA封裝指令碼的編輯器之手寫編碼部份。例如,如果組態 格式包含MAC 16選擇90,則用以產生相乘/累積指令之指令 碼僅被包含於編輯器。 該編輯器也被組態以支援經由TIE語言指定之設計者-· 定義指令。這支援具有兩組位準。在最低位準,設計者-定 義指令於被編輯之指令碼當中可用爲巨集,本質函數,或 線內(外質)函數。本發明中之這實施例產生一組C檔頭檔 案,其定義線內函數爲”線內組合’’碼(一組gnu C編輯器之 標準特點)。給予設計者-定義操作碼之TIE格式以及它們對 應的操作元,產生這檔頭檔案是一種轉譯至GNU C編輯器之 線內組合語法的直接程序。一組另外的製作產生一組包 含指定線內組合指令之C預處理器巨集之檔頭檔案。然 57 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ 五、發明説明(55 ) 而其他的選擇則使用TPP以增加本質函數直接地進入編輯 (請先閲讀背面之注意事項再填寫本頁) 器。 支援設計者-定義指令之第二位準是藉由使編輯器自動 地確認使用指令之時機而提供。在組態程序時,這些TIE指 令可以直接地被使用者定義或自動地被產生。優先於編輯 使用者應用,該TIE碼自動地被檢驗並且被轉換成爲C等效d 函數。這是被使用以允許TIE指令之快速模擬的相同步驟。 該C等效函數部份地被編輯成爲一組被編輯器使用之樹狀基 礎中間表示。該對於各TIE指令之表示被儲存於資料庫當。 中。當使用者應用被編輯時,編輯處理程序之部份是圖型 匹配器。使用者應用被編輯成爲樹狀-基礎中間表示。該圖 型匹配器自下而上地行進使用者程式中之每一樹。在各行 進步驟中,圖型匹配器檢查是否長根於目前點之中間表示 有匹配資料庫中之任何TIE指令。如果有一組匹配,則該匹 配便被標明。在完成行進各樹之後,最大幅匹配之組被選 擇。在樹當中各最大之匹配被等效TIE指令所取代。 上述演算法將自動地確認使用無狀態TIE指令之時機。 另外的方法也可以被使用以自動地確認使用具備狀態之TIE 指令之時機。先前的部份說明供自動地選擇具備狀態之可 行TIE指令之演算法。相同演算法被使用以自動地使用TIE_ 指令於C或C + +應用當中。當一組TIE共同處理器已經被定 義爲具有更多暫存器但是受限制之操作時,指令碼之區域 被掃瞄以判別是否它們承受暫存器滿溢以及是否那些區域 僅使用可用的操作組。如果該區域被發現,則在那些區域 58 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 __B7_ 五、發明説明(56 ) 當中之指令碼自動地被改變以使用共同處理器98指令以及 暫存器。轉換操作被產生於區域邊界以移動資料進出共同-處理器98。相似地,如果一組TIE共同處理器已經被定義爲 運作於不同大小之整數,則指令碼之區域便被檢驗以判別 是否區域當中之全部資料如爲不同的大小被存取。爲了匹 配區域,該指令碼被改變並且接合碼被添加於邊界上。相 似地,如果一組TIE共同處理器98已經被定義爲製作一種 C + +摘要資料型式,則所有該資料型式之操作便被TIE共同 處理器指令所取代。 注意,自動地建議TIE指令以及自動地利用TIE指令爲 獨立有用的。建議之TIE指令也可以經由本質的機構手動地 被使用者使用而利用演算法可以被施加至TIE指令或被手動 設計之共同處理器9 8。 不論設計者-設計指令如何被產生,經由線內函數或藉 由自動確認,編輯器需要瞭解該設計者-定義指令之可觀副 作用以便其可以最佳化並且排訂這些指令。爲了改進性 能,傳統之編輯器將使用者指令碼最佳化以便最大化所需 的特性,例如執行-時間性能,指令碼尺度或功率消耗。如 對於精通本技術之人員是習知的,此最佳化包含,例如重 配置指令或以其他語意地等效指令取代某種指令。爲了完 整地進行最佳化,編輯器必須瞭解每一指令如何影響機器 之不同的部份。讀取並且寫入機器狀態之不同部份的兩組指θ 令可以被自由地重新排順序。存取機器狀態之相同埠的兩組 指令則永遠不可以被重新排順序。對於傳統之處理器而言’ 59 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公爱) (請先閲讀背面之注意事項再填寫本頁) 、τ. # 539965 A7 _B7___ 五、發明説明(57 ) (請先閲讀背面之注意事項再填寫本頁) 不同指令的狀態讀取及/或寫入被硬體接線,有時藉由查 表,而倂入編輯器。在本發明之一組實施例中,TIE指令被 保守地假設以讀取並且寫入處理器60之所有狀態。這允許 編輯器產生更正碼,但限制在TIE指令之存在下編輯器將指 令碼最佳化之能力。在本發明之其他的實施例中,一組工 具自動地讀取TIE定義並且對於各TIE指令發現何組狀態藉 該指令被讀取或被寫入。這工具接著修改被編輯器之最佳 化器使用之表以精確地模式化各TIE指令之效應。 相似於編輯器,該組譯器110之機器-相關部份同時包 含自動地被產生部份以及以TPP被組態之手寫-編碼部份。 所有組態之某些共同特點爲被手寫指令碼支援。但是,組 譯器110之主要的工作爲編碼機器指令,並且指令編碼以及 解碼軟體可以自動地自ISA說明被產生。 因爲指令編碼以及解碼在許多不同的軟體工具中是有 用的,本發明之這實施例將進行那些工作之軟體聚集成一 組分別的軟體檔案庫。這檔案庫使用ISA說明中之資訊自動 地被產生。該檔案庫定義一組操作碼之詳表,一種有效益 地映射操作碼助記符號之串列於詳表之成員上_ (stringToOpcode)之函數,以及供用於各操作碼之表,其指 定指令長度(instructionLength),操作元之數目 (numberOfOperands),操作元欄,操作元型式(亦即,暫存 或立即 KoperandType),二進位編碼(encodeOpcode),以及 助記符號串列(opcodeName)。對於各操作元欄而言,該檔 案庫提供存取器函數以編碼(fieldSetFunction)並且解碼 60 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(58 ) (fieldGetFunction)指令字組中對應的位元。這所有資訊是 在ISA說明中馬上可用的;產生該檔案庫軟體僅是轉譯資訊 成爲可執行C碼之工作。例如’指令編碼被記錄於C陣列變 數,其中各項爲特定指令的編碼,其藉由設定各操作碼欄 至指定ISA說明中之指令値被產生:該解操作碼函數簡單地 返回陣列値至所給予的操作碼。 檔案庫同時也提供一種以二進位指令解碼操作碼 (decodeinstruction)之函數。這函數被產生爲一組巢式 s w i t c h陳述之序列,其中最外面s w i t c h測試在操作碼階級之 頂部的副操作碼欄,並且該巢式switch陳述測試於操作碼階 級漸次低之副操作碼欄。針對這函數被產生之碼因此具有_ 與操作碼階級本身相同之結構。 給予這檔案庫以供編碼及解碼指令,組譯器1 1 〇容易地 被製作。例如,組譯器中之指令編碼邏輯是相當簡單:SrDataIn_W is the input to the flip-flop, but only the relevant bits are input according to the number of interrupts; srlntrEnWEn is the write trigger of the flip-flop; cReset is the input to the flip-flop; and CLK is the input clock to the flip-flop Device. For example, given the following input to the TPP:; # Timer interrupt 48 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (46) if (SIsaUseTimer) {wire ['$ width-r: 0] srCCount; wire ccountWEn; // ------ CCOUNT Register // ------ assign ccountWEn = s rWEn_W M (srWrAdr_W == 'SR0C0UNT); xtflop # (' $ width ' ) Srccntreg (srCCount, (ccountWEn? SrDatalnJV: srCCount + 1), CLK); 9 for ($ i = 0; $ i < $ TimerNumber; // ...... // —— CCONPARE Register //- ---- wire ['$ width-l': 0] srCCompare '$ i'; wire ccompWEn '$ i'; assign ccompWEn '$ i' = s rWEn_W & & (srWrAdr_W == 4SRCC0MPARE '$ i' ); Xtenflop # ('$ width') srccmp '$ i4reg (srCCompare' $ i ', srDataln—W, ccompWEn' $ i4, CLK); assign setCCompIntr '$ i' = (srCCompare4 $ i4 == srCCount); assign clrCCompIntr '$ i' = ccompWEn '$ i'; (Please read the notes on the back before filling out this page);} ## IsaUseTimer and It declares that $ I s aU s eT imer = 1 $ T imerNumber = 2 $ width = 3 2 TPP produces 49 This paper size is applicable to China National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (47 ) wire [31: 0] srCCount; wire ccountWEn; (Please read the precautions on the back before filling in this page) // -............ ........................................ . // CCDUNT Register // .............................................. ............... assign ccountWEn = srWEn—WM (srWrAdr_w = = 'SRCCOUNT); xtflop # (32) srccntreg (srCCount, (ccountWEn? srDataln_W: srCCount + 1), CLK); // ............................. ....................................... // CCDMPARE Register / / ...............................-....... wire [31: 0] srCCompareO; wire ccompWEnO ; assign ccompWEnO = s rWEn_W & & (srWrAdr_w = = 'SRCCOMPAREO); xtenflop # (32) srccmpOreg (srCCompareO, srDataln—W, ccompWEnO, CLK); assign setCCompInt rO = (srCCompuntO) clrCCompIntrO = ccom pWEnO; // ......... Γ // CCOMPARE Register // ---............ wire [31: 0] srCComparel; wire ccompWEnl; assign ccompWEnl two srWEn one WM (srWrAdr_w == 'SRCCOMPARE1); xtenflop # (32) srccmplreg (srCComparel, srDataln one W, ccompWEnl, CLK) »Assign setCComprrCrl = ccompWEnl; The HDL description 1 14 thus generated is used in synthetic hardware for processor production, for example, manufactured by Synopsys in Box 122. This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) ) 539965 A7 _ B7 5. Design CompilerTM of Invention Description (48). The result is then placed and guided using, for example, Silicon Ense mb leTM of the company Avant! Once its components have been guided, the results can be used in block 132 for wiring background-annotations and timing confirmation, such as, for example, Prime TimeTM of Synopsys. The product of this process is a set of hardware profiles 1 34 that can be used by the user to provide further input to the configuration capture routine for further configuration iterations. As described above in connection with the logic synthesis section 122, one of the results of the configuration processor 60 is a set of self-made HDL files from which specific logic gate-level production can be obtained by using any number of commercial synthesis tools. One such tool is Design CompilerTM from Synopsys. To ensure corrections and high-performance gate-level fabrication, this embodiment provides the originals necessary to automate the synthesis process in a home-made environment. Provide this original choice. Support for multiple synthesis methods and different production goals for users. To achieve its first challenge, this embodiment splits into smaller and functionally complete originals. A set of such examples is to provide a set of read originals that can read all HDL files associated with a particular processor configuration 60, a set of original timing constraints that set the unique timing requirements in the processor 60, and a set of write outs The synthesis result is in a way that can be used for the placement and routing of the logic gate and level network. To achieve this second challenge 'this embodiment provides a set of scripts for each production purpose. A set of such examples is to provide a set of originals for achieving the fastest cycle time, a set of originals for achieving the minimum silicon area, and a set of originals for achieving the minimum power consumption. 51 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) --------------------- change ------------ -------- 、 Yes ---------------- 0 (Please read the precautions on the back before filling this page) 539965 A7 _B7_ V. Description of the invention (49) (Please read the notes on the back before filling out this page.) It was also used in other steps of processor configuration. For example, once the HDL mode of the processor 60 has been written, a set of simulators can be used to confirm the corrective operation of the processor 60, as described above in conjunction with block 132. This is often achieved by executing many test programs, or diagnostics, on the simulated processor 60. Running a test program on the simulated processor 60 may require many steps, such as generating a set of executable images of the test program, generating a set of representations of this executable image that can be read by the simulator 1 1 2, and generating a set of Temporary locations where the results of the simulation can be collected for future analysis, analyzing the results of the simulation, and so on. In previous technologies, this goal was originally accomplished with some discards. These originally had some built-in wisdom of simulating the environment, such as how to group HDL files should be included, where the files can be found in the directory structure, what groups of files are required by the test platform, etc. In the current design, the preferred mechanism is to write a set of original prototypes replaced by parameters. The configuration agency also uses the TPP to generate a list of files needed for the simulation. Further, the verification procedure of block 132 must often include other originals that allow the designer to execute a test program sequence. This is often used to implement a recovery package that gives designers confidence: the changes given in the HDL model do not introduce new errors. The originals of these restorations are often discarded because they have many built-in assumptions about the file name 'location' and so on. As mentioned above, to perform the original creation of a single test program, the copy was originally written as a set of prototypes. This version is configured by replacing parameters with actual parameters during configuration. The final step of converting the HTL description into a hardware-made processing program is to make 52 paper sizes to apply the Chinese National Standard (CNS) A4 specification (210X297). 539965 A7 _B7_ V. Description of the invention (50) Use a placement and guide (P & R) software transforms this abstract network into a representation of a few. The P & R software analyzes the connectivity of the network and determines the placement of constituent cells. It then attempts to pull connections between all constituent cells. The clock network usually receives special attention and is directed as the last step. At the same time, this process can be assisted by providing some information about its tools, such as those that should be close together (known as software groups), the related placement of the members, what network should have a small transmission delay, etc. . To make this process easier and at the same time ensure that the required performance goals are achieved-cycle time, area, power dissipation-the configuration mechanism generates a set of original or input files for use in the P & R software. These originally contained information such as the relative placement of constituent cells. It also contained information, such as how much power and ground connections were needed, how these should be distributed along the boundaries, and so on. The original was created by asking how many software populations the information database contains and which cells should be included in them, which sets of networks have timing needs, and so on. These parameters change depending on which group is selected. These must have been configurable according to the tools that will be used for placement and guidance. The configuration mechanism can optionally request more information from the user and pass it to the P & R original. For example, the interface can require the user's final layout of the required length to width ratio, how many levels of buffering should be plugged into the clock tree, and which sides of the input and output pins should be placed in the relative or absolute position of these pins, power and ground. Width and position, etc. These parameters are then passed to P & R originally to produce the desired layout. Even more fine-grained can be used, which allows, for example, a set of more 53 paper sizes to apply Chinese National Standard (CNS) A4 specifications (210X297 mm) -...... ..: 0 .................., 玎 ................ 0. (Please read the notes on the back first Fill out this page again) 539965 A7 B7 V. Description of the invention (51) (Please read the precautions on the back before filling out this page) Add the fine clock tree. One common optimization to reduce power dissipation is to sniff and control. Clock signal. However, this makes clock tree synthesis more difficult because it is more difficult to balance the delays of all branches. The configuration interface may require the user to use correction component cells for the clock tree and for parts or the whole of the clock tree synthesis. This goal is achieved by having some message: where the logic gate clock is placed in the design and the estimated delay forms the clock input that limits the logic gate to the flip-flop. It then gives the clock tree synthesis tool a set of constraints to match the delay of the clock buffer with the delay of the logic gate. In the current production, this situation is originally accomplished using Perl for general purpose. This originally reads the clock information of the logic gate generated by the configuration agent according to which group is selected. This Perl was originally executed once its design had been placed and guided and before the final clock tree synthesis was completed. Further improvements can be made to the profile processing program described above. Specifically, we will describe a set of processes that allows users to get similar hardware profile information almost simultaneously without spending hours executing the CAD tools. This process has many steps. The first step in this process is to separate all configuration selection groups into orthogonal selection groups so that the effect of selecting one group on the hardware profile is not related to the selection in any other group. For example, the impact of the MAC 16 unit on the hardware profile is irrelevant to any other choice. Therefore, a selection group with only MAC 16 selection is formed. A more complex set of options is the selection group that includes interrupt selection, high-level interrupt selection, and timer selection, because the impact on hardware profiles is determined by the specific combination of these selections. < 54 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7_ V. Description of the invention (52) The second step is to characterize the impact of the hardware profile of each selected ethnic group. The characterization is accomplished by obtaining an impact on the hardware profile of the various combinations selected in the ethnic group. For each combination, the profile was obtained using the previously described process, where a set of actual productions were exported and their hardware briefs were measured. This information is stored in a set of assessment databases. The final step is to derive a specific formula for calculating the impact of the hardware profile. It uses the curve fit and interpolation techniques to select a specific combination in the selected group. Depending on the nature of the choice, different formulas are used. For example, because each additional interrupt vector adds approximately the same logic to the hardware, we use a linear function to model its hardware impact. In other examples, having a set of timer units requires high priority interrupt selection, so the formula for the hardware impact of the timer selection is a conditional formula that covers many choices. It is useful to provide rapid feedback on how construction choices may affect their execution time performance and application code size. Many groups of appraisal programs were selected from multiple application areas. For each domain, a set of databases is pre-built, which estimates how the design decisions for different constructs will affect the execution time performance in the domain and the script size of the application. When users change the design of the structure, their database is asked about the application area or multiple areas of interest to the user. The results are provided to users so that they can get a coordinated evaluation between software profit and hardware costs. The rapid estimation system can be easily extended to provide users with suggestions on how to modify the configuration to further optimize their processors. One set of this example is a combination of configuration choices and a set of representative choices for various cost scales, such as area, delay, and the number of increased power shocks. For the given 55 paper sizes, the Chinese National Standard (CNS) A4 specification (210X297 mm) is applicable. ..........., 玎 ---------------- · (Please read the notes on the back before filling out this page) 539965 A7 B7 V. Description of Invention (53) Choosing to increase the cost impact can be done by a rapid estimation system. It only covers two sets of calls to the estimation system, with and without options. The estimated cost difference between the two groups represents the incremental impact of their choice. For example, the calculation of the incremental area impact selected by MAC 16 is by evaluating the cost of two sets of configuration areas with and without MAC 16 selection. The difference is then displayed in the interactive configuration system using the MAC16 selection. This system can lead the user towards an optimal solution through a sequence of single step improvements. When moved to the software side of the automatic processor configuration program, this set of embodiments in the present invention configures the software development tools 30 so that they are specific to the processor. The configuration process begins with a software tool 30 that can be ported to a variety of different systems and instruction set structures. Such retargetable tools have been extensively studied and are well known in the related art. This embodiment uses the GNU tool family of free software, including, for example, the GNU C editor, GNU translator, GNU debugger, GNU linker, GNU styling tool, and various practical programs. These tools 30 are then automatically configured by generating software portions directly from the ISA specification and by using TPP to modify the software portions written by hand. The GNU C editor is configured in many different ways. Give the core ISA instructions that many machine-related logic in the editor can be written by hand. Part of this editor is common to all configurations of the configurable processor instruction set, and manual retargeting allows fine-tuning for best results. But ^ Yes, even with the hand-coded part of the editor, some instruction codes are automatically generated from the ISA instructions. Specifically, the ISA specification defines the settings of the constants 可以 that can be used for the immediate column of various instructions. For each immediate column, 56 paper sizes are applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) ........., OK (please read the precautions on the back before filling this page) 539965 A7 _B7_ _ V. Description of the Invention (54) (Please read the notes on the back before filling this page), a predicate function is generated to test whether a specific set of constants can be coded in the column. When the instruction code of the processor 60 is generated, the editing is performed. The processor uses these predicate functions. This argument for automated editor configuration eliminates the opportunity for inconsistencies between the ISA specification and the editor, and induces changes in ISA constants with minimal effort. ^ By preprocessing with TPP, many aspects of the editor are configured. For configuration selection controlled by parameter selection, the corresponding parameters in the editor are set via TPP. For example, the editor has a set of flag variables to indicate whether the target processor 60 uses a large logical arrangement or a small logical arrangement of the byte order, and this variable uses a TPP command that reads logical arrangement parameters from the self-configuring format 100 to automatically It is set. TPP is also used in the hand-coded part of the editor that generates a selective ISA package instruction with or without a condition depending on whether the corresponding package is invoked in the configuration format 100. For example, if the configuration format includes MAC 16 selection 90, the instruction code used to generate the multiply / accumulate instruction is included only in the editor. The editor is also configured to support designer-defined instructions specified via the TIE language. This supports having two sets of levels. At the lowest level, the designer-definition instruction can be used as a macro, intrinsic function, or inline (external) function in the edited instruction code. This embodiment of the present invention generates a set of C file files, which defines inline functions as "inline combination" codes (a standard feature of a set of gnu C editors). It is given to the designer to define the TIE format of the operation code And their corresponding operands, the generation of this header file is a direct program that translates into the inline assembly syntax of the GNU C editor. Another set of productions produces a set of C preprocessor macros that contain the inline assembly instructions Profile file. Ran 57 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7_ V. Description of the invention (55) and other options use TPP to increase the essential function directly into the editor ( Please read the notes on the back before filling this page). The second level of support for designer-definition instructions is provided by having the editor automatically confirm when to use the instructions. When configuring the program, these TIE instructions Can be directly defined by the user or generated automatically. Prior to editing the user application, the TIE code is automatically checked and converted into a C equivalent d function. This is used The same steps that allow fast simulation of TIE instructions. The C equivalent function is partially edited into a set of tree-based intermediate representations used by the editor. The representations for each TIE instruction are stored in the database. When the user application is edited, part of the editing process is a pattern matcher. The user application is edited into a tree-based intermediate representation. The pattern matcher travels from bottom to top in each of the user programs. A tree. In each step, the pattern matcher checks whether it is rooted in the middle of the current point to indicate that there is any TIE instruction in the matching database. If there is a set of matches, the match is marked. After completing each tree, The group with the largest match is selected. The largest match in the tree is replaced by the equivalent TIE instruction. The above algorithm will automatically confirm the timing of using the stateless TIE instruction. Another method can also be used to automatically confirm the use Timing of stateful TIE instructions. The previous section explained algorithms for automatically selecting feasible stateful TIE instructions with states. The same algorithm is used To automatically use the TIE_ instruction in C or C ++ applications. When a group of TIE coprocessors has been defined as having more registers but restricted operations, the area of the instruction code is scanned to determine whether they are Withstand register overflow and whether those areas use only available operating groups. If the area is found, then 58 paper sizes in those areas apply the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 __B7_ V. The instruction code in (56) is automatically changed to use the common processor 98 instructions and registers. The conversion operation is generated at the area boundary to move data in and out of the common-processor 98. Similarly, if a group of TIEs are common The processor has been defined to operate on integers of different sizes, and the areas of the instruction code are checked to determine whether all the data in the areas are accessed as if they were of different sizes. To match the area, the instruction code is changed and the junction code is added to the boundary. Similarly, if a group of TIE coprocessors 98 has been defined to make a C ++ summary data type, all operations of that data type are replaced by TIE coprocessor instructions. Note that automatically recommending the TIE instruction and automatically using the TIE instruction are independently useful. The proposed TIE instruction can also be used manually by the user through the essential mechanism, and the algorithm can be applied to the TIE instruction or a manually designed common processor 9 8. Regardless of how the designer-design instruction is generated, via in-line functions or by automatic confirmation, the editor needs to understand the considerable side effects of the designer-definition instruction so that it can optimize and schedule these instructions. To improve performance, traditional editors optimize user scripts in order to maximize required characteristics, such as execution-time performance, script size, or power consumption. If known to those skilled in the art, this optimization includes, for example, reconfiguration instructions or replacing certain instructions with other semantically equivalent instructions. To fully optimize, the editor must understand how each instruction affects different parts of the machine. Two sets of instructions θ that read and write different parts of the machine state can be reordered freely. Two sets of instructions that access the same port of the machine state can never be reordered. For traditional processors, '59 This paper size applies the Chinese National Standard (CNS) Α4 specification (210X297 public love) (Please read the precautions on the back before filling out this page), τ. # 539965 A7 _B7___ 5. Description of the invention (57) (Please read the precautions on the back before filling this page) The status reading and / or writing of different instructions are wired by hardware, and sometimes enter the editor by looking up the table. In one set of embodiments of the invention, the TIE instruction is conservatively assumed to read and write all states of the processor 60. This allows the editor to generate correction codes, but limits the ability of the editor to optimize the instruction code in the presence of the TIE instruction. In other embodiments of the present invention, a set of tools automatically reads the TIE definition and finds for each TIE instruction what set of states are read or written by the instruction. This tool then modifies the table used by the editor's optimizer to accurately model the effects of each TIE instruction. Similar to the editor, the machine-related part of the translator 110 includes both an automatically generated part and a hand-coded part configured with TPP. Some features common to all configurations are supported by handwritten scripts. However, the main job of the translator 110 is to encode machine instructions, and the instruction encoding and decoding software can be automatically generated from the ISA instructions. Because instruction encoding and decoding are useful in many different software tools, this embodiment of the present invention aggregates the software that performs those tasks into a separate set of software archives. This archive is automatically generated using the information in the ISA instructions. The archive defines a detailed list of opcodes, a function that efficiently maps strings of opcode mnemonics to members of the detailed list (stringToOpcode), and a table for each opcode, which specifies instructions Length (instructionLength), number of operands (numberOfOperands), operand column, operand type (ie, temporary or immediate KoperandType), binary encoding (encodeOpcode), and mnemonic symbol series (opcodeName). For each operation element column, this archive provides accessor functions to encode (fieldSetFunction) and decode 60 paper standards applicable to Chinese National Standard (CNS) A4 specifications (210X297 mm) 539965 A7 B7 V. Description of the invention (58 ) (fieldGetFunction) The corresponding bit in the instruction block. All this information is immediately available in the ISA instructions; generating the archive software is simply a matter of translating the information into executable C code. For example, 'instruction code is recorded in the C array variable, where each item is the encoding of a specific instruction, which is generated by setting each opcode column to the instruction in the specified ISA description: the solution opcode function simply returns the array 値The given opcode. The archive also provides a function that decodes instruction codes with binary instructions. This function is generated as a sequence of nested sw i t c h statements, where the outermost sw i t c h tests the sub-op code column at the top of the opcode level, and the nested switch statement tests the sub-op code column that is progressively lower in op code level. The code generated for this function therefore has the same structure as that of the opcode class itself. Given this archive for encoding and decoding instructions, the translator 1 10 is easily made. For example, the instruction encoding logic in the translator is quite simple:

Assemblelnstruction (String mnemonic, int arguments []) begin opcode = stringToOpcode(mnemonic); if(opcode = = UNDEFINED)Assemblelnstruction (String mnemonic, int arguments []) begin opcode = stringToOpcode (mnemonic); if (opcode = = UNDEFINED)

Error("Unknow opeodeM); instruction = encodeOpcode(opcode); numArgs = numbe rOfOpe rands(opcode); ^ for i = 0, numArgs-1 do begin setFun = fieldSetFuncticn(opcode, i); setFun(instruction, arguments [i]); end 61 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 、訂| 539965 A7 B7 五、發明説明(59 ) return instruction; end 製作轉譯二進位指令成爲相似於組合碼之可讀取型式 的反組譯器1 1 〇是同樣地直接:Error (" Unknow opeodeM); instruction = encodeOpcode (opcode); numArgs = numbe rOfOpe rands (opcode); ^ for i = 0, numArgs-1 do begin setFun = fieldSetFuncticn (opcode, i); setFun (instruction, arguments [ i]); end 61 This paper size is in accordance with Chinese National Standard (CNS) A4 (210X297mm) (Please read the precautions on the back before filling this page), order | 539965 A7 B7 V. Description of Invention (59) return instruction; end Making a transliteration binary instruction similar to a combinational code in a decomposer 1 1 0 is equally straightforward:

Disassemblelnstruction (Binarylnstruction instruction) begin opcode = decodelnstruct ion(instruction); inst ruct ionAddress += instruct ionLength(opcode); print opcodeName(opcode); // Loop through the oprands, disassembling each numArgs = numberOfOprands(opcode); for i = 0, numArgs-1 do begin type = oprandType(opcode, i); getFun = fie1dGetFunct ion(opcode, i); value = getFun(opcode, i, instruction); if (i != 0) print "," ; // Comma separate oprands ' // Print based on the type of the oprand switch (type) case register: print registerPrefix(type), value; case immediate: print value; case pc_relative_label: print instruct ionAddress + value; // etc. for more different oprand types end end 62 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 訂· 539965 A7 _B7___ 五、發明説明(60 ) 這反組譯器演算法被使用於一單獨的反組譯器工具並 且也使用於除錯器1 30以支援機器碼之除錯。 鏈接器對於組態是比編輯器以及組譯器1 10不靈敏。大 部分鏈接器是標準的並且即使機器-相關部份主要仰賴於核 心ISA並且對特定的核心isA可以被手寫-編碼。參數,例如ώ 邏輯結構,使用ΤΡΡ自組態格式1〇〇被設定。目標處理器60 之記憶體映射是鏈接器需要之組態的另一組論點。之前, 指定記憶體映射之參數使用ΤΡΡ被塞進入鏈接器。在本發明 之這實施例中,GNU鏈接器被一組鏈接器原本所驅動,而 就是這些鏈接器原本包含記憶體映射資訊。這方法的優點 之一爲另外的鏈接器原本可以稍後被產生,無須再組態處 理器60並且無須再建立鏈接器,如果當處理器60被組態 時,目標系統之記憶體映射不同於指定之記憶體映射。因 此,這實施例包含一種以不同的記憶體映射參數以組態新 的鏈接器原本之工具。 除錯器1 30提供用以觀察程式執行時的狀態,用以將每. 次執行一組指令單一步驟化,用以引介***點,並且用以 進行其他的標準除錯工作之機構。被除錯之程式可以被執 行於被組態處理器之硬體製作上或於ISS 126上。該除錯器 在上述情況中提供使用者相同界面。當程式執行於硬體製-作上時,一組小監視器程式被包含於目標系統而用以控 制使用者之程式的執行並且用以經由串列埠和除錯器通 訊。當程式執行於模擬器126時,該模擬器126本身進行 這些功能。除錯器1 30以許多方式取決於組態。其被以上 63 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 、可| 539965 A7 B7 _ 五、發明説明(61 ) 述指令編碼/解碼檔案庫鏈路以支援自除錯器130之內反組 譯機器碼。顯示處理器之暫存器狀態的除錯器1 30之部份〆 以及提供該資訊至除錯器130的除錯監視器程式之部份與 ISS1 26 ’藉由掃瞄ISA說明而被產生以發現何組暫存器現存 於處理器60當中。 其他的軟體發展工具30爲標準的並且不需要對各處理 器組態被改變。簡介觀看器以及各種實用性程式屬於這分 類。這些工具可能需要被再目標化一次以操作於被處理器 60之所有組態分享的二進位格式檔案,但是它們並不取決 於IS A說明或組態格式1〇〇中其他的參數。 該組態格式同時也被使用以組態展示於第1 3圖ISS 126之模擬器。該ISS 126是一組軟體應用,其將可組態處 理器指令集之功能性行爲模式化。相異於其相對之處理器。 硬體模式模擬器,例如Synopsys VCS和Cadence Verilog XL 及NC模擬器,該ISS HDL模式爲一種在其指令執行時之CPU 摘要。該ISS 126執行可以更快於硬體模擬,因爲在完全處 理器設計中其並不需要去將每一邏輯閘以及暫存器之每一 信號轉移模式化。 ISS 126允許針對被組態處理器60被產生之程式執行於 主電腦之上。其精確地複製處理器之重置以及中斷行爲, 其允許低階程式,例如元件驅動器以及啓始碼被產生。當 接埠自然碼至固定應用時這特別地有用。 該ISS 126可以被使用於確認主要問題,例如建構性假 設,記憶體順序考慮,等等,而不需要下載程式碼至實 64 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) -、可| 539965 A7 B7 五、發明説明(62 ) 際固定之目標。 (請先閲讀背面之注意事項再填寫本頁) 在這實施例中,ISS語意通常使用C-一般語言被文字表 示以建立轉換指令成爲函數之C操作器構成區塊。例如,中 斷之基本功能性,即,中斷暫存器,位元設定,中斷位 準,向量,等等,被使用這語言模式化。 該可組態ISS 126被使用於下面的四組目的或目標作爲 系統設計以及確認程序之部份: --在硬體成爲可用之前將軟體應用除錯; 一將系統軟體(例如,編輯器以及操作系統構件)除錯; --與HDL模擬比較以便硬體設計確認。ISS作爲一種ISA 之參考製作-該ISS以及處理器HDL同時被執行以供診斷與 應用,其在自此兩組之處理器設計確認以及追蹤被比較 時; 一分析軟體應用性能(在一組處理器組態已經被選擇之 後,這可以爲組態程序之部份,或可以被使用以供進一步 地應用調整)。 所有之目標需要ISS 126有能力負載並且解碼以可組態 組譯器1 10與鏈接器產生之程式。它們也需要指令之ISS執 行是在語意上等效於對應的硬體執行以及編輯器之期望。 爲了這些因素,該ISS 126自被使用以定義硬體與系統軟體 的相同IS A檔案導出其解碼以及執行行爲。 爲了達到上述第一以及最後目標,對於ISS 126儘可能 地快速以供所需的精確度是重要的。該ISS 126因此允許模 擬詳細位準之動態控制。例如,除非被要求,快取詳情 65 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7__ 五、發明説明(63 ) --------------------------- (請先閲讀背面之注意事項再填寫本頁) 不被模式化,並且快取模式化可以動態地被切斷和導通。 此外,在ISS 126被編輯之前,ISS 126之部份(例如’快取 以及管線模式)被組態,如此ISS 1 26在執行時間做出非常少 數行爲之組態-相關選擇。以此方式,所有的ISS可組態行 爲被導出自關於系統之其他部份的完整定義來源。 爲了達到上述第一以及第三目標,當這些服務尙未自 0S提供在設計(目標)過程之系統爲可用時,ISS 126提供操 作系統服務至應用是重要的。同時,當其是除錯處理程序 之相關部份時,這些服務被目標0S提供亦是重要的。以此 方式,系統提供一組設計而可改變地移動這些服務在ISS主。 以及模擬目標之間。目前設計仰賴一組ISS動態控制(套取 SYSCALL指令可以被導通以及關閉)之組合以及用以要求主 機0S服務之特別SIMCALL指令的使用。 最後目標需要ISS 126將在ISA指定之位準下的處理器 以及系統行爲之某些論點模式化。尤其是,該ISS快取模式 被構成,其藉由自Perl原本產生模式C碼,該Perl原本自組 態資料庫1 00抽取參數。此外,指令之管線行爲的細節(例 如,依據暫存器使用以及功能性元件可用性需求而連鎖)也 自組態資料庫1 〇〇被導出。在目前製作中,一組特別的管線 說明檔案以LISP語法指定這資訊。 第三目標需要中斷行爲之準確控制。爲了達到這目 的,一組IS S 126中之特別非建構之暫存器被使用以抑制中 斷引動。 該ISS 12 6提供許多界面以支援其使用之不同的目標: 66 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ 五、發明説明(64 ) (請先閲讀背面之注意事項再填寫本頁) 一一組批次檔或命令行模式(一般被使用於連接第一以 及最後目標); 一一組命令迴路模式,其提供非符號性除錯能力,例如^ ***點,觀看點,步驟,等等--通常被使用於所有的四組目 標; 一一組插座界面,其允許ISS 126被軟體除錯器使用爲 一組執行背端(這必須被組態以讀取並且寫入被選擇之特定 組態的暫存器狀態)。 一一組原本界面,其允許非常詳細之除錯以及性能分 析。尤其是,這界面可以被使用於比較不同組態上之應用 行爲。例如,在任何***點,執行於一組態之狀態可以被 比較或被傳送至執行於其他的組態之狀態。 該模擬器126也具有手寫-編碼與自動地被產生部份。 該手寫-編碼部份是習見的,除了指令解碼以及執行,其自ώ IS Α說明語言產生之表被產生。該表藉由於將被執行之指令 字組發現之主要的操作碼開始將指令解碼,以該欄之値將 指標加入表內,並且繼續直至一組葉片操作碼,亦即,一 組以其他的操作碼而言尙未被定義之操作碼,被發現爲 止。該表接著將一組指示器給與至被轉譯自指定於語意宣 告以供指令之TIE碼的指令碼。這指令碼被執行以模擬指 令。 該ISS 126可以選擇性地簡介被模擬之程式的執行。這 簡介使用一種習知技術中之程式計數器取樣技術。在固 定區間,模擬器126取樣被模擬之處理器的PC(程式計數 67 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 ___B7___ 五、發明説明(65 ) (請先閲讀背面之注意事項再填寫本頁) 器)。其建構一組具備指令碼之各區域中之取樣數目的統計 圖。該模擬器126也藉由增量計數器計算當呼叫指令被模擬 時,呼叫圖形之各邊緣之次數。當模擬完成時,該模擬器 126寫入一組輸出檔案,其同時包含以可以被標準簡介觀看 器讀取之格式的統計圖以及呼叫圖形邊緣計數。因爲被模 擬之程式118不需要以儀器碼(如標準簡介技術)被修改,因 此上述簡介並未影響模擬結果而該簡介是完全地無侵害性。 最好是該系統製作可用的硬體處理器估算以及軟體處 理器估算。爲了達到這目的,這實施例提供一組估算板。 如第8圖所展示,該估算板200使用一種複雜可程式邏輯元 件2 02,例如Altera Flex 10K 200E以模擬,於硬體當中, 一組處理器組態60。一旦以被系統產生之處理器網列被規 劃,則CPLD元件202爲功能性上等效於最後ASIC產品。其 提供之優點爲處理器60之實際的製作是可用的,其可以執 行比其他模擬方法(相似ISS 126或HDL)更快並且是週期精、 確。但是,其無法達成最後ASIC元件可以達成的高頻率目 標。 β 這估算板使設計者估算各種處理器組態選擇並且開始 軟體發展以及在設計週期初期除錯。其同時也可以被使用 以供處理器組態之功能性確認。 該估算板200本身具有許多可用的資源以允許簡易軟體 發展、除錯以及確認。這些包含CPLD元件202本身、 EPROM 204 、 SRAM 206 、同步SRAM 208 、快閃言己憶體 210 68 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ , _ 五、發明説明(66 ) 以及兩組RS232串列通道212。串列通道212提供一組通訊鏈 路至UNIX或PC主機以供將使用者程式下載與除錯。一組處 理器60之組態,以CPLD網列而言,經由至元件之組態埠 2 1 4的特定串列鏈路或經由特定組態ROM 2 1 6被下載進入 CPLD 202。 估算板200上之可用資源可被組態至某種程度。估算板 上之各種記憶體元件的記憶體映射可以容易地被改變,因 爲其映射是經由可以容易被改變之可程式邏輯元件 (PLD)217所完成的。同時,處理器核心使用之快取218和 228爲可擴充的,其藉由使用較大的記憶體元件並且適當地 調整連接至快取21 8與228之標籤匯流排222與224。 使用該估算板以模擬特定的處理器組態涵蓋許多步 驟。第一步驟爲得到一組說明處理器之特定組態的RTL檔 案。接著之步驟爲使用任何數目之商業化合成工具自RTL說 明合成邏輯閘位準網列。一組此範例是Synopsys公司之 FPGA Express。該邏輯閘位準網列可以接著被使用以得到 一般由賣方提供之CPLD製作使用工具。一組此工具爲 Altera公司之Maxplus2。最後步驟爲再次使用由CPLD賣方 提供之程式設計器下載該製作至估算板上之CPLD晶片上。 因爲估算板之目的之一爲支援供除錯目的之迅速原型 製作,因此自動化列出綱要於先前的段落中之CPLD製作程 序是重要的。爲了達成這目的,傳送至使用者之檔案是藉 聚集所有相關的檔案進入單一目錄而自製的。接著,一 組完全自製的合成原本被提供以能夠合成特定的處理器組 69 本紙張尺度適用中國國家標準(®S) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 、^τ— 539965 A7 _B7_ 五、發明説明(67 ) (請先閲讀背面之注意事項再填寫本頁) 態至買方選擇之特定的FPGA元件。一組被賣方工具使用之 完全自製的製作原本同時也被產生。此合成以及製作原本 以最佳性能保證功能性上更正製作。該功能性更正之達成 是藉由包含適當的命令於原本中以讀取所有相關於特定處w 理器組態的RTL檔案,其藉由依據處理器組態中之I/O信號 包含適當的命令以指定晶片接腳位置並且藉由包含命令以 得到在閘控時脈之下處理器邏輯之某種主要部份之特定邏 輯製作。該原本同時也藉由排定詳細之時序限制至所有處 理器I/O信號並且藉由某種臨界信號之特別的程序而改進製 作性能。一組時序限制之範例爲藉由考慮板上該信號之延 遲而排定特定輸入延遲至一組信號。一組主要信號處理之 範例爲指定其時脈信號至廣域接線以便達成CPLD晶片上之 低時脈轉變率。 最好是,系統同時也組態一組被組態處理器60之確認 套組。大部分如微處理機之複雜設計之確認包含之流程如 下: 一建立一組測試平台以激勵其設計並且在測試平台之內 或使用外部模式,例如ISS 126,比較輸出; …寫入診斷以產生激勵物; 一使用機構,例如,有限狀態機器範圍HDL之線範圍以 量測確認範圍,降低執行於設計上之錯誤率,向量之數 目; —如果該範圍並不充足-則寫入更多診斷並且可使用工 具以產生診斷以進一步地運用設計。 70 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ 五、發明説明(68 ) (請先閲讀背面之注意事項再填寫本頁) 本發明使用一組類似之流程’但是流程之所有的構件 被修改以展示該設計之組態能力。這方法包含下面的步驟: …建立一組特定組態的測試平台。測試平台之組態使用 一種與HDL說明相似之方法並且支援所有其中支援的選擇 以及延伸,亦即,快取尺度、匯流排界面、時脈、中斷產 生、等等; …執行自我檢查診斷於HDL之特定組態之上。自我診斷 是可組態以使它們適合於硬體之特定組件。選擇執行何組 診斷同時也仰賴該組態; …執行假性隨機地被產生之診斷並且在相對於ISS 126 的各指令之執行之後,比較處理器狀態; …確認範圍之量測·使用量測功能性以及線範圍之範圍 工具。同時,監視器以及檢查器一起與診斷被執行以搜尋 不合法狀態以及狀況。這些所有的功能是可供特定組態格 式被組態。 所有之確認構件是可組態的。其組態性是使用TPP被製 作。 一組測試平台是系統之Veril〇gTM模式,且組態處理器 60被置於其中。在本發明之情況中這些測試平台包含: -- 快取、匯流排界面、外部記憶體; 一 外部中斷以及匯流排錯誤產生;以及 -- 時脈產生。 因爲幾乎所有上述特性都是可組態的,因此測試平台 本身需要支援組態性。所以,例如,快取尺度以及寬度和 71 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(69 ) 外部中斷之數目自動地依據組態被調整。 (請先閲讀背面之注意事項再填寫本頁) 該測試平台提供激勵至在測試之下元件…處理器60。y 其利用被預先負載進入記憶體之組合指令(來自診斷)完成此 步驟。其同時也產生控制處理器60之行爲的信號-例如,中 斷。同時,這些外部信號之頻率以及時序是可控制的並且 自動地被測試平台產生。 現有兩組型式之供診斷組態性。第一,診斷使用TPP以 決定測試何物。例如,一組診斷已經被寫入以測試軟體中 斷。這診斷將需要瞭解有多少軟體中斷以便產生正確組合 指令碼。 第二,處理器組態系統1 0必須決定何組診斷對於這組 態是適當的。例如,一組被寫入以測試mac單元之診斷是 不可應用於並未包含這元件之處理器60。在這實施例中" 這目標之達成是經由使用一組包含關於各診斷之資訊的資 料庫。該資料庫可以對各診斷包含下面的資訊: …如果某種選擇已經被選擇,則使用該診斷; 一如果該診斷無法以中斷被執行: ~ 一如果該診斷需要特別的檔案庫或處理器以執行; --如果該診斷無法以ISS 126相互模擬被執行。 最好是該處理器硬體說明包含三組型式之測試工具:測 試產生器工具、監視器以及範圍工具(或檢查器)、以及一組 相互模擬機構。測試產生工具爲以一種智慧型方式產生處 理器指令串列之工具。它們是假性-隨機測試產生器之序 列。這實施例內部地使用兩組型式-一組稱爲RTPG之特別地^ 72 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _ B7_ 五、發明説明(70 ) (請先閲讀背面之注意事項再填寫本頁) 產生型式以及一組依據稱爲VERA(VSG)之外部工具之其他 型式。兩者皆具有建構於它們周圍之組態性。依據一組態 之有效指令,它們將產生指令之串列。這些工具同時也將 能夠處理來自TIE之新定義指令-以至於這些新定義指令隨 機地被產生以供測試。這實施例包含量測設計確認之範圍 的監視器以及檢查器。 監視器以及範圍工具是沿著回歸路徑被執行之工具。 範圍工具監視該診斷正在做何事以及正在運作之HDL之功 能與邏輯。整個回歸執行期間中所有資訊被收集並且稍後 被分析以獲得某些提示:邏輯之何部份需要進一步地測試。 這實施例使用許多可組態的功能性範圍工具。例如,對於 一特定的有限狀態機器並非所有的狀態都依據組態而被包 含。因此,對於該組態而言,功能性範圍工具不可嘗試檢 查這些狀態或轉移。這目標之達成是藉由經TPP製作可組態 工具。 相似地,有監視器檢查在HDL模擬之內不合法狀況發 生。這些不合法狀況可以以錯誤形式顯出。例如在三態匯 流排上,兩組驅動器不應該同時地啓動。這些監視器是可 組態的-依據該組態是否有特定的邏輯被包含而添加或移除 檢查。 該相互模擬機構連接HDL至ISS 126。其被使用於檢查 在指令之末端處理器的狀態是相同於HDL以及ISS 126中。 其也是可組態至瞭解什麼特點被包含於各組態以及什 73 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(71 ) (請先閲讀背面之注意事項再填寫本頁) 麼狀態需要被比較。因此,例如,資料***點特點添加一組特 別的暫存器。這機構需要瞭解以比較這新的特別暫存器。 經由TIE指定之指令語意可以被轉譯爲功能上等效的C 函數以供使用於ISS 126以及以供系統設計者使用於測試和 確認。搜尋引擎1 06之組態資料庫中一組指令之語意藉使用。 標準剖析器工具建立一組剖析樹之工具被轉譯爲C函數,接 著編碼樹行徑並且以C語言輸出對應的表示。該轉譯需要一 組預回合以指定位元寬度至所有的表示並且再寫入剖析樹 以簡化某些轉譯。比較於其他的轉譯器,這些編譯器相當 地簡單,例如HDL至C或C至組合語言編輯器,並且可以由 熟習本技術之人員自TIE以及C語言格式開始被寫出。 d 使用一組組態檔案1 〇〇以及組譯器/反組譯器1 〇〇被組態 之編輯器,評鑑應用原始碼118被編輯並且被組合,並且使 用樣本資料組1 24被模擬以得到一組同時被提供至使用者組 態擷取常式以供回饋至使用者之軟體簡介1 30。 具有同時得到硬體以及軟體成本/獲利特徵以供任何組 態參數選擇之能力打開了供設計者進一步地最佳化系統之 新的時機。明確地說,這將使設計者依據優點之某些圖表 選擇使其整個系統最佳化之最佳組態參數。一組可能程序 是依據一種貪婪策略,藉由重複地選擇或不選擇一組組態 參數。在各步驟,該具有對於整個系統性能以及成本最佳 衝擊之參數便被選擇。這步驟被重複直至無單一參數可以β 被改變以改進系統性能以及成本。其他的延伸包含每次地 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) 539965 A7 ___ _B7____ 五、發明説明(72 ) 觀看一組族群組態參數或採用更精進之尋找演算法。 除了得到最佳組態參數選擇之外’這處理過程也可以 被使用於建構選擇性處理器延伸。因爲處理器延伸中之大 量可能性,因此限制延伸候選者之數目是重要。一技術爲d 分析應用軟體並且僅觀看可以改進系統性能或成本之指令 延伸。 已經包括依據這實施例自動處理器組態系統之操作 後,接著將給予至處理器微結構組態之系統應用範例。第 一範例展示應用於影像壓縮的本發明之優點。 ^ 運動評估是許多影像壓縮演算法之一種重要構件’包 含MPEG視訊以及H.263會議應用。視訊影像壓縮希望使用 自一像框至下一像框之共同點以減低對於各像框所需的儲 存數量。在最簡單之情況中,將被壓縮的一組影像之各區 塊可以比較於參考影像(一組緊緊超前或跟隨被壓縮影像之 影像)之對應區塊(相同X,Y位置)。在像框之間影像差量的^ 壓縮一般是較分別的影像壓縮更加位元-有效益。在視訊序 列中,獨特的影像特點時常自一像框至一像框移動,因此 在不同的像框中之區塊間最接近的對應並非時常在確切地 相同X,Y位置,而有某些偏移。如果影像之主要部份在像 框之間移動,在計算差量之前,對於該移動之確認以及補t 償是必須的。這事實意指其最密集的表示可以藉由編碼在 連續影像之間的差量被達成,對於獨特的特點而言,包含 在被使用於計算差量之子影像之中一組X,Y偏移量。該被 使用以供計算影像差量之位置偏移稱爲移動向量。 75 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) •、?τ— 539965 A7 B7_ 五、發明説明(73 ) 在這類之影像壓縮中,最計算性密集的工作爲各區塊 之最適當移動向量之決定。選擇移動向量之共同尺度爲在 將被壓縮影像之各區塊以及一組先前影像之候選者區塊之 間找出最低平均像素-接-像素差量之向量。該候選者區塊爲 所有在將被壓縮區塊之位置周圍鄰近區域之區塊組。影像 尺寸、區塊尺寸以及鄰近區域尺寸都會影響移動評估演算 法之執行時間。 簡單區塊爲主之移動評估是相對於參考影像比較各將 被壓縮的影像之副影像。該參考影像可以超前或隨後其視^ 訊序列中之主題影像。在每一情況中,該參考影像是習知 在主題影像被解壓縮之前爲可用於解壓縮系統。將被壓縮 之影像之一組區塊與參考影像之候選區塊的的比較被展示 如下。 對於主題影像中之各區塊而言,搜尋是達成於參考影 像之對應的位置周圍。一般影像之各色彩成份(例如,yuv) 被分別地分析。有時移動評估之達成僅在於一組成份上, 特別是亮度成份。在該主題區塊以及參考影像之搜尋區域 中每一可能區塊間之平均像素-接-像素差量被計算。將差量 爲像素値之振幅中差量的絕對値。其平均値是成比例於區 塊組對中在N2像素之上的總數(其中N是區塊之尺度)。產生β 最小平均像素差量之參考影像的區塊定義對於主題影像之 區塊的移動向量。 下面的範例展示一種簡單型式之移動評估演算法,接 著使用TIE對於小應用特定功能性單元將演算法最佳化。這f 76 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) f: (請先閲讀背面之注意事項再填寫本頁) .訂丨 539965 A7 B7 五、發明説明(74 最佳化產生一大於10之因素的速度提升,使得處理器爲主 之壓縮可實行於許多視訊應用中。其展示以高階語言寫程, 式之簡易化與特別目的硬體之效率組合的可組態處理器之 功率。 這範例使用兩組尺度,OldB以及NewB,以分別地代表 舊的以及新的影像。影像之尺寸由NX以及NY決定。區塊尺 寸由BLOCKX以及BLOCKY決定。因此,該影像是由 NX/BLOCKX乘NY/BLOCKY區塊所構成。一組區塊周圍之 搜尋區域由SEARCHX以及SEARCHY決定。最佳的移動向量 以及數値被儲存於VectX,VectY,以及VectB。其由基礎 (參考)製作所計算之最佳移動向量以及數値被儲存於 BaseX,BaseY,以及BaseB。這些値被使用以檢查相對於 由使用指令延伸製作所計算之向量。這些基本的定義被擷 取於下面的C-指令碼片段: /* image width */ /* image height */ /* block width */ 。 /氺 block height 氺/ /* search region width */ /* search region heigth */ /* old image */ /* new image */ /* X motion vector */ /* Y motion vector */ (請先閲讀背面之注意事項再填寫本頁) #define NX 64 #define NY 32 #define BLOCKX 16 #define BLOCKY 16 #define SEARCHX 4 #define SEARCHY 4 unsinged char 01dB[NX][NY]; unsinged char NeWB[NX][NY]; unsinged short VectX[NX/BLOCKX] [NY/BLOCKY]; unsinged short VectY[NX/BLOCKX] [NY/BLOCKY]; 77 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 五 發明説明(75 A7 B7 unsinged short VectB[NX/BLOCKX] [NY/BLOCKY]; /* absolute difference */ unsinged short BaseX[NX/BLOCKX] [NY/BLOCKY]; /* Base X motion vector */ unsinged short BaseY[NX/BLOCKX] [NY/BLOCKY]; /* Base Y motion vector */ unsinged short BaseB[NX/BLOCKX] [NY/BLOCKY]; /* Base absolute difference */ #define ABS(x) (((x) < 0) ? (-(x)) : (x)) #define NIN(x,y) (((x) < (y)) ? (x) : (y)) #define MAX(x,y) (((x) > (y)) ? (x) : (y)) #define ABSD(x,y) (((x) > (y)) ? ((x) - (y)) : ((y) - (x))) 區塊 該移動評估演算法包含三組巢式迴路: 1. 對於在舊的影像中各來源區塊。 2. 對於在來源區塊之周圍區域新的影像之各目的地 3. 計算在各組對像素之間的絕對差量。 該演算法之完全碼如下所示。 /木氺本本本本本氺本氺氺氺氺本氺氺本氺氺氺氺氺本本氺氺本氺氺氺本氺本氺氺本木氺本氺氺氺氺本氺氺氺本氺氺本氺本本本本氺本本氺本本本本氺氺 Reference Software implementation 氺本氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺本氺氺氺氺氺氺本氺氺氺氺氺氺本氺氺氺木氺本本氺氺氺氺本氺氺氺氺氺/ void mot ion_estimate_base() { int bx, by, cx, cy, x, y; int startx, starty, endx, endy; unsigned diff, best, bestx, besty; 78 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) •、^T— 五、發明説明(76 ) 539965 A7 B7 for(bx =0; bx < NX/BLOCKX; bx++) { for(by = 0; by < NY/BLOCKY; by++) { best = bestx = besty = UINT_MAX; startx = MAX(0, bx*BLCCKX - SEARCHX); starty = MAX(0, by*BLOCKY - SEARCHY); endx = NIN(NX-BLOCKX, bx*BLOCKX + SEARCHX); endy 二 NIN(NY-BLOCKY, by*BLOCKY + SEARCHY); for(cx = startx; cx < endx; cx++) { for(cy = starty; cy < endy; cy++) diff = 0; for(x = 0; x< BLOCKX; X++) { for(y = 0; y < BLOCKY; y++) diff += ABSD(01dB[cx+x][cy+y],Disassemblelnstruction (Binarylnstruction instruction) begin opcode = decodelnstruct ion (instruction); inst ruct ionAddress + = instruct ionLength (opcode); print opcodeName (opcode); // Loop through the oprands, disassembling each numArgs = numberOfOprands (opcode); for i = 0, numArgs-1 do begin type = oprandType (opcode, i); getFun = fie1dGetFunct ion (opcode, i); value = getFun (opcode, i, instruction); if (i! = 0) print ", "; // Comma separate oprands' // Print based on the type of the oprand switch (type) case register: print registerPrefix (type), value; case immediate: print value; case pc_relative_label: print instruct ionAddress + value; // etc . for more different oprand types end end 62 This paper size applies to Chinese National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page) Order · 539965 A7 _B7___ V. Description of the invention (60 ) This decompiler algorithm is used in a separate decompiler tool and also used in debugger 1 30 to support the machine Device code debugging. The linker is less sensitive to configuration than the editor and translator 1-10. Most linkers are standard and even if the machine-related part relies heavily on the core ISA and can be hand-coded for a specific core isA. Parameters, such as logic structures, are set using the TPP self-configuration format 100. The memory mapping of the target processor 60 is another set of arguments for the configuration required by the linker. Previously, the parameters that specified the memory map were plugged into the linker using TPP. In this embodiment of the invention, the GNU linker is originally driven by a set of linkers, and these linkers originally contained memory mapping information. One of the advantages of this method is that another linker could have been generated later. There is no need to configure the processor 60 and no linker needs to be created again. If the processor 60 is configured, the memory mapping of the target system is different The specified memory map. Therefore, this embodiment includes a tool that configures a new linker with different memory mapping parameters. The debugger 1 30 provides a mechanism for observing the state of the program when it is executed, for performing a set of instructions in a single step every time, for introducing a split point, and for performing other standard debugging tasks. The debugged program can be executed on the hardware of the configured processor or on ISS 126. The debugger provides the same user interface in the above cases. When the program is executed on a hard system, a set of small monitor programs are included in the target system to control the execution of the user's program and to communicate via serial ports and debuggers. When the program is executed in the simulator 126, the simulator 126 itself performs these functions. The debugger 1 30 depends on the configuration in many ways. It is covered by the above 63 paper standards in accordance with Chinese National Standard (CNS) A4 specifications (210X297 mm) (please read the precautions on the back before filling out this page), OK | 539965 A7 B7 _ V. Description of Invention (61) Instructions Encode / decode archive link to support decompilation of machine code within debugger 130. The part of the debugger 1 30 that displays the state of the register of the processor, and the part of the debug monitor program that provides this information to the debugger 130 and ISS1 26 'are generated by scanning the ISA instructions to What set of registers are found in the processor 60. Other software development tools 30 are standard and do not need to be changed for each processor configuration. The profile viewer and various useful programs fall into this category. These tools may need to be retargeted once to operate in a binary format file shared by all configurations of the processor 60, but they do not depend on other parameters in the IS A description or configuration format 100. This configuration format is also used to configure the simulator shown in Figure 13 ISS 126. The ISS 126 is a set of software applications that model the functional behavior of the configurable processor instruction set. Different from its relative processor. Hardware mode simulators, such as Synopsys VCS and Cadence Verilog XL and NC simulators, the ISS HDL mode is a CPU summary when its instructions are executed. The implementation of the ISS 126 can be faster than hardware simulation, because it does not need to model every logic gate and every signal transfer of the register in a complete processor design. ISS 126 allows programs generated for the configured processor 60 to be executed on a host computer. It accurately replicates the reset and interrupt behavior of the processor, which allows low-level programs such as component drivers and start codes to be generated. This is especially useful when the port is naturally coded to a fixed application. The ISS 126 can be used to confirm major issues, such as constructive assumptions, memory order considerations, etc., without downloading the code to real 64. This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the notes on the back before filling out this page)-、 可 | 539965 A7 B7 V. Description of the invention (62) The goal of international fixation. (Please read the notes on the back before filling this page.) In this embodiment, the ISS semantics is usually expressed in text using C-general language to build the C operator constructing block where the conversion instruction becomes a function. For example, the basic functionality of interrupts, that is, interrupt registers, bit settings, interrupt levels, vectors, etc., is modeled using this language. The configurable ISS 126 is used as part of the system design and validation process for the following four groups of goals or objectives:-Debug software applications before the hardware becomes available;-Debug system software (for example, editors and Operating system components) debugging;-Compare with HDL simulation for hardware design confirmation. ISS as a reference production of ISA-The ISS and the processor HDL are simultaneously executed for diagnosis and application. When the two groups of processor designs are confirmed and tracked and compared, an analysis of software application performance (in a group of processing After the device configuration has been selected, this can be part of the configuration process or it can be used for further application adjustments). All targets require ISS 126 to be capable of loading and decoding programs generated with configurable translators 10 and linkers. They also require that the ISS implementation of the instruction is semantically equivalent to the corresponding hardware implementation and editor expectations. For these reasons, the ISS 126 derives its decoding and execution behavior from the same IS A file used to define hardware and system software. In order to achieve the above first and last goals, it is important for ISS 126 to be as fast as possible with the required accuracy. The ISS 126 therefore allows the dynamic control of detailed levels to be simulated. For example, unless requested, cache details 65 This paper size applies Chinese National Standard (CNS) A4 specifications (210X297 mm) 539965 A7 _B7__ V. Description of the invention (63) ------------- -------------- (Please read the notes on the back before filling in this page) It is not patterned, and the cache pattern can be cut and turned on dynamically. In addition, before ISS 126 was edited, parts of ISS 126 (such as 'cache and pipeline mode') were configured, so ISS 1 26 made very few configuration-related choices at execution time. In this way, all ISS configurable behaviors are exported from a fully defined source about other parts of the system. In order to achieve the first and third goals mentioned above, it is important that ISS 126 provides operating system services to applications when these services are not available from the OS during the design (target) process. It is also important that these services are provided by the target OS when it is a relevant part of the debugging process. In this way, the system provides a set of designs that can changeably move these services to the ISS master. As well as between simulation targets. The current design relies on the combination of a set of ISS dynamic control (the SYSCALL instruction can be turned on and off) and the use of a special SIMCALL instruction to request the host OS service. The final goal requires ISS 126 to model the processor at the level specified by the ISA and certain arguments for system behavior. In particular, the ISS cache mode is constructed by generating mode C code originally from Perl, which originally extracts parameters from the configuration database 100. In addition, the details of the pipeline behavior of the instructions (for example, chaining based on the use of registers and functional component availability requirements) are also derived from the configuration database 100. In the current production, a special set of pipeline description files specifies this information in LISP syntax. The third goal requires precise control of the interrupted behavior. To achieve this, a set of specially unstructured registers in ISS 126 are used to suppress interrupts. The ISS 12 6 provides many interfaces to support different goals of its use: 66 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7_ V. Description of the invention (64) (Please read the back page first Please fill out this page again) a batch file or command line mode (usually used to connect the first and last target); a set of command loop mode, which provides non-symbol debugging capabilities, such as ^ split point , Viewing points, steps, etc.-are usually used for all four sets of targets; a set of socket interfaces that allow the ISS 126 to be used by the software debugger as a set of execution backends (this must be configured to read Fetches and writes the register state of the particular configuration selected). A set of native interfaces that allows very detailed debugging and performance analysis. In particular, this interface can be used to compare application behavior on different configurations. For example, at any split point, a state executed in one configuration can be compared or transferred to a state executed in another configuration. The simulator 126 also has handwritten-encoded and automatically generated parts. The handwriting-encoding part is customary. In addition to instruction decoding and execution, a table generated by its self-help IS Α description language is generated. The table starts to decode instructions with the main operation code found by the instruction block to be executed, adds indicators to the table at the beginning of the column, and continues until a set of leaf operation codes, that is, a set of other Opcodes are undefined opcodes until they are discovered. The table then gives a set of pointers to the instruction code that is translated from the TIE code assigned to the semantic declaration for instruction. This instruction code is executed to simulate the instruction. The ISS 126 can optionally profile the execution of the program being simulated. This introduction uses a conventional program counter sampling technique. In a fixed interval, the simulator 126 samples the PC of the processor being simulated (program count 67) This paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 ___B7___ V. Description of the invention (65) (Please read first Note on the back then fill in this page) device). It constructs a set of statistical graphs of the number of samples in each area with a script. The simulator 126 also counts the number of times of each edge of the call pattern when the call instruction is simulated by an increment counter. When the simulation is complete, the simulator 126 writes a set of output files that also contain statistics in a format that can be read by a standard profile viewer and the call graph edge count. Because the simulated program 118 does not need to be modified with instrument code (such as standard profile techniques), the above profile does not affect the simulation results and the profile is completely non-invasive. It is best if the system makes available hardware processor estimates and software processor estimates. To achieve this, this embodiment provides a set of estimation boards. As shown in FIG. 8, the evaluation board 200 uses a complex programmable logic element 202, such as the Altera Flex 10K 200E to simulate. In the hardware, a group of processors is configured 60. Once planned with the processor array generated by the system, the CPLD element 202 is functionally equivalent to the final ASIC product. It provides the advantage that the actual production of the processor 60 is available, it can perform faster than other simulation methods (similar to ISS 126 or HDL) and is cycle accurate and accurate. However, it cannot achieve the high-frequency goals that the final ASIC component can achieve. β This evaluation board enables designers to evaluate various processor configuration options and start software development and debug early in the design cycle. It can also be used for functional confirmation of processor configuration. The evaluation board 200 itself has many resources available to allow easy software development, debugging, and validation. These include the CPLD element 202 itself, EPROM 204, SRAM 206, synchronous SRAM 208, flash memory 210 68. This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7_, _ V. Invention Explanation (66) and two sets of RS232 serial channels 212. The serial channel 212 provides a set of communication links to a UNIX or PC host for downloading and debugging user programs. The configuration of a group of processors 60, in terms of the CPLD network, is downloaded into the CPLD 202 via a specific serial link to the component's configuration port 2 1 4 or via a specific configuration ROM 2 1 6. The resources available on the estimation board 200 can be configured to some extent. The memory mapping of the various memory elements on the evaluation board can be easily changed because its mapping is done via a programmable logic element (PLD) 217 which can be easily changed. At the same time, the caches 218 and 228 used by the processor core are extensible, by using larger memory elements and appropriately adjusting the tag buses 222 and 224 connected to the caches 21 8 and 228. There are many steps involved in using this evaluation board to simulate a specific processor configuration. The first step is to obtain a set of RTL files that describe the specific configuration of the processor. The next step is to use any number of commercially available synthesis tools to illustrate the synthesis of logic gate levels from the RTL. One such example is FPGA Express from Synopsys. The logic gate level grid can then be used to obtain CPLD production and use tools generally provided by the seller. One set of this tool is Maxplus2 of Altera Corporation. The final step is to download the production to the CPLD chip on the evaluation board again using the programmer provided by the CPLD seller. Because one of the purposes of the estimation board is to support rapid prototyping for debugging purposes, it is important to automate the CPLD production process outlined in the previous paragraph. To achieve this, the files sent to the user are self-made by aggregating all related files into a single directory. Next, a completely self-made synthesis was originally provided to be able to synthesize a specific processor group. 69 This paper size is applicable to the Chinese National Standard (®S) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page. ), ^ Τ— 539965 A7 _B7_ V. Description of the invention (67) (Please read the precautions on the back before filling this page) Status to the specific FPGA component selected by the buyer. A set of completely self-made productions used by the seller's tools was also originally produced. This synthesis and production was originally functionally corrected with the best performance guarantee. This functional correction is achieved by including appropriate commands in the original to read all RTL files related to a particular processor configuration, which includes appropriate Commands specify the chip pin locations and are made by specific logic that contains commands to obtain a major part of the processor logic under the gated clock. At the same time, it also restricts the I / O signals of all processors by scheduling detailed timing and improves the production performance by special procedures of some critical signals. An example of a set of timing constraints is to schedule a particular input to a set of signals by considering the delay of that signal on the board. An example of a set of main signal processing is to specify its clock signal to a wide area wiring in order to achieve a low clock transition rate on a CPLD chip. Preferably, the system also configures a set of confirmation sets of the configured processors 60 at the same time. Most of the complex designs such as microprocessors include the following processes:-Establish a set of test platforms to motivate their design and compare the output within the test platform or using an external mode, such as ISS 126;… write diagnostics to generate Incentives; a using mechanism, for example, the line range of the finite state machine range HDL to measure and confirm the range, reduce the design error rate, the number of vectors;-if the range is not sufficient-write more diagnostics And tools can be used to generate diagnostics to further apply the design. 70 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7_ V. Description of the invention (68) (Please read the notes on the back before filling this page) This invention uses a similar process. But all components of the process were modified to demonstrate the configuration capabilities of the design. This method consists of the following steps:… establishing a specific set of test benches. The configuration of the test platform uses a method similar to the HDL specification and supports all supported options and extensions, that is, cache scales, bus interfaces, clocks, interrupt generation, etc .; ... performs self-check diagnostics in HDL Specific configuration. Self-diagnostics are specific components that can be configured to fit the hardware. The choice of which group of diagnostics to perform depends on the configuration;… performs a diagnostic that is generated randomly and compares the processor status after the execution of each instruction relative to ISS 126;… confirms the range measurement and use measurement Functional and range tools for line range. At the same time, the monitor and the inspector are executed together with the diagnosis to search for illegal states and conditions. All of these functions are available for configuration in a specific configuration format. All validation components are configurable. Its configurability is made using TPP. One set of test platforms is the VeriogTM mode of the system, and the configuration processor 60 is placed in it. These test platforms in the context of the present invention include:-cache, bus interface, external memory;-external interrupts and bus error generation; and-clock generation. Because almost all of these features are configurable, the test platform itself needs to support configurability. So, for example, the cache size and width and 71 paper sizes are applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (69) The number of external interrupts is automatically adjusted according to the configuration. (Please read the notes on the back before filling out this page) This test platform provides incentives to the components under test ... processor 60. y This is done using a combination of instructions (from diagnostics) that are preloaded into memory. It also generates signals that control the behavior of the processor 60-for example, interrupts. At the same time, the frequency and timing of these external signals are controllable and automatically generated by the test platform. There are two types of diagnostic configuration. First, the diagnosis uses TPP to decide what to test. For example, a set of diagnostics has been written to test software interruptions. This diagnosis will require understanding how many software interrupts are in order to generate the correct combination of scripts. Second, the processor configuration system 10 must decide what set of diagnostics is appropriate for this configuration. For example, a set of diagnostics written to test mac units is not applicable to processors 60 that do not include this component. This goal is achieved in this embodiment by using a set of databases containing information about each diagnosis. The database can contain the following information for each diagnosis:… if a certain option has been selected, use the diagnosis;-if the diagnosis cannot be performed with interrupts: ~-if the diagnosis requires a special archive or processor to Perform; if the diagnosis cannot be performed with ISS 126 mutual simulation. Preferably, the processor hardware description contains three types of test tools: a test generator tool, a monitor and range tool (or checker), and a set of mutual simulation mechanisms. The test generation tool is a tool for intelligently generating a sequence of processor instructions. They are a sequence of pseudo-random test generators. This embodiment uses two sets of types internally-a set called RTPG specially. 72 This paper size applies Chinese National Standard (CNS) A4 specifications (210X297 mm) 539965 A7 _ B7_ V. Description of the invention (70) (Please (Read the notes on the back before filling out this page.) Production types and other types based on an external tool called VERA (VSG) Both have configurability built around them. Depending on a valid command configured, they will generate a sequence of commands. These tools will also be able to handle newly defined instructions from TIE-so that these newly defined instructions are randomly generated for testing. This embodiment includes a monitor and an inspector that measure the range of design confirmation. The monitor and scope tools are tools that are executed along the regression path. The scoping tool monitors what the diagnostic is doing and the function and logic of the HDL in operation. All information was collected throughout the execution of the regression and later analyzed for some hints: what part of the logic needs further testing. This embodiment uses many configurable functional scope tools. For example, for a particular finite state machine, not all states are included depending on the configuration. Therefore, for this configuration, the functional scope tool must not attempt to check these states or transitions. This goal is achieved by making configurable tools via TPP. Similarly, there are monitors that check for illegal conditions occurring within the HDL simulation. These illegal situations can manifest themselves in the wrong form. For example, on a tri-state bus, the two sets of drivers should not start simultaneously. These monitors are configurable-added or removed depending on whether the configuration contains specific logic. This mutual simulation mechanism connects HDL to ISS 126. It is used to check that the status of the processor at the end of the instruction is the same as in HDL and ISS 126. It is also configurable to understand what features are included in each configuration and what 73 paper sizes are applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (71) (Please read the back first (Please fill out this page again) The status needs to be compared. So, for example, the data split point feature adds a special set of registers. This agency needs to know to compare this new special register. The semantics of the instructions specified by TIE can be translated into functionally equivalent C functions for use in ISS 126 and for system designers to use for testing and validation. The semantic meaning of a set of instructions in the configuration database of the search engine 106 is used. The tools of the standard parser tools to build a set of parse trees are translated into C functions, followed by coding tree actions and outputting corresponding representations in C language. The translation requires a set of pre-rounds to specify the bit width to all representations and then write to the parse tree to simplify some translations. Compared to other translators, these compilers are relatively simple, such as HDL to C or C to combined language editors, and can be written by those skilled in the art from TIE and C language formats. d Using a set of configuration files 100 and an editor configured with translator / de-translator 1 100, the evaluation application source code 118 is edited and combined, and the sample data set 1 24 is simulated In order to obtain a set of software profiles that are simultaneously provided to the user configuration retrieval routine for feedback to the user. The ability to get both hardware and software cost / profit characteristics for any configuration parameter selection opens up new opportunities for designers to further optimize the system. Specifically, this will allow the designer to choose the best configuration parameters to optimize his entire system based on some diagrams of the advantages. A set of possible procedures is based on a greedy strategy by repeatedly selecting or not selecting a set of configuration parameters. At each step, the parameters that have the best impact on overall system performance and cost are selected. This step is repeated until no single parameter can be changed to improve system performance and cost. Other extensions include the application of the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 ___ _B7____ each time the paper size is applied. 5. View the group parameters or use a more sophisticated search algorithm. . In addition to getting the best configuration parameter selection, this process can also be used to build selective processor extensions. Because of the large number of possibilities in processor extensions, it is important to limit the number of extension candidates. One technique is to analyze application software and only look at extensions that can improve system performance or cost. After the operation of the automatic processor configuration system according to this embodiment has been included, a system application example to the processor microstructure configuration will be given next. The first example shows the advantages of the invention applied to image compression. ^ Motion evaluation is an important component of many image compression algorithms, including MPEG video and H.263 conference applications. Video image compression hopes to use the common point from one frame to the next to reduce the amount of storage required for each frame. In the simplest case, each block of a group of images to be compressed can be compared to the corresponding block (same X, Y position) of a reference image (a group of images that closely precedes or follows the compressed image). The ^ compression of the image difference between image frames is generally more bit-efficient than the individual image compression. In video sequences, unique image features often move from one frame to one frame, so the closest correspondence between blocks in different image frames is not always exactly the same X, Y position, but with some offset. If the main part of the image moves between image frames, confirmation of the movement and compensation must be made before calculating the difference. This fact means that its most dense representation can be achieved by encoding the difference between successive images. For unique features, a set of X, Y offsets are included in the sub-images used to calculate the difference. the amount. The position shift used to calculate the image difference is called a motion vector. 75 This paper size applies to Chinese National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling out this page) •,? Τ— 539965 A7 B7_ V. Description of Invention (73) In image compression, the most computationally intensive task is the determination of the most appropriate motion vector for each block. The common scale for selecting the motion vector is to find the vector with the lowest average pixel-to-pixel difference between each block of the compressed image and the candidate blocks of a group of previous images. The candidate block is all block groups in the vicinity of the location of the block to be compressed. The image size, block size, and neighboring area size all affect the execution time of the motion estimation algorithm. Simple block-based motion evaluation compares the secondary images of each image to be compressed relative to the reference image. The reference image can lead or follow the subject image in its video sequence. In each case, the reference image is known to be available to the decompression system before the subject image is decompressed. A comparison of a set of blocks from a compressed image with candidate blocks from a reference image is shown below. For each block in the subject image, the search is performed around the corresponding location of the reference image. Each color component (eg, yuv) of a general image is analyzed separately. Sometimes the movement evaluation is achieved only on a set of components, especially the brightness component. The average pixel-to-pixel difference between each possible block in the subject block and the search area of the reference image is calculated. Let the difference be the absolute value of the difference in the amplitude of the pixel. Its average 値 is proportional to the total number of N2 pixels in a block group pair (where N is the scale of the block). The block of the reference image that produces the β minimum average pixel difference defines the motion vector for the block of the subject image. The following example shows a simple version of the mobile evaluation algorithm, and then uses TIE to optimize the algorithm for small application-specific functional units. This f 76 paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) f: (Please read the precautions on the back before filling this page). Order 丨 539965 A7 B7 V. Description of Invention (74 Optimization A speed increase that produces a factor greater than 10 enables processor-based compression to be implemented in many video applications. It demonstrates configurable processing with high-level language writing, simplified simplification, and the efficiency of special-purpose hardware. This example uses two sets of scales, OldB and NewB, to represent the old and new images respectively. The size of the image is determined by NX and NY. The block size is determined by BLOCKX and BLOCKY. Therefore, the image is determined by NX / BLOCKX is multiplied by NY / BLOCKY blocks. The search area around a group of blocks is determined by SEARCHX and SEARCHY. The best motion vectors and data are stored in VectX, VectY, and VectB. It is based on (reference) The best motion vectors and numbers calculated by the production are stored in BaseX, BaseY, and BaseB. These frames are used to check the vector calculated relative to the production by using the instruction extension. Some basic definitions are taken from the following C-script snippet: / * image width * / / * image height * / / * block width * /. / 氺 block height 氺 / / * search region width * / / * search region heigth * / / * old image * / / * new image * / / * X motion vector * / / * Y motion vector * / (Please read the notes on the back before filling this page) #define NX 64 #define NY 32 #define BLOCKX 16 #define BLOCKY 16 #define SEARCHX 4 #define SEARCHY 4 unsinged char 01dB [NX] [NY]; unsinged char NeWB [NX] [NY]; unsinged short VectX [NX / BLOCKX] [NY / BLOCKY ]; unsinged short VectY [NX / BLOCKX]; 77 This paper size applies Chinese National Standard (CNS) A4 specifications (210X297 mm). Five invention instructions (75 A7 B7 unsinged short VectB [NX / BLOCKX] [NX / BLOCKX] [ NY / BLOCKY]; / * absolute difference * / unsinged short BaseX [NX / BLOCKX] [NY / BLOCKY]; / * Base X motion vector * / unsinged short BaseY [NX / BLOCKX] [NY / BLOCKY]; / * Base Y motion vector * / unsinged short BaseB [NX / BLOCKX] [NY / BLOCKY]; / * Base absolute difference * / #define ABS (x) (((x) < 0)? (- (x)): (x)) #define NIN (x, y) (((x) < (y))? (x): (y)) #define MAX (x, y) (((x) > (y))? (x): (y)) #define ABSD (x, y) (((x) > (y))? ((x)-(y)): ((y)- (x))) block The mobile evaluation algorithm contains three sets of nested loops: 1. For each source block in the old image. 2. For each destination of the new image in the area surrounding the source block 3. Calculate the absolute difference between each pair of pixels. The complete code of the algorithm is shown below. / Wooden Books, Books, Books, Books, Books, Books, Books, Books, Books, Books, Books, Books, Books, Books, Books, Books, Books Hard copy hard copy hard copy hard copy hard copy hard copy氺 氺 氺 本 氺 氺 氺 氺 氺 氺 本 氺 氺 氺 木 氺 本本 氺 氺 氺 氺 本 氺 氺 氺 氺 氺 / void mot ion_estimate_base () {int bx, by, cx, cy, x, y; int startx, starty, endx, endy; unsigned diff, best, bestx, besty; 78 This paper size applies to China National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page) •, ^ T — V. Description of the invention (76) 539965 A7 B7 for (bx = 0; bx < NX / BLOCKX; bx ++) {for (by = 0; by < NY / BLOCKY; by ++) {best = bestx = besty = UINT_MAX ; startx = MAX (0, bx * BLCCKX-SEARCHX); starty = MAX (0, by * BLOCKY-SEARCHY); endx = NIN (NX-BLOCKX, bx * BLOCKX + SEARCHX); endy two NIN (NY-BLOCKY, by * B LOCKY + SEARCHY); for (cx = startx; cx <endx; cx ++) {for (cy = starty; cy <endy; cy ++) diff = 0; for (x = 0; x <BLOCKX; X ++) {for (y = 0; y <BLOCKY; y ++) diff + = ABSD (01dB [cx + x] [cy + y],

NewB[bx*BLOCKX+x][by*BLOCKY+y]); if (diff < best) { best = diff; bestx = cx; besty = cy; }NewB [bx * BLOCKX + x] [by * BLOCKY + y]); if (diff < best) {best = diff; bestx = cx; besty = cy;}

BaseX[bx][by] = bestx; BaseY[bx][by] = besty; BaseB[bx][by] = best; 雖然此基本的製作是簡單的,其無法揭示區塊至區塊 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)BaseX [bx] [by] = bestx; BaseY [bx] [by] = besty; BaseB [bx] [by] = best; Although this basic production is simple, it cannot reveal the block-to-block paper scale Applicable to China National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page)

539965 A7 B7 五、發明説明(77 (請先閲讀背面之注意事項再填寫本頁) 比較的許多本質平行性。該可組態處理器結構提供兩組關 鍵工具以允許這應用之顯著的增加速度。 首先,指令集結構包含有效的漏斗式移位指令以允許 記憶體中不對齊欄之快速抽取。這允許像素比較之內部迴 路有效益地自記憶體擷取相鄰像素之族群。該迴路接著可‘ 以重新被寫入以同時地操作於四組像素(位元組)上。尤其, 爲了達到這範例之目標,需要定義一組新的指令以一次地 計算四組像素組對之絕對差量。但是,在定義這新的指令 之前,必須重新製作其演算法以使用此指令。 這指令之存在允許改進迴路不滾動成爲有吸引力的內 部迴路像素差量計算。內部迴路之C指令碼重新被寫入以利 用新的絕對差量總和指令以及有效移位之優點。參考影像 之四組重疊區塊部份可以接著被比較於相同迴路。SAD(x, y)爲對應於添加之指令之新的本質函數。SRC(x,y)藉由儲 存於SAR暫存器之移位數量進行X與y之連鎖的右方移位。539965 A7 B7 V. Description of the invention (77 (please read the notes on the back before filling out this page) many of the essential parallelisms of the comparison. This configurable processor architecture provides two sets of key tools to allow this application to increase speed significantly First, the instruction set structure contains valid funnel-type shift instructions to allow fast extraction of misaligned columns in memory. This allows the internal circuit of pixel comparison to efficiently retrieve the population of adjacent pixels from memory. The circuit then Can be rewritten to operate on four groups of pixels (bytes) at the same time. In particular, in order to achieve the goal of this example, a new set of instructions needs to be defined to calculate the absolute difference of the four groups of pixel groups at once However, before this new instruction is defined, its algorithm must be reworked to use this instruction. The existence of this instruction allows improved loop non-rolling to become an attractive calculation of the internal circuit pixel difference. The C instruction code of the internal circuit Rewritten to take advantage of the new absolute difference sum instruction and effective shift. The four sets of overlapping blocks of the reference image can be accessed Are compared to the same circuit. SAD (x, y) is a new essential function corresponding to the added instruction. SRC (x, y) performs the right of X and y linkage by the number of shifts stored in the SAR register方 shift.

Fast version of motion estimation which uses the SAD instruction. void mot ion_est imate_t ie() { int bx, by, cx, cy, x; int startx, starty, endx, endy; 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(78 ) --------------------------- (請先閲讀背面之注意事項再填寫本頁) unsinged diffO, diffi, diff2, diff3, best, bestx, besty; unsinged *N, N1, N2, N3, N4, *0, A,B,C,D,E; for(bx = 0; bx < NX/BLOCKX; bx++) { for(by = 0; by < NY/BLOCKY; by++) { best = bestx = besty = UINT一MAX; startx = MAX(0, bx*BLOCKX - SEARCHX); starty = MAX(0, by*BLOCKY - SEARCHY); endx = MIN(NX-BLOCKX, bx*BLOCKX + SEARCHX); endy = MIN(NY-BLOCKY, by*BLOCKY + SEARCHY); for(cy = starty; cy < endy; cy += sizeof(long)) { for(cx = startx; cx < endx; cx++) { diffO = diffl = diff2 = diff3 = 0; for(x = 0; x < BLOCKX; x++) { N = (unsinged *) & (NewB[bx*BLOCKX+x] [by*BLOCKY]); N1 二 N[0]; N2 = N[l]; N3 = N[2]; N4 = N[3]; 0 = (unsinged *) &(oldB[cx+x][cy]); A = 0[0]; B = 0[1]; C = 0[2]; D = 0[3]; E = 0[4]; diffO += SAD(A, Nl) + SAD(B, N2) + SAD(C, N3) + SAD(D, N4); SSAI(8); diffl += SAD(SRC(B,A), Nl) + SAD(SRC(C,B), N2) + SAD(SRC(D,C), N3) + SAD(SRC(E,D) N4); SSAI(16); 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(79 ) t (請先閱讀背面之注意事項再填寫本頁) diff2 += SAD(SRC(B,A), Nl) + SAD(SRC(C,B), N2) + SAD(SRC(D,C), N3) + SAD(SRC(E,D), N4); SSAI(24); diff3 += SAD(SRC(B,A), Nl) + SAD(SRC(C,B), N2) + SAD(SRC(D,C), N3) + SAD(SRC(E,D), N4); 0 += NY/4; N += NY/4; } if (diffO < best) { best = diffO; bestx = cx; besty = cy; } if (diffl < best) { best = diffl; bestx = cx; besty = cy + 1; } if (diff2 < best) { best = diff2; bestx = cx; besty = cy + 2; } if (diff3 < best) { best = diff3; bestx = cx; besty = cy + 3; 82 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(8〇Fast version of motion estimation which uses the SAD instruction. Void mot ion_est imate_t ie () {int bx, by, cx, cy, x; int startx, starty, endx, endy; This paper standard applies to China National Standard (CNS) A4 Specifications (210X297 mm) 539965 A7 B7 V. Description of the invention (78) --------------------------- (Please read the note on the back first Please fill in this page again) unsinged diffO, diffi, diff2, diff3, best, bestx, besty; unsinged * N, N1, N2, N3, N4, * 0, A, B, C, D, E; for (bx = 0; bx < NX / BLOCKX; bx ++) {for (by = 0; by < NY / BLOCKY; by ++) {best = bestx = besty = UINT-MAX; startx = MAX (0, bx * BLOCKX-SEARCHX) ; starty = MAX (0, by * BLOCKY-SEARCHY); endx = MIN (NX-BLOCKX, bx * BLOCKX + SEARCHX); endy = MIN (NY-BLOCKY, by * BLOCKY + SEARCHY); for (cy = starty; cy <endy; cy + = sizeof (long)) {for (cx = startx; cx <endx; cx ++) {diffO = diffl = diff2 = diff3 = 0; for (x = 0; x <BLOCKX; x ++ ) {N = (unsinged *) & (NewB [bx * BLOCKX + x] [by * BLOCKY]); N1 two N [0]; N2 = N [l]; N3 = N [2]; N4 = N [3]; 0 = (unsinged *) & (oldB [cx + x] [cy]); A = 0 [0]; B = 0 [1]; C = 0 [2]; D = 0 [3]; E = 0 [4] ; diffO + = SAD (A, Nl) + SAD (B, N2) + SAD (C, N3) + SAD (D, N4); SSAI (8); diffl + = SAD (SRC (B, A), Nl ) + SAD (SRC (C, B), N2) + SAD (SRC (D, C), N3) + SAD (SRC (E, D) N4); SSAI (16); This paper size applies to Chinese national standards ( CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Invention description (79) t (Please read the precautions on the back before filling this page) diff2 + = SAD (SRC (B, A), Nl) + SAD ( SRC (C, B), N2) + SAD (SRC (D, C), N3) + SAD (SRC (E, D), N4); SSAI (24); diff3 + = SAD (SRC (B, A) , Nl) + SAD (SRC (C, B), N2) + SAD (SRC (D, C), N3) + SAD (SRC (E, D), N4); 0 + = NY / 4; N + = NY / 4;} if (diffO < best) {best = diffO; bestx = cx; besty = cy;} if (diffl < best) {best = diffl; bestx = cx; besty = cy + 1;} if (diff2 < best) {best = diff2; bestx = cx; besty = cy + 2;} if (diff3 < best) {best = diff3; bestx = cx; besty = cy + 3; 82 National Standard (CNS) A4 Specification (210X297 mm) 539965 A7 B7 V. Description of Invention (8

VectX[bx][by] = bestx; VectY[bx][by] = besty; VectB[bx][by] = best; 這製作使用下面的SAD函數以模擬最後的新指令: /本本卞本本本本木木本木本木*木木木木$本木$木本本木本木本木木木木本本本本本木木本本木本木木木木木木木本木木*本本木本木本木本冰VectX [bx] [by] = bestx; VectY [bx] [by] = besty; VectB [bx] [by] = best; This production uses the following SAD function to simulate the last new instruction: / 本本 卞 本本 本本Woody woody wood * woody woody woody woody woody woody woody woody woody woody woody woody woody woody woody booky woody booky woody woody woody woody woody woody woody woody woody woody woody woody ice

Sum of absolute difference of four bytes 木?|c * * *木木木* 木*木木木* * * *木木* * * * * * *木木* * *木*木*卞* * *中卞* * * * * *木* *木*木* 木/ static inline unsinged SAD(unsinged ars, unsinged art) { return ABSD(ars » 24, art » 24) + ABSD((ars » 16) & 255, (art » 16) & 255) + ABSD((ars » 8) & 255, (art » 8) & 255) + ABSD(ars & 255, art & 255); 爲了將這新的製作除錯,下面的測試程式被使用以比較 移動向量以及由新的製作以及基礎製作所計算之値: /氺氺氺氺氺本本氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺本氺氺氺氺氺本;|{氺氺氺氺Sum of absolute difference of four bytes wood? C * * * wood wood wood * wood * wood wood wood * * * * wood wood * * * * * * * wood wood * * * wood * wood * 卞 * * * in卞 * * * * * 木 * * 木 * 木 * wood / static inline unsinged SAD (unsinged ars, unsinged art) {return ABSD (ars »24, art» 24) + ABSD ((ars »16) & 255 , (art »16) & 255) + ABSD ((ars» 8) & 255, (art »8) & 255) + ABSD (ars & 255, art &255); For production debugging, the following test program was used to compare the motion vectors and calculated by the new production and basic production: / 氺 氺 氺 氺 氺 本本 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 本 氺 氺 氺 氺 氺 本; | {氺 氺 氺 氺

Main test 氺氺氺氺氺本氺氺本本本氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺氺本氺氺氺氺氺/ int main(int argc, char **argv) 83 本紙張尺度適用中國國家標準A4規格(210X297公爱) ---------------------,•….........--…訂:-------------------·_ (請先閲讀背面之注意事項再填寫本頁) 539965 A7 B7 五、發明説明(si ) (請先閲讀背面之注意事項再填寫本頁) int passwd;Main test 氺 氺 氺 氺 氺 本 氺 氺 本本 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 氺 本 氺 氺 氺 氺 氺 / int main (int argc, char ** argv) 83 This paper size applies to China National Standard A4 (210X297 public love) ------ ---------------, • ...............--... Order: ------------------ -· _ (Please read the notes on the back before filling this page) 539965 A7 B7 V. Description of the invention (si) (Please read the notes on the back before filling this page) int passwd;

#ifndef NOPRINTF printf(uBlock=(%d,%d), Search=(%d,%d), size=(%d,%d)\n,\ BLOCKX, BLOCKY, SEARCHX, SEARCHY, NX, NY); #endi f mot ion__est imate_base(); mot ion_est imate_t ie(); passwd 二 check;#ifndef NOPRINTF printf (uBlock = (% d,% d), Search = (% d,% d), size = (% d,% d) \ n, \ BLOCKX, BLOCKY, SEARCHX, SEARCHY, NX, NY) ; #endi f mot ion__est imate_base (); mot ion_est imate_t ie (); passwd two check;

#ifndef NOPRINTF printf(passwd ? i4TIE version passed\n” : TIE version failed\n’’); #endif return passwd; 這簡單測試程式將被使用於整個發展程序中。此處一 組必須被遵行之重要規則爲當一組錯誤被檢測時主要程式 必須送回〇,而其他的情形則送回1。 該TIE之使用允許新的指令之快速格式。該可組態的處 理器產生器可以同時以硬體製作以及軟體發展工具完全地 製作這些指令。硬體合成產生一組新的功能之最佳整合進 入硬體資料通道。該可組態處理器軟體環境完全地支援其 新的指令於C以及C + +編輯器、組譯器、符號除錯器、造型 器以及週期精確指令集模擬器中。將硬體與軟體之快速再 產生使得特定應用指令成爲供應用加速之迅速以及可依賴 之工具。 84 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(82 ) 這範例使用TIE以製作一組簡單指令以平行地進行於四 組像素之像素差量、絕對値以及累積。這單一指令如原子 操作進行十一組基本的操作(其在習見的處理程序中可能需 要分離指令)。下列爲完整說明: // define a new opcode for Sum of Absolute Difference (SAD) // from which instruction decoding logic is derived opcode SAD op2=4'b0000 CUSTO // define a new instruction class // from which compiler, assembler, disassembler // routines are derived iclass sad {SAD} {out arr, in ars, in art) // semantic definition from which instruction-set // simulation and RTL descriptions are derived semantic sad一logic {SAD} { wire [8:0] diffOl, diffll, diff21, diff31; wire [7:0] diffOr, difflr, diff2r, diff3r; assign diffOl = art[7:0] - ars[7:0]; assign diffll = art[15:8] - ars[15:8]; assign diff21 = art[23:16] - ars[23:16]; assign diff31 = art[31:24] - ars[31:24]; assign diffOr = ars[7:0] - art[7:0]; assign difflr = ars[15:8] - art[15:8]; assign diff2r = ars[23:16] - art[23:16]; assign diff3r = ars[31:24] - art[31:24]; assign arr = (diff01[8] ? diffOr : diffOl) + (diffll[8] ? difflr : diffll) + (diff21[8] ? diff2r : diff21) + (diff31[8] ? diff3r : diff31); (請先閲讀背面之注意事項再填寫本頁) -、可| # 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 — _B7 五、發明説明(83 ) 這說明代表定義一組新的指令所需要之最少步驟。首 先,必須定義一組該指令之新的操作碼。在這情形中,新 的操作碼SAD被定義爲CUSTO之一組副操作碼。如上面所 述,CUSTO被預先定義爲: opcode QRST op0 = 4'b0000 opcode CUSTO opl=4'b0100 QRST 容易可瞭解QRST爲其最高位準操作碼。CUSTO是QRST 之副操作碼而SAD同理是CUSTO之副操作碼。操作碼之這階 層式機構允許操作碼空間之邏輯族群化以及管理。一件需 記住之重要事件是CUSTO(以及CUST1)被定義爲保留之操作 碼空間以供使用者增加新的指令。較佳情形是使用者保持-在這被配置之操作碼空間中以確保TIE說明之未來再使用。 這TIE說明中之第二步驟爲定義一組包含新的指令SAD 的新指令類別。這就是SAD指令之操作元被定義之處。在 這情形中,SAD包含三組暫存器操作元,目的地暫存器arr 以及來源暫存器ars與art。如先前說明,arr被定義爲由該指 令之r欄所標定之暫存器,ars與art被定義爲由指令之s和t欄 所標定之暫存器。 這說明中之最後區塊給予SAD指令之標準語意定義。 該說明使用供描述組合邏輯之Verilog HDL·語言之一子集。 就是這區塊精確地定義ISS將如何模擬SAD指令以及另外的 86 本紙張尺度適用中國國家標準(⑶幻A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁:) 訂- 539965 A7 ____ B7_ 五、發明説明(84 ) (請先閲讀背面之注意事項再填寫本頁) 電路如何被合成並且被添加至可組態處理器硬體以支援新 的指令。 接著,該TIE說明使用先前說明之工具被除錯並且被確 認。在確認TIE說明之正確性後,接下來之步驟爲評估新的 指令對於硬體尺度以及性能之衝擊。如上面所述,這可以 使用,例如:Design Comp ilerTM 完成。當 Design CompilerTM完成時,使用者可以觀看輸出以供詳細之面積 以及速度報告。 在確認TIE說明爲正確且有效益之後,即是組態並且建 立一組也支援新的SAD指令之可組態處理器的時間。這步 驟如上述使用GUI完成。 接著,該移動評估碼被編輯成爲可組態處理器之碼, 其使用指令集模擬器以確認程式之正確性並且更加重要的 是量測其性能。這目標完成於三組步驟:使用模擬器以執行 測試程式;僅執行基礎製作以獲得指令計算;並且僅執行 新的製作以獲得指令計算。 下列爲第二步驟之模擬輸出:#ifndef NOPRINTF printf (passwd? i4TIE version passed \ n ”: TIE version failed \ n ''); #endif return passwd; This simple test program will be used throughout the development process. It is important to follow a set here The rule is that the main program must return 0 when a set of errors is detected, and 1 in other cases. The use of the TIE allows a fast format of new instructions. The configurable processor generator can simultaneously Software development tools and software development tools fully produce these instructions. Hardware synthesis produces the best integration of a new set of functions into the hardware data channel. The configurable processor software environment fully supports its new instructions in C and C + + In editors, translators, symbol debuggers, modelers, and cycle-accurate instruction set simulators. The rapid reproduction of hardware and software makes application-specific instructions a rapid and dependable tool for accelerating supply. 84 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (82) This example uses TIE to make a Simple instructions are performed in parallel on the pixel difference, absolute unit, and accumulation of four groups of pixels. This single instruction, such as atomic operations, performs eleven basic operations (which may require separate instructions in the conventional processing program). The following is complete Explanation: // define a new opcode for Sum of Absolute Difference (SAD) // from which instruction decoding logic is derived opcode SAD op2 = 4'b0000 CUSTO // define a new instruction class // from which compiler, assembler, disassembler / / routines are derived iclass sad {SAD} {out arr, in ars, in art) // semantic definition from which instruction-set // simulation and RTL descriptions are derived semantic sad-logic {SAD} {wire [8: 0] diffOl, diffll, diff21, diff31; wire [7: 0] diffOr, difflr, diff2r, diff3r; assign diffOl = art [7: 0]-ars [7: 0]; assign diffll = art [15: 8]-ars [15: 8]; assign diff21 = art [23:16]-ars [23:16]; assign diff31 = art [31:24]-ars [31:24]; assign diffOr = ars [7: 0]- art [7: 0]; assign difflr = ars [15: 8]-art [15: 8]; assign diff2r = ars [23:16 ]-art [23:16]; assign diff3r = ars [31:24]-art [31:24]; assign arr = (diff01 [8]? diffOr: diffOl) + (diffll [8]? difflr: diffll) + (diff21 [8]? diff2r: diff21) + (diff31 [8]? diff3r: diff31); (Please read the notes on the back before filling out this page)-、 可 | # This paper size applies to Chinese national standards (CNS ) A4 specification (210X297 mm) 539965 A7 — _B7 5. Invention Description (83) This description represents the minimum steps required to define a new set of instructions. First, a new set of opcodes must be defined for the instruction. In this case, the new opcode SAD is defined as a set of sub-opcodes of CUSTO. As mentioned above, CUSTO is predefined as: opcode QRST op0 = 4'b0000 opcode CUSTO opl = 4'b0100 QRST It is easy to understand that QRST is its highest level opcode. CUSTO is the secondary opcode of QRST and SAD is the secondary opcode of CUSTO. This hierarchy of opcodes allows logical clustering and management of opcode spaces. One important event to keep in mind is that CUSTO (and CUST1) is defined as a reserved opcode space for users to add new instructions. The preferred case is user retention-in this configured opcode space to ensure future reuse of the TIE description. The second step in this TIE description is to define a new set of instruction classes containing the new instruction SAD. This is where the operands of the SAD instruction are defined. In this case, SAD contains three sets of register operands, destination register arr, and source register ars and art. As explained earlier, arr is defined as the register marked in column r of the instruction, and ars and art are defined as the registers marked in column s and t of the instruction. The last block in this description gives the standard semantic definition of the SAD instruction. This description uses a subset of the Verilog HDL · language for describing combinational logic. It is this block that precisely defines how the ISS will simulate the SAD instruction and the other 86 paper standards are applicable to the Chinese national standard (3D A4 specification (210X297 mm) (Please read the precautions on the back before filling out this page :) Order- 539965 A7 ____ B7_ 5. Description of the Invention (84) (Please read the notes on the back before filling out this page) How the circuit is synthesized and added to the configurable processor hardware to support new instructions. Next, the TIE description Using the previously described tools was debugged and confirmed. After confirming the correctness of the TIE description, the next step is to evaluate the impact of the new instruction on the hardware scale and performance. As mentioned above, this can be used, for example: Design Comp ilerTM is completed. When Design CompilerTM is completed, users can view the output for detailed area and speed reports. After confirming that the TIE description is correct and effective, it is time to configure and create a set that also supports the new SAD instructions The time of the configurable processor. This step is completed using the GUI as described above. Then, the mobile evaluation code is edited into a configurable process It uses an instruction set simulator to confirm the correctness of the program and more importantly to measure its performance. This goal is accomplished in three sets of steps: using the simulator to execute the test program; only the basic production is performed to obtain the instruction calculation; And only the new production is executed to obtain the instruction calculation. The following is the simulation output of the second step:

Block二(16,16), Search=(4,4), size=(32,32) TIE version passedBlock two (16,16), Search = (4,4), size = (32,32) TIE version passed

Simulation Completed SuccessfullySimulation Completed Successfully

Time for Simulation = 0.98 secondsTime for Simulation = 0.98 seconds

Events Number Number per 100 instrsEvents Number Number per 100 instrs

Instruct ions 226005 ( 100.00 ) 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(85Instruct ions 226005 (100.00) This paper size is applicable to Chinese National Standard (CNS) A4 (210X297 mm) 539965 A7 B7 V. Description of the invention (85

Unconditional taken branches Conditional branches Taken Not taken Window Overflows Window Underflows 454 ( 0.20 ) 37149 ( 16.44 ) 26947 ( 11.92 ) 10202 ( 4.51 ) 20 ( 0.01 ) 19 ( 0.01 ) 下列是最後步驟之模擬輸出:Unconditional taken branches Conditional branches Taken Not taken Window Overflows Window Underflows 454 (0.20) 37149 (16.44) 26947 (11.92) 10202 (4.51) 20 (0.01) 19 (0.01) The following is the simulated output of the final step:

Block=(16,16), Search=(4,4), size=(32,32) TIE version passed Simulation Completed Successfully Time for Simulation = 0.36 secondsBlock = (16,16), Search = (4,4), size = (32,32) TIE version passed Simulation Completed Successfully Time for Simulation = 0.36 seconds

EventsEvents

Instructions Unconditional taken branches Conditional branches Taken Not taken Window Overflows Window UnderflowsInstructions Unconditional taken branches Conditional branches Taken Not taken Window Overflows Window Underflows

Number Number per 100 instrs 51743 ( 100.00 ) 706 ( 1.36 ) 3541 ( 6.84 ) 2759 ( 5.33 ) 782 ( 1.51 ) 20 ( 0.04 ) 19 ( 0.04 ) 從這兩組報告中,我們可以看到大約是發生四倍速之 增加速度。注意,該可組態處理器指令集模擬器可以提供 更多其他有用的資訊。 在確認程式之正確性以及性能後,接著步驟爲使用如 上述說明之VerUog模擬器以執行該測試程式。熟習本技術 88 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)Number Number per 100 instrs 51743 (100.00) 706 (1.36) 3541 (6.84) 2759 (5.33) 782 (1.51) 20 (0.04) 19 (0.04) From these two sets of reports, we can see that it is about four times faster increase speed. Note that this configurable processor instruction set simulator can provide more useful information. After confirming the correctness and performance of the program, the next step is to use the VerUog simulator as described above to execute the test program. Familiarize yourself with this technology 88 This paper size applies to Chinese National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page)

、可I -參- 539965 A7 _B7_ 五、發明説明(86 ) (請先閲讀背面之注意事項再填寫本頁) 之人員可以自附錄C之製作檔案(相關的檔案同時也展示於 附錄C)發現這過程之細節。這模擬之目的爲進一步地確認、 新的製作之正確性並且更加重要的是使得這測試程式爲這 被組態處理器回歸測試之部份。 最後,該處理器邏輯可以使用,例如,Design CompilerTM 被合成並且使用,例如,Apoll〇TM被放置以及被引導。 爲了讓說明更加淸楚明白以及簡略,這範例採取一組’ 視訊壓縮以及移動評估之簡化圖。事實上,在標準壓縮演 算法中有許多另外的細微差異。例如,MPEG2—般以副-像 素解析度執行移動評估以及補償。像素之兩組相鄰列或行 可以被平均化以產生一組內插至兩組列或行之間一半的假 想位置之像素。該可組態處理器之使用者定義指令在此處 再次地是有用的,因爲平行像素平均指令是容易地被製作 於剛好三行或四行TIE碼。在一列當中像素之間的平均化再 次地使用處理器之標準指令集的有效益對齊操作。 因此,簡單絕對差量總和指令之合倂僅添加幾百組之 邏輯閘,但十倍地改進移動評估性能。這加速代表於最後 系統之成本以及功率效率之主要改進。進一步地說,包含· 新的移動評估指令的軟體發展工具之無縫的延伸允許迅速 原型化,性能分析以及完全軟體應用解決方法之釋出。本 發明之解決方法使得特定應用處理器組態簡單、可靠以及 完全,並且提供最後系統產品之成本、性能、功能性以及 功率效率的顯著增強。 如聚焦於功能性硬體單元之添加的一組範例,考慮第9 89 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ 五、發明説明(87 ) 圖展示之基礎組態,其包含處理器控制功能、程式計數器 (PC)、分支選擇、指令記憶體或快取以及指令解碼器、以及 包含主要暫存器檔案的基本整數資料通道、旁道多工器、管 線暫存器、ALU、位址產生器以及快取資料記憶體。 HD L以當"乘法器"參數被設定時狀況性乘法器邏輯之存 在下被寫入,以及一組乘法器單元被添加爲新的管線步 驟,如第7圖展示(如果確切的例外將被支援,則例外處理· 之改變可能是所需的)。當然,使用乘法器之指令最好是與 新的單元附隨地被添加。 如第二範例所示,一完整共同處理器可以被添加至基 礎組態,如第8圖展示,以供用於數位信號處理器,例如: 相乘/累積單元。這需要處理器控制中之改變,例如:添加解 碼控制信號以供相乘-累積操作,自延伸之指令包含暫存器 來源以及目的地之解碼;添加控制信號的適當管線延遲; 延伸暫存器目的地邏輯;添加自累積暫存器移動之暫存器 旁管多工器之控制,以及包含相乘-累積單元爲一組指令結 果之可能來源。另外,需要添加另外的累積暫存器之相乘_ 累積單元,一組相乘-累積陣列以及來源選擇多工器以供用/ 於主要暫存器來源。同時,共同處理器之添加需要暫存器 旁管多工器之延伸自累積暫存器以自累積暫存器得到來 源,以及負載/對齊多工器之延伸以自乘法器結果得到來 源。再次地,該系統最好是添加指令以供使用新的功能性 單元以及實際硬體。 其他特別有用於連接數位信號處理器的選擇是浮動點 90 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 、τ 539965 A7 ____B7_ 五、發明説明(88 ) (請先閲讀背面之注意事項再填窝本頁) 單元。此一功能性單元製作,例如:IEEE 754單一精確性浮 動點操作標準可以伴隨指令被添加以供存取。浮動點單元 可以被使用’例如,於數位信號處理應用中,例如音訊壓 縮以及解壓縮。 如系統之變化的其他範例,考慮該第9圖展示之4kB記 憶體界面。使用本發明之組態性,共同處理器暫存器以及 資料通道可以較主要整數暫存器檔案以及資料通道更寬或. 更窄’而本體記憶體寬度可以被變化,以便記憶體寬度便 等於最寬處理器或共同處理器寬度(讀取以及寫入之記憶體 定位址因此被調整)。例如,第1 0圖展示一組處理器之本體 記憶體系統,該處理器支援32位元之負載與儲存至定址相 同陣列之處理器/共同處理器組合,但是其中共同處理器支 援128位元之負載與儲存。這可以使用TPP碼被製作 function memory(Select,A1,A2,DI1,DI2,W1,W2,D01,D02) ; $B1 = config_get_value(uwidth_of_port_r,); $B2 = con f i g_g e t _v a1ue(“wod t h_o f_po r t _2”); ;$Bytes = conf ig_get_va 1 ue("s ize_of_memory,?); ;$Max = max($Bl,$B2); $Min = min($Bl,$B2); ;$Banks = $Max/$Min; ;$Widel = ($Max == $B1); $Wide2 = ($Max == $B2); ;$Depth = $Bytes/(log2($Banks)*log2($Max)); wire ['$Max'*8-l:0] Datal = 'SWidel^DIlirSBanks'iDIl}}; wire ['$Max'*8-l:0] Data2 = '$Wider?DI2:{'$Banks'{DI2}}); wire ['$Max'*8-l:0] D = Select ? Datal : Data2; wire Wide = Select ? Widel : Wide2; 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 五、發明説明(89 ) 539965 A7 B7 wire [log2('$Bytes')-l:0] A = Select? A1 : A2; wire [log2('$Bytes,)-l:0] Address = A[l〇g2 ('SBanks')- 1:log2('SBytes')]: wire [log2(,$Banks')-l:0] Lane = A[log2('$Banks')-1:〇]; ;for ($i=0; $i<$Banks; $i++) { wire WrEnable{i} = Wide | (Lane = {i}); wire [log2(,$Min,)-1:0] WrData'$i' = DtCiU+D^SMin^S- ram(RdData'$i',Depth,address,WrData'$i',WrEnable'$i'); ;} wire ['$Max'*8-l:0] RdData = { ;for ($i=0; $i<$Banks; $i++), 可 I-参-539965 A7 _B7_ V. Description of the invention (86) (please read the precautions on the back before filling out this page) The personnel can make files from Appendix C (the relevant files are also displayed in Appendix C) Details of this process. The purpose of this simulation is to further confirm the correctness of the new production and more importantly to make the test program part of the regression test of the configured processor. Finally, the processor logic can be used, for example, Design CompilerTM is synthesized and used, for example, ApollOTM is placed and guided. To make the explanation more clear and concise, this example uses a set of simplified diagrams of video compression and mobile evaluation. In fact, there are many additional nuances in standard compression algorithms. For example, MPEG2 performs motion estimation and compensation with sub-pixel resolution. Two adjacent columns or rows of pixels can be averaged to produce a set of pixels interpolated to half the imaginary position between the two columns or rows. The user-defined instructions of the configurable processor are again useful here because the parallel pixel average instructions can easily be made in exactly three or four lines of TIE codes. The averaging between pixels in a column again uses the efficient alignment operations of the processor's standard instruction set. Therefore, the combination of the simple absolute difference sum instruction only adds a few hundred sets of logic gates, but improves the performance of the mobile assessment by a factor of ten. This acceleration represents a major improvement in cost and power efficiency of the final system. Furthermore, the seamless extension of software development tools that include new mobile evaluation instructions allows rapid prototyping, performance analysis, and the release of complete software application solutions. The solution of the present invention makes the application-specific processor configuration simple, reliable, and complete, and provides significant enhancements in cost, performance, functionality, and power efficiency of the final system product. For a set of examples focusing on the addition of functional hardware units, consider the 9th 89th paper size applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7_ V. Description of the invention (87) The basis of the illustration Configuration, which includes processor control functions, program counter (PC), branch selection, instruction memory or cache, and instruction decoder, as well as basic integer data channels, side-by-side multiplexers, and pipelines containing main register files Registers, ALUs, address generators, and cache data memory. HD L is written in the presence of conditional multiplier logic when the "multiplier" parameter is set, and a set of multiplier units is added as a new pipeline step, as shown in Figure 7 (if the exact exception is Will be supported, exceptions and changes may be needed). Of course, the instructions for using the multiplier are preferably added with the new unit. As shown in the second example, a complete coprocessor can be added to the basic configuration, as shown in Figure 8, for use in digital signal processors, such as: Multiply / accumulate unit. This requires changes in processor control, such as: adding decode control signals for multiply-accumulate operations, self-extending instructions that include the source and destination of the register decoding; adding the appropriate pipeline delays for the control signals; extending the register Destination logic; Adds control of register multiplexer control moved from the accumulator register, and includes a multiply-accumulate unit as a possible source of instruction results. In addition, it is necessary to add another accumulation register of multiply_accumulation unit, a set of multiply-accumulate arrays and a source selection multiplexer for use / for the main register source. At the same time, the addition of a coprocessor requires a register. The extension of the bypass multiplexer is derived from the accumulation register to obtain the source from the accumulation register, and the extension of the load / alignment multiplexer is obtained from the source of the multiplier result. Again, the system is best to add instructions for using the new functional units as well as the actual hardware. Other options that are particularly useful for connecting digital signal processors are floating point 90. This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the precautions on the back before filling this page), τ 539965 A7 ____B7_ V. Description of the Invention (88) (Please read the notes on the back before filling in this page). This functional unit is produced, for example: IEEE 754 single precision floating point operation standard can be added with instructions for access. Floating point units can be used, for example, in digital signal processing applications such as audio compression and decompression. For other examples of system changes, consider the 4kB memory interface shown in Figure 9. Using the configurability of the present invention, the coprocessor register and data channel can be wider or narrower than the main integer register file and data channel, and the body memory width can be changed so that the memory width is equal to Widest processor or coprocessor width (memory locations for reading and writing are adjusted accordingly). For example, Figure 10 shows the physical memory system of a group of processors. The processor supports a 32-bit load and a processor / co-processor combination stored in the same array, but the co-processor supports 128-bit. Load and storage. This can be made using TPP code function memory (Select, A1, A2, DI1, DI2, W1, W2, D01, D02); $ B1 = config_get_value (uwidth_of_port_r,); $ B2 = con fi g_g et _v a1ue ("wod t h_o f_po rt _2 ”);; $ Bytes = conf ig_get_va 1 ue (" s ize_of_memory ,?);; $ Max = max ($ Bl, $ B2); $ Min = min ($ Bl, $ B2); ; $ Banks = $ Max / $ Min;; $ Widel = ($ Max == $ B1); $ Wide2 = ($ Max == $ B2);; $ Depth = $ Bytes / (log2 ($ Banks) * log2 ($ Max)); wire ['$ Max' * 8-l: 0] Datal = 'SWidel ^ DIlirSBanks'iDIl}}; wire [' $ Max '* 8-l: 0] Data2 =' $ Wider? DI2 : {'$ Banks' {DI2}}); wire ['$ Max' * 8-l: 0] D = Select? Datal: Data2; wire Wide = Select? Widel: Wide2; This paper size applies Chinese national standards ( CNS) A4 specification (210X297 mm) 5. Invention description (89) 539965 A7 B7 wire [log2 ('$ Bytes')-l: 0] A = Select? A1: A2; wire [log2 (' $ Bytes,) -l: 0] Address = A [l〇g2 ('SBanks')-1: log2 ('SBytes')]: wire [log2 (, $ Banks ')-l: 0] Lane = A [log2 (' $ Banks')-1: 〇];; for ($ i = 0; $ i < $ Banks; $ i ++) {wire WrEnable {i} = Wide | (Lane = {i}); wire [log2 (, $ Min ,)-1 : 0] WrData '$ i' = DtCiU + D ^ SMin ^ S- ram (RdData '$ i', Depth, address, WrData '$ i', WrEnable '$ i');;} wire ['$ Max' * 8-l: 0] RdData = {; for ($ i = 0; $ i < $ Banks; $ i ++)

Rdnata'$i', wire ['$ΒΓ*8-1:0] DQ1 Widel?RdData:RdData[(Lane+l)*Bl*8- 1:Lane*Bl*8]; wire [·$Β2·*8-1:0] D02 = W i de2?RdDa t a:RdDa t a[(Lane+1)*B2*8 > 1:Lane*B2*8]; 其中SBytes爲在寫入信號W1之控制之下以資料匯流排 D 1在位元組位址A 1之寬度B 1位元組’或使用對應的參數 B2、A2、D2以及W2之被存取全部記憶體尺寸。僅一組被 Select定義之信號爲作用於所給予的週期中。該TPP碼製作 記憶體爲記憶體群集之集合。各群集之寬度由最小存取寬 度所給予而群集之數目則是由最大與最小存取寬度之比 率所給予。一組迴路被使用於示範各記憶體群集以及其 相關的寫入信號,亦即,寫入引動以及寫入資料。一組 92 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公爱) (請先閲讀背面之注意事項再填寫本頁)Rdnata '$ i', wire ['$ ΒΓ * 8-1: 0] DQ1 Widel? RdData: RdData [(Lane + l) * Bl * 8- 1: Lane * Bl * 8]; wire [· $ Β2 · * 8-1: 0] D02 = Wi de 2? RdDa ta: RdDa ta [(Lane + 1) * B2 * 8 > 1: Lane * B2 * 8]; where SBytes is controlled by the write signal W1 Take the data bus D 1 at the byte address A 1 width B 1 byte 'or use the corresponding parameters B2, A2, D2, and W2 to access the entire memory size. Only one set of signals defined by Select acts on the given cycle. The TPP code making memory is a collection of memory clusters. The width of each cluster is given by the minimum access width and the number of clusters is given by the ratio of the maximum and minimum access widths. A set of circuits is used to demonstrate each memory cluster and its associated write signals, that is, write triggers and write data. A set of 92 paper sizes are applicable to China National Standard (CNS) A4 specifications (210X297 public love) (Please read the precautions on the back before filling this page)

539965 A7 _ B7 _— 一 五、發明説明(90 ) (請先閱讀背面之注意事項再填寫本頁) 第二迴路被使用於收集讀取自所有群集的資料進入單一匯 流排。 第1 1圖展示一組包含使用者定義指令於基礎組態中之 範例。如展示於圖中,簡單指令可以以相似於ALU之時序 以及界面被添加至處理器管線。以此方式被添加之指令必 須不產生阻塞或例外,不包含狀態,僅使用兩組正常來源 暫存器値以及指令字組爲輸入,並且產生一組單一輸出 値。但是,如果TIE語言供應指明處理器狀態,則此限制即 非必須的。 第12圖展示在這系統之下使用者-定義單元之製作的其 他範例。該功能性單元展示於圖中,一組ALU之8/16平行資 料單元延伸,被產生自下面的ISA碼:539965 A7 _ B7 _ — One, five, invention description (90) (Please read the notes on the back before filling this page) The second loop is used to collect data read from all clusters into a single bus. Figure 11 shows an example of a set of user-defined instructions in the basic configuration. As shown in the figure, simple instructions can be added to the processor pipeline with timing and interface similar to ALU. Instructions added in this way must not generate blocking or exceptions, contain no state, use only two sets of normal source registers 値 and instruction blocks as inputs, and produce a set of single outputs 値. However, this restriction is not necessary if the TIE language supply indicates processor status. Figure 12 shows another example of the creation of user-defined units under this system. This functional unit is shown in the figure. A set of 8/16 parallel data units of ALU extends from the following ISA code:

Instruction {Instruction {

Opcode ADD8_4 CUSTOM op2=0000 Opcode MIN16.2 CUSTOM op2=0001Opcode ADD8_4 CUSTOM op2 = 0000 Opcode MIN16.2 CUSTOM op2 = 0001

Opcode SHIFT16_2 CUSTOM op2=0002 iclass MY 4ADD8,2MIN16,SHIFT16_2 act,a<s,a>t }Opcode SHIFT16_2 CUSTOM op2 = 0002 iclass MY 4ADD8,2MIN16, SHIFT16_2 act, a < s, a > t}

Implementation { input [31:0] art, ars; input [23:0] inst; input ADD8_4, MINT6.2, SHIFT16_2; output [31:0] arr; wire [31:0] add, min, shift; assign add = {art[31:24] + ars[31:24], art[23:16] + art[23:16], art[15:8] + art[15:8], art[7:0] + art[7:0]}; assign min[31:16] = art[31:16] < ars[31:16] ? Art[31:16]: 93 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 11 1 11111 " " * " — — — — — -―-- ^ 五、發明説明(91 ) ars[31:16]; assign min[15:0] = art[15:0] < ars[15:0] ? Art[15:0] : ars[15:0]; assign shift[31:16] = art[31:16] « ars[31:16]; assign shift[15:0] = art[15:0] « ars[15:0]; assign arr = {32{ADD8_4}} & add | {32{MIN16_2}} & min | {32{SHIFT16_2}} & shift; } 本發明之其他論點之特定利益爲設計者-定義指令執行 單元96,因爲其在該處包含那些修改處理器狀態之TIE-定 義指令被解碼並且被執行。在本發明之這論點中’ 一些構 成區塊已經被添加至語言以使得宣告可以由新的指令被讀’ 取並且被寫入之另外的處理器狀態爲可能。這些’’狀態”陳 述被使用於宣告添加處理器狀態。其宣告開始於關鍵字組 狀態。狀態陳述之接著部份說明狀態位元之尺度數目以及 其狀態之位元如何被指標。下面之部份爲狀態之名稱’被, 使用以確認其狀態於其他的說明部份中。"狀態”陳述之最 後部份爲一組與狀態相關的屬性列表。例如’ state [63:0] DAT Acpn = Oautopack state[27:0]KEYCcpn=lnopack state[27:0]KEYDcpn=l 定義三組新的處理器狀態,DATA、KEYC、以及 KEYD。狀態DATA爲64-位元寬而其位元自63至0被指標。-KEYC以及KEYD都是28_位元狀態。DATA具有一組共同處 理器-數目屬性cpn指示資料DATA屬於何組共同處理器。 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) -------------------0…: (請先閲讀背面之注意事项再填窝本頁) ·、τ· 539965 A7 B7 五、發明説明(92 ) (請先閲讀背面之注意事項再填寫本頁) 該屬性”autopack"指示狀態DATA將自動地被映射至使 用者-暫存器檔案中之某些暫存器以便其DATA之値可以藉 軟體工具被讀取並且被寫入。 該user_register部份被定義以指示狀態之映射至使用者。 檔案中之暫存器。一組user_regiSter部份開始於關鍵字組 user_register,接著一組指示暫存器數目之數目,而結束於 一組指示狀態位元將被映射於暫存器上之表示。例如, user register 0 DATA[31:0] user — register 1 DATA[63:32] user_register 2 KEYC user_register 3 KEYD user_register 4 {X,Y,Z} 指定DATA之低序字組被映射至第一使用者暫存器檔案 而高序字組至第二。接著兩組使用者暫存器檔案項目被使 用於保持KEYC以及KEYD之數値。明顯地,被使用於這部。 份之狀態資訊必須於與state部份之狀態資訊一致。此處, 其一致性可以由電腦程式自動地被檢查。 在本發明之其他的實施例中,如此之狀態位元至使用 者暫存器檔案項目之指定是使用筒裝演算法(bm packing algorithms)自動地被導出。而在其他的實施例中,手動以及 自動指定之組合可以被使用以,例如,確保向上相容性。 指令欄陳述field被使用以改進TIE碼之讀取能力。欄爲 95 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_ _ 五、發明説明(93 ) (請先閲讀背面之注意事項再填寫本頁) 被群聚一起並且以名稱參考之其他欄之子集或連鎖。在 一組指令中位元之完全組爲其最高階超組欄inst ’並且這 欄可以被分割成爲較小的欄。例如, fieldx inst[ 11:8] fieldy inst [ 15:12] fieldxy {X,Y} 定義兩組4-位元欄,x以及y,爲最高位準欄inst之副欄 (分別地爲位元8 -1 1以及1 2 -1 5)而8 -位元欄xy爲X以及y欄之 連鎖。 該陳述opcode定義供編碼特定欄之操作碼。欲指定操 作元,例如,暫存器或立即常數,而供因此定義之操作碼 使用之指令欄必須首先以欄陳述被定義並且接著以操作元 陳述被定義。 例如,Implementation {input [31: 0] art, ars; input [23: 0] inst; input ADD8_4, MINT6.2, SHIFT16_2; output [31: 0] arr; wire [31: 0] add, min, shift; assign add = {art [31:24] + ars [31:24], art [23:16] + art [23:16], art [15: 8] + art [15: 8], art [7: 0 ] + art [7: 0]}; assign min [31:16] = art [31:16] < ars [31:16]? Art [31:16]: 93 This paper size applies the Chinese National Standard (CNS ) A4 specification (210X297mm) 539965 A7 B7 11 1 11111 " " * " — — — — — ————— ^ V. Description of the invention (91) ars [31:16]; assign min [15: 0] = art [15: 0] < ars [15: 0]? Art [15: 0]: ars [15: 0]; assign shift [31:16] = art [31:16] «ars [31 : 16]; assign shift [15: 0] = art [15: 0] «ars [15: 0]; assign arr = {32 {ADD8_4}} & add | {32 {MIN16_2}} & min | { 32 {SHIFT16_2}} &shift;} A particular benefit of the other arguments of the present invention is the designer-definition instruction execution unit 96, because it contains those TIE-definition instructions that modify the state of the processor, which are decoded and executed. In this argument of the present invention 'some of the constituent blocks have been added to the language to make it possible to declare another processor state which can be read by a new instruction' and written. These "state" statements are used to declare add processor states. The declaration begins with the keyword group state. The next part of the state statement describes the number of dimensions of the state bits and how their state bits are indexed. The following section The name "state" is used to confirm its state in the other explanatory sections. The final part of the "state" statement is a list of attributes related to the state. For example, ’state [63: 0] DAT Acpn = Oautopack state [27: 0] KEYCcpn = lnopack state [27: 0] KEYDcpn = l defines three new sets of processor states, DATA, KEYC, and KEYD. The state DATA is 64-bit wide and its bits are indexed from 63 to 0. -KEYC and KEYD are both 28-bit states. DATA has a set of common processors-the number attribute cpn indicates to which set of common processors the data DATA belongs. This paper size applies to China National Standard (CNS) Α4 specification (210X297 mm) ------------------- 0 ...: (Please read the precautions on the back before filling the nest (This page) ·· τ · 539965 A7 B7 V. Invention description (92) (Please read the precautions on the back before filling this page) The attribute "autopack" indicates that the status DATA will be automatically mapped to the user-register Some registers in the file so that its data can be read and written by software tools. The user_register part is defined to indicate the mapping of the status to the user. Registers in the file. A set of user_regiSter Part starts with the keyword group user_register, followed by a group indicating the number of registers, and ending with a group of instructions indicating that the status bits will be mapped on the register. For example, user register 0 DATA [31: 0 ] user — register 1 DATA [63:32] user_register 2 KEYC user_register 3 KEYD user_register 4 {X, Y, Z} The low-order word of the specified DATA is mapped to the first user register file and the high-order word is to Second, then two sets of user register file entries are used Keep the number of KEYC and KEYD. Obviously, it is used in this part. The status information of the part must be consistent with the status information of the state part. Here, its consistency can be automatically checked by a computer program. In the present invention In other embodiments, the assignment of such status bits to the user register file items is automatically derived using bm packing algorithms. In other embodiments, manual and automatic designation The combination can be used, for example, to ensure upward compatibility. The command field states that the field is used to improve the reading ability of the TIE code. The field is 95. This paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm). 539965 A7 _B7_ _ V. Description of the Invention (93) (Please read the notes on the back before filling out this page) A subset or chain of other columns grouped together and referred to by name. Complete set of bits in a set of instructions For its highest order supergroup column inst 'and this column can be split into smaller columns. For example, fieldx inst [11: 8] fieldy inst [15:12] fieldxy {X, Y} defines two groups of 4-bits Field, x and y, the highest level of the sub-fields of field inst (respectively for bits 8-11 and 12-15) and the 8 - bit field xy is the chain of X and y columns. The statement opcode defines an opcode for encoding a particular column. To specify an operand, for example, a register or an immediate constant, an instruction column for use by an operation code defined thereby must first be defined with a column statement and then with an operand statement. E.g,

opcode acs op2=4'b0000 CUSTOopcode acs op2 = 4'b0000 CUSTO

opcode adsel op2 = 4'b0001 CUSTO 依據先前-定義操作碼CUST0(4’b0000指示四位元長二 進位常數〇〇〇〇)定義兩組新的操作碼,acs以及adsel。較佳 核心I S A之TIE格式具有下面陳述 fieldopO inst[3 : 0] fieldop 1 inst[ 1 9 :1 6] fieldop2 inst[23:20] opcode QRST op0 = 4'b0000opcode adsel op2 = 4'b0001 CUSTO defines two new sets of opcodes, acs, and adsel, based on the previously-defined opcode CUST0 (4'b0000 indicates a four-bit binary constant 〇〇〇〇〇). The preferred TIE format of the core I S A has the following statement fieldopO inst [3: 0] fieldop 1 inst [1 9: 1 6] fieldop2 inst [23:20] opcode QRST op0 = 4'b0000

opcode CUSTO opl=4fb0100 QRST 96 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(94 ) 爲其基礎定義之部份。因此,該acs以及adsel之定義導 致TIE編輯器產生分別地由下列表示之指令解碼邏輯: inst[23:0]=0000 0110 xxxx xxxx xxxx 0000 inst[23:0]=0001 0110 xxxx xxxx xxxx 0000 - 指令操作元陳述operand識別暫存器以及立即常數。但 是,在定義一欄爲一組操作元之前,其先前必須已經被定 義爲一組如上述之欄。如果該操作元是一組立即常數,則 其常數値可以自操作元被產生,或其可以被採取自如下面 說明被定義之先前被定義常數表。例如,爲了編碼一組立 即操作元,該TIE碼 field offset inst[23:6] operand offests4 offset { assign offsets4 = {{14{offset[17]}}, offset}«2;opcode CUSTO opl = 4fb0100 QRST 96 This paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 5. The invention description (94) is part of its basic definition. Therefore, the definition of acs and adsel causes the TIE editor to generate instruction decoding logic respectively represented by: inst [23: 0] = 0000 0110 xxxx xxxx xxxx 0000 inst [23: 0] = 0001 0110 xxxx xxxx xxxx 0000- The instruction operand states operand identifying the register and immediate constant. However, before a column is defined as a set of operands, it must have been previously defined as a set of columns as described above. If the operand is a set of immediate constants, then its constant 値 can be generated from the operand, or it can be taken from a previously defined constant table as defined below. For example, in order to encode a set of immediate operands, the TIE code field offset inst [23: 6] operand offests4 offset {assign offsets4 = {{14 {offset [17]}}, offset} «2;

H wire [31:0] t; assign t = offset s4»2; assign offset = t[17:0]; , 定義一組18-位元欄名稱偏移,其具有儲存於偏移欄之 四倍數目的一符號數目和一組操作元offsets4。該operand陳 述之最後部份實際上說明電路,其被使用以進行於 Veril〇gTM HDL之子集供描述組合電路之計算,其對於熟習 本技術之人員是明顯的。 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 、可| 539965 A7 B7 五、發明説明(95 ) (請先閲讀背面之注意事項再填寫本頁) 此處,該wire陳述定義名稱爲t之三十二位元寬之一組 邏輯接線。該第一 assign陳述在接線陳述之後指定驅動邏輯 接線之邏輯信號爲移位至右方之〇ffsets4常數,而第二 a s s i g η陳述指定其t之較低位十八位元被置放進入〇 f f s e t欄。 其第一 a s s i g n陳述直接地指定o f f s e t s 4操作元之値爲一組 offset之連鎖以及接著兩組位元之左方移位之其符號位元 (位元17)的十四組複製。 對於一組常數表操作元而言,其TIE碼 table prime 16 2, 3, 5, 7, 9, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53 } operand prime_s s { assign prime_s : =prime[s]; assign s =prime_s prime [0] ? 4-b000〇: prime_s == prime [1] ? 4丨bOOOl : prime_s == prime [2] ? 4'bOOlO : prime_s == prime [3] ? 4'bOOll prime_s == prime [4] ? 4'bOlOO prime_s == prime [5] ? 4'bOlOl prime_s = prime [6] ? 4'bOllO prime_s = prime [7] ? 4'bOlll prime_s == prime [8] ? 4'blOOO prime_s prime [9] 9 4'blOOl prime_s — prime [10] ? 4'blOlO prime_s =: prime [11] ? 4'blOll prime_s =二 prime [12] 9 4'bllOO prime_s == prime [13] ? 4'bllOl prime_s -= prime [14] 9 4'blllO 4'bllll : 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7 五、發明説明(96 ) (請先閲讀背面之注意事項再填寫本頁) 使用其table陳述以定義一組常數之陣列prime(接著表 名稱之數目爲表中元件之數目)並且使用操作元s爲一組指 標,其進入表prime以編碼操作元prime_S2 —組値(注意於 定義其指標時VenlogTM陳述之使用)。 指令類別陳述iclass以共同格式聯合操作碼與操作元。 所有被定義於iclass陳述的指令具有相同格式以及操作元使. 用。在定義一組指令類別之前,其構件必須被定義,首先 爲欄並且接著爲操作碼以及操作元。例如,建立於碼之上 被使用於先前範例定義操作碼acs以及adsel,其另外的陳述 operand art t { assignart: :AR[t];} {} operand ar s s {assignars = = AR{s};}{} operand arr r {assignAR[r]=arr;} { } 使用其operand陳述以定義三組暫存器操作元art,ars以 及arr(再次地注意Veril〇gTM陳述之使用於定義)。接著,該 i c 1 a s s陳述 iclass viterbi {adsel,acs} {outarr,inart,inars} 指定操作元adsel以及acs屬於一組採取兩組暫存器操作 元art以及ars爲輸入並且寫入輸出至暫存器操作元arr的指令 viterbi之共同類別。 99 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7___ 五、發明説明(97 ) 在本發明中,指令類別陳述” iclass"被修改以允許有指 令之狀態-存取資訊的規格。其開始於一組關鍵字組 "iclass”,接著指令類別之名稱,屬於指令類別之操作碼列 表以及操作元存取資訊之列表,而結束於狀態存取資訊之。 新定義列表。例如, iclass lddata { LD D AT A } { out ar r,inimm4 } {inD AT A } iclass stdata {STDATA} {inars,inart} {outDATA} iclass stkey{STKEY}{inars,inart} {outKEYC,outKEYD} iclass desJDES} {outarr,inimm4} {inoutKEYC,inoutDATA, inoutKEYD} 定義許多指令類別以及許多新的指令如何存取狀態。 其關鍵字組’’in”、”out”、以及"inout”被使用以指示該狀態 由iclass當中指令被讀取、被寫入、或被修改(被讀取並且 被寫入)。在這範例中,狀態"DATA”被指令”LDDΑΤΑ”讀 取,狀態nKEYC”以及”KEYD”被指令”STKEY”寫入’而 "KEYC,,,,,KEYD,,,以及,丨資料,,被指令”DES,,修改。 該指令語意陳述semantic使用被用以編碼操作元的 Veril〇gTM之相同子集說明一組或多一組指令之行爲。藉由^ 定義多重指令於一組單一語意陳述中,某些共同表示可以 被分享並且其硬體製作可以被製作爲更有效益。語意陳述 中允許之變數爲被定義於陳述之操作碼列表中操作碼之操 作元,以及各指定於操作碼列表中之操作碼之一組單一位 元變數。這變數具有與操作碼之相同名稱並且當其操作碼 被檢測時,估算至1。其被使用於計算部份(VerilogTM子集 100 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁) 、tr— 539965 A7 B7 五、發明説明(98 ) 部份)以指示對應指令之存在。 (請先閲讀背面之注意事項再填寫本頁)H wire [31: 0] t; assign t = offset s4 »2; assign offset = t [17: 0];, define a set of 18-bit column name offsets with four times the number stored in the offset column The purpose is a number of symbols and a set of operands offsets4. The last part of this operand statement actually illustrates the circuit, which is used to perform the calculation of the combined circuit on a subset of VerilgTM HDL, which will be apparent to those skilled in the art. This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the notes on the back before filling this page), can | 539965 A7 B7 V. Description of the invention (95) (Please read the notes on the back first Please fill in this page again for details.) Here, this wire statement defines a group of logical wirings whose name is t thirty-two bits wide. The first assignment statement specifies that the logic signals that drive the logic wiring are shifted to the right of the 0ffsets4 constant after the wiring statement, and the second assig statement specifies that the lower eighteen bits of t are placed into the 0ffset. column. Its first a s s i g n statement directly specifies that the f s e t s 4 operands are a linkage of a set of offsets and a copy of its fourteenth set of sign bits (bit 17) followed by a left shift of the two sets of bits. For a set of constant table operands, its TIE code table prime 16 2, 3, 5, 7, 9, 11, 13, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53} operand prime_s s {assign prime_s: = prime [s]; assign s = prime_s prime [0]? 4-b000〇: prime_s == prime [1]? 4 丨 bOOOl: prime_s == prime [2]? 4'bOOlO : prime_s == prime [3]? 4'bOOll prime_s == prime [4]? 4'bOlOO prime_s == prime [5]? 4'bOlOl prime_s = prime [6]? 4'bOllO prime_s = prime [7] ? 4'bOlll prime_s == prime [8]? 4'blOOO prime_s prime [9] 9 4'blOOl prime_s — prime [10]? 4'blOlO prime_s =: prime [11]? 4'blOll prime_s = two prime [ 12] 9 4'bllOO prime_s == prime [13]? 4'bllOl prime_s-= prime [14] 9 4'blllO 4'bllll: This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7 V. Description of the Invention (96) (Please read the notes on the back before filling out this page) Use its table statement to define an array of primes (the number of table names is the number of components in the table) and S is an operand with a set of indicators, it enters the prime table to encode operand prime_S2 - set value (note that in the definition statement of its use VenlogTM index). The instruction class states that iclass combines opcodes and operands in a common format. All instructions defined in the iclass statement have the same format and use of operands. Before defining a set of instruction categories, its components must be defined, first as columns and then as opcodes and operands. For example, based on the code used in the previous example to define the ops acs and adsel, the other statement operand art t {assignart:: AR [t];} {} operand ar ss {assignars = = AR {s}; } {} operand arr r {assignAR [r] = arr;} {} uses its operand statement to define three sets of register operands, arts, ars, and arr (again note that the VerilOgTM statement is used for definition). Next, the ic 1 ass statement iclass viterbi {adsel, acs} {outarr, inart, inars} specifies that the operands adsel and acs belong to a group that takes two sets of register operands, art and ars, as input and writes output to temporary storage. A common category of instructions viterbi of the operator arr. 99 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7___ V. Description of the invention (97) In the present invention, the instruction type statement "iclass " is modified to allow the status of instruction-access Information specifications. It starts with a set of keyword groups "quoticlass", followed by the name of the instruction category, the list of opcodes that belong to the instruction category and the list of operand access information, and ends with the status access information. New definition list. For example, iclass lddata {LD D AT A} {out ar r, inimm4} {inD AT A} iclass stdata {STDATA} {inars, inart} {outDATA} iclass stkey {STKEY} {inars, inart} {outKEYC, outKEYD} iclass desJDES} {outarr, inimm4} {inoutKEYC, inoutDATA, inoutKEYD} defines many instruction classes and how many new instructions access the state. The keyword groups ‘’ in ”,“ out ”, and“ inout ”are used to indicate that the state is read, written, or modified (read and written) by instructions in the iclass. In this example, the status " DATA "is read by the instruction" LDDΑΤΑ ", the status nKEYC" and "KEYD" are written by the instruction "STKEY" and "KEYC ,,,,, KEYD ,, and, data ", Was instructed" DES ,, modified. The semantic statement of this instruction semantic uses the same subset of VerilOgTM used to encode the operands to describe the behavior of one or more sets of instructions. Multiple instructions are defined in a group by ^ In a single semantic statement, some common representations can be shared and their hardware production can be made more efficient. The variables allowed in a semantic statement are the operands of the opcodes defined in the opcode list of the statement, and each designation A set of single-bit variables of the opcode in the opcode list. This variable has the same name as the opcode and when its opcode is detected, it is estimated to 1. It is used in the calculation part (VerilogTM subset 100 This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page), tr- 539965 A7 B7 V. Description of the invention (98) Parts) to indicate the presence of the corresponding instruction. (Read precautions and then fill the back side of this page)

// define a new opcode for BYTESWAP based on // - a predefined instruction field op2 // - a predefined opcode CUSTO// define a new opcode for BYTESWAP based on //-a predefined instruction field op2 //-a predefined opcode CUSTO

// refer to Xtensa ISA manual for descript ions of op2 and CUSTO opcode BYTESWAP op2=4,b0000 CUSTO // declare state SWAP and COUNT state COUNT 32 state SWAP 1 // map COUNT and SWAP to user register file entries t user一register 0 COUNT user一register 1 SWAP // define a new instruction class that // - reads data from ars (predefined to be AR[s])// refer to Xtensa ISA manual for descript ions of op2 and CUSTO opcode BYTESWAP op2 = 4, b0000 CUSTO // declare state SWAP and COUNT state COUNT 32 state SWAP 1 // map COUNT and SWAP to user register file entries t user_register 0 COUNT user_register 1 SWAP // define a new instruction class that //-reads data from ars (predefined to be AR [s])

// - uses and writes state COUNT//-uses and writes state COUNT

// - uses state SWAP iclass bs {BYTESWAP} {out arr, in ars} {inout COUNT, in SWAP} // semantic definition of byteswap // COUNT the number of byte-swapped words//-uses state SWAP iclass bs {BYTESWAP} {out arr, in ars} {inout COUNT, in SWAP} // semantic definition of byteswap // COUNT the number of byte-swapped words

// Return the swapped or un-swapped data depending on SWAP semantic bs {BYTESWAP} { wire [31:0] ars_swapped {ars[7:0],ars[15:8],ars[23:16],ars[31:24]}; assign arr = SWAP ? ars_swapped : ars; assign COUNT = COUNT + SWAP; 上面指令碼之第一部份定義一組新的指令操作碼,稱 爲 BYTESWAP。 101 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(99 ) (請先閲讀背面之注意事項再填寫本頁) // define a new opcode for BYTESWAP based on // - a predefined instruction field op2 // - a predefined opcode CUSTO ^// Return the swapped or un-swapped data depending on SWAP semantic bs {BYTESWAP} {wire [31: 0] ars_swapped {ars [7: 0], ars [15: 8], ars [23:16], ars [ 31:24]}; assign arr = SWAP? Ars_swapped: ars; assign COUNT = COUNT + SWAP; The first part of the above instruction code defines a new set of instruction operation codes, called BYTESWAP. 101 This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of invention (99) (Please read the notes on the back before filling this page) // define a new opcode for BYTESWAP based on //-a predefined instruction field op2 //-a predefined opcode CUSTO ^

// refer to Xtensa ISA manual for descriptions of op2 and CUSTO opcode BYTESWAP op2=4'b0000 CUSTO// refer to Xtensa ISA manual for descriptions of op2 and CUSTO opcode BYTESWAP op2 = 4'b0000 CUSTO

此處,該新的操作碼BYTESWAP被定義爲CUSTO之一 組副操作碼。從更加詳細說明於下面之XtenSaTM指令集結 構參考手冊,吾人可知CUSTO被定義爲 opcode QRST op0 = 4’b0000 opcode CUSTO op 1 =4'b0 1 OOQRST 其中opO以及op l爲指令中之欄。操作碼一般以一種階 層式方式被組織。此處,QRST是最高位準操作碼而cust(t 是QRST之畐ij操作碼並且BYTESWAP是CUSTO之畐U操作石馬。 這操作碼之階層式機構允許操作碼空間之邏輯族群化以及ώ 管理。 第二宣告說明BYTE SWAP指令所需之另外的處理器狀 態: //declare state SWAP and COUNT state COUNT 32 state SWAP 1 此處,COUNT被宣告爲一組32-位元狀態而SWAP被宣 告爲一組1-位元狀態。TIE語言指定COUNT中之位元被以位,_ 102 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 ^ 五、發明説明(100 ) 元0爲最不主要位元自31至0指示。 該XtensaTMISA提供兩組指令,RSR以及WSR,以供儲 存並且重存特別的系統暫存器。相似地,其提供兩組其他 的指令,RUR以及WUR(更加詳細說明於下面)以供儲存並且 重存被宣告於TIE中之狀態。爲了儲存並且重存被宣告於 TIE中之狀態,吾人必須指定至RUR以及WUR指令可以存取 之使用者暫存器檔案的項目之狀態映射。上面指令碼之下 列部份指定這映射: ,Here, the new opcode BYTESWAP is defined as a set of sub-opcodes of CUSTO. From the XtenSaTM instruction set structure reference manual explained in more detail below, we know that CUSTO is defined as opcode QRST op0 = 4’b0000 opcode CUSTO op 1 = 4'b0 1 OOQRST where opO and op l are the columns in the instruction. Opcodes are generally organized in a hierarchical manner. Here, QRST is the highest-level opcode and cust (t is the 畐 ij opcode of QRST and BYTESWAP is the 操作 U operation stone horse of CUSTO. This hierarchical structure of opcodes allows logical grouping of opcode space and management of free sales The second declaration states the additional processor state required by the BYTE SWAP instruction: // declare state SWAP and COUNT state COUNT 32 state SWAP 1 Here, COUNT is declared as a set of 32-bit states and SWAP is declared as a Group 1-bit status. Bits in COUNT specified by TIE language are bitwise, _ 102 This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 ^ V. Description of invention (100) 0 is the least significant bit from 31 to 0. The XtensaTMISA provides two sets of instructions, RSR and WSR, for storage and re-store of special system registers. Similarly, it provides two sets of other instructions, RUR and WUR (explained in more detail below) for storage and re-store the state declared in TIE. In order to store and re-store the state declared in TIE, we must designate to the users who can be accessed by the RUR and WUR instructions The status register file item under the above mapping script which specifies the mapping part of the column:.,

//map COUNT and SWAP to user register file entries user —register 0 COUNT userregister 1 SWAP 而使得下面的指令將儲存COUNT之値至a2而SWAP之値 至a5 : RUR a2,0; RUR a5, 1 ; 這機構實際上被使用於測試程式中以確認狀態之內 容。在C中,上面的兩組指令將類似: X =RUR(0); y =RUR(1); TIE說明中之巢式部份是包含新的指令BYTE SWAP之新 指令類別的定義: //define a new instruction class that // -reads data from ars (predefined to be AR[s]) 103 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) ......................爨…: (請先閲讀背面之注意事項再填寫本頁) -訂— 539965 A7 _B7_ 五、發明説明(101 )// map COUNT and SWAP to user register file entries user —register 0 COUNT userregister 1 SWAP so that the following instructions will store COUNT to a2 and SWAP to a5: RUR a2,0; RUR a5, 1; This institution It is actually used in the test program to confirm the status. In C, the above two sets of instructions will be similar: X = RUR (0); y = RUR (1); The nested part of the TIE description is the definition of the new instruction category containing the new instruction BYTE SWAP: define a new instruction class that // -reads data from ars (predefined to be AR [s]) 103 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) .......... ............ 爨…: (Please read the notes on the back before filling out this page)-Order — 539965 A7 _B7_ V. Description of the invention (101)

// -uses and writes state COUNT (請先閲讀背面之注意事項再填寫本頁)// -uses and writes state COUNT (Please read the notes on the back before filling this page)

// -uses state SWAP iclass bs {BYTESWAP} {outarr,inars} {inoutCOUNT,in SWAP } 其中iclass是關鍵字組而bs是iclass之名稱。接著子句 列出這指令類別(bytESWAP)中之指令。在其之後的子句指. 定被這類別中之指令使用的操作元(在這情形中是輸入操作 元ars以及輸出操作元arr)。Iclass中之最後子句定義指定被 這類別中之指令存取的狀態(在這情形中指令將讀取狀態 SWAP並且讀取以及寫入狀態COUNT)。 上面指令碼中之最後區塊給予BYTESWAP指令之標準 語意定義: // semantic definition of byteswap // COUNT the number of byte-swapped words// -uses state SWAP iclass bs {BYTESWAP} {outarr, inars} {inoutCOUNT, in SWAP} where iclass is the keyword group and bs is the name of the iclass. The next clause lists the instructions in this instruction category (bytESWAP). Subsequent clauses specify the operands (in this case input operands ars and output operands arr) used by instructions in this category. The last clause definition in Iclass specifies the state to be accessed by instructions in this class (in which case the instruction will read the status SWAP and read and write the status COUNT). The last block in the above script gives the standard BYTESWAP instruction. Semantic definition: // semantic definition of byteswap // COUNT the number of byte-swapped words

// Return the swapped or un-swapped data depending on SWAP semantic bs {BYTESWAF} { wire [31:0] ars一swapped = {ars[7:0],ars[15:8],ars[23:16],ars[31:24]}; assign arr = SWAP ? ars_swapped : ars; assign COUNT = COUNT + SWAP; } 該說明使用一組描述組合邏輯之Verilog HDL子集。就 是這區塊精確地定義該指令集模擬器將如何模擬。 BYTESWAP指令並且另外的®路如何®合成並且被添力口至 理器硬體以支援新的指令。 104 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(102 ) 在本發明製作使用者-定義狀態中,被宣告之狀態可以 相似於存取儲存於狀態之資訊的任何其他變數而被使用。 出現於表示之右手側的一組狀態鑑定器指示自狀態之讀 取。完成寫入一組狀態是藉由排定一組値或表示給狀態鑑 定器。例如,下面的語意碼片段展示狀態如何被由指令讀 取並且被寫入: assign KE YC = sr = = 8'd2? art[27 : Ο]:KE YC ; assign KE YD = sr = = 8 'd3 ? art [27 : C]: KE YD ; assign D AT A = sr = = 8' do ? { D AT A [63 : 3 2 ], art} : (art, DATA[63:3 2]};// Return the swapped or un-swapped data depending on SWAP semantic bs {BYTESWAF} {wire [31: 0] ars-swapped = {ars [7: 0], ars [15: 8], ars [23:16] , ars [31:24]}; assign arr = SWAP? ars_swapped: ars; assign COUNT = COUNT + SWAP;} This description uses a set of Verilog HDL subsets that describe the combinational logic. It is this block that precisely defines how the instruction set simulator will simulate. The BYTESWAP instruction and the additional ® How to ® are synthesized and added to the processor hardware to support the new instruction. 104 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (102) In the user-defined state of the present invention, the declared state can be similar to the access stored Any other variables of state information are used. A set of state evaluators appearing on the right-hand side of the representation indicates reading from the state. The completion of writing a set of states is by scheduling a group of 値 or indicating to the state verifier. For example, the following semantic code snippet shows how the state is read and written by the instruction: assign KE YC = sr = = 8'd2? Art [27: Ο]: KE YC; assign KE YD = sr = = 8 ' d3? art [27: C]: KE YD; assign D AT A = sr = = 8 'do? {D AT A [63: 3 2], art}: (art, DATA [63: 3 2]};

Tensilica公司之1.0版XtensaTM指令集結構(ISA)參考手 冊,配合此處參考,以供展示可以被製作在可組態處理器 中爲核心指令以及經由組態選擇之選擇是可用的指令之指 令範例。進一步地,Tensilica公司之1.3版指令延伸語言 (TIE)參考手冊,也是配合參考以展示可以被使用於製作此 使用者-定義指令之TIE語言指令的範例。 從TIE說明中,製作指令之新的硬體可以使用,例如: 相似展示於附錄D的一組程式而被產生。附錄E展示支援新 的指令所需要之本質函數之檔頭檔案指令碼。 使用組態格式,下列可以自動地被產生: …處理器60之指令解碼邏輯; …對於處理器60之不合法指令檢測邏輯; 105 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) .、可| 539965 A7 _B7__ 五、發明説明(103 ) --組譯器之其ISA-特定部分; (請先閱讀背面之注意事項再填寫本頁) --對於編輯器之ISA-特定支援常式; 一反組譯器(被除錯器使用)之ISA-特定部分; 一模擬器之ISA-特定部分。 第16圖是展示這些軟體工具圖之ISA-特定部分如何被。 產生的圖形。自一組使用者-產生TIE說明檔案400,一組 TIE剖析器程式410產生許多程式之C碼,各產生一組被軟體 發展工具之一組或多組存取的檔案以提供關於使用者-定義 指令以及狀態之資訊。例如,程式tie2gcc 420產生一組稱' 爲xtensa-tie.h之C檔頭檔案470其包含新的指令之本質函數 定義。程式tie2isa 430產生一組動態鏈路之檔案庫(DLL) 480,其包含有關使用者-定義指令格式之資訊(在下面 Wilson等人應用之討論,這實際上是其中討論的編碼以及 解碼DLL之組合)。程式tie2iss 440產生性能模式化常式並 且產生包含指令語意之一組DLL 490,如於Wilson等人應用 中所討論,其被主編輯器使用於產生被模擬器使用之一組 模擬器DLL。程式tie2ver 450以適當的硬體說明語言產生使 用者-定義指令之必須的說明5 00。最後,程式tie2xtcs 460產生供RUR以及WUR指令使用之儲存以及重存指令碼 5 10° . 指令以及它們如何存取狀態之確切說明使得可能產生 可以塞進入現存之高性能微處理機設計之有效益邏輯。連 接本發明之這實施例說明的方法特別地處理那些讀取自或 寫入至一組或多組狀態暫存器之新的指令。尤其是’這實 106 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) 539965 A7 _B7_ 五、發明説明(104 ) 施例展示在一組都使用管道技術以達成高性能之微處理機 製作方式的類別情況中如何驅動狀態暫存器之硬體邏輯。 在一組管線化製作中,例如第1 7圖展示的一組,狀態 暫存器一般被複製許多次,各組示例代表特定的管線步驟 之狀態値。在這實施例中,一組狀態被轉譯成爲與下方之 核心處理器製作一致的暫存器之多重拷貝。另外的旁管以 及傳送邏輯也被產生,再次地,以與下方之核心處理器製 作一致的方式。例如,爲了達成一種包含三組執行步驟之 核心處理器製作,這實施例會轉譯狀態成爲三組連接之暫 存器,如第18圖展示。在這製作中,各暫存器6 1 0-630代表 於三組管線步驟之一時狀態in之値。Ctrl-1,ctrl-2,以及 ctrl-3是被使用於引動資料鎖定於對應的正反器6 1 0-630的控 制信號。 爲了使得狀態暫存器的多重拷貝與下方之處理器製作 一致地運作,需要另外的邏輯以及控制信號。’’ 一致地’’意 指其狀態應該在中斷狀況、例外以及管線阻塞之下與處理 器之其他狀態以相同方式運作。一般而言,一組所給予的 處理器製作定義某種代表各種管線狀況之信號。此信號是 使得管線狀態暫存器適當地運作所需。 在一般的管線之製作中,其執行單元包含多重管線步 驟。一組指令之計算被實施於在這管線中之多重步驟。依 控制邏輯指示之順序,指令流經由管線。在任何所給予的 時間中,可以有多至η組指令被執行於管線中,其中η是步 驟的數目。在一組超尺度處理器中’也可使用本發明製 107 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) (請先閲讀背面之注意事項再填窝本頁) 訂— 539965 A7 __B7___ 五、發明説明(105 ) (請先閱讀背面之注意事項再填寫本頁) 作,其管線中指令之數目可以爲n*w,其中w是處理器之主 要寬度。 控制邏輯之任務爲確保在指令之間的從屬性被遵守而a 在指令之間的任何干擾被解決。如果一組指令使用一組由 較早指令計算之資料,則需要特別的硬體以傳送其資料至 稍後指令而不阻塞管線。如果發生中斷,則管線中所有的 指令需要被刪除並且稍後再被執行。當一組指令因爲其輸-入資料或其需要之計算硬體不可用而無法被執行時,則該 指令必須被阻塞。一組阻塞指令之成本效益方式是於其第 一執行步驟時刪除之而於接著週期再執行該指令。這技術 的結果之一是在管線中產生一種無效步驟(泡沫)。這泡沫與 其他的指令一起經由管線進行流程。在其中指令被執行之 管線末端,該泡沬則被丟棄。 使用上面的三級管線範例,如此之處理器狀態的一般。 製作需要第19圖展示之另外的邏輯以及連接。 在正常情形下,在一步驟下被計算之一組値將被即時 地傳送至接著指令而不等待該値到達管線之末端以便減低 資料附屬性引介之管線阻塞的次數。這目標之達成是藉由 直接地傳輸第一正反器610之輸出至語意區塊而使得其可以 被接著指令即時地使用。爲了處理不正常狀況,例如:中 斷以及例外,該製作需要下面的控制信號:Kill_l,Kill_all 以及 Valid_3。 信號” K i 11 _ 1 ’’指不目前在第一管線步驟1 1 0中之指令由 於某些理由必須被刪除,例如不具有其前進所需要之資 108 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 ______B7_ 五、發明説明(l〇6 ) 料。一旦該指令被刪除,則其將被再嘗試於接著週期。信 號”Kill_all”指示目前在管線中之所有的指令由於某些理由必 須被刪除,例如,在它們之前所有的指令已經產生一組例 外或一組中斷已發生。信號nValid_3"指示目前在最後步驟 630中之指令是否爲有效的。此一狀況時常爲刪除一組在第 一管線步驟6 1 0中之指令的結果並且於管線中導致一組泡沬 (在有效指令中)。’’Valid_3”簡單地指示是否第三管線步驟 中之指令爲有效的或是泡沬。明顯地,僅有效指令應該被 鎖定。 第20圖展示製作該狀態暫存器所需要之另外的邏輯以 及連接。其同時也展示如何建構其控制邏輯以驅動該信號 ’’ctrl-1”,"ctrl-2”,以及”ctrl-3”而使得這狀態-暫存器製作 達到上面的需要標準。下列爲自動地被產生以製作狀態暫 存器之樣本HDL碼,如第19圖展示。 module t ie_enflop(t ie_out, t ie_in, en, elk); parameter size = 32; output [size-l:0] tie_out; input [size-l:0] tie_in; input en; input elk; reg [size-l:0] tmp; assign tie_out = tmp; always @(posedge elk) begin if (en) tmp <= #1 t ie_in; end endmodule moduele tie_athens_state(ns, we, ke, kp, vw, elk, ps); 109 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) •訂i 539965 A7 B7 五、發明説明(i〇7 parameter size = 32; input [size-1:0] ns; // next state input we; // write enable input ke; // Kill E state input kp; // Kill Pipeline input vw; // Valid W state input elk; // clock output [size-l:0] ps; // present state (請先閲讀背面之注意事項再填寫本頁) wire [size-1:0] se; wire [size-l:0] sm; wire [size-l:0] sw; wire [size-l:0] sx; // state at E stage // state at M stage // state at W stage // state at X stage wire ee; // write enable for EM register wire ew; // write enable for WX register assign se = kp ? sx : ns; assign ee = kp I we & 〜ke; c assign ew = vw & 〜kp; assign ps = sm; tie_enflop #(size) state_EM(.tie_out(sm), .t ie_in(se), .en(ee), \ .clk(clk)); t ie_enflop #(size) state_MW(.tie—out(sw), .tie—in(sm), .en(l’bl), \ .clk(clk)): tie_enflop #(size) state_WX(.tie_out(sx), .tie一in(sw), .en(ew), \ .clk(clk)); endmodule 110 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7五、發明説明(108 ) 使用上述的管線狀態暫存器模式,如果該語意區塊指 定該狀態爲其輸入,則該狀態之現在狀態値被傳送至語意 區塊作爲一組輸入變數。如果其語意區塊具有產生該狀態 之新値的邏輯,則一組輸出信號被產生。這輸出信號被使 用爲至管線之狀態暫存器之下一狀態輸入。 這實施例允許多重語意說明區塊,各組說明多重指令 之行爲。在這未被限制說明方式之下,可能僅一組語意區 塊之子集產生所給予狀態的下一狀態輸出。進一步地說’ 同時也是可能有條件地依據於所給予的時間執行何組指 令,一組所給予的語意區塊產生其下一狀態輸出。結果, 需要另外的硬體邏輯以自所有的語意區塊組合下一狀態輸 出以形成輸入至管線之狀態暫存器。在本發明之這實施例 中,各語意區塊之一組信號被自動地導出以指示這區塊是 否已經產生一組該狀態之新値。在其他的實施例中,此一 信號可以留給設計者指定。 第20圖展示如何自許多語意區塊sl-sn組合一組狀態之 下一狀態輸出並且適當地選擇一組以輸入至狀態暫存器。 在此圖中,〇pi_i以及是第一語意區塊之操作碼信 號,op2_l以及op2_2是第二語意區塊之操作碼信號,等 等。語意區塊i之下一狀態輸出爲si(如果具有多重狀態暫存 器時,便有該區塊之多重下一狀態輸出)。指示語意區塊i已 經產生一組狀態之新値之信號爲si_we。信號s_we指示是否 (請先閲讀背面之注意事項再填寫本頁) *tr— 参| 111 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _ B7____^ 五、發明説明(109 ) 有任何語意區塊產生一組狀態之新値,並且被使用爲至管 線之狀態暫存器作爲寫入-引動信號之輸入。 。 即使多重語意區塊之表示能力不多於一組單一語意區 塊之表示能力,它的確提供一種製作更多結構之說明的方 式,一般是藉由群集相關指令成爲一組單一區塊。由於指 令被製作範疇中更加嚴格,多重語意區塊也可以導致指令 效應之更簡單分析。另一方面,時常有理由以一組單一語 意區塊說明多重指令之行爲。通常,是因爲這些指令之硬, 體製作分享共同邏輯。描述單一語意區塊中之多重指令通 常產生更有效益之硬體設計。 因爲中斷以及例外,軟體必須重存並且負載該狀態値 來回資料記憶體。依據新的狀態以及新的指令之標準說 明,自動地產生此重存以及負載指令是可能的。在本發明 之一組實施例中,用以重存以及負載指令之邏輯自動地被 產生爲兩組語意區塊,其可以接著重復地被轉譯成爲®際 硬體,就如任何其他的區塊。例如,自下面的狀態宣告: state [63:0] DATA cpn = 0 autopack state [27:0] KEYC cpn=l nopack state [27:0] KEYD cpn=l user_register 0= DATA[31:0]; user —register 1= DATA[63:32]; user register 2= KEYC; user register 3 = KEYD ; 112 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 、可| 539965 A7 B7 五、發明説明(110 ) (請先閲讀背面之注意事項再填寫本頁) 下面的語意區塊可以被產生以讀取"DATA”, KEYC”,以及”KEYD”之値進入一般目的之暫存器: iclass rur {RUR} {out arr, in st} {in DATA, in KEYC, in KEYD} semantic rur (RUR) { wi re sel_0 = (st = 8'dO); wire sel_l = (st = 8'dl); wire sel_2 = (st = 8'd2); wire sel_3 = (st = 8'd3);Tensilica's Version 1.0 XtensaTM Instruction Set Architecture (ISA) Reference Manual, which is incorporated herein by reference to show examples of instructions that can be made in configurable processors as core instructions and instructions that are available through configuration selection . Further, Tensilica's Version 1.3 Instruction Extended Language (TIE) Reference Manual is also a reference to show examples of TIE language instructions that can be used to make this user-defined instruction. From the TIE description, new hardware for making instructions can be used, for example: A set of programs similar to those shown in Appendix D was generated. Appendix E shows the header file script for the essential functions needed to support the new command. Using the configuration format, the following can be automatically generated:… the instruction decoding logic of the processor 60;… the illegal instruction detection logic of the processor 60; 105 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) ) (Please read the notes on the back before filling out this page). May. 539965 A7 _B7__ V. Description of the Invention (103)-The ISA-specific part of the translator; (Please read the notes on the back before filling (This page)-ISA-specific support routines for editors; ISA-specific parts for translators (used by debuggers); ISA-specific parts for emulators. Figure 16 shows how the ISA-specific parts of these software tool diagrams are used. The resulting graphics. From a group of user-generating TIE description files 400, a group of TIE parser programs 410 generates C codes of many programs, each generating a group of files accessed by one or more groups of software development tools to provide information about users- Define command and status information. For example, the program tie2gcc 420 generates a set of C header files 470 called 'xtensa-tie.h' which contains the essential function definitions of the new instructions. The program tie2isa 430 generates a set of dynamically linked files (DLLs) 480, which contain information about the user-defined instruction format (discussed by Wilson et al., Which is actually the encoding and decoding DLL discussed therein). combination). The program tie2iss 440 generates a performance patterning routine and generates a set of DLLs 490 containing instruction semantics, as discussed in the Wilson et al. Application, which is used by the main editor to generate a set of simulator DLLs used by the simulator. The program tie2ver 450 generates a user-defined instruction 5 00 in the appropriate hardware description language. Finally, the program tie2xtcs 460 generates stored and re-stored instruction codes 5 10 ° for RUR and WUR instructions. The exact description of the instructions and how they access the state makes it possible to generate benefits that can be plugged into existing high-performance microprocessor designs logic. The method described in connection with this embodiment of the present invention specifically deals with new instructions that are read from or written to one or more sets of state registers. In particular, the paper size of this paper is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 539965 A7 _B7_ V. Description of the invention (104) The examples show that a set of pipeline technology is used to achieve high performance. How to drive the hardware logic of the state register in the case of the type of microprocessor manufacturing method. In a group of pipelined productions, such as the one shown in Figure 17, the state register is usually copied many times. Each group of examples represents the state of a particular pipeline step. In this embodiment, a set of states is translated into multiple copies of a register that is consistent with the core processor below. Additional by-pass and transmission logic are also generated, again, in a manner consistent with the core processor below. For example, in order to achieve a core processor production with three sets of execution steps, this embodiment translates the state into three sets of connected registers, as shown in Figure 18. In this production, each register 6 1 0-630 represents the state of one of the three sets of pipeline steps. Ctrl-1, ctrl-2, and ctrl-3 are control signals that are used to lock the data to the corresponding flip-flop 6 1 0-630. In order for the multiple copies of the state register to work in concert with the processor below, additional logic and control signals are required. 'Consistently' means that its state should operate in the same way as other states of the processor under interrupt conditions, exceptions, and pipeline blockages. In general, a given set of processors makes a signal that defines some kind of pipeline condition. This signal is required to make the pipeline status register operate properly. In general pipeline production, its execution unit includes multiple pipeline steps. The calculation of a set of instructions is performed in multiple steps in this pipeline. The instructions flow through the pipeline in the order indicated by the control logic. At any given time, up to n groups of instructions can be executed in the pipeline, where n is the number of steps. In a set of super-scale processors, the system of the present invention can also be used. 107 The paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the precautions on the back before filling in this page). A7 __B7___ 5. Description of the invention (105) (Please read the notes on the back before filling this page). The number of instructions in the pipeline can be n * w, where w is the main width of the processor. The task of the control logic is to ensure that the subordinate properties between instructions are observed and that any interference between instructions is resolved. If a set of instructions uses a set of data calculated by an earlier instruction, special hardware is required to pass its data to later instructions without blocking the pipeline. If an interrupt occurs, all instructions in the pipeline need to be deleted and executed later. When a group of instructions cannot be executed because their input-input data or the required computing hardware is unavailable, the instruction must be blocked. A cost-effective way to block a group of instructions is to delete them at the first execution step and execute the instruction at subsequent cycles. One of the results of this technique is an ineffective step (bubble) in the pipeline. This bubble goes through the pipeline along with other instructions. At the end of the pipeline where instructions are executed, the bubble is discarded. Using the three-stage pipeline example above, the processor status is average. Production requires additional logic and connections as shown in Figure 19. Under normal circumstances, a group of puppets that are counted in one step will be sent in real time to the next instruction without waiting for the puppet to reach the end of the pipeline in order to reduce the number of pipeline blockages introduced by data dependencies. This goal is achieved by directly transmitting the output of the first flip-flop 610 to the semantic block so that it can be used immediately by subsequent instructions. To handle abnormal conditions, such as interrupts and exceptions, the production requires the following control signals: Kill_l, Kill_all, and Valid_3. The signal “K i 11 _ 1” means that the instruction not currently in step 1 1 1 of the first pipeline must be deleted for some reasons, for example, it does not have the resources it needs to move forward. 108 This paper standard is applicable to the Chinese National Standard (CNS ) A4 specification (210X297 mm) 539965 A7 ______B7_ V. Description of the invention (106). Once the instruction is deleted, it will be tried again in the next cycle. The signal "Kill_all" indicates all the current in the pipeline Instructions must be deleted for some reason, for example, all instructions before them have generated a set of exceptions or a set of interrupts have occurred. The signal nValid_3 " indicates whether the instruction currently in the last step 630 is valid. This condition is often To delete the result of a group of instructions in the first pipeline step 6 10 and cause a group of bubbles in the pipeline (in the valid instruction). "Valid_3" simply indicates whether the instructions in the third pipeline step are valid Or bubble. Obviously, only valid instructions should be locked. Figure 20 shows the additional logic and connections needed to make this state register. It also shows how to construct its control logic to drive the signals ’’ ctrl-1 ”, " ctrl-2”, and “ctrl-3” to make this state-register production meet the above required standards. The following is a sample HDL code that is automatically generated to make a state register, as shown in Figure 19. module t ie_enflop (t ie_out, t ie_in, en, elk); parameter size = 32; output [size-l: 0] tie_out; input [size-l: 0] tie_in; input en; input elk; reg [size- l: 0] tmp; assign tie_out = tmp; always @ (posedge elk) begin if (en) tmp < = # 1 t ie_in; end endmodule moduele tie_athens_state (ns, we, ke, kp, vw, elk, ps) ; 109 This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the notes on the back before filling this page) • Order i 539965 A7 B7 V. Description of the invention (i〇7 parameter size = 32 ; input [size-1: 0] ns; // next state input we; // write enable input ke; // Kill E state input kp; // Kill Pipeline input vw; // Valid W state input elk; // clock output [size-l: 0] ps; // present state (Please read the precautions on the back before filling this page) wire [size-1: 0] se; wire [size-l: 0] sm; wire [ size-l: 0] sw; wire [size-l: 0] sx; // state at E stage // state at M stage // state at W stage // state at X stage wire ee; // write enable for EM register wire ew; // write enable for WX regist er assign se = kp? sx: ns; assign ee = kp I we & ~ ke; c assign ew = vw & ~ kp; assign ps = sm; tie_enflop # (size) state_EM (.tie_out (sm),. t ie_in (se), .en (ee), \ .clk (clk)); t ie_enflop # (size) state_MW (.tie_out (sw), .tie_in (sm), .en (l'bl ), \ .Clk (clk)): tie_enflop # (size) state_WX (.tie_out (sx), .tie-in (sw), .en (ew), \ .clk (clk)); endmodule 110 paper sizes Applicable to China National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of invention (108) Use the above pipeline state register mode. If the semantic block specifies the state as its input, then the state of The state 値 is now passed to the semantic block as a set of input variables. If its semantic block has the logic to generate this state, a set of output signals is generated. This output signal is used as the next state input to the state register of the pipeline. This embodiment allows multiple semantic description blocks, with each group describing the behavior of multiple instructions. In this unrestricted way of explanation, it is possible that only a subset of a set of semantic blocks produce the next state output for the given state. Furthermore, it is also possible to conditionally execute a set of instructions based on the given time, and a given set of semantic blocks produces its next state output. As a result, additional hardware logic is required to combine the next state output from all semantic blocks to form a state register for input to the pipeline. In this embodiment of the present invention, a set of signals of each semantic block is automatically derived to indicate whether this block has generated a set of new messages in this state. In other embodiments, this signal may be left to the designer. Figure 20 shows how to combine the next state output of a group of states from many semantic blocks sl-sn and select a group appropriately for input to the state register. In this figure, 0pi_i and opcode signals of the first semantic block, op2_1 and op2_2 are opcode signals of the second semantic block, and so on. The next state output of the semantic block i is si (if there is a multi-state register, there will be multiple next-state outputs of the block). The signal indicating that the semantic block i has generated a new set of states is si_we. The signal s_we indicates whether or not (please read the precautions on the back before filling in this page) * tr— 111 | 111 This paper size applies to China National Standard (CNS) A4 (210X297 mm) 539965 A7 _ B7 ____ ^ V. Description of the invention 109) Any semantic block generates a new set of states and is used as a state register to the pipeline as an input to the write-initiate signal. . Even if the representation ability of multiple semantic blocks is not more than the representation ability of a single set of semantic blocks, it does provide a way to make more structured explanations, usually by clustering related instructions into a single set of blocks. Since the order is made more strictly in the category, multiple semantic blocks can also lead to a simpler analysis of the effect of the order. On the other hand, there are often reasons to explain the behavior of multiple instructions in a single set of semantic blocks. Usually, it is because of these instructions that they share common logic. Describing multiple instructions in a single semantic block often results in a more efficient hardware design. Because of interruptions and exceptions, the software must re-save and load the state 値 to and from data memory. According to the new state and the new instruction standard description, it is possible to generate this restore and load instruction automatically. In a set of embodiments of the present invention, the logic for re-store and load instructions is automatically generated as two sets of semantic blocks, which can then be repeatedly translated into inter-hardware, just like any other block . For example, from the following state declaration: state [63: 0] DATA cpn = 0 autopack state [27: 0] KEYC cpn = l nopack state [27: 0] KEYD cpn = l user_register 0 = DATA [31: 0]; user —register 1 = DATA [63:32]; user register 2 = KEYC; user register 3 = KEYD; 112 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back first (Fill in this page again), OK | 539965 A7 B7 V. Description of invention (110) (Please read the notes on the back before filling in this page) The following semantic blocks can be generated to read " DATA ", KEYC" And "KEYD" register for general purpose: iclass rur {RUR} {out arr, in st} {in DATA, in KEYC, in KEYD} semantic rur (RUR) {wi re sel_0 = (st = 8 'dO); wire sel_l = (st = 8'dl); wire sel_2 = (st = 8'd2); wire sel_3 = (st = 8'd3);

assign arr = {32{sel_0}} & DATA[31:〇] Iassign arr = {32 {sel_0}} & DATA [31: 〇] I

{32{sel_l}) & DATA[64:32] I{32 {sel_l}) & DATA [64:32] I

{32{sel_2}} & KEYC I {32{sel_3}} & KEYD; 第21圖展示對應於這種類之語意邏輯之邏輯方塊圖。 該輸入信號"st”被與各種常數比較以形成各種選擇信號,其 被使用於以一種與uSer_regiSter格式一致之方式自狀態暫存 器選擇某種位元。使用先前的狀態宣告,DATA之位元32映 射至第二使用者暫存器之位元〇。因此,在這圖中之MUX的 第二輸入應該被連接到其資料狀態之第3 2位元。 下面的語意區塊可以被產生而將自一般目的暫存器之 値寫入狀態'’DATA”,”KEYC”,以及”KEYD”。 iclass wur {WUR} {in art, in sr} {out DATA, out KEYC, out KEYD} semantic wur (WUR) { wire sel_0 = (st = 8'dO); wire sel_l = (st = 8'dl); wire sel_2 = (st = 8'd2); wire sel_3 = (st = 8'd3); assign DATA = {sel_l ? art : DATA[63:32], sel_0 113 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(111 ) art : (請先閲讀背面之注意事項再填寫本頁) DATA[31:0]}; assign KEYC = art; assign KEYD = art; assign DATA一we = WUR; assign KEYS_we = WUR & sel_2; assign KEYD_we = WUR & sel_3; } 第22圖展示當其被映射至第i使用者暫存器之第k位元 時,狀態S之第j位元的邏輯。如果在WUR指令中之 user —register數目”st”爲”i”,則"ars”之第k位元被負載進入 S[j]暫存器;否則,S[j]之原始的値被再循環。此外,如果 狀態S之任何位元被再負載,則信號S_we即被引動。 TIE uSer_regiSter宣告指定自被狀態宣告定義之另外的 處理器狀態映射至被這些RUR以及WUR指令使用以讀取並 且寫入這無關於TIE指令之狀態的一組識別器。 附錄F展示用以產生RUR以及WUR指令之碼。 RUR以及WUR之主要的目的爲工作切換。在一多重工 作環境中,多重軟體工作依據某些排程演算法以執行分享 處理器。當作用時,該工作之狀態便駐於處理器暫存器 中。當排定之演算法決定切換至另外的工作時,被保持於處 理器暫存器之狀態便被儲存至記憶體,此外另外的工作狀態 自記憶體被負載至處理器暫存器。該XtenSaTM指令集結構 (ISA)包含讀取並且寫入被ISA定義之狀態之RSR以及WSR指 114 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(112 ) 令。例如,下面的指令碼是”儲存至記憶體”工作之部份.· II save special register rsr aO, SAR rsr al, LCOUNT s32I aO, a3, UEXCSAVE + O s32I al , a3, UEXCSAVE + 4 rsr aO, LBEG rsr a 1, LEND s32i aO, a3,UEXCSAVE + 8 s32i al, a3,UEXCSAVE+12 ;if (config_get_value("IsaUseMAS16’’)){ rsr a 0 , ACCLO rsr al, ACCHI s32i aO, a3, UEXCSAVE+16 s32i al, a3, UEXCSAVE + 20 rsr aO, MR_0 rsr a 1, MR_ 1 s32i aO, a3, UEXCSAVE + 24 s32i al, a3, UEXCSAVE + 28 rsr aO, MR_2 rsr al, MR_3 s32i aO, a3, UEXCSAVE + 32 115 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁){32 {sel_2}} & KEYC I {32 {sel_3}} &KEYD; Figure 21 shows a logical block diagram corresponding to this kind of semantic logic. The input signal " st "is compared with various constants to form various selection signals, which are used to select a certain bit from the state register in a manner consistent with the uSer_regiSter format. Using the previous state declaration, the DATA bit Element 32 is mapped to bit 0 of the second user register. Therefore, the second input of the MUX in this figure should be connected to the 32nd bit of its data state. The following semantic block can be generated The statuses of the general purpose registers are written to the states "DATA", "KEYC", and "KEYD". iclass wur {WUR} {in art, in sr} {out DATA, out KEYC, out KEYD} semantic wur (WUR) {wire sel_0 = (st = 8'dO); wire sel_l = (st = 8'dl); wire sel_2 = (st = 8'd2); wire sel_3 = (st = 8'd3); assign DATA = {sel_l? art: DATA [63:32], sel_0 113 This paper standard applies to China National Standard (CNS) A4 Specifications (210X297 mm) 539965 A7 B7 V. Invention Description (111) art: (Please read the notes on the back before filling this page) DATA [31: 0]}; assign KEYC = art; assign KEYD = art; assign DATA_we = WUR; assign KEYS_we = WUR &sel_2; assign KEYD_we = WUR &sel_3;} Figure 22 shows that when it is mapped to the k-th bit of the i-th user register, the state of the S-th bit j-bit logic. If the number of user-registers "st" in the WUR instruction is "i", the k-th bit of " ars "is loaded into the S [j] register; otherwise, the original register of S [j] is In addition, if any bit of state S is reloaded, the signal S_we is activated. The TIE uSer_regiSter declaration specifies that additional processor states defined by the state declaration are mapped to be used by these RUR and WUR instructions to read And write a set of identifiers about the status of the TIE instruction. Appendix F shows the code used to generate the RUR and WUR instructions. The main purpose of RUR and WUR is to switch jobs. In a multiple work environment, multiple software Jobs are based on certain scheduling algorithms to perform shared processors. When active, the status of the job resides in the processor's register. When the scheduled algorithm decides to switch to another job, it is kept in processing The state of the device register is stored in the memory, and the other working state is loaded from the memory to the processor register. The XtenSaTM instruction set structure (ISA) includes reading and writing the state defined by the ISA. R SR and WSR refer to 114 This paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (112) Order. For example, the following instruction code is the “Save to Memory” job II save special register rsr aO, SAR rsr al, LCOUNT s32I aO, a3, UEXCSAVE + O s32I al, a3, UEXCSAVE + 4 rsr aO, LBEG rsr a 1, LEND s32i aO, a3, UEXCSAVE + 8 s32i al , a3, UEXCSAVE + 12; if (config_get_value (" IsaUseMAS16 '')) {rsr a 0, ACCLO rsr al, ACCHI s32i aO, a3, UEXCSAVE + 16 s32i al, a3, UEXCSAVE + 20 rsr aO, MR_0 rsr a 1, MR_ 1 s32i aO, a3, UEXCSAVE + 24 s32i al, a3, UEXCSAVE + 28 rsr aO, MR_2 rsr al, MR_3 s32i aO, a3, UEXCSAVE + 32 115 This paper is in accordance with China National Standard (CNS) A4 specifications ( 210X297 mm) (Please read the notes on the back before filling this page)

539965 A7 B7 五、發明説明(113 ) s32i al, a3, UEXCSAVE + 36 (請先閲讀背面之注意事項再填寫本頁)539965 A7 B7 V. Description of the invention (113) s32i al, a3, UEXCSAVE + 36 (Please read the precautions on the back before filling this page)

而下面的指令碼是”重存自記憶體”工作之部份: // restore special registers 132i a2, al, UEXCSAVE + 0 132i a3, al, UEXCSAVE + 4 wsr a2, SARAnd the following script is part of the work of "Restore from Memory": // restore special registers 132i a2, al, UEXCSAVE + 0 132i a3, al, UEXCSAVE + 4 wsr a2, SAR

wsr a3, LCOUNT 132i a2, al, UEXCSAVE + 8 132i a3, al, UEXCSAVE + 12wsr a3, LCOUNT 132i a2, al, UEXCSAVE + 8 132i a3, al, UEXCSAVE + 12

wsr a2, LBEGwsr a2, LBEG

wsr a3, LEND ;if (config_get_value("IsaUseMAC16") ) {wsr a3, LEND; if (config_get_value (" IsaUseMAC16 ")) {

132i a2, al, UEXCSAVE + 16 132i a3, al, UEXCSAVE + 20 wsr a2, ACCLO132i a2, al, UEXCSAVE + 16 132i a3, al, UEXCSAVE + 20 wsr a2, ACCLO

wsr a3, ACCHI 132i a2, al, UEXOSAVE + 24 132i a3, al, UEXCSAVE + 28 wsr a2, MR_0 wsr a3, MR_1 132i a2, al, UEXCSAVE + 32 132i a3, al, UEXCSAVE + 36 wsr a2, MR_2 wsr aS, MR_3 其中 SAR、LCOUNT、LBEG、LEND 是核心 XtensaTM ISA之處理器狀態暫存器部份,而ACCLO、ACCHI、 116 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) 539965 A7 B7 五、發明説明(114 ) MR_〇、MR—1、MR —2、以及 MR_3 是 MAC16 XtensaTM ISA 選 擇之部份。(該等暫存器以組對被儲存並且重存以避免管線互 當設計者以TIE定義新的狀態時,其也必須如上述狀態 被切換工作。一種可能是讓設計者簡單地執行編輯其工作 切換碼(其一部份於上述被給予)並且增加類似於上面指令碼 之RUR/S32I以及L32I/WUR指令。但是,當該軟體自動地被 產生並且構造上是正確時,可組態處理器是最有效的。因 此本發明包含一組設備以自動地擴增其工作切換碼。下面 的tpp行被添加至上述的儲存工作: ;my $off = 0; ;my $i; ;for ($i = 0; $i < $#user一register; $i += 2) { rur a2, '$user_registers[$i+0]' rur a3, '$user_registers[$i+l]' s32i a2, UEXCUREG + '$off + 0' s32i a3, UEXCUREG + '$off + 4' ;$off += 8; ;} ;if (@user一registers & 1) { ; # odd number of user_registers rur a2, '$user_registers[$#user一registers]' s32i a2, UEXCUREG + '$off + 0' ;$off += 4; 而下面的行被添加至上述的重存工作: 117 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁)wsr a3, ACCHI 132i a2, al, UEXOSAVE + 24 132i a3, al, UEXCSAVE + 28 wsr a2, MR_0 wsr a3, MR_1 132i a2, al, UEXCSAVE + 32 132i a3, al, UEXCSAVE + 36 wsr a2, MR_2 wsr aS , MR_3 Among them, SAR, LCOUNT, LBEG, and LEND are the processor state registers of the core XtensaTM ISA, and ACCLO, ACCHI, 116 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (114) MR_0, MR-1, MR-2, and MR_3 are selected by the MAC16 XtensaTM ISA. (These registers are stored and re-stored in pairs to avoid pipeline interactions. When a designer defines a new state with TIE, it must also be switched as described above. One possibility is for the designer to simply perform editing of his work Switch code (a part of which is given above) and add RUR / S32I and L32I / WUR instructions similar to the above instruction code. However, when the software is automatically generated and the structure is correct, the processor can be configured Is the most effective. Therefore, the present invention includes a set of equipment to automatically expand its work switching code. The following tpp line is added to the above storage job:; my $ off = 0;; my $ i;; for ($ i = 0; $ i < $ # user 一 register; $ i + = 2) {rur a2, '$ user_registers [$ i + 0]' rur a3, '$ user_registers [$ i + l]' s32i a2, UEXCUREG + '$ off + 0' s32i a3, UEXCUREG + '$ off + 4'; $ off + = 8;;); if (@user 一 registers & 1) {; # odd number of user_registers rur a2, ' $ user_registers [$ # user 一 registers] 's32i a2, UEXCUREG +' $ off + 0 '; $ off + = 4; and the following line was added to the above resave job: 117 papers Scale applicable Chinese National Standard (CNS) A4 size (210X297 mm) (Please read the back of the precautions to fill out this page)

539965 A7 B7 五、發明説明(η5 ) ;my $off = 0; ;my $i; ;for ($i =〇;Si < $#user—registers; $i += 2) 132i a2, UEXCUREG + '$off + 0' 132i a3, UEXCUREG + '$off + 4' wur a2, '$user_registers[$i+0]' wur a3, '$user_registers[$i+l]' ;$oft += 8;;) ;if (@user_registers & 1) { ; # odd number of user_registers 132i a2, UEXCUREG + '$off + 0' wur a2, 'user_registers[$#user一registers] ;$off += 4; (請先閲讀背面之注意事項再填寫本頁) 、?T— 参- 最後,記憶體中之工作狀態區域必須有安置給使用者 暫存器儲存的另外空間,並且這空間自工作儲存指示器之 基礎之偏移被定義爲組譯器常數UEXCUREG。這儲存區域 先前被下面的碼所定義 #define UEXCREGSIZE (16*4) #define UEXCPARMSIZE (4*4) ;if (&config_get_value("IsaUseMAC16")){ #define UEXCSAVESIZE (10*4) ;} else { #define UEXCSAVESIZE (4*4) ;} #define UEXCMISCSIZE (2*4) #define UEXCPARM 0 #define UEXCREG (UEXCPARM+UEXCPARMSIZE) 118 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 五、發明説明(116 ) A7 B7 #defme UEXCSAVE (UEXCREG+UEXCREGSIZE) #define UEXCMISC (UEXCSAVE+UEXCSAVESIZE) #define UEXCWIN (UEXCMISC+O)539965 A7 B7 V. Description of the Invention (η5); my $ off = 0;; my $ i;; for ($ i = 〇; Si < $ # user—registers; $ i + = 2) 132i a2, UEXCUREG + '$ off + 0' 132i a3, UEXCUREG + '$ off + 4' wur a2, '$ user_registers [$ i + 0]' wur a3, '$ user_registers [$ i + l]'; $ oft + = 8; ;); If (@user_registers & 1) {; # odd number of user_registers 132i a2, UEXCUREG + '$ off + 0' wur a2, 'user_registers [$ # user 一 registers]; $ off + = 4; (Please First read the notes on the back and then fill out this page),? T—--Finally, the working status area in the memory must have another space for the user's temporary storage, and this space is based on the working storage indicator The offset is defined as the translator constant UEXCUREG. This storage area was previously defined by the following code #define UEXCREGSIZE (16 * 4) #define UEXCPARMSIZE (4 * 4); if (& config_get_value (" IsaUseMAC16 ")){#define UEXCSAVESIZE (10 * 4);} else {#define UEXCSAVESIZE (4 * 4);} #define UEXCMISCSIZE (2 * 4) #define UEXCPARM 0 #define UEXCREG (UEXCPARM + UEXCPARMSIZE) 118 This paper size applies to China National Standard (CNS) A4 (210X297 mm) 539965 V. Description of the invention (116) A7 B7 #defme UEXCSAVE (UEXCREG + UEXCREGSIZE) #define UEXCMISC (UEXCSAVE + UEXCSAVESIZE) #define UEXCWIN (UEXCMISC + O)

#define UEXCFRAME (UEXCREGSIZE+UEXCPARMSIZE+UEXCSAVESIZE+UEXCMISCSIZE) 其被改變成爲 #define UEXCREGSIZE (16*4) #define UEXCPARMSIZE (4*4) ;if (&config_get_value ("IsaUseMAciC") ) { #define UEXCSAVESIZE (10*4) . ;} else { #define UEXCSAVESIZE (4*4) ;} #define UEXCMISCSIZE (2*4) #define UEXCUREGSIZE '@user_registers * 4' #define UEXCPARM 0 #define UEXCREG (UEXCPARM+UEXCPARMSIZE) #define UEXCSAVE (UEXCREG+UEXCREGSIZE) #define UEXCMISC (UEXCSAVE+UEXCSAVESIZE) #define UEXCUREG (UEXCMISC+UEXCMISCSIZE) #define UEXCWIN (UEXCUREO+O) #define UEXCFRAME \ (UEXCREGSIZE+UEXCPARMSIZE+UEXCSAVESIZE+UEXCMISCSIZE+UEXCUREGSIZE) 這碼是仰賴於該處具有一組tpp變數@user_registers並 具備使用者暫存器數目之列表。這僅是簡單地被產生自每 119 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)#define UEXCFRAME (UEXCREGSIZE + UEXCPARMSIZE + UEXCSAVESIZE + UEXCMISCSIZE) which was changed to #define UEXCREGSIZE (16 * 4) #define UEXCPARMSIZE (4 * 4); if (& config_get_value (" IsaUseMAciC ")) AVE #define UEXCFRAME 10 * 4).;} Else {#define UEXCSAVESIZE (4 * 4);} #define UEXCMISCSIZE (2 * 4) #define UEXCUREGSIZE '@user_registers * 4' #define UEXCPARM 0 #define UEXCREG (UEXCPARM + UEXCPARMSIZE) #define UEXCSAVE (UEXCREG + UEXCREGSIZE) #define UEXCMISC (UEXCSAVE + UEXCSAVESIZE) #define UEXCUREG (UEXCMISC + UEXCMISCSIZE) #define UEXCWIN (UEXCUREO + O) #define UEXCFRAME \ (UEXCREGSIZE + UEXAVEGARMSIZE + UEXCAVESIZESIZE + There is a set of tpp variables @user_registers and a list of the number of user registers. This is simply generated from the application of the Chinese National Standard (CNS) A4 specification (210X297 mm) per 119 paper sizes (please read the precautions on the back before filling this page)

539965 A7 ____ B7_ 五、發明説明(117 ) 一 user_register陳述之第一論點的一組列表。 在某些更加複雜微處理機製作中,一組狀態可被以不 同的管線狀態計算。處理這需要許多此處說明之程序的延 伸(雖然是簡單延伸)。首先,其格式語言需要被延伸以便能 夠將一組語意區塊與一組管線步驟聯結。這目標以許多方 式之一被達成。在一組實施例中,其相關的管線步驟可以 被以各語意區塊明確地指定。在另外的實施例中,管線步 驟之範圍可以被指定爲各語意區塊。然而在其他的實施例 中,所給予的語意區塊之管線步驟可以依據所需的計算性 延遲自動地被導出。 在不同的管線步驟中支援狀態產生之第二工作爲處理 中斷、例外、以及阻塞。在管線控制信號控制之下,這通 常涵蓋添加適當的旁管並且傳送邏輯。在一組實施例中, 一組產生使用圖可以被產生以指示在當狀態被產生時以及 當其被使用時之間的關係。依據應用分析,適當的傳送邏 輯可以被製作以處理共同情形而連鎖邏輯可以被產生供未 被傳送邏輯所處理之情況下阻塞管線。 用以修改基礎處理器的指令發出邏輯之方法取決於處 理器採用之演算法。但是,一般而言,大部分之處理器的 指令發出邏輯,不論是否爲單一發出或超尺度’不論是否 供單一週期或多重週期指令,僅取決於,對於被發出測試 指令: 1.信號:指示對於各處理器狀態元件是否其指令使用 狀態作爲來源; 120 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁) 訂丨 539965 A7 B7 五、發明説明(us ) 2.信號:指示對於各處理器狀態元件是否其指令使用 狀態作爲目的地;以及 3 .信號:指示對於各功能性單元是否其指令使用功能 性單元; 這些信號被使用於進行發出至管線以及相互發出檢查 並且更動在管線·相關發出邏輯中之其管線狀態。TIE包含 用以擴增新的指令之信號以及它們的方程式之所有必須資 訊。 第一,各TIE狀態宣告導致一組新的信號對於指令發出 邏輯被產生。列表於iclass宣告之第三或第四論點之各組irT 或inout操作元或狀態添加列表於第二論點中指令之指令解’ 碼信號至指定處理器狀態元件的第一組方程式。 第二,列表於iclass宣告之第三或第四論點之各組out 或i η 〇 u t操作元或狀態添加列表於第二論點中指令之指令解 碼信號至指定處理器狀態的第二組方程式。 第三,自各TIE語意區塊被產生之邏輯代表一組新的功 能性單元,因此一組新的單元信號被產生,並且指定語意 區塊之TIE指令的解碼信號被OR在一起以形成第三組方程 式。 當一組指令被發出時,其管線狀態必須被更動以供將 來之發出決定。該用以修改基礎處理器的指令發出邏輯之u 方法再次地取決於處理器採用之演算法。但是,再次地,. 某些一般的觀察是可能的。其管線狀態必須提供下面的狀 態回至發出邏輯: 121 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) -訂— ,#Γ 539965 A7 _B7__ 五、發明説明(119 ) (請先閲讀背面之注意事項再填寫本頁) 4.信號:當該結果是可用於旁管時,指示對於各發出 之指令目的地; 5 .各功能性單元之信號:其指示功能性單元是可用於 另外的指令。 此處說明之實施例是單一發出處理器,其中設計者定 義指令受限制於一組邏輯計算之單一週期。在這情形中’ 上面所述相當簡化。對於功能性單元檢查或相互發出檢查 是不需要的,並且無單一週期指令可以使處理器狀態元件 是不可用於下一指令。因此其發出方程式剛好成爲 issue =(〜srcluse| srclpipeready)&(〜src2use| src2pipeready) & (〜srcNuse | srcNpipeready); 並且其中src[i]pipeready信號不被另外的指令影響而 src[i] use是如上述說明並且修改之第一組方程式。在這實施 例中,第四以及第五組信號並非所需的。對於以多重週期 多重發出之不同的實施例,其TIE格式會對於各組給予在管 線化計算之週期數目的指令以一種潛伏格式被擴增。 第四組信號會利用將依據格式完成步驟之各指令之指 令解碼信號OR在一起而被產生於各語意區塊管道步驟。 藉由原定値其被產生邏輯將被完全地管線化,並且因 此TIE產生功能性單元將在接受一組指令之後一組週期永遠 備妥。在這情形中,TIE語意區塊之第五組信號永遠被確 定。當在多重週期之上必須再使用語意區塊中之邏輯時’ 122 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 ' _B7_ 五、發明説明(120 ) 一組更進一步的格式將指定功能性單元將在此指令之使用 中有多少組週期。在這情形中,第五組信號會利用將在該 步驟中指定週期計數之各指令的指令解碼信號在一起而 被產生。 另外,在不同的實施例中,其可以作爲TIE之延伸留給 設計者指定結果可用以及功能性單元可用的信號。 依據這實施例被處理之程式碼的範例被展示於附帶之 附錄中。爲了簡化,這些將不會被詳細說明;但是,在檢 閱上述參考手冊之後,它們將迅速地被熟習本技術之人員 所了解。附錄G是使用TIE語言之一組指令的製作範例;附 錄Η展示TIE編輯器產生什麼給使用此碼之編輯器。相似 地,附錄I展示TIE編輯器產生什麼給模擬器;附錄J展示~ TIE編輯器產生什麼給在使用者應用中擴充TIE指令之巨 集;附錄K展示tie編輯器產生什麼以用本體模式模擬TIE指 令;附錄L展示tie編輯器產生什麼作爲另外的硬體之 Verilog HDL說明;以及附錄Μ展示TIE編輯器產生什麼作爲 Design Compiler原本以將上面說明之Verilog HDL最佳化以 評估對於整體CPU尺寸以及性能上TIE指令之面積以及速度 之衝擊。 如上面所述,爲了啓始處理器組態步驟,使用者經由 上述GUI開始選擇一組基礎處理器組態。如處理程序之部 份,一組軟體發展系統30被建構並且被傳送至使用者,如 第1圖展示。該軟體發展系統30包含相關於本發明之另外論、 點的四組鍵構件,第6圖更加詳細地展示:一組編輯器1 08、 123 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) (請先閲讀背面之注意事項再填寫本頁)539965 A7 ____ B7_ V. Description of the Invention (117) A set of lists of the first argument stated by user_register. In some more complex microprocessor fabrications, a set of states can be calculated with different pipeline states. Dealing with this requires many (though simple) extensions of the procedures described here. First, its format language needs to be extended to be able to link a set of semantic blocks to a set of pipeline steps. This goal is achieved in one of many ways. In one set of embodiments, its related pipeline steps may be explicitly specified in semantic blocks. In other embodiments, the scope of the pipeline steps may be designated as each semantic block. However, in other embodiments, the pipeline steps of the given semantic blocks can be automatically derived based on the required computational delay. The second job of supporting states in different pipeline steps is handling interrupts, exceptions, and blocking. Under the control of pipeline control signals, this usually involves adding the appropriate bypass and transmitting logic. In one set of embodiments, a set of production usage graphs can be generated to indicate the relationship between when a state is generated and when it is used. Based on application analysis, appropriate transfer logic can be made to handle common situations and interlocking logic can be generated to block the pipeline without being processed by the transfer logic. The method used to modify the instruction issuing logic of the underlying processor depends on the algorithm used by the processor. However, in general, most of the processor's instruction issuing logic, whether single-issued or over-scaled, 'whether or not for single-cycle or multi-cycle instructions, depends only on the test instructions being issued: 1. Signal: Indication For the status of each processor state component as the source of its instruction use; 120 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the precautions on the back before filling this page) Order 丨 539965 A7 B7 V. Description of the invention (us) 2. Signals: indicates whether the instruction use status is used as the destination for each processor state element; and 3. signals: indicates whether the functional units are used for instructions for each functional unit; these signals are used In the issue to pipeline and mutual issue inspection and change its pipeline status in pipeline and related issue logic. The TIE contains all signals necessary to augment new instructions and their equations. First, each TIE status announcement causes a new set of signals to be issued to the instruction logic. Each group of irT or inout operands or states listed in the third or fourth argument declared by the iclass adds the instruction solution of the instructions listed in the second argument to the first set of equations that specify the processor state elements. Second, the groups of out or i η 〇 u t operands or states listed in the third or fourth argument declared by iclass add the instruction decoding signals of the commands listed in the second argument to the second set of equations that specify the state of the processor. Third, the logic generated from each TIE semantic block represents a new set of functional units, so a new set of unit signals are generated, and the decoded signals of the TIE instructions specifying the semantic block are ORed together to form a third Set of equations. When a group of instructions is issued, its pipeline state must be changed for future issuance decisions. The u method for modifying the instruction issuing logic of the underlying processor again depends on the algorithm used by the processor. But again, some general observations are possible. The pipeline status must provide the following status back to the sending logic: 121 This paper size is applicable to China National Standard (CNS) Α4 specification (210X297 mm) (Please read the precautions on the back before filling this page) -Order—, # Γ 539965 A7 _B7__ V. Description of the invention (119) (Please read the precautions on the back before filling out this page) 4. Signal: When the result is available for bypass, indicate the destination of each command issued; 5. Each function Signal of sexual unit: It indicates that the functional unit is available for another instruction. The embodiment described here is a single issue processor in which the designer defines instructions that are limited to a single cycle of logical computations. In this case 'the above is quite simplified. No functional unit checks or mutual issue checks are required, and no single cycle instruction can make the processor state element unavailable for the next instruction. So its equation just becomes issue = (~ srcluse | srclpipeready) & (~ src2use | src2pipeready) & (~ srcNuse | srcNpipeready); and where the src [i] pipeready signal is not affected by other instructions and src [i] use is the first set of equations described and modified as described above. In this embodiment, the fourth and fifth sets of signals are not required. For different embodiments that are issued in multiple cycles, the TIE format is augmented in a latent format for each group of instructions given the number of cycles in the pipeline calculation. The fourth group of signals will be generated at each semantic block pipeline step by OR decoding the instructions of the instructions that complete the steps according to the format. By default, its generated logic will be completely pipelined, and thus the TIE generation functional unit will always be ready for a set of cycles after receiving a set of instructions. In this case, the fifth set of signals of the TIE semantic block is always determined. When it is necessary to use the logic in the semantic block on multiple cycles, '122 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7' _B7_ V. Description of the invention (120) A group goes further The format will specify how many cycles the functional unit will use in this instruction. In this case, the fifth group of signals is generated by using the instruction decoded signals of the instructions of the cycle count specified in this step together. In addition, in different embodiments, it can be used as an extension of the TIE to leave the designer with a signal that specifies the results available and the functional units available. Examples of code processed according to this embodiment are shown in the attached appendix. For simplicity, these will not be explained in detail; however, after reviewing the above reference manuals, they will be quickly understood by those skilled in the art. Appendix G is an example of making a set of instructions using the TIE language; the appendix shows what the TIE editor produces for editors using this code. Similarly, Appendix I shows what the TIE editor generates to the simulator; Appendix J shows ~ what the TIE editor produces to extend the macro of TIE instructions in user applications; Appendix K shows what the tie editor produces to simulate in ontology mode TIE instructions; Appendix L shows what the tie editor generates as additional hardware Verilog HDL instructions; and Appendix M shows what the TIE editor generates as Design Compiler originally designed to optimize the Verilog HDL described above to evaluate the overall CPU size And the impact of the TIE instruction area and speed on performance. As mentioned above, in order to start the processor configuration step, the user starts to select a set of basic processor configurations via the above-mentioned GUI. As part of the processing procedure, a set of software development systems 30 is constructed and transmitted to the user, as shown in FIG. The software development system 30 includes four sets of key components related to other points and points of the present invention, and FIG. 6 shows in more detail: a set of editors 1 08, 123 This paper size applies the Chinese National Standard (CNS) A4 specification ( 210 X 297 mm) (Please read the notes on the back before filling this page)

•、可I 539965 A7 _B7_ 五、發明説明(121 ) 一組組譯器110、一組指令集模擬器112以及一組除錯器^ 130° 如熟習本技術之人員所習知’一組編輯器轉換以高階 程式語言,例如:(::或(::+ +,被寫出之使用者應用成爲處 理器-特定組合語言。高階程式語言,例如(:或C + +,被設計 以允許應用編寫者以他們能簡易地並確切地說明之形式去 說明它們的應用。這些並非處理器所了解之語言。應用編 寫者不需要擔心將被使用之處理器的所有特定特性。相同 之C或C + +程式一般可以稍微或不加以修改而被使用於許多 不同型式的處理器。 編輯器轉譯c或C + +程式成爲組合語言。組合語言是更 加相近於處理器直接地支援之機器語言。不同型式的處理. 器將具有它們自有的組合語言。各組合指令時常直接地代 表一組機器指令,但是兩組則並非絕對相同的。組合指令 被設計爲人類可讀取串列。各指令以及操作元被給予一組 有意義之名稱或助記符號,其允許人類讀取組合指令並且 容易地瞭解何種操作將被機器達成。組譯器轉換組合語言 成爲機器語言。各組合指令串列有效益地被組譯器編碼成 爲一組或多組可以直接地並且有效益地被處理器執行之機 器指令。 機器碼可以直接地執行於處理器上,但是實際的處理 器並非永遠即時地可用的。建立實際的處理器是花費時間 的並且是昂貴的程序。當選擇可行處理器組態時,使用者 無法建立供各可行選擇的實際處理器。取而代之,使用者 124 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 参… (請先閲讀背面之注意事項再填寫本頁) 、¥ 539965 A7 B7 五、發明説明(122 ) 被提供一種稱爲模擬器之軟體程式。執行於一般電腦上之 程式的模擬器能夠模擬執行使用者應用於使用者組態處理 器上之效應。該模擬器能夠模仿被模擬處理器之語意並且 能夠告知使用者該實際處理器將能多麼快速地執行使用者 之應用。 , 除錯器是允許使用者互動地找出他們軟體之問題的工 具。該除錯器允許使用者互動地執行他們之程式。使用者 可以在任何時間停止程式之執行並且觀看其C原始碼,產生 之組合或機器碼。使用者同時也可以在***點檢驗或修改 任何或所有變數或硬體暫存器之値。使用者接著可以繼續 執行一也許每次一組陳述,也許每次一組機器指令,也許至 新的使用者-選擇之***點。 γ 全部四組構件108、110、112以及130需要瞭解使用者— 定義指令75 0(參考第3圖)而模擬器112以及除錯器130必須另 外地瞭解使用者-定義狀態752。該系統允許使用者經由添 加至使用者C與C + +應用之本質性而存取使用者-定義指令 750。編輯器108必須轉譯其本質的呼叫成爲使用者-定義指 令750之組合語言指令738。組譯器110必須採取新的組合語 言指令73 8,不論是被使用者直接地寫入或被編輯器1〇8轉 譯,並且將它們編碼成爲對應於使用者-定義指令750之機 器指令740。模擬器1 12必須將使用者-定義機器指令740解 碼。其必須將指令之語意模式化,並且其必須將被組態處 理器上指令之性能模式化。模擬器1 1 2同時也必須將使用者 -定義狀態之値以及性能含意模式化。除錯器130必須允許 125 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公爱) (請先閲讀背面之注意事項再填寫本頁) 訂· 539965 A7 _ B7_ 五、發明説明(123 ) 使用者列印其包含使用者-定義指令750之組合語言指令 738。其必須允許使用者檢驗並且修改使用者-定義狀態之 値。 在本發明之這論點中,使用者引用一種工具,TIE編 輯器702,以處理其目前可行使用者-定義增強部736。該 TIE編輯器702不同於轉譯使用者應用成爲組合語言73 8之編 輯器708。該TIE編輯器702建構引動已經-建構之基礎軟體 系統30(編輯器708、組譯器7 10以及模擬器7 12和除錯器730) 去使用新的使用者-定義增強部736之構件。軟體系統30之 各元件使用一組稍微不同的構件組。 第24圖是關於這些軟體工具之TIE-特定部分如何被產 生的圖形。自使用者-定義延伸檔案736,該TIE編輯器702 產生許多程式之C碼,各組產生一組被一組或多組軟體發展 工具存取的檔案以供關於使用者-定義指令以及狀態之資 訊。例如,程式tie2gcc 800產生一組稱爲xtensa-tie.h之C檔 頭檔案842(下面將更加詳細地說明之),其包含新指令之本 質功能定義。該程式tie2isa δ 10產生一組動態鏈路之檔案庫 (DLL)844/848,其包含關於使用者-定義指令格式(編碼DLL 844以及解碼DLL 848之組合將於下列更加詳細地說明)之資 訊。該程式tie21SS 840產生性能模式化以及指令語意(如下 面討論)之C碼870,其被主編輯器846使用以產生被模擬器 7 12使用之模擬器DLL 849,如下面將更加詳細地說明之。 程式tie2Ver 850以適當的硬體說明語言產生使用者-定義指 令之必須的說明850。最後,程式tie2xtos 860產生儲存以及 126 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)• 、 可 I 539965 A7 _B7_ V. Description of the invention (121) A set of translators 110, a set of instruction set simulators 112, and a set of debuggers ^ 130 ° As is familiar to those skilled in the art 'a set of editors The converter is written in a high-level programming language, such as: (:: or (:: ++), and the written user application becomes a processor-specific combination language. The high-level programming language, such as (: or C ++, is designed to allow Application writers explain their applications in a form that they can easily and accurately explain. These are not the languages the processor understands. Application writers do not need to worry about all the specific characteristics of the processor that will be used. C ++ programs can generally be used in many different types of processors with little or no modification. The editor translates C or C ++ programs into a combined language. A combined language is closer to the machine language directly supported by the processor. Different types of processors will have their own combination language. Each combination instruction often directly represents a group of machine instructions, but the two groups are not absolutely the same. The combination instruction is set Human-readable strings. Instructions and operands are given a set of meaningful names or mnemonics that allow humans to read combined instructions and easily understand what operations will be achieved by the machine. The translator translates the combined language Become machine language. Each combination of instruction strings is beneficially encoded by the translator into one or more sets of machine instructions that can be executed directly and efficiently by the processor. Machine code can be executed directly on the processor, but The actual processor is not always available immediately. Setting up the actual processor is time consuming and expensive procedure. When selecting a feasible processor configuration, the user cannot create an actual processor for each feasible choice. Instead, User 124 This paper size is in accordance with Chinese National Standard (CNS) A4 specification (210X297 mm) See ... (Please read the precautions on the back before filling out this page), ¥ 539965 A7 B7 5. Description of the invention (122) Provided A software program called an emulator. An emulator of a program running on a general computer can simulate the execution The user configures the effect on the processor. The simulator can mimic the semantics of the simulated processor and can inform the user how quickly the actual processor will execute the user's application. The debugger allows user interaction A tool to find problems with their software. The debugger allows users to interactively execute their programs. Users can stop the execution of the program at any time and view their C source code, the generated combination or machine code. Users At the same time, it is also possible to test or modify any or all variables or hardware registers at the split point. The user can then continue to execute a set of statements, a set of machine instructions, or a new user. -The split point of choice. Γ All four sets of components 108, 110, 112, and 130 need to know the user—define instruction 75 0 (refer to Figure 3) and simulator 112 and debugger 130 must know the user-definition separately. State 752. The system allows users to access user-defined instructions 750 via the nature of the applications added to users C and C ++. The editor 108 must translate its essential call into a user-defined instruction 750 combined language instruction 738. The translator 110 must take the new combined language instruction 73 8, either directly written by the user or translated by the editor 108, and encode them into a machine instruction 740 corresponding to the user-defined instruction 750. Simulator 112 must decode user-defined machine instructions 740. It must model the semantics of the instructions, and it must model the performance of the instructions on the configured processor. Simulator 1 1 2 must also model the user-defined state and performance implications. The debugger 130 must allow 125 paper sizes to apply Chinese National Standard (CNS) A4 specifications (210X297 public love) (Please read the precautions on the back before filling out this page) Order · 539965 A7 _ B7_ V. Description of the invention (123) The user prints a combined language instruction 738 containing the user-defined instruction 750. It must allow users to verify and modify user-defined status. In this argument of the present invention, the user refers to a tool, the TIE editor 702, to handle its currently available user-defined enhancement 736. The TIE editor 702 is different from the editor 708 which translates the user application into a combined language 738. The TIE editor 702 constructs the already-built base software system 30 (editor 708, translator 7 10, simulator 7 12 and debugger 730) to use the components of the new user-definition enhancement section 736. Each component of the software system 30 uses a slightly different set of components. Figure 24 is a diagram of how the TIE-specific parts of these software tools are generated. From the user-defined extension file 736, the TIE editor 702 generates C code for many programs, and each group generates a set of files accessed by one or more software development tools for user-defined instructions and status. Information. For example, the program tie2gcc 800 generates a set of C file header files 842 (explained in more detail below) called xtensa-tie.h, which contains the essential function definitions of the new command. The program tie2isa δ 10 generates a set of dynamically linked files (DLLs) 844/848, which contains information on user-defined instruction formats (the combination of encoding DLL 844 and decoding DLL 848 will be explained in more detail below) . The program tie21SS 840 generates performance coded C code 870 with instruction semantics (as discussed below), which is used by the main editor 846 to generate the simulator DLL 849 used by the simulator 7 12 as explained in more detail below. . The program tie2Ver 850 generates the required description 850 of the user-defined instruction in the appropriate hardware description language. Finally, the program tie2xtos 860 generates storage and 126 paper sizes are applicable to China National Standard (CNS) A4 specifications (210X297 mm) (Please read the precautions on the back before filling this page)

539965 A7 _B7 _ 五、發明説明(丨24 ) 重存碼810以儲存並且重存原文切換之使用者-定義狀態。 另外關於使用者-定義狀態之製作的資訊可以被發現於上述 之Wang等人應用中。 編輯器708 在這實施例中,編輯器708轉譯使用者之應用中本質呼 叫成爲使用者-定義增強部736之組合語言指令738。編輯器 708製作這機構於標準編輯器例如:GNU編輯器可發現之巨 集以及線內組合機構之上面。爲了獲得更多關於這些機構 的資訊,可參考,例如,GNU C以及C + +編輯器使用者手 冊,£〇〇8 1.0.3版。 。 考慮一使用者,其希望產生一組操作於兩組暫存器並 且返送結果於第三暫存器之新的指令f〇〇。該使用者置放指 令說明於特定目錄中之使用者-定義指令檔案750並且引用 TIE編輯器702。該TIE編輯器702產生一組具標準名稱例如 xtensa-tie.h之檔案742。該檔案包含下面foo之定義。 #define foo(ars,art)\ ({int arr;asm volatile(" foo%0, % 1, %2":" = a" (arr): \ na,,(ars),,,a,,(art)); }) 當使用者引用編輯器708於其應用時,其經由命令行選 擇或環境變數以使用者-定義增強部73 6告知編輯器708目錄 名稱。該目錄也包含xtensa-tie.h檔案742。編輯器708自動、 地將檔案xtensa-tie.h包含進入被編輯之使用者C或C + +應用 程式中,就似使用者已經親自寫入f〇〇之定義。使用者已 經於其應用中包含本質呼叫至指令foo。因爲被包含之定 127 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)539965 A7 _B7 _ V. Description of the invention (丨 24) Restore code 810 to store and re-store the user-defined state of the original text switch. In addition, information about the creation of user-defined states can be found in the Wang et al. Application described above. Editor 708 In this embodiment, the editor 708 translates the essential call in the user's application into a combined language instruction 738 of the user-definition enhancement section 736. The editor 708 produces this mechanism on top of standard editors such as the macros found by the GNU editor and inline assembly mechanisms. For more information on these institutions, refer to, for example, the GNU C and C ++ Editor User Manual, £ 0.08 version 1.0.3. . Consider a user who wants to generate a set of new instructions f0 that operate on two sets of registers and return the result to the third register. The user placement instruction is described in a user-defined instruction file 750 in a specific directory and references the TIE editor 702. The TIE editor 702 generates a set of files 742 with standard names such as xtensa-tie.h. This file contains the definition of foo below. #define foo (ars, art) \ ({int arr; asm volatile (" foo% 0,% 1,% 2 ": " = a " (arr): \ na, (ars) ,, a (, (Art));}) When the user refers to the editor 708 in its application, it informs the editor 708 of the directory name via the user-defined enhancement or the environment variable via the user-defined enhancement section 7366. This directory also contains the xtensa-tie.h file 742. The editor 708 automatically and automatically contains the file xtensa-tie.h into the edited user C or C ++ application, as if the user had written the definition of f〇〇 himself. The user has included an intrinsic call to the instruction foo in his application. Because it is included 127 This paper size is applicable to China National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page)

539965 A7 B7 五、發明説明(125 ) 義,編輯器7〇8視那些本質呼叫爲至被包含定義的呼叫。 依據編輯器708提供之標準巨集機構,編輯器708視至巨 集foo之呼叫如同使用者已經直接地寫出組合語言陳述738 而非巨集呼叫。亦即,依據標準線內組合機構,編輯器 708轉譯呼叫成爲單一組合指令f〇〇。例如,使用者可具有 包含呼叫至本質foo之函數: intfre d( inta, intb) { returnfoo(a,b); } 編輯器轉譯該函數成爲下面使用該使用者定義指令foo 之組合語言副常式: fred:539965 A7 B7 V. Description of the invention (125) Definition, the editor 708 treats those essential calls as calls up to and including the definition. According to the standard macro mechanism provided by the editor 708, the editor 708 treats the call to the macro foo as if the user had directly written the combined language statement 738 instead of the macro call. That is, according to the standard in-line combination mechanism, the editor 708 translates the call into a single combination instruction f00. For example, a user may have a function that contains a call to essential foo: intfre d (inta, intb) {returnfoo (a, b);} The editor translates the function into a combined language subroutine using the user-defined instruction foo below : Fred:

.frame sp, 3 2 entrysp,3 2 #APP fooa2,a2,a3 #NO_APP retw. n 當使用者產生新的一組使用者-定義增強部7 3 6之時, 無須建構新的編輯器。TIE編輯器僅產生檔案xtensa-tie.h 742,其被預先建立編輯器708自動地包含進入使用者 之應用。 128 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 、τ 539965 A7 、 B7_ 五、發明説明(126 ) 組譯器7 1 〇 在這實施例中,組譯器710使用一組編碼檔案庫744以 編碼組合指令7 50。這檔案庫744之界面包含之功能爲: 一轉譯一組操作碼助記符號串列成爲內部操作碼表示; …提供將被產生之各操作碼之位元樣型至機器指令740 中之操作碼欄; --將各指令操作元之操作元値編碼並且將該編碼之操作 元位元樣型***機器指令740之操作元欄。 。 如範例所示,考慮呼叫本質f〇〇之使用者功能的先前範 例。該組譯器可採取nf〇〇 a2,a2,a3”指令並且將之轉換成 爲由十六進位數目0x62230表示之機器指令,其中高階6以 及低階〇—起代表操作碼爲f〇〇,而2,2和3分別地代表三組 暫存器a2,a2以及a3。 這些功能之內部製作是依據表以及內部功能之組合。 表容易地被TIE編輯器702所產生,但是它們的表示性受限 制。當需要更多彈性時,例如當表示操作元編碼功能時, TIE編輯器702可以產生被包含於檔案庫744中之任意的C 碼。 再次地考慮1〇〇&2,&2 33”之範例。每一組暫存器欄僅。 以暫存器之數目被編碼。TIE編輯器702產生下面的功能, 檢查合法暫存器値,並且如果該値是合法的,則送回暫存 器數目: xtensa_encode_result encode_r (vaip) u_int32_t *valp; 129 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) -------------------------- (請先閲讀背面之注意事項再填寫本頁) 訂丨 539965 A7 B7 五、發明説明(127 ) u_int32_t val = *valp; if ((val » 4) != 0) (請先閱讀背面之注意事項再填寫本頁) return xtensa_encode一result_too一high; *valp = val; return xtensa_encode_result_ok; 如果所有的編碼是如此簡單,則無須編碼功能;一組 表即足夠了。但是,使用者被允許選擇更加複雜之編碼。 下面的編碼,以TIE語言說明,以操作元被1 024除之値的數 目將每一操作元編碼。此一編碼於需要爲1 024之倍數値的 密集地編碼是有用的。.frame sp, 3 2 entrysp, 3 2 #APP fooa2, a2, a3 #NO_APP retw. n When the user generates a new set of user-defined enhancements 7 3 6, there is no need to build a new editor. The TIE editor only generates the file xtensa-tie.h 742, which is automatically included by the pre-built editor 708 into the user's application. 128 This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page), τ 539965 A7, B7_ V. Description of the invention (126) Translator 7 1 〇 In this embodiment, the translator 710 uses a set of encoding archives 744 to encode the combined instructions 750. The functions of the interface of this library 744 are: a set of operation code mnemonic symbols is translated into an internal operation code representation;… provide the bit pattern of each generated operation code to the operation code in machine instruction 740 Column;-Encode the operation element of each instruction operand and insert the encoded operation bit pattern into the operation element column of machine instruction 740. . As shown in the example, consider the previous example of the user function of the call nature f00. The translator can take nf〇〇a2, a2, a3 "instructions and convert them into machine instructions represented by the hexadecimal number 0x62230, where high-order 6 and low-order 0-1 together represent the opcode f〇〇, and 2, 2, and 3 represent three sets of registers a2, a2, and a3, respectively. The internal production of these functions is based on tables and a combination of internal functions. Tables are easily produced by the TIE editor 702, but their representation is affected by Restriction. When more flexibility is needed, such as when representing the operation meta-encoding function, the TIE editor 702 can generate any C code contained in the archive 744. Consider again 100 & 2, & 2 33 "example. Each set of register columns is only. Encoded by the number of registers. The TIE editor 702 generates the following function, checks the legal register, and if the register is valid, returns the number of registers: xtensa_encode_result encode_r (vaip) u_int32_t * valp; 129 This paper standard applies Chinese national standard ( CNS) A4 specification (210X297 mm) -------------------------- (Please read the precautions on the back before filling this page) Order 丨539965 A7 B7 5. Description of the invention (127) u_int32_t val = * valp; if ((val »4)! = 0) (Please read the notes on the back before filling this page) return xtensa_encode_result_too_high; * valp = val; return xtensa_encode_result_ok; If all encoding is so simple, then no encoding is needed; a set of tables is sufficient. However, users are allowed to choose more complex codes. The following encoding is described in TIE language. Each operand is encoded by a number divided by 1 024. This encoding is useful for dense encoding that requires multiples of 1 024.

Operand txlO t {t<< 1 0} {tx 10>> 10} 此TIE編輯器轉換操作元編碼說明成爲下面的C函數。 xtensa_encode_result encode_txlO (valp) u_int32_t *valp; { u_int32_t t, txlO; txlO = *valp; t = (txlO 》10) & Oxf; txlO = decode_txl0(t); if (txlO !=氺valp) return xtensa_encode_result_not_ok; }else { 氺valp = t; } return xtensa_encode_result_ok; 130 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五 、發明説明( 128 (請先閱讀背面之注意事項再填寫本頁) 因爲操作元可能値之領域非常大,所以不可使用一組 表於此一編碼。該表將必須爲非常大。 -、tr— 在編碼檔案庫744之實施例中,一組表映射操作碼助 記符號串列至內部操作碼表示。爲了有效率’這表可以被 分類或其可以爲一組散列表或某些允許有效益尋找之其他“ 的資料結構。另一表映射各操作碼至機器指令之樣版’而 其操作碼欄被啓始至該操作碼之適當的位元樣型。具備相 同操作元欄以及操作元編碼之操作碼被群聚一起。對於這 些族群之一的各操作元而言,其檔案庫包含一種函數以將 操作元値編碼成爲位元圖型以及另外一種函數以將那些位 元***機器指令中之適當欄。一組分別的內表映射各指令 操作元至這些函數。考慮一組範例,其中結果暫存器數目 被編碼進入指令之位元12...15。TIE編輯器702將產生下面 的功能:其設定指令之位元1 2 ... 1 5爲結果暫存器之値(數目): void set_r_field (insn, val) xtensa一insnbuf insn; u」nt32_t val; { insn[0] = (insn[0] & OxffffOfff) I ((val « 12) & OxfOOO): } 爲了允許改變使用者-定義指令而不再建立組譯器 7 1 〇,編碼檔案庫744被製作爲一組動態地鏈路之檔案庫 131 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _B7_____ 五、發明説明(129 ) (DLL·) 〇 DLL·是一種允許程式動態地延伸其功能性之標準方 法。處理DLL之細節隨不同的主操作系統而變更,但是其 基本的槪念是相同的。DLL被動態地負載進入一組執行程 式作爲程式碼之延伸。一組執行時間鏈接器解決在DLL與 主要程式之間以及在DLL與其他已經負載的DLL之間的符號^ 參考。在編碼檔案庫或DLL 744之情況中,碼之一小部分被 靜態地鏈路進入組譯器7 1 0。這碼專責於裝載DLL,結合 DLL中之資訊與現存編碼資訊以供用於預定建構之指令集 74 6(其可以已經自一分別DLL被負載),並且經由上述界面 功能使得該資訊是可存取的。 當使用者產生新的增強部736時,其引用TIE編輯器702 於增強部736之說明。TIE編輯器702產生定義製作編碼DLL 744之內部表以及功能的C碼。TIE編輯器702接著引用主系 統之本體編輯器746(其編輯指令碼以執行於主處理器而非 被組態之處理器)以產生使用者-定義指令750之編碼 DLL1 44。使用者弓|用預先建立之組譯器710於其應用以指。 示至包含使用者-定義增強部736之目錄的旗標或環境變 數。該預先建立之組譯器7 1 0動態地打開目錄中之DLL 744。對於各組合指令,該預先建立之組譯器7 1 0使用編碼 DLL 744以搜尋操作碼助記符號,找出機器指令中之操作碼 欄之位元樣型,並且編碼各指令操作元。 例如,當組譯器710查知TIE指令” foo a2,a2,a3,1#〆 該組譯器710自表查知該"foo"操作碼轉譯成爲位元位置16至 23中之數目6。自表中,其找出各暫存器之編碼功能。該功 132 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)Operand txlO t {t < < 1 0} {tx 10 > > 10} This TIE editor conversion operation meta-code description becomes the following C function. xtensa_encode_result encode_txlO (valp) u_int32_t * valp; {u_int32_t t, txlO; txlO = * valp; t = (txlO) 10) &Oxf; txlO = decode_txl0 (t); if (txlO! = ___valp) return else {氺 valp = t;} return xtensa_encode_result_ok; 130 This paper size applies to China National Standard (CNS) A4 (210X297 mm) 539965 A7 B7 V. Description of the invention (128 (Please read the precautions on the back before filling this page) ) Because the field of operands may be very large, a set of tables cannot be used for this encoding. The table will have to be very large.-, Tr— In the embodiment of the encoding archive 744, a set of tables maps to opcodes A mnemonic string is serialized to an internal opcode representation. For efficiency 'this table can be classified or it can be a set of hash tables or some other "data structure that allows for efficient searching. Another table maps each opcode to A sample version of the machine instruction 'and its opcode column is started to the appropriate bit pattern of the opcode. Opcodes with the same op column and op code are grouped together. For each of the operands in one of these groups, its archive contains a function to encode the operands into a bit pattern and another function to insert those bits into the appropriate columns of the machine instruction. The table maps each instruction operand to these functions. Consider a set of examples where the number of result registers is encoded into instruction bits 12 ... 15. The TIE editor 702 will produce the following function: It sets the instruction bits 1 2 ... 1 5 is the number (number) of result registers: void set_r_field (insn, val) xtensa_insnbuf insn; u ″ nt32_t val; {insn [0] = (insn [0] & OxffffOfff) I ((val «12) & OxfOOO):} In order to allow user-defined instructions to be changed without creating a translator 7 1 〇, the coding archive 744 is made into a set of dynamically linked archives 131 copies Paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7_____ V. Description of the invention (129) (DLL ·) 〇DLL · is a standard method that allows programs to dynamically extend their functionality. The details of dealing with DLLs vary with different host operating systems, but the basic idea is the same. The DLL is dynamically loaded into a set of execution programs as an extension of the code. A set of runtime linkers resolves the symbols ^ reference between the DLL and the main program and between the DLL and other loaded DLLs. In the case of a coding archive or DLL 744, a small portion of the code is statically linked into the translator 7 1 0. This code is specifically responsible for loading the DLL, combining the information in the DLL with the existing encoding information for the predetermined set of instruction sets 74 6 (which may have been loaded from a separate DLL), and making the information accessible through the above interface functions of. When the user generates a new enhancement section 736, it refers to the description of the enhancement section 736 by the TIE editor 702. The TIE editor 702 generates C code that defines the internal tables and functions of the production coding DLL 744. The TIE editor 702 then references the main system's ontology editor 746 (which edits the instruction code to execute on the main processor instead of the configured processor) to generate a user-defined instruction 750 encoding DLL1 44. User Bow | Use a pre-built translator 710 to point to its application. Flags or environmental variables displayed to the directory containing the user-defined enhancement 736. The pre-built translator 710 dynamically opens the DLL 744 in the directory. For each combined instruction, the pre-built translator 710 uses the encoding DLL 744 to search for the opcode mnemonic, find the bit pattern in the opcode column in the machine instruction, and encode each instruction operand. For example, when the group translator 710 knows the TIE instruction "foo a2, a2, a3, 1 #", the group translator 710 knows from the table that the " foo " opcode is translated into a number of bit positions 16 to 23 From the table, it finds out the coding function of each register. The paper size of this function is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page)

539965 A7 _B7_____ 五、發明説明(13〇) (請先閲讀背面之注意事項再填寫本頁) 能將a2編碼成爲數目2,另外的a2成爲數目2以及a3成爲數 目3。自表中,其找出適當的設定功能。Set_r_field置放結 果値2進入指令之位元位置1 2 * 1 5。相似的設定功能適當地 置放另外的2以及3。 。 模擬器7 1 2 模擬器712以許多方式與使用者-定義增強部73 6互動。 給予一組機器指令740,模擬器7 1 2必須將指令解碼;亦 即,***指令成爲操作碼以及操作元。使用者-定義增強部 736之解碼經由解碼DLL 748中之功能完成(編碼DLL 744以 及解碼DLL 748可能實際上是單一組DLL)。例如,考慮一 種情況,其中使用者定義三組操作碼;f001 ’ fo〇2以及f003 而以編碼0x6,0x16以及0x26分別地於指令之位兀16至23並 且以0於位元〇至3。該TIE編輯器702產生下面比較其操作碼 與所有使用者-定義指令7 5 0之操作碼的解碼功能: int decode_insn(const xtensa_insnbuf insn) { if ((insn[0] & OxffOOOf) == 0x60000) return xtensa_fool_op; if ((insn[0] & OxffOOOf) = 0x160000) return xtensa一foo2_op; if ((insn[0] & OxffOOOf) = 0x260000) return xtensa_foo3_op; return XTENSA_UNDEFINED; 以大數目之使用者-定義指令,相對於所有可能的使用 者-定義指令750比較一組操作碼是昂貴的,因此TIE編輯器 133 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(m ) 可以取而代之地使用一種切換陳述之階層組 switch (get_opO_field(insn)) { case 0x0: switch (get_opl_field(insn)) { case 0x6: switch (get—op2一field (insn)) { case 0x0: return xtensa_fool_op; case 0x1: return xtensa_foo2_op; case 0x2: return xtensa_foo3_op; default: return XTENSA一UNDEFINED; } ^ default: return XTENSA_UNDEFINED; } default: return XTENSA_UNDEFINED; } 除將指令操作碼解碼之外,該解碼DLL 748包含用以 將指令操作元解碼之功能。這步驟是以與編碼編碼DLL 744中之操作元的相同方式而完成。首先,解碼DLL 748提供功能而自機器指令抽取操作元欄。繼續先前的範 例,TIE編輯器702產生下面的功能而自指令之位元12至15 抽取一値: u_int32_t get_r_field (insn) xtensa_insnbuf insn; { return ((insn[0] & OxfOOO) » 12); } 操作元之TIE說明同時包含編碼以及解碼之格式,因而 當編碼DLL 744使用操作元編碼格式時,該解碼DLL 748便 使用操作元解碼格式。例如,其TIE操作元格式: operand txlO t {t<< 1 0 } {tx 1 0〉〉10} 134 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 、T. 539965 A7 B7___ 五、發明説明(132 ) 產生下面的操作元解碼功能: u_int32_t decode_txlO (val) u_int32_t val; { u_int32_t t, txlO; t = val; txlO = t « 10; return txlO; } 當使用者引用模擬器712時,其告知模擬器712有關包 含使用者-定義增強部736之解碼DLL 748的目錄。模擬器 7 1 2打開適當的DLL。當模擬器7 1 2解碼指令時,如果該指 令並未成功地被解碼功能所解碼以供用於其預定建構之指 令集,則模擬器712便引用DLL 748中之解碼功能。 給予一組解碼之指令750,模擬器712必須轉譯並且模 15 式化指令750之語意。這是功能性地完成。每一指令750具 有一組對應的功能,其允許模擬器712以將指令750之語意 模式化。模擬器7 1 2內部地追蹤所有被模擬處理器的狀態。 該模擬器7 1 2具有一組固定界面以更動或詢問處理器之狀 態。如上面所述,使用者-定義增強部736被以是Verilog之 子集的TIE硬體說明語言寫出。該TIE編輯器702轉換硬體說 明成爲被模擬器712使用以將新的增強部736模式化之C函 數。硬體說明語言中之操作器直接地被轉譯成爲對應的C操 作器。讀取狀態或寫入狀態之操作被轉譯成爲模擬器之界 面以更動或詢問處理器狀態。 135 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)539965 A7 _B7_____ 5. Description of the invention (13〇) (Please read the notes on the back before filling in this page) A2 can be coded into the number 2, the other a2 becomes the number 2 and a3 becomes the number 3. From the table, it finds the appropriate setting function. Set_r_field placement result If bit 2 enters the instruction bit position 1 2 * 1 5. Similar setting functions place the other 2 and 3 appropriately. . Simulator 7 1 2 The simulator 712 interacts with the user-definition enhancement section 73 6 in many ways. Given a set of machine instructions 740, the simulator 7 1 2 must decode the instructions; that is, split instructions into opcodes and operands. The decoding of the user-defined enhancement section 736 is performed by functions in the decoding DLL 748 (the encoding DLL 744 and the decoding DLL 748 may actually be a single set of DLLs). For example, consider a case where a user defines three sets of opcodes; f001 'fo02 and f003 and codes 0x6, 0x16, and 0x26 are 16 to 23 and 0 to 3, respectively, in instructions. The TIE editor 702 generates the following decoding function comparing its opcode with the opcode of all user-defined instructions 7 50: int decode_insn (const xtensa_insnbuf insn) {if ((insn [0] & OxffOOOf) == 0x60000 ) return xtensa_fool_op; if ((insn [0] & OxffOOOf) = 0x160000) return xtensa a foo2_op; if ((insn [0] & OxffOOOf) = 0x260000) return xtensa_foo3_op; return XTENSA_UNDEFINED; with a large number of users- Compared to all possible user-defined instructions 750, a definition instruction is expensive compared to a set of opcodes, so TIE editor 133 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the Invention (m) A hierarchical group of switch statements can be used instead (get_opO_field (insn)) {case 0x0: switch (get_opl_field (insn)) {case 0x6: switch (get—op2_field (insn)) {case 0x0: return xtensa_fool_op; case 0x1: return xtensa_foo2_op; case 0x2: return xtensa_foo3_op; default: return XTENSA_UNDEFINED;} ^ default: return XTENSA_UNDEFINED;} default : return XTENSA_UNDEFINED;} In addition to decoding the instruction opcode, the decoding DLL 748 contains functions for decoding instruction operands. This step is done in the same way as the operands in the encoding DLL 744 are encoded. First, the decode DLL 748 provides functions to extract operation meta columns from machine instructions. Continuing the previous example, the TIE editor 702 generates the following function and extracts a bit from bits 12 to 15 of the instruction: u_int32_t get_r_field (insn) xtensa_insnbuf insn; {return ((insn [0] & OxfOOO) »12); The TIE description of the operand contains both encoding and decoding formats. Therefore, when the encoding DLL 744 uses the operand encoding format, the decoding DLL 748 uses the operand decoding format. For example, its TIE operation meta format: operand txlO t {t < < 1 0} {tx 1 0 〉〉 10} 134 This paper size applies to China National Standard (CNS) A4 (210X297 mm) (please read the back first) Please note this page, please fill in this page), T. 539965 A7 B7___ 5. The invention description (132) produces the following operand decoding function: u_int32_t decode_txlO (val) u_int32_t val; {u_int32_t t, txlO; t = val; txlO = t «10; return txlO;} When the user refers to the simulator 712, it informs the simulator 712 about the directory containing the decoding DLL 748 of the user-defined enhancement section 736. Simulator 7 1 2 Open the appropriate DLL. When the simulator 7 1 2 decodes the instruction, if the instruction is not successfully decoded by the decoding function for the instruction set of its predetermined construction, the simulator 712 refers to the decoding function in the DLL 748. Given a set of decoded instructions 750, the simulator 712 must translate and model the semantics of the instructions 750. This is done functionally. Each instruction 750 has a corresponding set of functions that allow the simulator 712 to model the meaning of the instruction 750. Simulator 7 1 2 tracks the status of all simulated processors internally. The simulator 7 1 2 has a fixed set of interfaces to change or query the status of the processor. As described above, the user-defined enhancement 736 is written in the TIE hardware description language which is a subset of Verilog. The TIE editor 702 converts the hardware description into a C function used by the simulator 712 to model the new enhancement 736. Operators in the hardware description language are directly translated into corresponding C operators. Read or write status operations are translated into the simulator interface to change or query the processor status. 135 This paper size is applicable to China National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page)

539965 A7 B7 五、發明説明(l33 ) (請先閲讀背面之注意事項再填寫本頁) 如這實施例中之一組範例,考慮使用者產生增加兩組 暫存器之一組指令7 5 0。爲了簡化,這範例被選擇。以該硬 體說明語言,使用者可說明增加之語意如下: semantic add {add} {assign arr = ars + art;} 其輸出暫存器,內建名稱標記符號爲arr,被指定兩組 輸入暫存器之總數,內建名稱標記符號爲ars以及art。該 TIE編輯器702採取這說明並且產生一種被模擬器712使用之 語意功能。 void add_func(u32 OPNDO_, u32 0PND1_, u32 _0PND2_, u32 0PND3_) { set一ar( _〇PND0_, ar( _0PND1_ ) + ar(一0PND2 )); pc_incr( 3 ); } ^ 該硬體操作器” + "被直接地轉譯成爲C操作器"+ ”。硬體 暫存器ars以及art之讀取被轉譯成爲一組模擬器712功能呼 叫”ar”之呼叫。硬體暫存器arr之寫入被轉譯成爲一組呼叫 至模擬器712功能”set_ar”。因爲每一指令藉由指令之尺度 隱含地增量程式計數器,PC,因此TIE編輯器702同時也產 生一組呼叫至模擬器7 1 2功能,其將被模擬pc之尺度增加 3,那是相加指令之尺度。 當TIE編輯器702被引用時,其產生如上述每一使用者-定義指令之語意功能。其同時也產生一組映射所有操作碼 名稱至相關語意功能之表。該表以及功能使用標準編輯器 746被編輯成爲模擬器DLL 749。當使用者引用模擬器712 136 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 _ B7___ 五、發明説明(I34 ) 時,其告知模擬器712包含使用者-定義增強部73 6之目錄。 該模擬器712打開適當的DLL。當模擬器712被引用時,其 將所有程式中的指令解碼並且產生映射指令至相關語意功 能之一組表。當產生映射時,模擬器712打開其DLL並且搜 尋適當的語意功能。當模擬使用者-定義指令736之語意。 時,該模擬器712直接地引用DLL中之功能。 爲了告知使用者一組應用需耗時多久以執行於被模擬 硬體上,該模擬器7 1 2需要模擬指令7 5 0之性能效應。爲了 這目的,該模擬器7 1 2使用管線模式。每一指令執行經過許 多週期。在各週期中,一組指令使用不同的機器資源。該 模擬器7 1 2開始嘗試平行地執行所有的指令。如果多重指令t 嘗試使用相同資源於相同週期,則後者指令被延遲以等待 資源釋放。如果後者指令讀取某些被較早之指令寫入但在 稍後週期之狀態,則該後者指令被延遲以等待其値被寫 入。模擬器7 1 2使用一種功能性界面以將各指令之性能模式 化。對於每一型式指令產生一種功能。該功能包含至將處 理器性能模式化的模擬器界面之呼叫。 例如,考慮簡單三組暫存器指令f〇〇。TIE編輯器可產 生下面的模擬器功能: void foo_sched (u32 opO, u32 opl, u32 op2, u32 op3) { pipe一use一ifetch (3); pipe一use (REGF32_AR, opl, 1); 、 pipe一use (REGF32_AR, op2, 1); 137 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)539965 A7 B7 V. Description of the Invention (l33) (Please read the notes on the back before filling this page) As an example in this embodiment, consider the user to generate a group of instructions to add one of two sets of registers 7 5 0 . For simplicity, this paradigm was chosen. In this hardware description language, the user can explain the meaning of the addition as follows: semantic add {add} {assign arr = ars + art;} The output register, the built-in name tag symbol is arr, and is assigned two sets of input temporary. Total number of registers. The built-in name tags are ars and art. The TIE editor 702 takes this description and generates a semantic function used by the simulator 712. void add_func (u32 OPNDO_, u32 0PND1_, u32 _0PND2_, u32 0PND3_) {set one ar (_〇PND0_, ar (_0PND1_) + ar (one 0PND2)); pc_incr (3);} ^ This hardware operator ”+ " is translated directly into a C operator " + ". The reading of the hardware register ars and art is translated into a set of emulator 712 function calls "ar" calls. The writing of the hardware register arr is translated into a set of calls to the simulator 712 function "set_ar". Because each instruction implicitly increments the program counter and PC by the scale of the instruction, the TIE editor 702 also generates a set of calls to the Simulator 7 1 2 function, which will increase the scale of the simulated PC by 3, which is The scale of the add instruction. When the TIE editor 702 is referenced, it generates a semantic function for each user-defined instruction as described above. It also generates a set of tables that map all opcode names to related semantic functions. The table and functions are edited into a simulator DLL 749 using a standard editor 746. When the user refers to the simulator 712 136 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _ B7___ V. Description of the invention (I34), it informs the simulator 712 that it contains a user-defined enhancement unit 73 Directory of 6. The simulator 712 opens the appropriate DLL. When the simulator 712 is referenced, it decodes the instructions in all programs and generates a set of tables that map the instructions to the relevant semantic functions. When the mapping is generated, the simulator 712 opens its DLL and searches for the appropriate semantic function. When emulating the meaning of user-defined instruction 736. At this time, the simulator 712 directly references the functions in the DLL. In order to inform the user how long it takes a set of applications to execute on the simulated hardware, the simulator 7 1 2 needs to simulate the performance effect of the instruction 7 5 0. For this purpose, the simulator 7 1 2 uses a pipeline mode. Each instruction executes over many cycles. In each cycle, a set of instructions uses different machine resources. The simulator 7 1 2 starts trying to execute all instructions in parallel. If multiple instructions t try to use the same resource for the same cycle, the latter instruction is delayed to wait for the resource to be released. If the latter instruction reads some state that was written by an earlier instruction but at a later cycle, the latter instruction is delayed to wait for its frame to be written. Simulator 7 1 2 uses a functional interface to model the performance of each instruction. A function is generated for each type of instruction. This feature includes calls to the simulator interface that model the processor's performance. For example, consider a simple set of three register instructions f00. The TIE editor can produce the following simulator functions: void foo_sched (u32 opO, u32 opl, u32 op2, u32 op3) {pipe_use_ifetch (3); pipe_use (REGF32_AR, opl, 1);, pipe_ use (REGF32_AR, op2, 1); 137 This paper size applies to China National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page)

539965 A7 B7 五、發明説明(l35 ) (請先閲讀背面之注意事項再填寫本頁) pipe_def (REGF32_AR, opO, 2); pipe_def_ifetch (-1); } 至pipe_use_ifetch之呼叫告知模擬器7 12該指令將需要擷取3 位元組。至pipe_use之兩組呼叫告知模擬器712兩組輸入暫存器 將於週期1被讀取。至pipe_def之呼叫告知模擬器712輸出暫存器 將於週期2被寫入。至?丨0^心1丨化^11之呼叫告知模擬器712這指 令並非分支,因此其接著指令可以於接著週期被擷取。 這些功能的指示器被置於如語意功能之相同表。該功β 能本身被編輯成爲如語意功能之相同DLL 749。當模擬器 7 1 2被引用時,其產生一組在指令以及性能功能之間的映 射。當產生映射時,模擬器712打開DLL 749並且搜尋適當 的性能功能。 當模擬使用者-定義指令736之性能時,該模擬器712直 接地引用DLL 749中之功能。 除錯器730 除錯器以兩種方式與使用者-定義增強部7 5 0互動。第 一,使用者具有列印使用者-定義指令73 6之組合指令73 8的 能力。爲了完成這’除錯器7 3 0必須將機器指令7 4 0解碼成k 爲組合指令7 3 8。這是被模擬器7 1 2使用以將指令解碼之 相同機構,並且除錯器7 3 0最好是使用被模擬器712使用 以執行解碼之相同DLL。除解碼指令之外’該除錯器必 138 本紙張尺度適用中國國家標準(CNS) A4规格(210X297公釐) 539965 A7 _B7____ 五、發明説明(136 ) 須轉換被解碼之指令成爲串列。爲此目的,解碼DLL 74 8包 含一種映射各內部操作碼表示至對應的助記符號串列之功 能。這可以以一組簡單表被製作。 使用者可以藉指示包含使用者-被定義增強部750之目 錄的旗標或環境變數引用預先建立之除錯器。該預先建立 之除錯器動態地打開適當的DLL 74 8。 除錯器730同時也與使用者-定義狀態752互動。該除錯 器730必須能夠讀取並且修改該狀態752。爲了執行這步 驟,除錯器73 0與模擬器7 12進行通訊。其詢問模擬器7 12該 狀態之大小以及該狀態變數名稱。當除錯器730被要求列印 某些使用者狀態之値時,其以詢問預先定義狀態之相同方 式詢問模擬器7 1 2有關該値。相似地,爲了修改使用者狀 態,除錯器730告知模擬器712設定狀態至所給予的値。 因此,可明白依據本發明支援使用者-定義指令組以及 狀態之製作可以藉由使用定義被塞入核心軟體發展工具之 使用者功能性的模組而被達成。因此,一組系統可以產 生,其中特定一組使用者-定義增強部之塞入模組被保持爲 在系統之中的一組族群以便利組織以及操作。 進一步地說,該核心軟體發展工具可以被特定化至特 定的核心指令組以及處理器狀態,並且使用者-定義增強部 之單一組塞入模組可以與置於系統中之多組核心軟體發展 工具連接而被估算。 139 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)539965 A7 B7 V. Description of the invention (l35) (Please read the notes on the back before filling this page) pipe_def (REGF32_AR, opO, 2); pipe_def_ifetch (-1);} Call to pipe_use_ifetch to inform the simulator 7 12 This instruction 3 bytes will need to be retrieved. The two sets of calls to pipe_use inform the simulator 712 that two sets of input registers will be read in cycle 1. The call to pipe_def informs the simulator 712 that the output register will be written in cycle 2. to? The call of ^ 0 ^ 11 ^ 11 informs the simulator 712 that this instruction is not a branch, so its subsequent instructions can be fetched in subsequent cycles. The indicators for these functions are placed on the same table as the semantic functions. The function β itself can be edited into the same DLL 749 as the semantic function. When Simulator 7 1 2 is referenced, it generates a set of mappings between instructions and performance functions. When the mapping is generated, the simulator 712 opens the DLL 749 and searches for the appropriate performance function. When emulating the performance of the user-defined instruction 736, the simulator 712 directly references the functions in the DLL 749. Debugger 730 The debugger interacts with the user-defined enhancement 750 in two ways. First, the user has the ability to print a combined instruction 73 8 of the user-defined instruction 73 6. In order to accomplish this, the debugger 7 3 0 must decode the machine instruction 7 4 0 into k as a combined instruction 7 3 8. This is the same mechanism used by the simulator 7 1 2 to decode instructions, and the debugger 7 3 0 preferably uses the same DLL used by the simulator 712 to perform decoding. Except for the decoding instruction, the debugger must be 138. This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 _B7____ 5. Description of the invention (136) The decoded instructions must be converted into a serial. For this purpose, the decoding DLL 74 8 includes a function for mapping each internal opcode representation to a corresponding mnemonic symbol string. This can be made as a simple set. The user may refer to a pre-built debugger by a flag or an environmental variable indicating that the user-defined enhancement 750 directory is included. The pre-built debugger dynamically opens the appropriate DLL 74 8. The debugger 730 also interacts with the user-defined state 752 at the same time. The debugger 730 must be able to read and modify the status 752. To perform this step, the debugger 73 0 communicates with the simulator 7 12. It asks the simulator 7 12 the size of the state and the name of the state variable. When the debugger 730 is requested to print a certain status of a user, it asks the simulator 7 1 2 about the status in the same manner as the predefined status. Similarly, in order to modify the user status, the debugger 730 informs the simulator 712 to set the status to the given hiccup. Therefore, it can be understood that the creation of supporting user-defined instruction sets and states according to the present invention can be achieved by using a module that defines user functionality that is plugged into the core software development tool. As a result, a set of systems can be created, in which a particular set of user-definition plug-in modules is maintained as a set of ethnic groups in the system to facilitate organization and operation. Further, the core software development tool can be specified to a specific core instruction set and processor state, and a single set of plug-in modules of the user-defined enhancements can be developed with multiple sets of core software placed in the system The tool is connected and evaluated. 139 This paper size applies to China National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page)

539965 A7 B7 五、發明説明(l37 ) 元件標號對照表 10... …處理器組態系統 20··· …使用者組態界面 30··· …軟體發展工具 40." …硬體製作說明 50··· …建_LL系統 60·.· …處理器 62.·· …處理器控制部份 64.·. …解碼部份 66··· …ALU以及位址產生部份 68"· …分支邏輯以及指令擷取 70". …處理器界面 72". …中斷控制部份 74". …資料以及指令位址觀看部份 76"· …資料以及指令位址觀看部份 78". …視窗暫存器檔案 80". …資料以及指令快取和標籤部份 82". …寫入緩衝器 84". …計時器 86··· …組態管理器屏幕 90..· …選擇 92··· …晶片上除錯模組 94". …JTAG埠 96·.· …設計者定義指令執行單元 140 (請先閲讀背面之注意事項再填寫本頁)539965 A7 B7 V. Description of the Invention (l37) Comparison Table of Component Numbers 10 ...… Processor Configuration System 20 ···… User Configuration Interface 30 ···… Software Development Tool 40. "… Hardware Production Explanation 50 ..... __LL system 60 ...... Processor 62 ...... Processor control section 64 ..... Decoding section 66 .... ALU and address generation section 68 " … Branch logic and instruction fetch 70 " .... processor interface 72 " .... interrupt control section 74 ".… data and instruction address viewing section 76 " ·… data and instruction address viewing section 78 ".… Windows register file 80 " ... data and instruction cache and tag section 82 " ... write buffer 84 " .... timer 86 ··· ... configuration manager screen 90 .. · ... select 92 · ··… on-chip debug module 94 " .... JTAG port 96 ··· ... Designer-defined instruction execution unit 140 (Please read the precautions on the back before filling this page)

本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) 539965 A7 B7 五、發明説明(l38 ) 98……共同處理器 1〇〇……組態格式 102……使用者選擇的組態設定 106……搜尋引擎 108……編輯器 110……組譯器 112……模擬器 114……HDL說明 118……被模擬之程式 12 2......Design Compiler™ 124……樣本資料組This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of the invention (l38) 98 ... Common processor 100 ... Configuration format 102 ... Configuration selected by the user Settings 106 ... search engine 108 ... editor 110 ... translator 112 ... simulator 114 ... HDL description 118 ... simulated program 12 2 ... Design Compiler ™ 124 ... sample data group

126......ISS 1 2 8......Apollo™ 130……軟體簡介 13 2......區塊 134……硬體外形 2〇〇……估算板 202 ......CPLD 元件126 ... ISS 1 2 8 ... Apollo ™ 130 ... Introduction to the software 13 2 ... Block 134 ... Hardware shape 200 ... Estimation board 202 ... .... CPLD components

204 ......EPROM204 ... EPROM

206 ......SRAM206 ... SRAM

208 ......同步 SRAM 210……快閃記憶體 212……RS23 2串歹[J通道 214……組態埠 141 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)208 ... Synchronous SRAM 210 ... Flash memory 212 ... RS23 2 series [J channel 214 ... Configuration port 141 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) (Please read the notes on the back before filling this page)

539965 A7 B7 五、發明説明(l39 )539965 A7 B7 V. Description of the invention (l39)

216……特定組態ROM 217……可程式邏輯元件(PLD) 218……快取 2 22……標籤匯流排 224……標籤匯流排 228……快取 3〇〇……主要型式的組態性 302……可延伸性 3 04 ......可改變性 306……二分法選擇 3 08……參數性格式 400……TIE說明檔案 410……TIE剖析器程式 4 2 0......t i e 2 g c c 4 3 0 ......t i e 2 i s a 4 4 0 ......tie2iss 4 5 0 ......tie2 ver 4 6 0......tie2xtcs 470……C檔頭檔案 480……動態鏈路之檔案庫216 ... specific configuration ROM 217 ... programmable logic element (PLD) 218 ... cache 2 22 ... tag bus 224 ... tag bus 228 ... cache 300 ... main configuration 302 ... extensibility 3 04 ... changeability 306 ... dichotomy option 3 08 ... parametric format 400 ... TIE description file 410 ... TIE parser program 4 2 0 ... ... tie 2 gcc 4 3 0 ...... tie 2 isa 4 4 0 ...... tie2iss 4 5 0 ...... tie2 ver 4 6 0 ...... tie2xtcs 470 ... C file header file 480 ... Dynamic link archive

490......DLL 5〇〇……說明 510……指令碼 610……正反器 142 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁)490 ... DLL 5〇〇 …… Explanation 510 ... Instruction code 610 ... Flip-flop 142 This paper size applies to China National Standard (CNS) A4 (210X297 mm) (Please read the note on the back first (Fill in this page again)

539965 A7 B7 五、發明説明(⑽ 620…… 630…… 702…… 708…… 710…… 712…… 730…… 736…… 73 8…… 740…… 742…… 744…… 746…… 748…… 749…… 750…… 752…… 800…… 810…… 840…… 842…… 8 4 4..... 8 4 6 ..... 8 4 8 ..... 正反器 正反器 TIE編輯器 編輯器 組譯器 模擬器 除錯器 使用者-定義增強部 組合語言 機器指令 檔案 編碼檔案庫 指令集539965 A7 B7 V. Description of the invention (⑽ 620 ... 630 ... 702 ... 708 ... 710 ... 712 ... 730 ... 736 ... 73 8 ... 740 ... 742 ... 744 ... 746 ... 748 …… 749 …… 750 …… 752 …… 800 …… 810 …… 840 …… 842 …… 8 4 4 ..... 8 4 6 ..... 8 4 8 ..... Positive and negative converter TIE editor editor translator simulator debugger user-definition enhancement combined language machine instruction file encoding archive instruction set

解碼DLLDecode DLL

模擬器DLL 使用者-定義指令檔案 使用者-定義狀態 tie2gcc tie2isa t i e 2 i s s C檔頭檔案Emulator DLL user-defined command file user-defined status tie2gcc tie2isa t i e 2 i s s C header file

DLLDLL

主編輯器 解碼DLL (請先閲讀背面之注意事項再填寫本頁) •訂| 143 本紙張尺度適用中國國家標準(CNS) Α4規格(210X297公釐) 539965 A7 B7 五、發明説明(⑷)Main editor Decoding DLL (Please read the notes on the back before filling this page) • Order | 143 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) 539965 A7 B7 V. Description of invention (⑷)

8 4 9 ......模擬器DLL 8 5 0 ......說明 8 6 0 ......tie2 xto s 870……C指令碼 (請先閲讀背面之注意事項再填寫本頁) 、? 144 本紙張尺度適用中國國家標準(CNS) A4規格(210X297公釐)8 4 9 ...... Simulator DLL 8 5 0 ...... Description 8 6 0 ...... tie2 xto s 870 ...... C instruction code (Please read the precautions on the back before filling (This page),? 144 This paper size applies to China National Standard (CNS) A4 (210X297 mm)

Claims (1)

539965 A8 B8 C8 D8 六、申請專利範圍 經濟部智慧財產局員工消費合作社印製 1 · 一種用以設計可組態處理器之系統,該系統包含: 一組裝置,依據一種組態格式,用以產生該處理器之 硬體製作之說明;以及 一組裝置,依據該組態格式,用以產生¥定於該硬體 製作之軟體發展工具。 2. 如申請專利範圍第1項之系統,其中用以產生軟 體發展工具之裝置包含用以產生能夠產生在處理器上面執 行之程式碼之軟體發展工具之裝置。 3· 如申請專利範圍第1項之系統,其中該軟體發展 工具包含一組編輯器,其依組態格式被_裁,用以將一組 應用編輯成爲可被處理器執行之程式碼。 4 · 如申請專利範圍第1項之系統,其中該軟體發展 工具包含一組組譯器,其依組態格式被量裁,用以將一組 應用組譯成爲可被處理器執行之程式碼。 5 · 如申請專利範圍第1項之系統,其中該軟體發展 工具包含一組鏈接器,其依組態格式被量裁,用以鏈接可 被處理器執行之程式碼。 6. 如申請專利範圍第1項之系統,其中該軟體發展 工具包含一組反組譯器,其依組態格式被量裁,用以反組 譯可被處理器執行之程式碼。 7· 如申請專利範圍第1項之系統,其中該軟體發展 工具包含一組除錯器,其依組態格式被量裁,用以將可被 處理器執行之程式碼除錯。 8· 如申請專利範圍第7項之系統,其中該除錯器具 145 (請先聞讀背面之注意事項再本頁) 訂 平 I T N y, - A4 %—/ S N * 公 7 29 X 539965 經濟部智慧財產局員工消費合作社印製 A8 B8 C8 D8六、申請專利範圍 有供用於指令集模擬器和硬體製作之一組共同界面和組 態。 9. 如申請專利範圍第1項之系統,其中該軟體發展 工具包含一組指令集模擬器,其依組態格式被量裁,用以 模擬可被處理器執行之程式碼。 I 0.如申請專利範圍第9項之系統,其中該指令集模 擬器能夠模式化被模擬程式碼之執行以量測包含執行週期 之主要性能準則。 II .如申請專利範圍第1 0項之系統,其中該性能準則 是依據特定可組態微結構特點。 12·如申請專利範圍第10項之系統,其中該指令集模 擬器能夠簡介被模擬程式之執行以記錄標準簡介統計,包 含一些執行於各被模擬功能中之週期。 1 3 .如申請專利範圍第1項之系統,其中該硬體製作 說明包含詳細HDL硬體製作說明;合成原本;置放和引導 原本;可程式邏輯元件原本;測試平台;供確認之診斷測 試;用以在一組模擬器上面執行診斷測試之原本;以及測 試工具中之至少一組。 1 4.如申請專利範圍第1項之系統,其中用以產生該 硬體製作說明之裝置包含: 用以從組態格式產生該硬體製作說明之硬體說明語言 說明之裝置; 依據該硬體說明語言說明用以合成硬體製作邏輯之裝 置;以及 (請先閱讀背面之注意事項再本頁)539965 A8 B8 C8 D8 6. Application for Patent Scope Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs1. A system for designing a configurable processor, the system includes: a set of devices, according to a configuration format, for Generating a description of the hardware production of the processor; and a set of devices for generating a software development tool scheduled for the hardware production according to the configuration format. 2. The system of item 1 of the patent application, wherein the device for generating software development tools includes a device for generating software development tools capable of generating code for execution on a processor. 3. If the system of the first scope of the patent application, the software development tool includes a set of editors, which are tailored according to the configuration format to edit a set of applications into code that can be executed by the processor. 4 · If the system of the first scope of the patent application, the software development tool includes a set of translators, which are tailored according to the configuration format, and are used to translate a set of applications into code that can be executed by the processor . 5 · If the system of item 1 of the patent application scope, the software development tool includes a set of linkers, which are tailored according to the configuration format, and are used to link code that can be executed by the processor. 6. If the system of the first patent application scope, the software development tool includes a set of anti-compiler, which is tailored according to the configuration format, and used to de-compose the code that can be executed by the processor. 7. If the system of item 1 of the patent application scope, the software development tool includes a set of debuggers, which are tailored according to the configuration format to debug the code that can be executed by the processor. 8 · If you apply for the system of item 7 of the patent scope, the debugging device 145 (please read the notes on the back first and then this page) to level ITN y,-A4% — / SN * Male 7 29 X 539965 Ministry of Economic Affairs Printed by A8, B8, C8, D8, Consumer Cooperatives of the Intellectual Property Bureau. 6. The scope of patent application is for a set of common interfaces and configurations for instruction set simulators and hardware production. 9. If the system of the first patent application scope, the software development tool includes a set of instruction set simulators, which are tailored according to the configuration format to simulate code that can be executed by the processor. I 0. The system according to item 9 of the patent application scope, wherein the instruction set simulator can model the execution of the simulated code to measure the main performance criteria including the execution cycle. II. The system according to item 10 of the patent application scope, wherein the performance criterion is based on specific configurable microstructure characteristics. 12. The system of item 10 in the scope of patent application, wherein the instruction set simulator can profile the execution of the simulated program to record standard profile statistics, including some cycles executed in each simulated function. 1 3. If the system of item 1 of the patent application scope, wherein the hardware production instructions include detailed HDL hardware production instructions; synthesis original; placement and guidance original; programmable logic component original; test platform; diagnostic testing for confirmation An original for performing a diagnostic test on a set of simulators; and at least one of the test tools. 1 4. The system according to item 1 of the scope of patent application, wherein the device for generating the hardware production description includes: a device for generating a hardware description language description of the hardware production description from a configuration format; according to the hardware The description language describes the device used to synthesize the hardware production logic; and (Please read the precautions on the back before this page) 、言 146 本紙張尺度適用中國國家標準(CNS ) A4規格(210X29*7公釐) 539965 A8 B8 C8 D8 &、申請專利範圍 用以依據該被合成邏輯而置放及引導晶片上面構件以 形成一組電路之裝置。 1 5 ·如申請專利範圍第1 4項之系統,用以產生該硬體 製作說明之裝置進一步地包含: 用以證實該電路之時序的裝置;以及 用以決定該電路之面積、週期時間以及功率消耗之裝 置。' 16. 如申請專利範圍第1項之系統,進一步地包含用 以產生該組態格式之裝置。 17. 如申請專利範圍第16項之系統,其中用以產生該 組態格式之裝置是反應於被使用者選擇之組態參數。 18. 如申請專利範圍第16項之系統,其中用以產生該 組態格式之裝置是用以依據該處理器之設計目標而產生該 格式。 1 9.如申請專利範圍第1項之系統,其中該組態格式 至少包含處理器之可修改特性之一組參數格式。 20. 如申請專利範圍第1 9項之系統,其中該至少一組 參數格式指定包含一組功能性單元,以及操作該功能性單 元之至少一組處理器指令。 21. 如申請專利範圍第19項之系統,其中該至少一組 參數格式指定影響處理器狀態之結構之包含、排除以及特 點之一。 22. 如申請專利範圍第21項之系統,其中該結構是一 組暫存器檔案並且該參數格式指定該暫存器檔案中暫存器 147 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) m an* m (請先閱讀背面之注意事項再本頁) 、言· 經濟部智慧財產局員工消費合作社印製 539965 A8 B8 C8 D8 六、申請專利範圍 經濟部智慧財產局員工消費合作社印製 數目。 、 23·如申請專利範圍第21項之系統,其中該結構是 組指令快取。 24.如申請專利範圍第21項之系統 組資料快取。 2 5 ·如申請專利範圍第2 1項之系統 組寫入緩衝器。 26. 如申請專利範圍第21項之系統 片上面之ROM以及晶片上面之RAM之一種。 27. 如申請專利範圍第1 9項之系統,其中該至少一組 參數格式指定控制該處理器中資料和指令之至少一組的轉 g睪之一^組語意特性。 28. 如申請專利範圍第19項之系統,其中該至少一組 參數格式指定控制該處理器中指令之執行的執行特性。 29·如申請專利範圍第19項之系統,其中該至少一組 參數格式指定該處理器之除錯特性。 30·如申請專利範圍第19項之系統,其中該組態格式 包含指明一組預定特.點之選擇;一組處理器元件之尺寸或 數目;以及數値之指定的至少一種之一組參數格式。 3 1 ·如申請專利範圍第1項之系統,進一步地包含用 以評估組態格式適當性之裝置。 32·如申請專利範圍第31項之系統,其中該用以評估 之裝置包含一組互動評估工具。 3 3 .如申請專利範圍第31項之系統,其中該用以評估 其中該結構是一 其中該結構是一 其中該結構是晶 C請先閱讀背面之注意事項再本頁j -訂 148 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 539965 A8 B8 C8 D8 六、申請專利範圍 ~ 之裝置是用以評估組態格式所說明之處理器之硬體特性。 34. 如申請專利範圍第31項之系統,其中該用以評估 之裝置是依據處理器之被評估性能特性而用以評估組態格 式之適當性。 35. 如申請專利範圍第34項之系統,進一步地包含有 依據被評估性能特性而甩以提供引動組態格式之修改的資 訊之裝置。 3 6·如申請專利範圍第34項之系統,其中該等性能特 性包含製作處理器於一組晶片上面所需的面積、被處理器 所消耗之功率以及處理器之時脈速率之至少一種。 3 7.如申請專利範圍第31項之系統,其中該用以評估 之裝置是依據處理器之被評估軟體特性而用以評估組態格 式之適當性、。 38.如申請專利範圍第37項之系統,其中該用以評估 之裝置是利用評估於處理器上面執行組態格式所說明之一 套組評鑑程式所需的至少一組程式碼尺寸和週期而用以互 動地呈現一種適當性評估至使用者。 經濟部智慧財產局員工消費合作社印製 3 9·如申請專利範圍第3 1項之系統,其中該用以評估 之裝置是用以評估組態格式說明之處理器之硬體特性和軟 體特性。 40·如申請專利範圍第1項之系統,其中該用以產生 之裝置進一步地用以一起提供硬體性能和成本以及軟體應 用性能之特徵以便利修改組態格式。 4 1.如申請專利範圍第1項之系統,其中該用以產生 149 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 539965 A8 B8 C8 D8 六、申請專利範圍 經濟部智慧財產局員工消費合作社印製 之裝置進一步地用以一起提供硬體性能和成本以及軟.體應 用性能之特徵以便利該組態格式之延伸。 42·如申請專利範圍第1項之系統,其中該用以產生 之裝置進一步地用以一起提供硬體性能和成本以及軟體應 用性能之特徵以便利該組態格式之修改,並且用以一起提 供硬體性能和成本以及軟體應用性能之特徵以便利該組態 格式之延伸說明。 43 _如申請專利範圍第1項之系統,進一步地包含利 用延伸而甩以產生該處理器之組態的裝置。 44·如申請專利範圍第1項之系統,其中該組態格式 包含該處理费之可延伸特性之至少一組延伸格式。 4 5.如申請專利範圍第44項之系統,其中該延伸格式 指定一組另外的指令。 46.如申請專利範圍第44項之系統,其中該延伸格式 指定包含一組使用者-定義指令以及該指令之一種製作。 、 v 47·如申請專利範圍第46項之系統,其中用以產生該 軟體發展工具之裝置包含用以建議特別地適合於至少一組 應用之可能使用者-定義指令給予使用者之裝置。 48·如申請專利範圍第46項之系統,其中該軟體發展 工具包含能夠產生使用者-定義指令之一組編輯器。 49.如申請專利範圍第48項之系統,其中該編輯器能 夠將包含使用者-定義指令之程式碼最佳化。 5 0.如申請專利範圍第46項之系統,其中該軟體發展 工具包含能夠荸生使用者-定義指令之一組組譯器;能夠使 150 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐) I 背 項 Η 訂 539965 A8 B8 C8 D8 穴、申請專利範圍 經濟部智慧財產局員工消費合作社印製 57·如申請專利範圍第52項之系統,其中該用以產生 軟體發展工具之裝置包含用以產生被使用於依組態格式被 量裁之一組組譯器中一組編碼表之裝置。 58.如申請專利範圍第52項之系統,其中該用以產生 硬體製作說明之裝置進一步地用以產生新特點之資料通道 硬體的說明,該資料通道硬體相容於該處理器之一組特定 管線結構。 5 9·如申請專利範圍第44項之名統,其中該另外的指 令不添加新的狀態至該處理器。 60.如申請專利範圍第44、項之系統,其中該另外的指 令添加狀態至該處理器。 61_如申請專利範圍第1項之系統,其中該組態格式 包含利用一組指令集結構說明語言說明之至少一部份指 定。 1 62.如申請專利範圍第61項之系統,其中該用以產生 硬體製作說明之裝置包含用以從該指令集結構語言說明自 動地產生指令解碼邏輯之裝覃。 6 3.如申請專利範圍第61項之系統,其中該用以產生 軟體發展工具之裝置包含用以從該指令集結構語言說明自 動地產生一組組譯器核心之裝置。 64. 如申請專利範圍第61項之系統,其中該用以產生 軟體發展工具之裝置包含用以從該指令集結構語言說明自 動地產生一組編輯器之裝置。 65. 如申請專利範圍第61項之系統,其中該用以產生 152 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閲讀背面之注意事項再本頁) 、言 539965 A8 B8 C8 D8 六、申請專利範圍 軟體發展工具之裝置包含用以從該指令集結構語言說明自 動地產生一組反組譯器之裝置。 66. 如申請專利範圍第61項之系統,其中該用以產生 軟體發展工具之裝置包含用以從該指令集結構語言說明自 動地產生一組指令集模擬器之裝置。 67. 如申請專利範圍第1項之系統,其中該用以產生 硬體製作說明之裝置包含用以將該硬體製作說明和該軟體 發展工具之至少一種的一部份前處理以依據組態格式分別 地修改該硬體製作說明和該軟體工具之裝置。 6 8.如申請專利範圍第67項之系統,其中該前處理裝 置是用以評估該硬體製作說明和該軟體發展工具之一種中 的一組表示並且依據該組態格式以一組數値取代該表示。 69. 如申請專利範圍第68項之系統,其中該表示包含 一組疊代構造、一組條件構造和一組資料庫詢問之至少一 種。 70. 如申請專利範圍第1項之系統,其中該組態格式 包含至少一組指明該處理器之可修改特性之參數格式以及 至少一組指明該處理器之可延伸特性之延伸格式。 71·如申請專利範圍第70項之系統,其中該可修改特 性是對於該核心格式之修改以及未指定於該核心格式中之 一組選擇特點之一種。 72.如申請專利範圍第丨項之系統,其中該組態格式 包含至少一組指明該處理器之二分法可選擇特性之參數格 式、至少一組該處理器之參數式可指定特性、以及至少一 153 本紙張尺度適用中國國家襟準(CNS ) A4規格(210X297公嫠) (請先閱讀背面之注意事項再 本頁 經濟部智慧財產局員工消費合作社印製 539965 A8 B8 C8 D8 六、申請專利範圍 組指明該處理器之可延伸特性之延伸格式。 73. —種設計可組態處理器之方法,該方法包含: 依據一種組態格式產生該處理器之硬體製作的說明; 以及 依據該組態格式產生特定於該硬體製作之軟體發展工 具。 74. —種用以設計可組態處理器之系統,該系統包含: 用以產生具有使用者-可定義部份之一種組態格式之裝 置,該組態格式之該使用者-可定義部份包含 使用者-定義處理器狀態之一種格式,以及 ' 相關的至少一組使用者-定義指令以及一組使用者-定義 功能,該功能包含讀取自以及寫入至使用者-定義處理器狀 態之至少一種;以及 依據一種組態格式而用以產生該處理器之硬體製作說 明之裝置。 75. 如申請專利範圍第74項之系統,其中該處理器之 硬體製作說明包含用以執行至少一組使用者-定義指令以及 用以製作使用者-定義處理器狀態所必須的控制邏輯之說 經 濟 部 智 慧 財 產 局 貝 X 消 費 合 作 社 印 製 明。 76·如申請專利範圍第75項之系統,其中: 該處理器之硬體製作說明一組指令執行管線;以及 該控制邏輯包含相關於該指令執行管線之各步驟的部 份。 77.如申請專利範圍第76項之系統,其中·· L _ 154 用中國國家標準(CNS ) A4· ( 210X297公釐) A8 B8 C8 D8 539965 六、申請專利範圍 該硬體製作說明包含用以放棄指令執行之一組電路說 明;並且 該控制邏輯包含用以防止使用者·定義狀態被所放棄指 令修改之電路。 78. 如申請專利範圍第77項之系統,其中該控制邏輯 包含用以對於至少一組使用者-定義指令進行指令發出、操 作元旁_和操作元寫入引動中之至少一種的電路。 79. 如申請專利範圍第76項之系統,其中該硬體製作 說明包含用以製作指令執行管線之多數個步驟中之使用者_ 定義狀態之暫存器。 8〇·如申請專利範圍第76項之系統,其中: 該硬體製作說明包含在不同於其中輸出操作元被產生 之一組的管線步驟中被寫入之狀態暫存器;並且 該硬體製作說明指定此等寫入被旁通進入依序的指 令,其在至該狀態的寫入被託付之前參考使用者-定義處理 器狀態。 8 1 ·如申請專利範圍第74項之系統,其中; 該組態格式除該使用者-定義部份之外包含一組預定部 份;並且 該格式之預定部份包含用以便利儲存使用者-定義狀態 至記憶體之一組指令以及用以便利從記憶體重存使用者-定 義狀態之一組指令。 82.如申請專利範圍第81項之系統,進一步地包含使 用該指令用以產生以切換使用者-定義狀態之軟體的裝置。 155 本紙張尺度適用中國國家標準(CNS ) A4規格(210 X 297公釐) (請先閲讀背面之注意事項再本頁) IJ.H IJ 經濟部智慧財產局員工消費合作社印製 539965 經濟部智慧財產局員工消費合作社印製 A8 B8 C8 D8六、申請專利範圍 ' 83. 如申請專利範圍第74項之系統,進一步地包含裝 置,用以產生,用以組譯使用者-定義處理器狀態以及至少 一組使用者-定義指令之一組組譯器;用以編輯使用者·定義 處理器狀態以及至少一組使用者-定義指令之一組編輯器; 用以模擬使用者-定義處理器狀態以及至少一組使用者-定義 指令之一組模擬器;以及用以將使用者-定義處理器狀態以 及至少一組使用者-定義指令除錯之一組除錯器中之至少一 種。 84. 如申請專利範圍第74項之系統,進一步地包含裝 置,供產生用以組譯使用者-定義處理器狀態以及至少一組 使用者-定義指令之一組組譯器、用以編輯使用者-定義處理 器狀態以及至少一組使用者-定義指令之一組編輯器、用以 模擬使用者-定義處理器狀態以及至少一組使用者-定義指令 之一組模擬器、以及用以將使用者-定義處理器狀態以及至 少一組使用者-定義指令除錯之一組除錯器。 85. 如申請專利範圍第74項之系統,其中該格式之使 用者-定義部份包含指明使用者-定義狀態之尺寸和指標之至 少一組陳述。 86·如申請專利範圍第85項之系統,其中該格式之使 用者-定義部份包含與使用者-定義狀態相關以及指明一組處 理器暫存器中使用者-定義狀態之封裝的至少一組屬性。 87·如申請專利範圍第74項之系統,其中該格$之使 用者-定義部份包含指明使用者-定義狀態至處理器暫存器之 一種映射的至少一組陳述。 (請先閱讀背面之注意事項再本頁) - I —V . 、τ 156 本紙張尺度適用中國國家標準(CNS ) Α4規格(210 X 297公釐) 539965 A8 B8 C8 D8 六、申請專利範圍 ^~" 88. 如申請專利範圍第74項之系統,其中該用以產生 硬體製作說明之裝置包含用以自動地映射該使用者-定義狀 態至處理器暫存器之裝置。 89. 如申請專利範圍第74項之系統,其中該格式之使 用者-定義部份包含指明使用者-定義指令之類別以及其於使 用者-定義狀態之影響的至少一組陳述。 90·如申請專利範圍第74項之系統,其中該格式之使 用者-定義部份包含排定一組數値至該使用者-定義狀態之至 少一組指定陳述。 9 1 · 一種用以設計可組態處理器之系統,該系統包含·· 一組核心軟體工具,用以依據一種指令集結構格式而 產生特定於該格式之軟體發展工具;以及 一組使用者-定義指令模組,用以依據一種使用者-定義 指令格式而產生製作該使用者-定義指令時該核心軟體工具 使用之至少一組模組。 92.如申請專利範圍第91項之系統,其中該核心軟體 工具包含能夠產生程式碼以執行於處理器上面之軟體工 具。 經濟部智慧財產局員工消費合作社印製 93·如申請專利範圍第91項之系統,其中該至少一組 模組被製作爲一組動態鏈接檔案庫。 94.如申請專利範圍第9 1項之系統,其中該至少一組 模組被製作爲一組列表。 9 5.如,申請專利範圍第91項之系統,其中該核心軟體 工具包含一組編輯器,使用該使用者-定義指令模組,用以 157 本紙張尺度適用中國國家標準(CNS ) M規格(210X297公釐) 539965 A8 B8 C8 D8 六、申請專利範圍 編輯一組應用成爲使用該使用者-定義指令並且可被該處理 器執行之程式碼。 96·如申請專利範圍第95項之系統,其中該至少一組 模組包含在編輯該使用者-定義指令中供編輯器使用之模 組。 97.如申請專利範圍第91項之系統,其中該核心軟體 工具包含一組譯器,用以使用該使用者-定義模組而組譯一 組應用成爲使用該使用者-定義指令並且可被處理器執行之 程式碼。 9 8.如申請專利範圍第97項之系統,其中該至少一組 模組包含供組譯器使用而映射組合語言指令至使用者-定義 指令之模組。 99. 如申請專利範圍第98項之系統,其中: 該系統進一步地包含指明非使用者定義指令之一組核 心指令集格式;並且 該核心指令集格式被'組譯器使用而將該應用組譯成爲 可被處理器執行之程式碼。 經濟部智慧財產局員工消費合作社印製 100. 如申請專利範圍第91項之系統,其中該核心軟體 工具包含用以模擬可被該處理器執行之程式碼之一組指令 集模擬器。 101·如申請專利範圍第100項之系統,其中該至少一 .組模組包含供模擬器使用而模擬該使用者-定義指令之執行 的一組模擬器模組。 102.如申請專利範圍第1〇1項之系統,其中供模擬器 158 本紙張尺度適用中國國家榇準(CNS ) A4規格(210 X 297公釐) 539965 A8 B8 C8 D8 六、申請專利範圍 使用之該模組包含用以將該使用者-定義指令解碼之資料。 1〇3.如申請專利範圍第102項之系統,其中當它們無 法被解碼爲預先定義指令時該模擬器使用一組模組將使用 該模擬器模組之指令解碼。 104.如申請專利範圍第91項之系統,其中該核心軟體 工具包含使用該使用者-定義模辑將使用該使用者-定義指令 並且可被處理器執行之程式碼除錯之一組除錯器。 1 05 .如申請專利範圍第1 〇4項之系統,其中該至少一 組模組包含可被該除錯器使用而將機器指令解碼成爲組合 指令之一組模組。 106·如申請專利範圍第104項之系統,其中該至少一 組模組包含可被該鼠錯器使用而轉換組合指令成爲串列之 一組模組。 107. 如申請專利範圍^第104項之系統,其中: 該核心軟體工具包含用以模擬可被處理器執行之程式 碼的一組指令集模擬器;並且 該除錯器與該模擬器通訊以得到使用者-定義狀態上面 供除錯之資訊。 108. 如申請專利範圍第91項之系統,其中一組單一使 用者-定義指令可依據不同的核心指令集格式而無修改被多 重核心軟體工具所使用。 109. —種用以設計可組態處理器之系統,該系統包含: 核心軟體工具,用以依據一組指令集結構格式而產生 特定於該格式之軟體發展工具; 159 本紙張尺度適用中國國家標準(CNS ) A4規格(210 X 297公釐) (請先閲讀背面之注意事項再填寫本頁) 、v" 經濟部智慧財產局員工消費合作社印製 539965 A8 B8 C8 ____ D8 々、申請專利範圍 一組使用者-定義指令模組,用以依據一組使用者-定義 指令格式而產生一群製作該使用者-定義指令中供該核心軟 體工具使用之至少一組模組;以及 儲存裝置,供同時地儲存被使用者-定義指令模組產生 之族群’各族群對應至一組不同的使用者·定義指令。 110·如申請專利範圍第109項之系統,其中該至少一 組模組被製作爲動態鏈接檔案庫。 111·如申請專利範圍第、109項之系統,其中該至少一 組模組被製作爲一組列表。 112.如申請專利範圍第1〇9項之系統,其中該核心軟 體工具包含一組編輯器,使用該使用者-定義指令模組,用 以編輯一組應用成爲使用該使用者-定義指令並且可被該處 理器執行之程式碼。 I 1 3 .如申請專利範圍第1 1 2項之系統,其中該至少一 組模組包含在編輯該使用者-定義指令中供編輯器使用之模 組。 經濟部智慧財產局員工消費合作社印製 114.如申請專利範圍第109項之系統,其中該核心軟 體工具包含一組譯器,用以使用該使用者-定義模組而組譯 一組應用成爲使用該使用者·定義指令並且可被處理器執行 之程式碼。 II 5.如申請專利範圍第114項之系統,其中該至少一 組模組包含供組譯器使用而映射組合語言指令至使用者-定 義指令之模組。 II6·如申請專利範圍第109項之系統,其中該核心軟 160 本紙張尺度逋用中國國家擦準(CNS ) A4規格(210X297公釐) 體工具包貪用以模擬可被該處理器執行之程式碼之一組指 令集模擬器。 II7•如申請專利範圍第116項之系統,其中該至少一 組模組包含供摔擬器使用而模擬該使用者-定義指令之執行 的一組模擬器模組。 118. 如申請專利範圍第117項之系統,其中供模擬器 使用之該模組包含用以將該使用者-定義指令解碼之資料。 119. 如申請專利範圍第118項之系統,其中當它們無 法被解碼爲預先定義指令時該模擬器使用一組模組將使用 該模擬器模組之指令解碼。 . 120. 如申請專利範圍第109項之系統,其中該核心軟 體工具包含使用該使用者-定義模組將使用該使用者-定義指 令並且可被處理器執行之程式碼除錯之一組除錯器。 121·如申請專利範圍第120項之系統,其中該至少一 組模組包含可被該除錯器使用而將機器指令解碼成爲組合 指令之一組模組。 122. 如申請專利範圍第120項之系統,其中該至少一 組模組包含可被該除錯器使用而轉換組合指令成爲串列之 一組模組。 經濟部智慧財產局員工消費合作社印製 123. —種用以設計可組態處理器之系統,該系統包含: 多數個族群之核心軟體工具,各族群用以依據一組指 令集結構格式而產生特定於該格式之軟體發展工具;以及 一組使用者-定義指令模組,用以依據一組使用者-定義 指令格式而產生製作該使用者-定義指令中被一組族群之核 161 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 539965 A8 B8 C8 __ —_ D8 六、申請專利範圍 心軟體工具使用之至少一組模組。 124. 如申請專利範圍第123項之系統,其中該至少一 組模組被製作爲一種動態鏈接檔案庫。 125. 如申請專利範圍第123項之系統,其中該至少一 組模組被製作爲一組列表。 12 6.如申請專利範圍第123項之系統,其中至少一組 族群之核心軟體工具包含一組編輯器,使用該使用者-定義 指令模組,用以編輯一組應用成爲使用該使用者-定義指令 並且可被該處理器執行之程式碼。 127.如申請專利範圍第126項之系統,其中該至少一 組模組包含在編輯該使用者-定義指令中供編輯器使用之模 組。 12 8.如申請專利範圍第123項之系統,其中至少一組 \ 族群之核心軟體工具包含一組譯器,用以使用該使用者-定 義模組而組譯一組應用成爲使用該使用者·定義指令並且可 被處理器執行之程式碼。 129. 如申請專利範圍第128項之系統,其中該至少一 組模組包含供組譯器使用而映射組合語言指令至使用者·定 義指令之模組。 經濟部智慧財產局員工消費合作社印製 130. 如申請專利範圍第123項之系統,其中至少一組 族群之核心軟體工具包含用以模擬可被處理器執行之程式 碼的一組指令集模擬器。 13 1.如申請專利範圍第130項之系統,其中該至少一 組模組包含供模擬器使用而模擬該使用者-定義指令之執行 162 本紙張尺度適用中國國家標準(CNS ) A4規格(210 X 297公釐) 539965 A8 B8 C8 D8 經濟部智慧財產局員工消費合作社印製 申請專利範圍 的一組模擬器模組。 132. 如申請專利範圍第131項之系統,其中供模擬器 使用之該模組包含用以將該使用者-定義指令解碼之資料。 133. 如申請專利範圍第132項之系統,其中當它們無 法被解碼爲預先定義指令時該模擬器使用一組模組將使用 該模擬器模組之指令解碼。 * 4 134. 如申請專利範圍第123項之系統,其中至少一組 族群之核心軟體工具包含使用該使用者-定義模組將使用該 使用者-定羲指令並且可被處理器執行之程式碼除錯之一組 除錯器。 135. 如申請專利範圍第134項之系統,其中該至少一 組模組包含可被該除錯器使用而將機器指令解碼成爲組合 指令之一組模組。 136·如申請專利範圍第134項之系統,其中該至少一 組模組包含可被該除錯器使用而轉換組合指令成爲串列之 一組模組。 3 6 11 本紙張尺度適用中國國家標準(CNS ) A4規格(210 X 297公釐)146, 146 This paper size applies to Chinese National Standard (CNS) A4 specification (210X29 * 7 mm) 539965 A8 B8 C8 D8 &, the scope of patent application is used to place and guide the upper components of the wafer to form according to the synthesized logic A set of circuits. 1 5 · If the system of item 14 of the scope of patent application, the device for generating the hardware production instructions further comprises: a device for verifying the timing of the circuit; and a device for determining the area, cycle time, and Power consumption device. '16. The system according to item 1 of the patent application scope further includes a device for generating the configuration format. 17. The system of claim 16 in which the scope of patent application is applied, wherein the device used to generate the configuration format is responsive to the configuration parameters selected by the user. 18. If the system of claim 16 is applied for, the device for generating the configuration format is used to generate the format according to the design goals of the processor. 19. The system according to item 1 of the scope of patent application, wherein the configuration format includes at least one set of parameter formats of the modifiable characteristics of the processor. 20. The system of claim 19, wherein the at least one parameter format designation includes a set of functional units and at least one set of processor instructions for operating the functional units. 21. The system of claim 19, wherein the at least one set of parameter formats specifies one of inclusions, exclusions, and features that affect the state of the processor. 22. If the system of the scope of application for patent No. 21, wherein the structure is a set of register files and the parameter format specifies the register 147 in the register file, this paper size is applicable to the Chinese National Standard (CNS) A4 specification ( 210X297 mm) m an * m (please read the notes on the back first and then this page), printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs, printed 539965 A8 B8 C8 D8 VI. Patent application scope Employees of the Intellectual Property Bureau of the Ministry of Economic Affairs Cooperatives printed. 23. The system according to item 21 of the scope of patent application, wherein the structure is a group instruction cache. 24. System group data cache such as the scope of patent application No. 21. 2 5 · The system group write buffer such as the 21st in the scope of patent application. 26. For example, one of the system on-chip ROM and the on-chip RAM. 27. The system of claim 19, wherein the at least one set of parameter formats specifies controlling one of a set of semantic characteristics of at least one set of data and instructions in the processor. 28. The system of claim 19, wherein the at least one set of parameter formats specifies execution characteristics that control the execution of instructions in the processor. 29. The system of claim 19, wherein the at least one set of parameter formats specifies debug characteristics of the processor. 30. The system of claim 19, wherein the configuration format includes a selection indicating a predetermined set of features; the size or number of a set of processor elements; and at least one of a set of parameters specified by the number format. 3 1 · The system according to item 1 of the patent application scope further includes a device for evaluating the appropriateness of the configuration format. 32. The system of claim 31, wherein the device used for evaluation includes a set of interactive evaluation tools. 3 3. If the system of the 31st scope of the patent application, which is used to evaluate where the structure is one, where the structure is one, where the structure is crystal C, please read the precautions on the back before ordering 148 pages of this paper The standard applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A8 B8 C8 D8 6. The scope of the patent application ~ is used to evaluate the hardware characteristics of the processor described in the configuration format. 34. If the system of the scope of patent application No. 31, the device used for evaluation is used to evaluate the appropriateness of the configuration format according to the evaluated performance characteristics of the processor. 35. The system under item 34 of the scope of patent application further includes a device that provides information to modify the configuration of the trigger configuration based on the performance characteristics being evaluated. 36. The system according to item 34 of the patent application, wherein the performance characteristics include at least one of the area required to make the processor on a group of chips, the power consumed by the processor, and the clock rate of the processor. 3 7. The system according to item 31 of the scope of patent application, wherein the device used for evaluation is used to evaluate the appropriateness of the configuration format according to the characteristics of the evaluated software of the processor. 38. The system of claim 37, wherein the device used for evaluation uses at least one set of code size and period required to execute a set of evaluation procedures described in the configuration format on the processor It is used to interactively present an appropriateness assessment to the user. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 39. If the system of item 31 in the scope of patent application is applied, the device used for evaluation is used to evaluate the hardware and software characteristics of the processor in the configuration format description. 40. The system of item 1 in the scope of patent application, wherein the means for generating is further used to provide features of hardware performance and cost and software application performance together to facilitate modification of the configuration format. 4 1. If the system of item 1 of the scope of patent application, which is used to generate 149 paper standards, the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A8 B8 C8 D8 The devices printed by the bureau's consumer cooperatives are further used to provide features of hardware performance and cost and software application performance together to facilitate the extension of the configuration format. 42. The system according to item 1 of the scope of patent application, wherein the means for generating is further used to provide features of hardware performance and cost and software application performance together to facilitate modification of the configuration format, and to provide together Features of hardware performance and cost and software application performance to facilitate extended description of the configuration format. 43 _ The system according to item 1 of the scope of patent application, further comprising means for generating the configuration of the processor by using extensions. 44. The system of claim 1 in which the configuration format includes at least one set of extended formats including the extensible characteristics of the processing fee. 4 5. The system according to item 44 of the patent application, wherein the extended format specifies a further set of instructions. 46. The system of claim 44 wherein the extended format designation includes a set of user-defined instructions and a production of the instructions. V 47. The system according to item 46 of the patent application, wherein the means for generating the software development tool includes means for suggesting possible user-defined instructions specifically adapted to at least one set of applications to the user. 48. The system of claim 46, wherein the software development tool includes a set of editors capable of generating user-defined instructions. 49. The system of claim 48, wherein the editor is capable of optimizing code containing user-defined instructions. 50. The system according to item 46 of the scope of patent application, wherein the software development tool includes a translator capable of generating user-definition instructions; capable of making 150 paper sizes applicable to Chinese National Standard (CNS) Α4 specifications ( (210 × 297 mm) I Back Item Order 539965 A8 B8 C8 D8 Hole, Patent Application Scope Printed by Intellectual Property Bureau Employee Consumer Cooperatives 57. If the system applies for Item 52 of the Patent Scope, which is used to generate software development tools The device includes a device for generating a set of codelists used in a set of translators tailored according to a configured format. 58. The system of claim 52 in which the scope of patent application is applied, wherein the device for generating hardware production instructions is further used for generating a description of a new feature of data channel hardware, the data channel hardware is compatible with the processor A specific set of pipeline structures. 5 9. If the name of the scope of patent application is 44, the additional instruction does not add a new state to the processor. 60. The system of claim 44 in the scope of patent application, wherein the additional instruction adds status to the processor. 61_ The system according to item 1 of the patent application scope, wherein the configuration format includes at least a part of the specification using a set of instruction set structure description language descriptions. 1 62. The system according to item 61 of the scope of patent application, wherein the device for generating hardware production instructions comprises a device for automatically generating instruction decoding logic from the instruction set structure language description. 6 3. The system of claim 61, wherein the device for generating a software development tool includes a device for automatically generating a set of translator cores from the instruction set structure language description. 64. The system of claim 61, wherein the device for generating a software development tool includes a device for automatically generating a set of editors from the instruction set structure language description. 65. If you apply for the system of item 61 of the patent scope, which is used to generate 152 paper sizes applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the precautions on the back before this page), word 539965 A8 B8 C8 D8 6. The device of the software development tool for patent application scope includes a device for automatically generating a set of anti-compiler from the instruction set structure language description. 66. The system of claim 61, wherein the device for generating a software development tool includes a device for automatically generating a set of instruction set simulators from the instruction set structure language description. 67. The system of claim 1 in the patent application scope, wherein the device for generating hardware production instructions includes preprocessing a part of at least one of the hardware production instructions and the software development tool according to the configuration The format modifies the hardware production instructions and the software tools separately. 6 8. The system according to item 67 of the patent application scope, wherein the pre-processing device is used to evaluate a set of representations of the hardware production instructions and the software development tool and use a set of numbers according to the configuration format Replace that representation. 69. The system of claim 68, wherein the representation includes at least one of a set of iterative structures, a set of conditional structures, and a set of database queries. 70. The system of claim 1, wherein the configuration format includes at least one set of parameter formats indicating the modifiable characteristics of the processor and at least one set of extended formats indicating the extensible characteristics of the processor. 71. The system of claim 70, wherein the modifiable feature is one of a modification to the core format and a set of selected features not specified in the core format. 72. The system according to the scope of application for a patent, wherein the configuration format includes at least one set of parameter formats indicating the dichotomy selectable characteristics of the processor, at least one set of parameter configurable characteristics of the processor, and at least 153 This paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 gong) (Please read the precautions on the back before printing on this page printed by the Employee Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economy 539965 A8 B8 C8 D8 The scope group specifies the extended format of the extensible characteristics of the processor. 73. A method of designing a configurable processor, the method comprising: generating a hardware-made description of the processor based on a configuration format; and based on the The configuration format generates software development tools specific to the hardware. 74. A system for designing a configurable processor, the system includes: a configuration format for generating a user-definable portion Device, the user-definable part of the configuration format includes a format for user-defining processor status, and 'relevant At least one set of user-defined instructions and one set of user-defined functions, the functions including at least one of reading from and writing to user-defined processor states; and generating the according to a configuration format Device for producing hardware description of a processor 75. The system of claim 74 for patent application, wherein the hardware production instruction of the processor includes instructions for executing at least one set of user-defined instructions and for producing users- The control logic necessary to define the state of the processor is printed by the Intellectual Property Bureau of the Ministry of Economic Affairs, X. Consumer Cooperatives. 76. If the system is under the scope of patent application No. 75, where: The hardware of the processor indicates a set of instructions to execute Pipeline; and the control logic contains parts related to each step of the instruction execution pipeline. 77. If the system of the scope of patent application No. 76, where L · 154 uses the Chinese National Standard (CNS) A4 · (210X297) (%) A8 B8 C8 D8 539965 VI. Scope of patent application The hardware production instructions include a set of circuit instructions for giving up instruction execution; And the control logic includes a circuit to prevent the user-defined state from being modified by the abandoned command. 78. For example, the system of patent application scope item 77, wherein the control logic includes a circuit for performing at least one set of user-defined commands. Circuit for at least one of instruction issuing, operating element_ and operating element writing activation. 79. For a system applying for scope 76 of the patent application, the hardware production instructions include most of the steps used to create the instruction execution pipeline. User_ register for defining the status. 80. The system according to item 76 of the patent application, in which: the hardware production instructions are written in a pipeline step different from the one in which the output operand is generated. And the hardware production instructions specify that these writes are bypassed into sequential instructions, which refer to the user-defined processor state before the writes to that state are entrusted. 8 1 · If the system of scope 74 of the patent application, wherein: the configuration format includes a set of predetermined parts in addition to the user-defined part; and the predetermined part of the format includes a user-friendly storage -A set of instructions to define states to memory and a set of instructions to facilitate storing users from memory. 82. The system of claim 81, further comprising a device that uses the instruction to generate software to switch user-defined states. 155 This paper size is in accordance with Chinese National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before this page) IJ.H IJ Printed by the Intellectual Property Bureau of the Ministry of Economic Affairs and Consumer Cooperatives 539965 Wisdom of the Ministry of Economic Affairs Printed by A8, B8, C8, D8 of the Consumer Cooperative of the Property Bureau. 6. Scope of patent application. 83. If the system of patent application No. 74, further includes a device for generating, used to translate users-define processor status, and A set of translators for at least one set of user-defined instructions; a set of editors for editing user-defined processor states and at least one set of user-defined instructions; a simulator for simulating user-defined processor states And a set of simulators of at least one set of user-defined instructions; and at least one of a set of debuggers for debugging the state of the user-defined processor and at least one set of user-defined instructions. 84. The system of claim 74, further comprising a device for generating a set of translators for compiling user-defined processor states and at least one set of user-defined instructions for editing and use. A set of editors for the user-defined processor state and at least one set of user-defined instructions, a set of simulators for simulating user-defined processor states and at least one set of user-defined instructions, and A user-defined processor state and at least one group of user-defined instruction debugs. 85. The system of claim 74, in which the user-definition part of the format contains at least one set of statements indicating the dimensions and indicators of the user-definition status. 86. The system of claim 85, wherein the user-defined part of the format includes at least one of a package related to the user-defined state and indicating a user-defined state in a set of processor registers. Group attributes. 87. The system of claim 74, wherein the user-definition portion of the cell $ contains at least one set of statements specifying a mapping of user-definition states to processor registers. (Please read the precautions on the back first, then this page)-I —V., Τ 156 This paper size is applicable to China National Standard (CNS) A4 specification (210 X 297 mm) 539965 A8 B8 C8 D8 6. Scope of patent application ^ ~ " 88. If the system is under the scope of patent application No. 74, the device for generating hardware production instructions includes a device for automatically mapping the user-defined state to a processor register. 89. The system of claim 74, wherein the user-definition portion of the format contains at least one set of statements specifying the type of user-definition instruction and its impact on the user-definition status. 90. The system of claim 74, wherein the user-definition portion of the format includes at least one set of specified statements that schedule a set of numbers to the user-definition status. 9 1 · A system for designing a configurable processor, the system includes · a set of core software tools for generating a software development tool specific to that format based on an instruction set structure format; and a set of users -A definition instruction module, which is used to generate at least one set of modules used by the core software tool when making the user-defined instruction according to a user-defined instruction format. 92. The system of claim 91, wherein the core software tools include software tools capable of generating code for execution on a processor. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 93. The system of item 91 of the patent application, wherein the at least one set of modules is made into a set of dynamically linked archives. 94. The system of claim 91 in the scope of patent application, wherein the at least one set of modules is made into a set of lists. 9 5. For example, the system for applying for the scope of patent application No. 91, in which the core software tool includes a set of editors, using the user-defined command module for 157 paper standards applicable to China National Standard (CNS) M specifications (210X297 mm) 539965 A8 B8 C8 D8 6. Apply for a patent to edit a set of applications into code that uses the user-defined instruction and can be executed by the processor. 96. The system of claim 95, wherein the at least one set of modules includes a set of modules for use by an editor in editing the user-defined instruction. 97. The system of claim 91, wherein the core software tool includes a set of translators to use the user-defined module and translate a set of applications to use the user-defined command and can be used by Code executed by the processor. 9 8. The system according to item 97 of the patent application scope, wherein the at least one set of modules includes modules for mapping the combined language commands to user-defined commands for use by the translator. 99. The system of claim 98, wherein: the system further comprises specifying a core instruction set format of a group of non-user-defined instructions; and the core instruction set format is used by a 'group translator to set the application group Translated into code executable by the processor. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 100. For example, the system under the scope of patent application No. 91, wherein the core software tool includes a set of instruction set simulators to simulate code that can be executed by the processor. 101. The system of claim 100, wherein the at least one set of modules includes a set of simulator modules used by the simulator to simulate the execution of the user-defined instruction. 102. If the system of the scope of patent application No. 101 is used, the paper size of simulator 158 is applicable to China National Standard (CNS) A4 specification (210 X 297 mm) 539965 A8 B8 C8 D8 The module contains data for decoding the user-defined instruction. 103. The system according to item 102 of the patent application scope, wherein when they cannot be decoded into predefined instructions, the simulator uses a set of modules to decode the instructions using the simulator module. 104. The system of claim 91, wherein the core software tool includes a set of debugs that uses the user-defined module to use the user-defined instructions and is executable by the processor. Device. 1 05. The system of claim 104, wherein the at least one set of modules includes a set of modules that can be used by the debugger to decode machine instructions into combined instructions. 106. The system of claim 104, wherein the at least one set of modules includes a set of modules that can be used by the mouse debugger to convert combined instructions into a series. 107. For example, the system of claim 104 in the scope of patent application, wherein: the core software tool includes a set of instruction set simulators to simulate code executable by the processor; and the debugger communicates with the simulator to Get user-defined status for debugging information. 108. In the case of the system of claim 91, a set of single user-defined instructions can be used by multiple core software tools without modification according to different core instruction set formats. 109. A system for designing a configurable processor, the system includes: a core software tool for generating a software development tool specific to the format according to a set of instruction set structure format; 159 This paper standard is applicable to China Standard (CNS) A4 specification (210 X 297 mm) (Please read the precautions on the back before filling out this page), v " Printed by the Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 539965 A8 B8 C8 ____ D8 々, patent application scope A set of user-defined command modules for generating a group of at least one set of modules in the user-defined command for use by the core software tool according to a set of user-defined command formats; and a storage device for Simultaneously storing the 'groups' generated by the user-definition instruction module corresponds to a different set of user-definition instructions. 110. The system of claim 109, wherein the at least one set of modules is made as a dynamic link archive. 111. The system according to item 109 of the patent application, wherein the at least one set of modules is made into a set of lists. 112. The system as claimed in claim 10, wherein the core software tool includes a set of editors using the user-defined command module for editing a set of applications to use the user-defined command and Code that can be executed by the processor. I 1 3. The system according to item 112 of the patent application scope, wherein the at least one set of modules includes a set of modules for use by an editor in editing the user-defined instruction. Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economy Code that uses this user-defined instruction and can be executed by the processor. II 5. The system according to item 114 of the patent application scope, wherein the at least one set of modules includes modules for mapping the combined language commands to user-defined commands for use by the translator. II6. If the system of the scope of patent application is No. 109, where the core software is 160 paper sizes, use the Chinese National Standard (CNS) A4 specification (210X297 mm). The body kit is used to simulate the software that can be executed by the processor A set of code instruction set simulators. II7. The system of claim 116, wherein the at least one set of modules includes a set of simulator modules used by the simulator to simulate the execution of the user-defined instruction. 118. The system of claim 117, in which the module for use by the simulator includes data for decoding the user-defined instruction. 119. The system of item 118 in the scope of patent application, wherein when they cannot be decoded into predefined instructions, the simulator uses a set of modules to decode the instructions using the simulator module. 120. If the system of the scope of patent application No. 109, wherein the core software tool includes a set of user-defined modules that will use the user-defined instructions and can be debugged by the processor Wrong device. 121. The system of claim 120, wherein the at least one set of modules includes a set of modules that can be used by the debugger to decode machine instructions into combined instructions. 122. The system of claim 120, wherein the at least one set of modules includes a set of modules that can be used by the debugger to convert combined instructions into a series. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs. 123. A system for designing configurable processors. The system includes: core software tools for most ethnic groups. Software development tools specific to the format; and a set of user-defined command modules for generating a core of a group of users in the user-defined command according to a set of user-defined command formats 161 papers The standard is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 539965 A8 B8 C8 __ —_ D8 VI. Patent application scope At least one set of modules used by software tools. 124. The system of claim 123, wherein the at least one set of modules is made as a dynamic link archive. 125. The system of claim 123, wherein the at least one set of modules is made into a set of lists. 12 6. The system of claim 123, wherein the core software tools of at least one group include a set of editors that use the user-defined command module to edit a group of applications to use the user- Code that defines instructions and can be executed by the processor. 127. The system of claim 126, wherein the at least one set of modules includes a set of modules for use by an editor in editing the user-defined instruction. 12 8. According to the system of claim 123, at least one set of core software tools of the \ group includes a set of translators for using the user-defined module and translating a set of applications to use the user Code that defines instructions and can be executed by the processor. 129. For the system of claim 128, the at least one set of modules includes modules used by the translator to map combined language commands to user-defined commands. Printed by the Intellectual Property Bureau of the Ministry of Economic Affairs and Consumer Cooperatives 130. If the system of patent application No. 123, the core software tools of at least one group include a set of instruction set simulators to simulate code that can be executed by processors . 13 1. The system according to item 130 of the scope of patent application, wherein the at least one set of modules includes a simulator for simulating the execution of the user-defined instruction 162 This paper size applies to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 539965 A8 B8 C8 D8 A group of simulator modules printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs for patent application. 132. If the system of the scope of patent application No. 131, the module used by the simulator includes data for decoding the user-defined instruction. 133. If the system of claim 132 is applied, the simulator uses a set of modules to decode the instructions of the simulator module when they cannot be decoded into predefined instructions. * 4 134. If the system of patent application No. 123, in which the core software tools of at least one group of groups contains code that uses the user-definition module and will be executed by the processor Debug a group of debuggers. 135. The system for applying for item 134 of the patent scope, wherein the at least one set of modules includes a set of modules that can be used by the debugger to decode machine instructions into combined instructions. 136. The system of claim 134, wherein the at least one set of modules includes a set of modules that can be used by the debugger to convert combined instructions into a series. 3 6 11 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)
TW089102150A 1999-02-05 2000-03-10 Automated processor generation system for designing a configurable processor and method for the same TW539965B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09/246,047 US6477683B1 (en) 1999-02-05 1999-02-05 Automated processor generation system for designing a configurable processor and method for the same
US09/323,161 US6701515B1 (en) 1999-05-27 1999-05-27 System and method for dynamically designing and evaluating configurable processor instructions
US09/322,735 US6477697B1 (en) 1999-02-05 1999-05-28 Adding complex instruction extensions defined in a standardized language to a microprocessor design to produce a configurable definition of a target instruction set, and hdl description of circuitry necessary to implement the instruction set, and development and verification tools for the instruction set

Publications (1)

Publication Number Publication Date
TW539965B true TW539965B (en) 2003-07-01

Family

ID=27399897

Family Applications (1)

Application Number Title Priority Date Filing Date
TW089102150A TW539965B (en) 1999-02-05 2000-03-10 Automated processor generation system for designing a configurable processor and method for the same

Country Status (6)

Country Link
EP (1) EP1159693A2 (en)
JP (2) JP2003518280A (en)
KR (2) KR100775547B1 (en)
AU (1) AU3484100A (en)
TW (1) TW539965B (en)
WO (1) WO2000046704A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7784024B2 (en) 2003-08-20 2010-08-24 Japan Tobacco Inc. Program creating system, program creating program, and program creating module
TWI381309B (en) * 2006-11-21 2013-01-01 Nec Corp Instruction operation code generation system
TWI416302B (en) * 2009-11-20 2013-11-21 Ind Tech Res Inst Power-mode-aware clock tree and synthesis method thereof
TWI514266B (en) * 2011-04-07 2015-12-21 Via Tech Inc Microprocessor that performs x86 isa and arm isa machine language program instructions and the operating method thereof, and computer program product encoded in at least one non-transitory computer usable medium for use with a computing device

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0028079D0 (en) * 2000-11-17 2001-01-03 Imperial College System and method
JP2002230065A (en) 2001-02-02 2002-08-16 Toshiba Corp System lsi developing device and method
EP1379977A1 (en) 2001-04-11 2004-01-14 Mentor Graphics Corporation Hdl preprocessor
DE10128339A1 (en) * 2001-06-12 2003-01-02 Systemonic Ag Development of the circuit arrangements used in digital signal processing technology especially using the HDL development language to generate a development circuit arrangement for comparison with a reference model
US6941548B2 (en) * 2001-10-16 2005-09-06 Tensilica, Inc. Automatic instruction set architecture generation
DE10205523A1 (en) * 2002-02-08 2003-08-28 Systemonic Ag Method for providing a design, test and development environment and a system for executing the method
US7200735B2 (en) 2002-04-10 2007-04-03 Tensilica, Inc. High-performance hybrid processor with configurable execution units
JP2003316838A (en) 2002-04-19 2003-11-07 Nec Electronics Corp Design method for system lsi and storage medium with the method stored therein
JP4202673B2 (en) * 2002-04-26 2008-12-24 株式会社東芝 System LSI development environment generation method and program thereof
US7937559B1 (en) 2002-05-13 2011-05-03 Tensilica, Inc. System and method for generating a configurable processor supporting a user-defined plurality of instruction sizes
US7346881B2 (en) 2002-05-13 2008-03-18 Tensilica, Inc. Method and apparatus for adding advanced instructions in an extensible processor architecture
US7376812B1 (en) 2002-05-13 2008-05-20 Tensilica, Inc. Vector co-processor for configurable and extensible processor architecture
US7278122B2 (en) 2004-06-24 2007-10-02 Ftl Systems, Inc. Hardware/software design tool and language specification mechanism enabling efficient technology retargeting and optimization
KR100722428B1 (en) * 2005-02-07 2007-05-29 재단법인서울대학교산학협력재단 Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture
US7757224B2 (en) * 2006-02-02 2010-07-13 Microsoft Corporation Software support for dynamically extensible processors
KR100793210B1 (en) * 2006-06-01 2008-01-10 조용범 Decoder obtaining method reduced approaching number to memory in Advanced RISC Machines
KR100813662B1 (en) 2006-11-17 2008-03-14 삼성전자주식회사 Profiler for optimizing processor architecture and application
WO2009084570A1 (en) * 2007-12-28 2009-07-09 Nec Corporation Compiler embedded function adding device
JP5217431B2 (en) 2007-12-28 2013-06-19 富士通株式会社 Arithmetic processing device and control method of arithmetic processing device
JP2010181942A (en) * 2009-02-03 2010-08-19 Renesas Electronics Corp System and method for providing information on estimation of replacement from pld/cpld to microcomputer
US8775125B1 (en) 2009-09-10 2014-07-08 Jpmorgan Chase Bank, N.A. System and method for improved processing performance
KR101635397B1 (en) * 2010-03-03 2016-07-04 삼성전자주식회사 Method and apparatus for simulation of multi core system using reconfigurable processor core
US8989242B2 (en) 2011-02-10 2015-03-24 Nec Corporation Encoding/decoding processor and wireless communication apparatus
KR20130088285A (en) * 2012-01-31 2013-08-08 삼성전자주식회사 Data processing system and method of data simulation
KR102025694B1 (en) * 2012-09-07 2019-09-27 삼성전자 주식회사 Method for verification of reconfigurable processor
US10558437B1 (en) * 2013-01-22 2020-02-11 Altera Corporation Method and apparatus for performing profile guided optimization for high-level synthesis
KR102122455B1 (en) * 2013-10-08 2020-06-12 삼성전자주식회사 Method and apparatus for generating test bench for verification of a processor decoder
US10084456B2 (en) 2016-06-18 2018-09-25 Mohsen Tanzify Foomany Plurality voter circuit
RU2631989C1 (en) * 2016-09-22 2017-09-29 ФЕДЕРАЛЬНОЕ ГОСУДАРСТВЕННОЕ КАЗЕННОЕ ВОЕННОЕ ОБРАЗОВАТЕЛЬНОЕ УЧРЕЖДЕНИЕ ВЫСШЕГО ОБРАЗОВАНИЯ "Военная академия Ракетных войск стратегического назначения имени Петра Великого" МИНИСТЕРСТВА ОБОРОНЫ РОССИЙСКОЙ ФЕДЕРАЦИИ Device for diagnostic control of verification
US10426424B2 (en) 2017-11-21 2019-10-01 General Electric Company System and method for generating and performing imaging protocol simulations
KR102104198B1 (en) * 2019-01-10 2020-05-29 한국과학기술원 Technology and system for improving the accuracy of binary reassembly system with lazy symbolization
CN110096257B (en) * 2019-04-10 2023-04-07 沈阳哲航信息科技有限公司 Design graph automatic evaluation system and method based on intelligent recognition
CN111832739B (en) * 2019-04-18 2024-01-09 中科寒武纪科技股份有限公司 Data processing method and related product
CN111400986B (en) * 2020-02-19 2024-03-19 西安智多晶微电子有限公司 Integrated circuit computing equipment and computing processing system
JP7461181B2 (en) * 2020-03-16 2024-04-03 本田技研工業株式会社 CONTROL DEVICE, SYSTEM, PROGRAM, AND CONTROL METHOD
CN114721982A (en) * 2022-03-22 2022-07-08 潍柴动力股份有限公司 Read-write processing method and system capable of configuring storage data types
CN114492264B (en) * 2022-03-31 2022-06-24 南昌大学 Gate-level circuit translation method, system, storage medium and equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE505783C2 (en) * 1995-10-03 1997-10-06 Ericsson Telefon Ab L M Method of manufacturing a digital signal processor
GB2308470B (en) * 1995-12-22 2000-02-16 Nokia Mobile Phones Ltd Program memory scheme for processors
JP2869379B2 (en) * 1996-03-15 1999-03-10 三菱電機株式会社 Processor synthesis system and processor synthesis method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7784024B2 (en) 2003-08-20 2010-08-24 Japan Tobacco Inc. Program creating system, program creating program, and program creating module
TWI381309B (en) * 2006-11-21 2013-01-01 Nec Corp Instruction operation code generation system
US8935512B2 (en) 2006-11-21 2015-01-13 Nec Corporation Instruction operation code generation system
TWI416302B (en) * 2009-11-20 2013-11-21 Ind Tech Res Inst Power-mode-aware clock tree and synthesis method thereof
TWI514266B (en) * 2011-04-07 2015-12-21 Via Tech Inc Microprocessor that performs x86 isa and arm isa machine language program instructions and the operating method thereof, and computer program product encoded in at least one non-transitory computer usable medium for use with a computing device

Also Published As

Publication number Publication date
KR20020021081A (en) 2002-03-18
KR100775547B1 (en) 2007-11-09
JP2003518280A (en) 2003-06-03
KR100874738B1 (en) 2008-12-22
AU3484100A (en) 2000-08-25
KR20070088818A (en) 2007-08-29
EP1159693A2 (en) 2001-12-05
WO2000046704A3 (en) 2000-12-14
WO2000046704A2 (en) 2000-08-10
JP2007250010A (en) 2007-09-27
CN1382280A (en) 2002-11-27

Similar Documents

Publication Publication Date Title
TW539965B (en) Automated processor generation system for designing a configurable processor and method for the same
US8875068B2 (en) System and method of customizing an existing processor design having an existing processor instruction set architecture with instruction extensions
Gries Methods for evaluating and covering the design space during early design development
US20070277130A1 (en) System and method for architecture verification
Chattopadhyay et al. LISA: A uniform ADL for embedded processor modeling, implementation, and software toolsuite generation
JP4801210B2 (en) System for designing expansion processors
August et al. A disciplined approach to the development of platform architectures
Brandolese A codesign approach to software power estimation for embedded systems
Huang et al. Profiling and annotation combined method for multimedia application specific MPSoC performance estimation
Mathaikutty et al. MMV: A metamodeling based microprocessor validation environment
de Sousa Specializing RISC-V Cores for Performance and Power
Pimentel et al. Tool integration and interoperability challenges of a system-level design flow: A case study
CN1382280B (en) For designing automatic processor generation system and the method thereof of configurable processor
Sierra et al. Witelo: Automated generation and timing characterization of distributed-control macroblocks for high-performance FPGA designs
Himmelbauer et al. The Vienna Architecture Description Language
Wagstaff From high level architecture descriptions to fast instruction set simulators
Chattopadhyay et al. Processor Modeling and Design Tools
Dingankar et al. MMV: Metamodeling Based Microprocessor Valiation Environment
Weber et al. Efficiently Describing and Evaluating the ASIPs
Moreira Simulador para Processadores de Sinal Digital de Arquitectura VLIW
Meyr et al. Designing and modeling MPSoC processors and communication architectures
Al Rayahi A CAD Tool for Synthesizing Optimized Variants of Altera's Nios II Soft-Core Processor
Bertels et al. The hArtes Tool Chain
Abrar et al. Performance analysis of cosimulating processor core in VHDL and SystemC
Huang et al. Automatic Platform Synthesis and Application Mapping for Multiprocessor Systems On-Chip

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees