TW200905474A

TW200905474A - Cache metadata for implementing bounded transactional memory

Info

Publication number: TW200905474A
Application number: TW097117848A
Authority: TW
Inventors: Jan Gary; Timothy L Harris; James Larus; Burton Smith
Original assignee: Microsoft Corp
Priority date: 2007-06-08
Filing date: 2008-05-15
Publication date: 2009-02-01
Also published as: WO2008154191A2; WO2008154191A3; US8813052B2; EP2174223A4; EP2174223B1; TWI498733B; EP2174223A2; US20070245099A1

Abstract

Various technologies and techniques are disclosed for providing a bounded transactional memory application that accesses cache metadata in a cache of a central processing unit. When performing a transactional read from the bounded transactional memory application, a cache line metadata transaction-read bit is set. When performing a transactional write from the bounded transactional memory application, a cache line metadata transaction-write bit is set and a conditional store is performed. At commit time, if any lines marked with the transaction-read bit or the transaction-write bit were evicted or invalidated, all speculatively written lines are discarded. The application can also interrogate a cache line metadata eviction summary to determine whether a transaction is doomed and then take an appropriate action.

Description

200905474 九、發明說明：【發明所屬之技術領域】本發明係相關於實作有限交易式記憶體之快取元資料。【先前技術】 CPU快取為由一電腦之中央處理單元所使用的一電腦硬體機制，用以降低存取記憶體之平均時間。快取為一小且快速的記憶體，其從最近所使用的主記憶體位置保留資料的副本。若後續記憶體存取係針對已保留於快取之記憶體位址，則記憶體存取係使用快取記憶體而被履行。因此，越多的存取係從被快取的記憶體位置執行，則平均記憶體存取時間越少，且應用程式執行越快。快取記憶體被係分成快取列。每一快取列具有某些固定大小、連續範圍的主記憶體之位元組的副本。每一快取列亦具有一位址標籤及識別快取列是否目前為有效的之其他狀態，且若其為有效的，哪個資料之經定址的範圍係被保留於快取列中。快取列為固定大小（典型為3 2至2 5 6位元組，根據硬體設計而定）。當CPU執行讀取或寫入記憶體存取至在主記憶體中給定的位址之一資料時，其亦檢查該位址是否包含於其快取，特別地，快取是否包含有效的且其位址標籤符合該存取之記憶體位址的快取列。若是，則一快取成功發生，且C P U存取快取列中之資料。否則，一快取失敗發生且CPU進行存取資料於其他地方之較慢路徑， 200905474 並記錄該資料之副本於cpu快取之一快取列。由於快取係固定大小’為了保留新資料於快取列，需要收回（使無效）先刚已保持於該快取列之資料。軟體應用程式執行一序列的硬體指令以實現一計算. 該等指令可執行算術操作、可改變後續指令的程式控制流序列、可讀取或寫入（統稱為存取）資料於特定記憶體位址、或執行其他操作。當— CPU快取係與一 CPU —起使用時’其存在（及任何有關哪個位址是目前被快取的及檢查快取列、使快取列有效、使快取列無效的任何硬體程序之資訊）係典型地對於軟體程式為不可見的及不可存取的，除了程式通常較快地執行。現代電腦可具有複數個快取層。舉例來說，一小且快的第一層快取（L 1 $)可迅速地服務大部分的記憶體存取；但是當L 1 $失敗時’一較大較慢的第二層快取（L2$)可被存取。僅當記憶體存取在Ll$及L2$都失敗時，存取才會被非常慢的主記憶體來執行。現代電腦亦可為多處理器，其具有複數個CPU。於一共用的記憶體多處理器中，每一 CPU可存取相同之共用的記憶體，故一個CPU可寫入至共用的記憶體且稍後另一 CPU可讀取由第—個cpu所寫入的資料。每一 CPU可具有一或多個快取層供其獨占使用（私人快取）以及與其他CPU共用的一或多個快取層（共享快取）。在具有快取的多個C P U 的存在中’多處理器實作快取一致性以顯然地提供多線執行於軟體程式’其有所有的記憶體存取係針對一單一共同 6 200905474 共用的主記憶體之錯覺。此處簡單的想法：「給定的快取列是有效的」係以更複雜的快取列有效性狀態、狀態機、及稱為快取一致協定之發訊協定來取代。有時於一個CPU 之存取（例如一寫入）必須使一快取列於其他C P U無效。將硬體資源計入因素及共享於一多處理器以共享多個 ' CPU之間的一些或近乎全部的複製的硬體資源亦是可能 - 的。在一極端的情形中，邏輯的複數個CPU可用時間多工的方式以單一CPU核心被實作於硬體，其係藉由提供複數 C 個所有處理器狀態及暫存器的副本（稱為硬體線内容）於單一 C P U。此係已知為多線C P U核心。舉例來說，具有四個不同線内容（例如其程式計數器、一般用途暫存器、及特殊用途暫存器之四個副本）之單一 C P U核心仍然顯示為四個邏輯處理器（LP, “Logical Processor”）之應用軟體及作業系統軟體，其與具有四個分開的CPU核心之多處理器在行為上（若不是效能）係不可區別的。隨著時間演進，電腦硬體已變的更快且更強。今日的多處理器提供可平行操作之多個CPU核心。程式設計師會200905474 IX. INSTRUCTIONS: [Technical Field to Be Invented] The present invention relates to a cached metadata material for implementing limited transactional memory. [Prior Art] CPU cache is a computer hardware mechanism used by a central processing unit of a computer to reduce the average time for accessing memory. The cache is a small, fast memory that retains a copy of the data from the most recent primary memory location. If the subsequent memory access is for a memory address that has been reserved in the cache, the memory access is performed using the cache memory. Therefore, the more accesses are executed from the cached memory location, the less average memory access time and the faster the application executes. The cache memory is divided into cache columns. Each cache column has a copy of the byte of the main memory with some fixed size, continuous range. Each cache column also has an address tag and other states that identify whether the cache column is currently active, and if it is valid, the addressed range of which data is retained in the cache column. The cache is listed as a fixed size (typically 3 2 to 2 5 6 bytes, depending on the hardware design). When the CPU performs a read or write memory access to one of the addresses given in the main memory, it also checks whether the address is included in its cache, in particular, whether the cache contains a valid one. And its address label conforms to the cache column of the accessed memory address. If so, then a cache succeeds and the C P U accesses the data in the cache. Otherwise, a cache failure occurs and the CPU accesses the slower path of the data elsewhere, 200905474 and records a copy of the data in one of the cpu cache caches. Since the cache is fixed in size, in order to retain the new data in the cache column, it needs to be reclaimed (invalid). The software application executes a sequence of hardware instructions to perform a calculation. The instructions can perform arithmetic operations, program control flow sequences that can change subsequent instructions, and readable or writable (collectively referred to as access) data in a particular memory location. Address, or perform other operations. When - CPU cache is used with a CPU - its existence (and any hardware about which address is currently cached and checks the cache column, makes the cache column valid, invalidates the cache column) The information of the program is typically invisible and inaccessible to the software program, except that the program is usually executed faster. Modern computers can have multiple cache layers. For example, a small and fast first layer cache (L 1 $) can quickly serve most memory accesses; but when L 1 $ fails, a larger and slower second layer cache (L2$) can be accessed. Access is only performed by very slow main memory when memory access fails in both Ll$ and L2$. Modern computers can also be multi-processors with multiple CPUs. In a shared memory multiprocessor, each CPU can access the same shared memory, so one CPU can write to the shared memory and later another CPU can be read by the first CPU. Written data. Each CPU can have one or more cache layers for its exclusive use (private cache) and one or more cache layers (shared cache) that are shared with other CPUs. In the presence of multiple CPUs with caches, 'multiprocessor implementations cache coherency to apparently provide multi-line execution in software programs' that have all memory access systems for a single common 6 200905474 shared master The illusion of memory. The simple idea here: "A given cache column is valid" is replaced by a more complex cache state, a state machine, and a protocol called a cached agreement. Sometimes access to a CPU (such as a write) must invalidate a cache listed in other CPUs. It is also possible to factor in hardware resources and share hardware resources that are shared by multiple processors to share some or nearly all of the duplicates between CPUs. In an extreme case, a logical plurality of CPUs can be implemented as hardware in a time-multiplexed manner with a single CPU core by providing a plurality of C copies of all processor states and copies of the scratchpad (referred to as The hardware content is on a single CPU. This is known as a multi-line C P U core. For example, a single CPU core with four different line contents (such as its program counters, general purpose registers, and four copies of special purpose registers) still appears as four logical processors (LP, "Logical The application software and operating system software of Processor" is indistinguishable from the behavior (if not performance) of a multiprocessor with four separate CPU cores. As time evolved, computer hardware has become faster and stronger. Today's multiprocessors offer multiple CPU cores that can operate in parallel. Programme designer

C 喜歡將一程式的不同片段平行地執行於這些多個核心，以利用可被達到的效能改善。然而，平行程式化對於使用今曰的軟體發展技術之一般程式設計師而言係相當困難，且因此系統實作者正發展可被使用以較佳地寫入平行程式之新的程式化模型。這些新的程式化模型中的某一些係採用 ' 一交易式記憶體方式，其使用交易提取以助於協調平行線的存取至共用的記憶體。交易並不會本身自動地提供平行 7 200905474 特性，但是其確實轉移一些協調平行任務的負擔至系統之其他部分，例如編譯程序或運行時間。【發明内容】本發明揭示用以提供於一中央處理單元之一快取中的軟體可存取的元資料之各種技術及技法。元資料狀態可包含：用於定址的資料之每一總量的狀態之至少一些位元、用於每一快取列之至少一些狀態、及用於整個快取之至少一些狀態。提供中央處理單元中之額外的指令，以與此元資料互動。新的副作用係藉由元資料及額外的指令之存在而被引入該中央處理單元及快取之操作中。元資料可藉由至少一軟體程式被存取，以促進軟體程式之操作。於一實作中，自一有限交易式記憶體應用存取一中央處理單元之一快取中的快取元資料。當執行來自該有限交易式記憶體應用之一交易式讀取時，一快取列元資料交易讀取位元係被設定。當執行來自該有限交易式記憶體應用之一交易式寫入時，一快取列元資料交易寫入位元係被設定且一條件式儲存係被執行。於確定時，若有任何以該交易讀取位元或該交易寫入位元標示之列係被收回或無效，則所有推測地寫入的列係被廢除。該應用亦可詢問一快取列元資料收回概要，以判定是否一交易係被決定然後採取適當行動。於另一實作中，本發明提供一硬體加速的軟體交易式記憶體（HASTM， “Hardware Accelerated Software 200905474C prefers to execute different segments of a program in parallel on these multiple cores to take advantage of the performance improvements that can be achieved. However, parallel stylization is quite difficult for general programmers using today's software development techniques, and as a result, system authors are developing new stylized models that can be used to better write parallel programs. Some of these new stylized models use a transactional memory approach that uses transaction extraction to help coordinate access to parallel lines to shared memory. The transaction does not automatically provide parallel 7 200905474 features, but it does divert some of the burden of coordinating parallel tasks to other parts of the system, such as compilers or runtimes. SUMMARY OF THE INVENTION The present invention discloses various techniques and techniques for providing soft-accessible metadata in a cache of a central processing unit. The metadata status may include at least some of the states of each of the totals of the addressed data, at least some states for each cached column, and at least some states for the entire cache. Additional instructions in the central processing unit are provided to interact with this metadata. The new side effects are introduced into the central processing unit and the cache operation by the presence of metadata and additional instructions. Metadata can be accessed by at least one software program to facilitate the operation of the software program. In one implementation, the cached metadata in one of the central processing units is accessed from a limited transaction memory application. A cached metadata transaction read bit is set when performing a transactional read from one of the limited transaction memory applications. When a transactional write from the limited transactional memory application is executed, a cached metadata transaction write bit is set and a conditional store is executed. Upon determination, if any of the columns identified by the transaction read bit or the transaction write bit are reclaimed or invalidated, then all speculatively written columns are discarded. The application may also query a cached metadata recovery summary to determine if a transaction was determined and then take appropriate action. In another implementation, the present invention provides a hardware accelerated software transaction memory (HASTM, "Hardware Accelerated Software 200905474"

Transactional Memory”）應用。軟體交易取一中央處理單元的一快取之元資料， STM系統之操作，尤其是加速於軟體交之某些最耗費時間的操作。舉例來說，係被提供，其使用包含於快取元資料中位元以迅速地過濾（測試及設定）是否給資料已接收其所需要之昂貴的軟體簿記「開啟以供讀取」軟體薄記係被安全地效亦使用元資料來加速。當一交易運作對於每一存取的資料設定於快取列元：元，累積於該快取，元資料狀態之「讀所讀取的資料組。當一些其他CPU寫入取集，除非被收回或除非無效，此快取快取的資料而被保持。當交易確定或更問快取列元資料收回概要；若沒有列係讀取集係原封不動，故軟體讀取集有效略過。此相同的讀取集快取列元資料亦憶體糸統之再試設襟。若一應用使用一 3 算其交易然後等待於其讀取集中之改變要建立一軟體處置器向量以等待快取列元的無效。接著當另一 CPU寫入一些於資料時，對應的快取列（及由此其讀取集資料）將為無效，而觸發軟體處置器之跳動作及恢復（重啟）該交易。式記憶體應用可存其可被使用以改善易式記憶體操作中開啟讀取障壁過濾之一開啟以供讀取定的交易式記憶體。若是，則多餘的略過。讀取記錄有時，HASTM軟體賢料之一讀取集位取集」表示該交易資料於該CPU的讀元資料係藉由其經早時，該應用可詢被收回，則CPU的係不必要的且可被加速軟體交易式記 ί易再試陳述以重，HASTM軟體僅需元資料之讀取集位應用的讀取集中之位元於其快取列元位，其推斷「再試_ 9 200905474 此發明内容係用一簡化形式來介紹概念的選擇，其於實施方式進一步說明。此發明内容非用於識別所申審的之關鍵特徵或基本特徵，亦不是用來作為判定所申審標的之範疇的輔助。【實施方式】為了促進本發明之原理的了解之目的，將參考圖式所示具體實施例並使用特定表達方式來描述之。然而，了解的是，本發明之範疇不因此所受限。於所述之具體施例的任何改變及進一步的修改及如此處所述之原理的一步應用之考量對於熟習此項技術者而言為經常發生的該系統在一般内容中可被描述為具有使用快取元資於CPU快取以改善一或多個軟體程式之操作的中央處理元之系統。如第1圖所示，使用以實作該系統之一或多個份之例示電腦系統包含一計算裝置，例如計算裝置1 0 0。其最基本的配置中，計算裝置100典型包含至少一個處理元102及記憶體1 04。於一實作中，中央處理單元102包含具有軟體可存取元資料1 0 5之一快取1 0 3。這些元資料將更加詳細描述於處多個其他圖式。一或多個硬體協助的軟體應用150可存 CPU快取1 03之元資料1 05，以促進個別的軟體應用之作。硬體協助的軟體應用之一些非限制的範例可包含（但限於）交易式記憶體系統、垃圾收集系統、用以分析程式效能或運行時間行為之系統、用以尋找程式中之缺陷之將標的中應實進〇料單部於單的此取操不之系 10 200905474 統、用以實施安全限制於程式之系統、及/或可被使用元資料105於CPU快取103來改善（於致能或於一些其他方面）之任何其他類型的軟體應用。這些軟體應用可直接地讀取及寫入合適用於給定的情況之元資料，'替代地或額外地，硬體本身可適當地讀取及/或修改該合適的元資料。根據正確的配置及計算裝置的類型，記憶體1〇4可為揮發性（例如RAM)、非揮發性（例如R〇M、快閃記憶體等）或Transactional Memory"). A software transaction takes a cache of metadata from a central processing unit. The operation of the STM system, in particular, speeds up some of the most time consuming operations of soft hands. For example, it is provided. Use the bits included in the cache metadata to quickly filter (test and set) whether the data has been received. The expensive software bookkeeping "Open for reading" software is safe and effective. Information to speed up. When a transaction operates for each accessed data set in the cache column: element, accumulates in the cache, the metadata state of the "read the read data set. When some other CPU writes the fetch set, unless Retrieve or unless invalid, the cached data is maintained. When the transaction determines or asks for the cache data recovery summary; if no column read set is intact, the software read set is effectively skipped. This same read set cache metadata is also a retry of the system. If an application uses a 3 to calculate its transaction and then waits for a change in its read set, a software handler vector is created to wait for the cache. The column element is invalid. Then when another CPU writes some data, the corresponding cache column (and thus its read set data) will be invalid, and the software processor is triggered to jump and resume (restart). The transaction memory application can be stored to improve the operation of the open memory barrier in the easy memory operation for reading the transactional memory. If so, the excess is skipped. Sometimes HAST One of the M software genius reads the set of the collection set" indicates that the transaction data of the transaction data in the CPU is early, the application can be reclaimed, and the CPU system is unnecessary and can be accelerated. The software transaction type is easy to retry the statement. The HASTM software only needs the reading of the metadata set in the reading set of the metadata set in its cached column element. It is inferred that "retry _ 9 200905474 A simplified form is used to introduce the choice of concepts, which are further described in the embodiments. The invention is not intended to identify key features or basic features of the application, nor is it used as an aid in determining the scope of the application. The present invention will be described with reference to the specific embodiments shown in the drawings and the specific embodiments of the present invention. However, it is understood that the scope of the invention is not limited thereby. Any changes to the specific embodiments and further modifications and one-step application considerations of the principles described herein are common to those skilled in the art. The system can be described as a system having a central processing unit that uses a cache to cache CPU caches to improve the operation of one or more software programs. As shown in Figure 1, one of the systems is used to implement the system. The exemplary computer system includes a computing device, such as computing device 100. In its most basic configuration, computing device 100 typically includes at least one processing element 102 and memory 104. In one implementation, the central processing unit 102 includes one of the software accessible metadata 1 0 5 cache 1 0 3. These metadata will be described in more detail in a number of other schemas. One or more hardware-assisted software applications 150 can store CPU fast Take the $1,010 data to facilitate individual software applications. Some non-limiting examples of hardware-assisted software applications may include (but are limited to) transactional memory systems, garbage collection systems, and analysis programs. The system of performance or runtime behavior, the target of the defect to be used in the search for the defect in the program, and the implementation of the security restrictions on the system. The system and / or may be any other type of software application resource element 105 to material 103 to improve CPU cache (to enable or to other aspects) of. These software applications can directly read and write meta-data suitable for a given situation, 'alternatively or additionally, the hardware itself can properly read and/or modify the appropriate metadata. Depending on the correct configuration and type of computing device, memory 1〇4 can be volatile (eg RAM), non-volatile (eg R〇M, flash memory, etc.) or

兩者的一些結合。此大部份的基本配置係以虚線1〇6說明於第1圖。額外地’裝置100亦可具有額外的特徵/功斗、不說’裝置100亦可包含額外的儲存器(可移除的及，或不可移除的&含（但不限於）磁碟或光碟或磁帶。此額外的儲存器係以可移除的儲存器π, 予盗109及不可移除的儲存器110圖示於第1圖。電腦储存媒體白会搜级琢藤s揮發性及非揮發性、可移除的及不可移除的媒體，實作於毪彻士、丄丄 T於任饤方法或技術以供資訊（例如電 &可讀取的指令、資料灶堪負科、、Ό構、程式模組或其他資料）的儲存。記憶體1 0 4、可移除的性六 J移除的儲存器109及不可移除的儲存器 1 1 0皆為電腦儲存媒體的f / τ 烁體的範例。電腦儲存媒體包含（但不限於）RAM、ROM、 EFPp λλ , PR〇M '快閃記憶體或其他記憶體技術、CD-ROM、數位容#外4 位夕功成碟（DVD，“Digital VersatileSome combination of the two. Most of the basic configuration is illustrated in Figure 1 by the dashed line 1〇6. Additionally, the device 100 may also have additional features/workouts, not to say that the device 100 may also include additional storage (removable and/or non-removable & include (but not limited to) disks or Disc or tape. This additional storage is shown in Figure 1 with removable storage π, pirate 109 and non-removable storage 110. The computer storage media will search for volatility and Non-volatile, removable and non-removable media, implemented in Chess, 丄丄T in any method or technology for information (eg electrical & readable instructions, data cooker , storage, programming, or other data storage. Memory 1 0 4. Removable sex 6 J removed storage 109 and non-removable storage 1 1 0 are computer storage media Examples of f / τ shimmering. Computer storage media include (but are not limited to) RAM, ROM, EFPp λλ, PR〇M 'flash memory or other memory technology, CD-ROM, digital location Disc (DVD, "Digital Versatile

Disk”）或其他光學儲存芎、伃盗磁卡、磁帶、磁碟儲存器或其他磁性儲存裝置、或可姑被使用以儲存期望的資訊且可由裝置100所存取之其他任柯拔鹛2 7媒體。任何此種電腦儲存媒體可為裝置100之一部份。 200905474 計算裝置100包含一或多個通訊連接丨14，其允許計窗裝置100與其他電腦/應用115進行通訊。裝置1〇〇亦可具有輸入裝置112,例如鍵盤、滑鼠、光筆 '聲音輸入裴置、觸碰輸入裝置等。亦可包含輸出t置Π丨例如顯示器、揚聲器、印表機等。這些裝置為熟習此項技術者所熟知，故於此省略其說明。第2圖為操作於第丨圖之電腦系統的一個實作之中央處理單元102之更為詳細的圖解視圖。CPU102之快取103包含軟體可存取的快取元資料1 〇 5。此元資料（簡稱）係額外的，與經快取的資料相關聯之軟體可存取的狀態。除以下所述之快取元資料控制暫存器以外’快取元資料狀態僅對於其對應的資料係被快取的時間間隔被保留。於所示的範例中’軟體可存取的元資料105包含每一虛擬位址元資料（此處稱為VAMD)106、每一快取列元資料（此處稱為 CLMD)107、及/或快取元資料控制暫存器（此處稱為 CMD)108。於 CPU快取 103 之 VAMD106、CLMD107、及/或 CMD108元資料可藉由一或多個軟體應用而被存取，以改善個別的應用之操作。應了解的是，於其他實作中，一些、額外的及/或其他類型之元資料可被使用於相較於顯示於第2圖之快取、或該快取以外的其他硬體位置。舉例來說，待說明的V AMD狀態可位於分開的類快取結構（與cpu快取分離），且C L M D狀態可位於分開的集中或分散結構，其與快取一致系統互相作用，但係與C P U快取分離。然而，為了說明的目的，這些元資料105將在此處各種圖示中更加詳 12 200905474 細地討論，以描述用以改善軟體應用操作之一些技法。第3圖為一對於第1圖之系統實作額外的元資料（每一快取列每一邏輯處理器）說明範例性硬體指令1 7 0之圖表。此處所使用的術語邏輯處理器係指包含每一個一或多個實際CPU核心及/或共用單一快取之多線的CPU核心的硬體線内容（[0 07])。於所顯示的範例性指令1 70，每一邏輯處理器之快取列的每一四字組之每一虛擬位址元資料（VAMD, “Virtual Address Metadata”）係分配有四個位元的元資料，且每一邏輯處理器之快取列的每一快取列元資料 (CLMD，“Cache Line Metadata”）係分配有八個位元的元資料。這些位元分配本質上為範例，且其他分配位置可於其他實作中被使用。再者，於第3圖所示之範例及此處所討論者，V A M D係被指定至快取列中每一資料的6 4位元四字組。使用四字組於非限制的範例中的一個理由係因為其為目前最小的二的倍數的方塊大小，於此情形中，將永遠不會有兩個分開的經交易的物件於相同方塊。然而，應了解的是，於其他實作中，小於或大於四字組者可被使用於每一 VAMD且仍然利用到此處所討論之一些或所有的該多種技術。要注意的是，每一邏輯處理器每一快取列中僅一個 CLMD共用該快取；但是每一邏輯處理器每一快取列中可能有許多V AMD(—個四字組）。請參考第4圖，顯示一圖表，其說明每一快取每一邏輯處理器實作快取元資料控制暫存器之範例性硬體狀態 1 9 0。於所示之範例中，這些暫存器1 9 0控制及追蹤各種快 13 200905474 取元資料行為，包含CLMD收回概要、CLMD推測寫入、 CLMD預設值、VAMD預設值、交易處置器向量位址、及觸發交易處置器調用之CLMD收回遮罩。一些、全部及/或額外的快取層細節可被追縱為C M D的一部分。這些元資料之一些使用係於後續範例中描述。以下所示為用於包含顯示於第3及4圖之新V AMD、 CLMD及CMD指令之四個邏輯處理器系統的基線快取狀態及新的快取元資料狀態硬體狀態之「C語言型態硬體定義虛擬碼」 {C-language-esque hardware definition pseudocode)。新的狀態係以粗體字來強調。於一實作中，這些指令於每個核心為單獨使用的，但在邏輯處理器之間被共用。 // Simple exemplary 64 KB direct mapped LI d-cache const int NLPS = 4; // no. of logical processors const int NLINES = 1024; const int LINESIZE = 64; // line size (bytes) typedef void* VA; // virtual address typedef void* PA; // physical address typedef void* VALUE; // arbitrary integer or FP data typedef int LP; // logical processor no. typedef void (*HANDLER)();// function pointer e.g. address in instruction stream const int VAMDBITS = 4; // no. of virtual address metadata bits typedef bit VAMD【VAMDBITS1; // virtual address metadata “word” const int VAMDSTRIDE = 8; // no. of bytes covered per VAMD (e.g. quadword) const int NVAMDS = LINESIZE/VAMDSTRIDE; 14 200905474 const int CLMDBITS = 8; // no. of line metadata bits typedef bit CLMD[CLMDBITS]; // line metadata “word” const int CLMD SPECWRITE = 0;// bit position of special CLMD bit to // track and enforce speculative writes struct LINE { PA tag; enum MESI { Μ, E, S, I } mesi; byte data[LINESIZE]; VAMD vamds[NVAMDS][NLPS];// separate VAMD per quadword per log. proc. CLMD clmds[NLPS]; // separate CLMD per logical processorDisk") or other optical storage cartridge, pirate card, magnetic tape, disk storage or other magnetic storage device, or other removable device that can be used to store desired information and accessible by device 100. Media. Any such computer storage medium may be part of the device 100. 200905474 The computing device 100 includes one or more communication ports 14 that allow the windowing device 100 to communicate with other computers/applications 115. It can also have an input device 112, such as a keyboard, a mouse, a light pen 'sound input device, a touch input device, etc. It can also include an output t such as a display, a speaker, a printer, etc. These devices are familiar with this item. It is well known to the skilled person, so the description thereof is omitted here. Fig. 2 is a more detailed diagrammatic view of an implementation of the central processing unit 102 of the computer system of the second diagram. The cache 103 of the CPU 102 includes software storage. The cached metadata 1 〇 5. This meta-data (abbreviation) is an additional software-accessible state associated with the cached data, except for the cache metadata control described below. The 'cache metadata state' is only reserved for the time interval during which the corresponding data is cached. In the example shown, 'software accessible metadata 105 contains each virtual address meta data (here Called VAMD) 106, each cached metadata (herein referred to as CLMD) 107, and/or cache metadata control register (herein referred to as CMD) 108. VAMD 106 in CPU cache 103, CLMD 107, and/or CMD 108 metadata may be accessed by one or more software applications to improve the operation of individual applications. It should be understood that in other implementations, some, additional and/or other types The metadata can be used in comparison to the cache shown in Figure 2, or other hardware locations other than the cache. For example, the V AMD state to be described can be located in a separate class cache structure (and Cpu cache separation), and the CLMD state can be in a separate centralized or decentralized structure that interacts with the cache coherency system but is separated from the CPU cache. However, for illustrative purposes, these metadata 105 will be here. More detailed in various illustrations 12 200905474 Discuss to describe some techniques for improving the operation of software applications. Figure 3 is a diagram of an exemplary hardware instruction for the implementation of additional metadata for the system of Figure 1 (each cache processor per cache) The term logical processor as used herein refers to the hardwired content ([07]) of the CPU core containing each of one or more actual CPU cores and/or multiple lines sharing a single cache. For the exemplary instructions 1 70 shown, each virtual address metadata (VAMD, "Virtual Address Metadata") of each quadword of the cache line of each logical processor is assigned four bits. Metadata, and each cached metadata (CLMD, "Cache Line Metadata") of the cache of each logical processor is assigned metadata of eight bits. These bit allocations are essentially examples, and other allocation locations can be used in other implementations. Furthermore, in the example shown in Figure 3 and discussed herein, V A M D is assigned to the 64-bit quadword of each material in the cache column. One reason to use quads in the non-limiting paradigm is because it is the block size of the current smallest multiple of two, in which case there will never be two separate transacted objects in the same square. However, it should be understood that in other implementations, less than or greater than a quadword can be used for each VAMD and still utilize some or all of the various techniques discussed herein. It is to be noted that only one CLMD per cache line in each logical processor shares the cache; however, there may be many V AMDs (-four quads) in each cache column per logical processor. Referring to Figure 4, a diagram is shown which illustrates the exemplary hardware state of each cache processor implementing the cache metadata control register for each cache. In the example shown, these registers 1 90 control and track various fast 13 200905474 fetch data behaviors, including CLMD reclaim summary, CLMD speculative writes, CLMD presets, VAMD presets, transaction handler vectors. The address, and the CLMD retract mask that triggered the transaction handler call. Some, all, and/or additional cache layer details can be traced as part of the C M D. Some of these metadata are described in the examples that follow. The following is a description of the baseline cache state and the new cache metadata state hardware state for the four logical processor systems including the new V AMD, CLMD, and CMD instructions shown in Figures 3 and 4. Type hardware defines the virtual code" {C-language-esque hardware definition pseudocode). The new state is highlighted in bold. In one implementation, these instructions are used separately for each core but are shared between logical processors. // Simple classic 64 KB direct mapped LI d-cache const int NLPS = 4; // no. of logical processors const int NLINES = 1024; const int LINESIZE = 64; // line size (bytes) typedef void* VA; / / virtual address typedef void* PA; // physical address typedef void* VALUE; // arbitrary integer or FP data typedef int LP; // logical processor no. typedef void (*HANDLER)();// function pointer eg address in Instruction stream const int VAMDBITS = 4; // no. of virtual address metadata bits typedef bit VAMD[VAMDBITS1; // virtual address metadata "word" const int VAMDSTRIDE = 8; // no. of bytes covered per VAMD (eg quadword) Const int NVAMDS = LINESIZE/VAMDSTRIDE; 14 200905474 const int CLMDBITS = 8; // no. of line metadata bits typedef bit CLMD[CLMDBITS]; // line metadata “word” const int CLMD SPECWRITE = 0;// bit position of Special CLMD bit to // track and enforce speculative writes struct LINE { PA tag; enum MESI { Μ, E, S, I } mesi; byte data[LINESIZE]; VAMD vamds[NVAMDS][NLPS]; / separate VAMD per quadword per log. proc. CLMD clmds[NLPS]; // separate CLMD per logical processor

} struct CMD { // cache metadata CLMD clmd_evictions; // 〇Γ9ά line evictions+invals summary CLMD clmd_specwritesmask; // subset of CLMD bits that indicate// speculative writes CLMD clmd_default; // default line load CLMD value VAMD vamd一default; // default line load VAMD value // (copied to every quadwords* VAMDs) HANDLER dmd_eviction_handIer; // eviction handler address CLMD clmd^eviction^handle^mask; // eviction handler event mask } struct CACHE { LINE lines[NLINES]; ... CMD cmds[NLPS]; struct CORE { CACHE dcache; LP lp; // current logical processor no. } CORE core; 15 200905474 應強調的是’使用於此整個實施方式之抽象硬體定義虛擬碼本身不是可執行的軟體程式。反而，其以相對緊密的標記來表示新穎的快取元資料狀態及行為，熟練的電腦設計者接著必須再編碼成電路說明或具體硬體定義語言 (例如 Verilog 或 VHDL)。於以上所示之非限制的範例，對於共用64 KB L 1快取 (具有1 0 2 4 6 4位元組列）之一 4 -邏輯處理器範例，額外的快取元資料儲存負載為：4線* 8位元* 2 K (列）+ 4線* 4位元* 8K (四字組）=64 K位元+ 128 K位元=24 KB，或約整個快取大小的3 7 %。如先前所注意的，顯示於此假設範例之外’ CPU快取元資料之各種其他分配及/或安排可被使用。請參考第5圖，其係顯示中央處理單元200之圖解視圖’其說明中央處理單元200所提供之快取元資料改良的硬體指令集架構202及其與CPU快取222之快取元資料224的互動。於一實作中，指令集架構2〇2包含自動快取及處理器操作行為204、CMD指令206、VAMD指令208、CLMD指令 2 1 4、及線内容切換儲存/恢復延伸2 2 0。自動快取及處理器操作行為204可與各種元資料224互動，如第6圖至第8圖所更為詳述者。CMD指令206可與各種元資料226互動，如第 9圖所更為詳述者。VAMD指令208包含個別的指令210及快閃指令212兩者。VAMD指令208可與VAMD元資料228及/ 或其他元資料互動，如第10圖所更為詳述者。CLMD指令 214可與CLMD元資料230及/或其他元資料互動，如第丨1圖所更為詳述者。C L M D指令2 1 4包含個別的指令2 1 6及快閃 16 200905474 指令2 1 8。線内容切換儲存/恢復延伸2 2 0可與各種元資料 224互動，如第12圖所更為詳述者。第6圖為用於第5圖之中央處理單元200的範例性自動快取及處理器操作行為2 5 0之圖解視圖。這些行為延伸先前技術C P U +快取操作，以說明新的快取元資料狀態。此情形影響了初始化（有效）2 5 2、收回/無效2 5 8及核心重置操作 2 7 0。快取失敗之硬體處理期間，快取列初始化隱含地發生於快取列2 5 2。快取列初始化設定該列的C L M D為預設值 254，且設定其VAMD為預設值25 6。以下為範例性硬體定義虛擬碼，其顯示額外的初始化行為252以供設定每一邏輯處理器之預設CLMD254及預設VAMD256。這些階段係於第 7圖中更為詳細說明。 // load and validate the designated cache line void load_line(LINE& line) += { // here += denotes extension of baseline Ioad_line behavior // initialize each logical processor's clmd for (int Ip = 0; lp < NLPS; lp++) line.clmds[lp] = dcache.cmds[lp].clmd_default; "initialize each logical processor’s line’s vamds. for (int lp = 0; Ip < "NLPS; lp++) for (int i = 0; i < NVAMDS; i十+) line.vamds[i][lp] = dcache.cmds[Ip].vamd—default; } 收回/無' 效操作2 5 8係被設計以當收回快取列或使快取列無效時運行。當從一些邏輯處理器於龙快取之記憶體存取迫使一些有效快取列被再提議以保留最近存取的資料 17 200905474 時，收回發生。於此情形中，快取列之資料内容被廢除或寫回記憶體，而元資料内容被完全地廢除。當從另一核心之一致記憶體存取迫使一列在此核心之快取的此特定層從有效狀態轉變成無效狀態時，無效發生。有複數個指令，其係用以：當CPU判定應該收回快取列或使快取列無效時，傳播快取列CLMD至快取收回概要260。亦有複數個指令，其係用以：廢除C L M D及V A M D位元2 6 2。範例性硬體定義虛擬碼係顯示於後文，以供實作這些收回/無效行為 2 5 8。其係於第8圖中更為詳細說明》 // extend the baseline eviction behavior to also discard the line's cache metadata void evict」ine(LINE& line) += { discard_line(line); } // extend the baseline invalidation behavior to also discard the line’s cache metadata void invalidate_line(LINE& line) += { discard__line(Iine); // the cache line is being repurposed; discard the line’s cache metadata void discard_line(LINE& line) { for (int lp = 0; lp < NLPS; lp++) { // Accumulate an eviction summary: // Propagate the line’s CLMD metadata to its eviction summary via // a bitwise-or logical operation, dcache. cmds[lp].clmd—evictions |= line.clmds[ip]; // Invalidate line (don't write back the line) if it is modified but was // speculatively written by *any* logical processor on this core. 18 200905474 // if (line.mesi == MESI.M /* modified */ && (Iine.clmds[lp] & dcache.crrjds[lp].clmd_specwritesmask) != 0) line.mesi = MESI.I; 核心重置指令270亦包含以將所有元資料272歸零。將所有元資料272歸零之一範例性硬體指令係顯示於後文。於顯示之範例中，所有快取元資料模式狀態位元係被歸零且所有元資料被歸零。舉例來說，CLMD收回' CLMD推測寫入、預設CLMD值及預設VAMD值皆被設定為零。 // extend the baseline CPU core reset behavior to also reset the cache metadata void core_reset() += { vamd_and_all((VAMD)0); // hardware definition pseudocode follows below clmd_and_all((CLMD)0); for (LP lp = 0; ]p < NLPS; Ip++) {} struct CMD { // cache metadata CLMD clmd_evictions; // 〇Γ9ά line evictions+invals summary CLMD clmd_specwritesmask; // subset of CLMD bits that indicate// speculative writes CLMD clmd_default; // default line load CLMD value VAMD vamd-default; // default line load VAMD value // (copied to every quadwords* VAMDs) HANDLER dmd_eviction_handIer; // eviction handler address CLMD clmd^eviction^handle^mask; // eviction handler event mask } struct CACHE { LINE lines[NLINES]; CMD cmds[NLPS]; struct CORE { CACHE dcache; LP lp; // current logical processor no. } CORE core; 15 200905474 It should be emphasized that 'the abstract hardware definition virtual code itself used throughout this implementation Not an executable software program. Instead, it uses relatively close-labeling to represent the state and behavior of the novel cache metadata, and the skilled computer designer must then recode it into a circuit description or a specific hardware-defined language (such as Verilog or VHDL). For the non-limiting example shown above, for a 4-byte processor example sharing a 64 KB L 1 cache (with 1 0 2 4 4 4 byte columns), the additional cache metadata storage load is: 4 lines * 8 bits * 2 K (columns) + 4 lines * 4 bits * 8K (quad characters) = 64 K bits + 128 K bits = 24 KB, or about 3 7 % of the entire cache size . As noted previously, various other assignments and/or arrangements of CPU cache metadata shown outside of this hypothetical example can be used. Please refer to FIG. 5, which is a diagrammatic view showing the central processing unit 200, which illustrates the hardware instruction set architecture 202 of the cache metadata improved by the central processing unit 200 and its cache metadata with the CPU cache 222. 224 interactions. In one implementation, the instruction set architecture 2〇2 includes an automatic cache and processor operation behavior 204, a CMD instruction 206, a VAMD instruction 208, a CLMD instruction 2 1 4, and a line content switching storage/restore extension 2 2 0. The auto-cache and processor operational behaviors 204 can interact with various metadata 224, as described in more detail in Figures 6-8. The CMD instruction 206 can interact with various metadata 226, as described in more detail in FIG. The VAMD instruction 208 includes both the individual instructions 210 and the flash instructions 212. The VAMD command 208 can interact with the VAMD metadata 228 and/or other metadata, as described in more detail in FIG. The CLMD instruction 214 can interact with the CLMD metadata 230 and/or other metadata, as described in more detail in FIG. C L M D instruction 2 1 4 contains individual instructions 2 1 6 and flash 16 200905474 instruction 2 1 8. The line content switching storage/recovery extension 2 2 0 can interact with various metadata 224, as described in more detail in FIG. Figure 6 is a diagrammatic view of an exemplary automatic cache and processor operational behavior 250 for the central processing unit 200 of Figure 5. These behaviors extend the prior art C P U + cache operation to illustrate the state of the new cache metadata. This situation affects initialization (valid) 2 5 2, reclaim/invalid 2 5 8 and core reset operation 2 7 0. During hardware processing of cache failures, cache column initialization implicitly occurs in cache column 2 5 2 . The cache column initialization setting C L M D of the column is the preset value 254, and its VAMD is set to the preset value 25 6 . The following is an exemplary hardware defined virtual code that displays an additional initialization behavior 252 for setting the default CLMD 254 and the preset VAMD 256 for each logical processor. These stages are described in more detail in Figure 7. // load and validate the designated cache line void load_line(LINE& line) += { // here += s extension of baseline Ioad_line behavior // initialize each logical processor's clmd for (int Ip = 0; lp <NLPS; lp++ Line.clmds[lp] = dcache.cmds[lp].clmd_default; "initialize each logical processor's line's vamds. for (int lp = 0; Ip <"NLPS; lp++) for (int i = 0; i <NVAMDS; i ten +) line.vamds[i][lp] = dcache.cmds[Ip].vamd—default; } Retract/No Effect 2 5 8 is designed to reclaim the cache line or make Run when the cache column is invalid. When the memory access from some logical processors to the Dragon cache forces some valid cache columns to be re-proposed to retain the most recently accessed data 17 200905474, the retraction occurs. In this case, the data content of the cached column is abolished or written back to the memory, and the metadata content is completely abolished. Invalidation occurs when a consistent memory access from another core forces a particular layer of cache at this core to transition from an active state to an inactive state. There are a plurality of instructions for: when the CPU determines that the cache line should be retracted or invalidates the cache line, the cache line CLMD is propagated to the cache retrieval summary 260. There are also a number of instructions for abolishing C L M D and V A M D bits 2 6 2 . Exemplary Hardware The definition of the virtual code system is shown below for the implementation of these reclaimed/invalid actions 2 5 8 . It is described in more detail in Figure 8. // extend the baseline eviction behavior to also discard the line's cache metadata void evict"ine(LINE& line) += { discard_line(line); } // extend the baseline invalidation Behavior to also discard the line's cache metadata void invalidate_line(LINE& line) += { discard__line(Iine); // the cache line is being repurposed; discard the line's cache metadata void discard_line(LINE& line) { for (int lp = 0; lp <NLPS; lp++) { // Accumulate an eviction summary: // Propagate the line's CLMD metadata to its eviction summary via // a bitwise-or logical operation, dcache. cmds[lp].clmd—evictions |= Line.clmds[ip]; // Invalidate line (don't write back the line) if it is modified but was // speculatively written by *any* logical processor on this core. 18 200905474 // if (line.mesi = = MESI.M /* modified */ && (Iine.clmds[lp] & dcache.crrjds[lp].clmd_specwritesmask) != 0) line.mesi = MESI.I; Core Reset Command 270 also contains All metadata 272 to zero. An exemplary hardware command system that returns all metadata 272 to zero is shown below. In the example shown, all cache metadata mode status bits are zeroed and all metadata is zeroed. For example, CLMD retracts 'CLMD speculative writes, preset CLMD values, and preset VAMD values are set to zero. // extend the baseline CPU core reset behavior to also reset the cache metadata void core_reset() += { vamd_and_all((VAMD)0); // hardware definition pseudocode follows below clmd_and_all((CLMD)0); for (LP lp = 0; ]p <NLPS; Ip++) {

CMD& cmd = dcache.cmdsflp]; cmd.clmd_evictions = 0; cmd.clmd_specwrites = 0; crnd.clmd_default = 0; cmd.vamd_default = 0; cmd_clmd_eviction一handler = 0; cmd.clmd_eviction_handler_mask = 0; 請參考第7圖，其係詳細說明涉及載入一快取列及初始 19 200905474 化一些快取元資料為預設值之階段的一個實作（相較於最初說明為第6圖之初始化指令2 5 2的一部份）。於一形式’第 7圖之處理係被實作於計算裝置1〇〇之硬體。該處理開始於開始點290，該CPU載入一快取列（階段292)。對於每一邏輯處理器，CLMD元資料係被初始化為其邏輯處理器特定的預設值（階段294)。對於每一邏輯處理器，VAMD元資料係被類似地初始化為其LP特定的預設值（階段296)。該處理結束於結束點3 0 0。於第8圖中，其係更為詳細顯示涉及收回一快取列或使一快取列無效之階段的一個實作（相較於最初說明為第6圖之收回/無效2 5 8的一部份）。於一形式，第8圖之處理係被實作於計算裝置1 〇〇之硬體。該處理開始於開始點3丨〇，該 C P U判定其係收回一快取列或使一快取列無效之時候（階段3 12)。邏輯處理器之CLME>係被傳播至該快取的收回概要 (階段314)。該列的CLMD及VAMD位元係接著被廢除（階段 3 1 6)。若該快取列係被隱含地推測地寫入，則其首先被無效化（不寫回主記憶體階段3丨8)。該處理结束於結束點 320 ° 第9圖係針對第5圖之中央處理單元2〇〇的範例性cmD 指令之圖解視圖圖。關於此處所介紹的所有新的指令，這些指令延伸並補充由習知CPU提供之硬體指令的一基本集 °換句話說，其延伸基本指令集架構。這些硬體指令係藉由权體來使用，以與各種快取元資料狀態互動，並控制快取兀資料行為。舉例來說有複數個指彳，其係用以： 20 200905474 設定及得到V A M D預設值控制暫存器（3 3 2及3 3 4);用以及得到CLMD預設值控制暫存器（336及338)。可被使用供此功能性以供設定及得到這些預設值的硬體指令定擬碼之範例係顯示於後文。（此處，於—功能之硬體定擬瑪屬性指令’指定硬體操作被（或可被）做成可用的作為新的C P U指令以供軟體隱含地使用。於此情形，能名稱暗示CPU指令名稱且其功能參數及傳回值暗示的指令之輸入及輸出參數’一般會表示為程式設計師暫存器或隱含的條件碼）。 II SET_VAMD_DEFAULT; // Set the current default VAMD control register for this logical processor, instruction void cache_set_vamd一default(VAMD vamd) { dcache.cmds[lp].vamd_default = vamd; // GET_VAMD_DEFAULT: // Get the current default VAMD control register for this logical processor, instruction VAMD cache_get_vamd_default() { return dcache.cmds[lp].vamd_default; 設定以提義虛義虛，以該功對應特定CMD& cmd = dcache.cmdsflp]; cmd.clmd_evictions = 0; cmd.clmd_specwrites = 0; crnd.clmd_default = 0; cmd.vamd_default = 0; cmd_clmd_eviction_handler = 0; cmd.clmd_eviction_handler_mask = 0; , which is a detailed description of the stage involved in loading a cache line and the initial 19 200905474 some cache metadata as a preset value (compared to the initial description of the initialization instruction 2 5 2 of Figure 6 Part). The processing in a form 'Fig. 7' is implemented in the hardware of the computing device. The process begins at start point 290, which loads a cache column (stage 292). For each logical processor, the CLMD metadata is initialized to its logical processor specific preset (stage 294). For each logical processor, the VAMD metadata is similarly initialized to its LP-specific preset (stage 296). The process ends at the end point 300. In Figure 8, it shows in more detail an implementation involving the phase of retrieving a cached column or invalidating a cached column (compared to the retracted/invalid 2 5 8 of Figure 6 originally described) Part). In one form, the processing of Fig. 8 is implemented in the hardware of the computing device 1 . The process begins at start point 3, which determines whether it is reclaiming a cached column or invalidating a cached column (stage 3 12). The CLME> of the logical processor is propagated to the reclaimed summary of the cache (stage 314). The CLMD and VAMD bits of this column are then abolished (stage 3 16). If the cache line is implicitly speculatively written, it is first invalidated (not written back to main memory stage 3丨8). The process ends at an end point 320 °. Figure 9 is a diagrammatic view of an exemplary cmD instruction for the central processing unit 2A of Figure 5. With respect to all of the new instructions presented herein, these instructions extend and complement a basic set of hardware instructions provided by conventional CPUs. In other words, they extend the basic instruction set architecture. These hardware commands are used by the right to interact with various cache metadata states and control the behavior of the cache. For example, there are a plurality of fingerprints, which are used to: 20 200905474 Set and get the VAMD preset value control register (3 3 2 and 3 3 4); use and get the CLMD preset value control register (336) And 338). Examples of hardware instructional codes that can be used for this functionality for setting and obtaining these preset values are shown below. (Here, the function-defined hardware-like attribute instruction' specifies that the hardware operation is (or can be made) available as a new CPU instruction for implicit use by the software. In this case, the name can be implied. The input and output parameters of the CPU instruction name and its function parameters and return values implied are generally expressed as a programmer register or an implied condition code. II SET_VAMD_DEFAULT; // Set the current default VAMD control register for this logical processor, instruction void cache_set_vamd-default(VAMD vamd) { dcache.cmds[lp].vamd_default = vamd; // GET_VAMD_DEFAULT: // Get the current default VAMD control Register for this logical processor, instruction VAMD cache_get_vamd_default() { return dcache.cmds[lp].vamd_default; Set to clarify the virtual meaning of the virtual, to correspond to the specific

// SET_CLMD_DEFAULT: // Set the current default CLMD control register for this logical processor, instruction void cache_set_clmd_default(CLMD clmd) { dcache.cmds[lp].clmd—default = clmd; // GET_CLMD_DEFAULT: // Set the current default CLMD control register for this logical processor. 21 200905474 instruction CLMD cache_get_clmd_default() { return dcache.cmds[lp].clmd_default; } 於一實作中，C M D指令3 3 0亦包含用以設定及得到 CLMD推測寫入控制暫存器（342及344)之指令，其設定及得// SET_CLMD_DEFAULT: // Set the current default CLMD control register for this logical processor, instruction void cache_set_clmd_default(CLMD clmd) { dcache.cmds[lp].clmd—default = clmd; // GET_CLMD_DEFAULT: // Set the current default CLMD 21 200905474 instruction CLMD cache_get_clmd_default() { return dcache.cmds[lp].clmd_default; } In one implementation, CMD instruction 3 3 0 also includes setting and getting CLMD speculative write control The instructions of the registers (342 and 344), which are set and obtained

到控制暫存器，該控制暫存器判定哪個指示該列之CLMD 位元已被推測地寫入。可被使用以設定及得到推測寫入的硬體指令定義虛擬碼之範例係顯示如後文。To the control register, the control register determines which of the CLMD bits indicating the column has been speculatively written. An example of a hardware instruction definition virtual code that can be used to set and get speculatively written is shown below.

// SET_CLMD_SPECWRITES: // Set the current speculative writes CLMD mask control register for this logical processor, instruction void cache set clmd_specwrites(CLMD mask) { dcache.cmds[lp].clmd_specwrites = mask; // GET_CLMD_SPECWRITES: // Get the current speculative writes CLMD mask control register for this logical processor, instruction CLMD cache_get<_clmd_specwrites(CLMD mask) { return dcache.cmds[lp].clmd_specwrites; } 於一實作中，CMD指令3 3 0包含用以設定及得到CLMD 收回概要控制暫存器（3 4 4及3 4 6)之指令。可被使用以設定及得到C L M D收回的硬體指令定義虛擬碼之範例係顯示如後文。 // SET_CLMD_EVICTIONS: // Set the current CLMD evictions summary control register for this logical processor, instruction void cache_set_clmd_evictions(CLMD clmd) { dcache.cmds[lp].clmd_evictions = clmd; 22 200905474 // GET_CLMD_EVICTIONS: // Get the current CLMD evictions summary control register for this logical processor, instruction CLMD cache_get_dmd_evictions() { return dcache.cmds[lp].clmd_evictions; }// SET_CLMD_SPECWRITES: // Set the current speculative writes CLMD mask control register for this logical processor, instruction void cache set clmd_specwrites(CLMD mask) { dcache.cmds[lp].clmd_specwrites = mask; // GET_CLMD_SPECWRITES: // Get the current Speculative writes CLMD mask control register for this logical processor, instruction CLMD cache_get<_clmd_specwrites(CLMD mask) { return dcache.cmds[lp].clmd_specwrites; } In one implementation, CMD instruction 3 3 0 includes setting and getting CLMD Retract the instructions of the summary control register (3 4 4 and 3 4 6). An example of a hardware code definition virtual code that can be used to set and get a C L M D retraction is shown below. // SET_CLMD_EVICTIONS: // Set the current CLMD evictions summary control register for this logical processor, instruction void cache_set_clmd_evictions(CLMD clmd) { dcache.cmds[lp].clmd_evictions = clmd; 22 200905474 // GET_CLMD_EVICTIONS: // Get the current CLMD Evictions summary control register for this logical processor, instruction CLMD cache_get_dmd_evictions() { return dcache.cmds[lp].clmd_evictions;

於一實作中，CMD指令330包含用以設定及得到CLMD 收回處置器位址及處置器遮罩控制暫存器（190)之指令。可被使用以設定及得到CLMD收回的硬體指令定義虛擬碼之範例係顯示如後文。In one implementation, the CMD instruction 330 includes instructions for setting and obtaining the CLMD reclaim handler address and the handler mask control register (190). An example of a virtual code that can be used to set and get a CLMD reclaimed hardware instruction is shown below.

// SET_CLMD_EVICTION_HANDLER: // Set the current CLMD eviction handler register for this logical processor, instruction void cache_set一clmd—eviction handler(HANDLER handler) { dcache.cmds[lp].clmd一eviction handler = handler; } // GET_CLMD_EVICTION_HANDLER: // Get the current CLMD evictions summary control register for this logical processor, instruction HANDLER cache_get_clmd_evictions() { return dcache.cmds[lp].clmd_eviction_handler; } // SET_CLMD_EVICTION_HANDLER_MASK: // Set the current CLMD eviction handler mask control register for this logical processor, instruction void cache_set_clmd_eviction_handler(CLMD clmd) { dcache.cmds[]p]tclmd_eviction__handler_mask = clmd; } // GET_CLMDJEVICTION_HANDLER_MASK: // Get the current CLMD evictions handler mask control register for this logical processor, instruction CLMD cache_get_clmd_eviction_handler_mask() { return dcache.cmds[lp].clmd_eviction_handler_mask; 23 200905474 於一實作中，CMD指令330包含用以利用快閃清除設定C L M D值於所有快取列3 4 8來條件式地測試收回之體指令。此可被使用於有限交易式記憶體系統或用於其目的，如此處所更為詳細說明者。可被使用以利用快閃除/設定來條件式地測試收回之硬體指令定義虛擬碼之例係顯示如後文。 // COND_TEST_EVICTIONS_AND_OR_ALL: // Atomically test whether any specific CLMD bits* evictions or invalidations // have occurred; // and if not, flash clear (via AND) and flash set (via OR) specific CLMD bit positions. instruction bool cache_cond_test_evictions_and_or_all( CLMD clmd, // mask, specifies noteworthy CLMD eviction bits CLMD and_mask, // mask, specifies CLMD bit positions to retain 一 (AND) CLMD or_mask) // mask, specifies CLMD bit positions to set (OR) {// SET_CLMD_EVICTION_HANDLER: // Set the current CLMD eviction handler register for this logical processor, instruction void cache_set a clmd-eviction handler(HANDLER handler) { dcache.cmds[lp].clmd-eviction handler = handler; } // GET_CLMD_EVICTION_HANDLER: / / Set the current CLMD eviction squir Processor, instruction void cache_set_clmd_eviction_handler(CLMD clmd) { dcache.cmds[]p]tclmd_eviction__handler_mask = clmd; } // GET_CLMDJEVICTION_HANDLER_MASK: // Get the current CLMD evictions handler mask control register for this logical processor, instruction CLMD cache_get_clmd_eviction_handler_mask() { return dcache .cmds[lp].clmd_eviction_handler_mask; 23 200905474 In one implementation, the CMD instruction 330 is included to utilize flash clearing C L M D value given to all cache lines 348 to the instruction of the conditional expression recovered to the test body. This can be used in a limited transactional memory system or for its purposes, as explained in more detail herein. A system display that can be used to conditionally test a reclaimed hardware command definition virtual code using flash/setup is shown below. // COND_TEST_EVICTIONS_AND_OR_ALL: // Atomically test whether any specific CLMD bits* evictions or invalidations // have occurred; // and if not, flash clear (via AND) and flash set (via OR) specific CLMD bit positions. instruction bool cache_cond_test_evictions_and_or_all( CLMD clmd, // mask,specific noteworthy CLMD eviction bits CLMD and_mask, // mask, specifies CLMD bit positions to retain one (AND) CLMD or_mask) // mask, specifies CLMD bit positions to set (OR) {

及硬他清範 // 'atomic' means the inner block happens instantaneously, without // intervening interference from nor impact upon other CPUs or agents // in the system atomic { // Determine if there were any evictions of interest CLMD evictions = cache_get_clmd_evictions(); if ((evictions & clmd) == 0) {And the hard block of the / / 'atomic' means the inner block happens instantaneously, without // intervening interference from nor impact upon other CPUs or agents // in the system atomic { // Determine if there are any evictions of interest CLMD evictions = cache_get_clmd_evictions (); if ((evictions & clmd) == 0) {

// If not AND and then OR the bit masks over all CLMD // metadata in the cache. clmd_and_all(and_mask); clmd_or_all(or_mask); return true; } else { 24 200905474 return false; } } } 於一實作中，CMD指令3 30包含用以根據CLMD來條件式地廢除快取列3 4 9之硬體指令。可被使用以條件式地廢除快取列的硬體指令定義虛擬碼之範例係顯示如後文。// If not AND and then OR the bit masks over all CLMD // metadata in the cache. clmd_and_all(and_mask); clmd_or_all(or_mask); return true; } else { 24 200905474 return false; } } } In a implementation The CMD instruction 3 30 includes a hardware instruction for conditionally abolishing the cache line 394 according to the CLMD. An example of a hardware instruction definition virtual code that can be used to conditionally revoke a cache column is shown below.

// COND_DISCARD // Conditionally flash clear all cache lines of this logical processor with CLMDs // with specific CLMD bit positions set. instruction void cache_cond_discard(CLMD clmd) { for (int i = 0; i < NLINES; i++) { if ((dcache.lines[i].clmds[Ip] & clmd) != 0) { discard_line(dcache. lines [i]); CMD指令亦可包含用以得到特定資料之實作的大小 (例如得到快取列大小3 5 0、得到VAMD位元3 52、得到VAMD 步幅354、或得到CLMD大小3 56)之指令。可被使用以得到這些基本元資料值大小的範例硬體指令定義虚擬碼係顯示如後文。 // GET_CACHE_LINE_SIZE instruction unsigned get_cache_line_size() { return LINESIZE; } // GET_VAMD_BITS: // Return implemented no. of VAMD_BITS (no. of bits in a VAMD). instruction unsigned get_vamd_bits() { 25 200905474 return VAMD_BITS; } // GET VAMD STRIDE: // Return implemented VAMD_STRIDE bytes (no. of data bytes per VAMD). instruction unsigned get_vamd_stride() { return VAMD_STRIDE; } // GET_CLMD_BITS:// COND_DISCARD // Conditionally flash clear all cache lines of this logical processor with CLMDs // with specific CLMD bit positions set. instruction void cache_cond_discard(CLMD clmd) { for (int i = 0; i <NLINES; i++) { if ((dcache.lines[i].clmds[Ip] & clmd) != 0) { discard_line(dcache. lines [i]); The CMD instruction can also contain the size of the implementation used to get the specific data (eg get The instruction to cache the column size 3 50, get the VAMD bit 3 52, get the VAMD step 354, or get the CLMD size 3 56). An example hardware instruction that can be used to obtain these basic metadata value sizes defines the virtual code system as shown below. // GET_CACHE_LINE_SIZE instruction unsigned get_cache_line_size() { return LINESIZE; } // GET_VAMD_BITS: // Return implemented no. of VAMD_BITS (no. of bits in a VAMD). instruction unsigned get_vamd_bits() { 25 200905474 return VAMD_BITS; } // GET VAMD STRIDE: // Return implemented VAMD_STRIDE bytes (no. of data bytes per VAMD). instruction unsigned get_vamd_stride() { return VAMD_STRIDE; } // GET_CLMD_BITS:

// Return implemented no. of CLMD_BITS (no. of bits in a CLMD). instruction unsigned get_clmd_bits() { return CLMD_BITS; } 一替代實作可透過一替代機制（例如一般目的之CPUID)指令來提供此實作特定參數資料。請參考第10圖，其係顯示用於第5圖之中央處理單元 2 0 0的範例性V A M D指令之圖解視圖。V A M D指令3 7 0包含個別的指令3 72(其一次存取特定V AMD)及快閃指令3 8 8 (其應用於所有於該快取之所有V A M D)。個別的指令3 7 2可包含用於實作一 VAMD得到3 76、一 VAMD設定3 78、一 VAMD測試及設定3 82、一 VAMD選擇性清除（及）3 84、及一 VAMD選擇性設定（或）3 86之指令。於此介紹一種私人的V A M D助手虛擬碼功能，其（如一讀取資料指令）取得一位址「ν a」並保證其資料被快取；接著傳回一參考（其於硬體，可為一控制訊號位元型樣）至對應至位元組的四字組之該特定V A M D (於位址v a)。 private VAMD& vamd_va(VA va) { validatejine(va); return dcache.lines[line_va(va)].vamds[offset_va(va)][lp]; 26 200905474 VAMD得到指令376選擇並傳回特定VAMD的目前 (其對於特定位址為適當的）。VAMD設定指令3 7 8對於特位址儲存一 VAMD。對於得到及設定指令之範例硬體定虛擬碼指令係顯示於後文。// Return implemented no. of CLMD_BITS (no. of bits in a CLMD). instruction unsigned get_clmd_bits() { return CLMD_BITS; } An alternative implementation can provide this implementation via an alternate mechanism (eg general purpose CPUID) instruction Specific parameter data. Please refer to FIG. 10, which is a diagrammatic view showing an exemplary V A M D instruction for the central processing unit 200 of FIG. The V A M D instruction 307 contains a separate instruction 3 72 (which accesses a particular V AMD at a time) and a flash instruction 3 8 8 (which applies to all V A M D of the cache). Individual instructions 3 7 2 may include implementation of a VAMD to obtain 3 76, a VAMD setting 3 78, a VAMD test and setting 3 82, a VAMD selective clear (and) 3 84, and a VAMD selective setting ( Or) 3 86 instructions. This introduces a private VAMD assistant virtual code function, which (such as a read data command) obtains a single address "ν a" and guarantees that its data is cached; then returns a reference (which is a hardware, can be a Control signal bit pattern) to the particular VAMD (at address va) corresponding to the quad of the byte. Private VAMD& vamd_va(VA va) { validatejine(va); return dcache.lines[line_va(va)].vamds[offset_va(va)][lp]; 26 200905474 VAMD gets instruction 376 to select and return the current VAMD (It is appropriate for a specific address). The VAMD setting command 3 7 8 stores a VAMD for the special address. The example hardware for the get and set instructions is shown in the following text.

// VAMD_GET // Return the current VAMD for the datum at address 'va'.// VAMD_GET // Return the current VAMD for the datum at address 'va'.

值定義 // If the datum wasn't already in cache, it is now! instruction VAMD vamd_get(VA va) { return vamd_va(va); }Value Definition // If the datum wasn't already in cache, it is now! instruction VAMD vamd_get(VA va) { return vamd_va(va);

// VAMD_SET // Set the current VAMD for the datum at the specified address 'va5. // If the datum wasn't already in cache, it is now! instruction void vamd_set(VAMD vaxnd, VA va) { vamd_va(va) = vamd; } 用大入定發虛 VAMD測試指令3 8 0對於特定位址擷取該VAMD、使該VAMD及該遮罩執行一 AND操作、及比較其結果。於部份CPU的基本指令集架構，此比較結果係被典型地寫至條件碼暫存器或至一般目的暫存器。VAMD測試及設指令3 8 2自動地測試及設定該位址並接著傳回在該設定生之前被讀取者。對於這兩個測試之範例硬體指令定義擬碼係顯示於後文。// VAMD_SET // Set the current VAMD for the datum at the specified address 'va5. // If the datum wasn't already in cache, it is now! instruction void vamd_set(VAMD vaxnd, VA va) { vamd_va(va) = vamd; } The virtual VAMD test command 380 is used to retrieve the VAMD for a particular address, to perform an AND operation on the VAMD and the mask, and to compare the results. In the basic instruction set architecture of some CPUs, this comparison is typically written to the condition code register or to the general purpose register. The VAMD test and setup command 3 8 2 automatically tests and sets the address and then returns the person who was read before the setup. The sample hardware definitions for these two tests are shown below.

// VAMD_TEST // Return true if all of the specified VAMD bits for the VAMD at ‘va’ are set. instruction bool vamd_test(VAMD vamd, VA va) { return (vamd_va(va) & vamd) == vamd; 27 200905474// VAMD_TEST // Return true if all of the specified VAMD bits for the VAMD at 'va' are set. instruction bool vamd_test(VAMD vamd, VA va) { return (vamd_va(va) & vamd) == vamd; 27 200905474

// VAMD_TEST—AND_SET // Return true if all of the specified VAMD bits for the VAMD at 'va5 are set; // then set the specified bits. instruction bool vamd_test_and_set(VAMD vamd, VA va) { atomic { bool ret = vamd_test(vamd, va); vamd—or(vamd，va); return ret; } } VAMD選擇性清除指令3 84選擇性地清除VAMD且 VAMD選擇性設定指令3 86選擇性地設定VAMD，其將進一步說明於以下之硬體指令。// VAMD_TEST—AND_SET // Return true if all of the specified VAMD bits for the VAMD at 'va5 are set; // then set the specified bits. instruction bool vamd_test_and_set(VAMD vamd, VA va) { atomic { bool ret = vamd_test (vamd, va); vamd-or(vamd, va); return ret; } } VAMD selective clear instruction 3 84 selectively clears VAMD and VAMD selective setting command 386 selectively sets VAMD, which will further illustrate The following hardware instructions.

// VAMD一AND // Bitwise-AND the VAMD mask against the VAMD for the specified address 6va\ // This may be used to clear certain VAMD bits, instruction VAMD vamd_and(VAMD vamd, VA va) { return vamd_va(va) &= vamd; }// VAMD_AND // Bitwise-AND the VAMD mask against the VAMD for the specified address 6va\ // This may be used to clear certain VAMD bits, instruction VAMD vamd_and(VAMD vamd, VA va) { return vamd_va(va) &= vamd; }

// VAMD_OR // Bitwise-OR the VAMD mask against the VAMD for the specified address 'va*. instruction VAMD vamd_or(VAMD vamd, VA va) { return vamd一va(va) |= vamd; } 替代地或除對於每一 V A M D快取列之個別的指令3 7 2 之外，整個快取快閃V A M D指令3 8 8可被提供。舉例來說’ 快閃清除（‘AND ALL’）指令3 90可隨著快閃設定（‘〇R_ALL’）指令3 92被提供。於以下所示之範例硬體指令定義虛擬碼， 28 200905474 VAMD — AND — ALL指令係被設計以快閃清除用於A邏輯處理器之每一快取列的所有VAMD中所指定的VAMD位元’且 VAMD_OR_ALL指令係被設計以類似地快閃設定用於此邏輯處理器之每一快取列的所有VAMD。// VAMD_OR // Bitwise-OR the VAMD mask against the VAMD for the specified address 'va*. instruction VAMD vamd_or(VAMD vamd, VA va) { return vamd-va(va) |= vamd; } alternatively or in addition to In addition to the individual instructions 3 7 2 of each VAMD cache, the entire cache flash VAMD instruction 388 can be provided. For example, the 'AND ALL' command 3 90 may be provided with a flash setting ('〇R_ALL') instruction 3 92. The virtual code is defined in the example hardware instructions shown below, 28 200905474 VAMD — AND — The ALL command is designed to flash clear the VAMD bits specified in all VAMDs for each cache column of the A logical processor. 'And the VAMD_OR_ALL instruction is designed to flash similarly set all VAMDs for each cache column of this logical processor.

// VAMD-AND—ALL // Flash bitwise-AND the specified mask over all the current logical processor's // VAMDs. instruction void vamd_and_all(VAMD vamd) { for (int i = 0; i < NLINES; i++) for (int j = 0； j < NVAMDS; j++) dcache.lines[i].vamds[j][lp] &= vamd; }// VAMD-AND-ALL // Flash bitwise-AND the specified mask over all the current logical processor's // VAMDs. instruction void vamd_and_all(VAMD vamd) { for (int i = 0; i <NLINES; i++) for ( Int j = 0; j <NVAMDS; j++) dcache.lines[i].vamds[j][lp] &= vamd;

// VAMD—OR一ALL // Flash bitwise-OR the specified mask over all the current logical processor^ // VAMDs. instruction void vamd_or_aIl(VAMD vamd) { for (int i = 0; i < NLINES; i++) for (int j = 0； j < NVAMDS; j++) dcache.Iines[i].vamds[j][Ip] |= vamd; 請參考第11圖，其係顯示用於第5圖之中央處理單元 200的範例性CLMD指令410之圖解視圖。於一實作中，有個別的指令412(其一次存取一個特定CLMD)及快閃指令 42 8(其應用所有CLMD)。個別的指令可包含用於CLMD得到（CLMD get) 416、CLMD 設定 418、CLMD 測試 420、CLMD 選擇性清除（及）422、CLMD選擇性設定（或）424、及CLMD 條件式儲存426之指令。這些個別的指令係如同關於VAMD 指令所述般操作，但於此簡略地論述C L M D内容。舉例來 29 200905474 說，得到及設定指令（4 16及41 8)得到及設定CLMD之值。該測試指令對於該特定位址擷取該CLMD、、利用該CLMD及該遮罩執行一 AN D操作、及比較該結果。若於該遮罩之位元皆被設定，則該測試傳回真值。選擇性清除及選擇性設定（422及424)對於該CLMD個別地執行選擇性清除或設定。硬體指令定義虛擬碼之範例係顯示如下。 // (Helper pseudo-function.) // Ensure the data for the cache line of data addressed by 'va5 is valid in the cache; then // return a reference to the line’s CLMD. private CLMD& clmd_va(VA va) { validate_line(va); return dcache.lines[line_va(va)].clmds[lp]; }// VAMD—OR a ALL // Flash bitwise-OR the specified mask over all the current logical processor^ // VAMDs. instruction void vamd_or_aIl(VAMD vamd) { for (int i = 0; i <NLINES; i++) for (int j = 0; j <NVAMDS; j++) dcache.Iines[i].vamds[j][Ip] |= vamd; Please refer to FIG. 11 which shows the central processing unit 200 for FIG. A graphical view of an exemplary CLMD instruction 410. In one implementation, there are individual instructions 412 (which access a particular CLMD at a time) and flash instructions 42 8 (which apply all CLMDs). Individual instructions may include instructions for CLMD get (CLMD get) 416, CLMD set 418, CLMD test 420, CLMD selective clear (and) 422, CLMD selective set (or) 424, and CLMD conditional store 426. These individual instructions operate as described with respect to the VAMD instruction, but the C L M D content is briefly discussed herein. For example, 29 200905474 says that the get and set commands (4 16 and 41 8) get and set the value of CLMD. The test command retrieves the CLMD for the particular address, performs an AN D operation using the CLMD and the mask, and compares the results. If the location of the mask is set, the test returns true. Selective clearing and selective settings (422 and 424) perform selective clearing or setting individually for the CLMD. An example of a hardware instruction definition virtual code is shown below. // (Helper pseudo-function.) // Ensure the data for the cache line of data addressed by 'va5 is valid in the cache; then // return a reference to the line's CLMD. private CLMD& clmd_va(VA va) { Validate_line(va); return dcache.lines[line_va(va)].clmds[lp]; }

// CLMD_GET // Return the current CLMD for the specified address ‘va’. instruction CLMD clmd_get(VA va) { return clmd_va(va); }// CLMD_GET // Return the current CLMD for the specified address ‘va’. instruction CLMD clmd_get(VA va) { return clmd_va(va);

// CLMD_SET // Set the current CLMD for the specified address ‘va’. instruction void clmd_set(CLMD clmd, VA va) { clmd_va(va) = clmd; }// CLMD_SET // Set the current CLMD for the specified address ‘va’. instruction void clmd_set(CLMD clmd, VA va) { clmd_va(va) = clmd;

// CLMD_TEST // Return true if all of the specified CLMD bits for the CLMD for the specified address // ‘va’ are set. instruction bool clmd_test(CLMD clmd, VA va) { return (clmd_va(va) & clmd) == clmd; }/ / CLMD_TEST / / / / / / / / / / / / / / / / / / / / / == clmd; }

//CLMD AND 30 200905474 // Bitwise-AND the CLMD mask against the CLMD for the specified address Wa5. instruction CLMD clmd_and(CLMD clmd, VA va) { return clmd_va(va) &= clmd; }//CLMD AND 30 200905474 // Bitwise-AND the CLMD mask against the CLMD for the specified address Wa5. instruction CLMD clmd_and(CLMD clmd, VA va) { return clmd_va(va) &= clmd;

// CLMD_OR // Bitwise-OR the CLMD mask against the CLMD for the specified address instruction CLMD clmd_or(CLMD clmd, VA va) { return clmd_va(va) |= clmd; } 於一實作中，c L M D條件式儲存指令4 2 6係被使用於有 f// CLMD_OR // Bitwise-OR the CLMD mask against the CLMD for the specified address instruction CLMD clmd_or(CLMD clmd, VA va) { return clmd_va(va) |= clmd; } In a implementation, c LMD conditional storage Command 4 2 6 is used for f

限交易式記憶體系統或用於其他目的，如此處所更為詳細說明者。此指令係測試以查看較早設定的特性是否仍然存在，且若為是，則儲存該值並傳回真值。否則，一假值被傳回且該值不被儲存。換句話說，僅當資料具有所需之該特定CLMD位址，該資料才被儲存於該位址。硬體指令定義虛擬碼之條件式儲存設定的範例係顯示於後文。Limited to a transactional memory system or for other purposes, as explained in more detail herein. This command is a test to see if the feature set earlier is still present, and if so, the value is stored and the true value is returned. Otherwise, a false value is passed back and the value is not stored. In other words, the data is stored at the address only if the material has the particular CLMD address required. An example of a conditional storage setting for a hardware instruction defining a virtual code is shown below.

// CLMD_COND_STORE // (exemplary of one of a family of conditional store instructions, one for each data type) // instruction bool clmd_cond_store(CLMD clmd, VA va, VALUE value) { atomic { if (clmd_test(clmd, va)) { *va = value; return true; } else { return false; 31 200905474 替代地或額外地對於C L M D之個別的指令4 1 2，可提供// CLMD_COND_STORE // (exemplary of one of a family of conditional store instructions, one for each data type) // instruction bool clmd_cond_store(CLMD clmd, VA va, VALUE value) { atomic { if (clmd_test(clmd, va)) { *va = value; return true; } else { return false; 31 200905474 Alternatively or additionally for individual instructions of the CLMD 4 1 2, available

整個快取快閃CLMD指令428。舉例來說，快閃清除（AND ALL)指令43 0可隨著快閃設定（OR ALL)指令43 2被提供。於The entire cache flashes the CLMD instruction 428. For example, an AND ALL command 43 0 may be provided with an OR ALL command 43 2 . to

以下所示之範例硬體指令定義虛擬碼，CLMD_AND_ALL 指令係被設計以快閃清除用於每一邏輯處理器之所有 CLMD，且CLMD_OR_ALL指令係被設計以快閃設定用於每一邏輯處理器之所有CLMD。The example hardware instructions shown below define virtual code, the CLMD_AND_ALL instruction is designed to flash clear all CLMDs for each logical processor, and the CLMD_OR_ALL instruction is designed to flash set for each logical processor. All CLMD.

// CLMD—AND—ALL // Flash bitwise-AND the specified mask over all the current logical processor's // CLMDs. instruction void clmd_and_all(CLMD clmd) { for (int i = 0; i < NLINES; i++) dcache.lines[i].clmds[lp] &= clmd; }// CLMD—AND—ALL // Flash bitwise-AND the specified mask over all the current logical processor's // CLMDs. instruction void clmd_and_all(CLMD clmd) { for (int i = 0; i <NLINES; i++) dcache. Lines[i].clmds[lp] &= clmd; }

// CLMD_OR_ALL // Flash bitwise-OR the specified mask over all the current logical processor's // VAMDs instruction void cImd_or_all(CLMD clmd) { for (int i = 0; i < NLINES; i++) dcache.lines[i].clmds[lp] |= clmd; } 第1 2圖係為用於第5圖之中央處理單元2 0 0的範例性内容切換儲存及恢復延伸4 5 0之圖解視圖。其係被使用於内容切換時間，以儲存及恢復線内容暫存器（例如經架構的（程式設計師可見的）線階段，例如各種一般目的暫存器檔案、特別暫存器等）。於一實作中，這些指令皆採用一 5 1 2位元組内容緩衝器作為一參數。於一實作中，該内容切換儲存 32 200905474 及恢復指令亦可儲存該邏輯處理器的整個C M D狀態結構於該内容缓衝器的目前保留的一些欄位。請參考第1 3圖至第1 6圖，一些範例性系統及技術將被描述，其使用如第1圖至第12圖所述之一些或所有的元資料、硬體指令、及/或其他技術。第1 3 -1 6圖說明利用有限交易式記憶體應用之此處所述之技術的一些使用。如第 13-16圖所進一步描述者，一有限交易式記憶體應用可使用一可程式化的CLMD位元之子集，使得若任何CLMD位元曾經被設定於一快取列，其表示該列已交易地讀取（且因此於其他邏輯處理器之後續寫入被觀看）或已被推測地寫入於 —交易。被推測地寫入意指該列不需了解該等值是否真的被永久地確定而被寫入。若此一推測地寫入的列係不佳的且接著被另一核心收回、或讀取或寫入，則其被無效替代，使得該等寫入被廢除。有限交易式記憶體應用亦可包含一指令，其自動地測試快取元資料CLMD收回概要位元之一子集，且若此收回沒有發生，其極微地清除所有快取列之 CLMD位元的一子集，以極微地確定該等推測寫入。請參考第1 3圖，其係顯示一個實作的有限交易式記憶體應用之圖解視圖。於一實作中，有限交易式記憶體應用 4 7 0為位於計算裝置1 0 0 (例如硬體協助的軟體應用1 5 0之其中一者）之應用程式的其中一者。然而，將被暸解的是，有限交易式記憶體應用4 7 0可替代地或額外地被具體化為於一或多個電腦及/或相較於顯示於第1圖之不同變化中的電腦可執行指令。替代地或額外地，一或多個部份的有限交 33 200905474 易式記憶體應用470可為部份的系統記憶體104(於其他電腦及/或應用115)或可發生於電腦軟體領域之其他此等變化。有限交易式記憶體應用4 7 0包含程式邏輯4 7 2，其係負責實現此處所述之一些或所有技術。程式邏輯472包含：用以存取CMD及CLMD元資料於CPU快取之邏輯474 ;用以當執行一交易式讀取時設定CLMD交易讀取位元於該位址之邏輯476 ;用以當執行交易式寫入時設定CLMD交易寫入位元於該位址及做出一條件式儲存之邏輯4 7 8;用以測試是否任何列（被標示為交易式讀取及寫入）被收回或無效（且若為否，則快閃清除所有的推測寫入位元，從而極微地確定所有的推測寫入位元）之邏輯4 8 0 ;用以存取元資料以判定一交易是否被決定之邏輯482;及用以操作該應用之其他邏輯484。於一實作中’程式邏輯472可***作的，其可從另一程式被規劃性地呼叫，例如於程式邏輯4 7 2使用單一呼叫至一程序" 第1 4圖說明涉及使用快取元資料提供有限交易式記憶體應用之階段之一實作。於一形式，第丨4圖的處理係至少部分地實作於計算裝置1 0 0的操作邏輯。該處理開始於開始點5 0 0，然後該系統提供有限交易式記憶體應用及存取至 CPU快取元資料（階段502)。對於每一交易式讀取，該系統 s又疋其中一個C L M D位元位置（舉例來說，c l μ D [ 0 ])，作為對於該快取列於該位址之CLMD元資料字上之CLMD交易讀取位元（階段504)。對於每一交易式寫入，該系統設定另 34 200905474 一 C L M D位元位置（例如，C L M D [ 1 ]) ’作為對於該快取該位址之CLMD元資料字之CLMD交易寫入位元，並進有條件的儲存（階段5 06)。於確定時，該系統測試# CLMD交易讀取或CLMD交易寫入之任何列是否被收無效（階段5 0 8)。若沒有找到收回/無效（決定點5丨〇)，有推測寫入位元被快閃清除，從而確定所有推測寫人 (階段5 1 2 )。若找到任何收回/無效（決定點5 1 〇 )，則所測地寫入的列被廢除且一 CMD無效被做出，以對於1 重置所有CLMD及收回資料（階段514)。該處理結束於點 5 1 6 〇此演算法對於適合每一邏輯處理器之快取的同時易正確地實作有限交易式記憶體。由於每一資料的交讀取係藉由軟體利用其快取列之交易讀取位元來隱含釋’且由於每一資料的推測交易式寫入係藉由軟體利快取列之交易寫入位元來隱含地解釋，故一交易僅當機（於執行的時間間隔期間，沒有來自其他邏輯處理器一致的存取至資料’具體而言，若沒有於該交易被讀資料之寫入（藉由其他邏輯處理器），且若沒有於該交寫入的資料之讀取（藉由其他邏輯處理器））才確定。若致存取發生，則多處理器之快取一致的行為係使此資交易的邏輯處理器之經快取得副本無效，歸因於行為之事件係明顯的作為CMD(190)中clmd_evictions暫存非零CLMD資料之收回/無效。再者，該演算法對於達易確定嘗試之每一第一參考正確地等待於該時間間隔列於行一示為回或則所位元有推交易結束的交易式地解用其一時之不取的易被料之 (25 8) 器的到交中之 35 200905474 任何交易地存取的資料之全部的任何快取列收回。若（當於程式執行期間發生）一快取失敗發生且一快取列必須被收回，且此快取列具有包含交易讀取或交易寫入位元設定之 CLMD快取元資料’則此事件經由（258)，亦係明顯的作為 clmd_evictions暫存器之收回/無效。於任一情形，軟體及硬體的結合偵測任何交易不一致或容量問題且因此保證正確的交易式記憶體語義。第1 5圖說明涉及使用C L M D收回指令以輪詢一交易是否由於不一致存取或容量被決定之階段的一個實作。於— 形式，第1 5圖的處理係至少部分地實作於計算裝置1 〇〇的操作邏輯。該處理開始於開始點530，然後重複地發出 GET_CLMD_EVICTI〇NS指令，以判定一交易是否被決定 (階段5 3 2)。若其係被決定，則一適當行動被採取以處置該交易，例如廢除所有推測地寫入的列並做出一 C M D無效（階段5 3 4)。該處理結束於結束點5 3 6。第1 6圖說明涉及使用一附加物至該CMD結構以處置於硬體之決定的交易之階段的一個實作。於一形式，第16 圖的處理係至少部分地實作於計算裝置100的操作邏輯。該處理開始於開始點550,隨著收回處置器及處置器遮罩之初始 ^匕（階段 552)。使用 CLMD_SET_EVICTION_HANDLER及 CLMD_SET_EVICTION_HANDLER —MASK指令以初始化 clmd_eviction_handler 及 clmd_eviction_handler_mask CMD控制暫存器，當具有CLMD位元交易-寫入之快取列被收回或無效時，軟體配置該CPU硬體以轉換控制至一軟體 36 200905474 收回處置器例行程序（階段5 5 2)。當此經交易的快取列被收回或無效時’程式執行立即跳至該CLMD交易失敗處置器 (階段5 5 4 )。一適當行動被採取以處置該決定的交易’例如廢除所有推測地寫入的列並做出一 CMD無效（階段5 5 6) °該處理結束於結束點5 5 8。一些實作可當處置器首先被調用時藉由清除 eviction_handler遮罩來防止處置器之遞迴調用。請參考第 1 7-22圖，其係描述使用快取元資料之硬體加速的軟體交易式記憶體系統。於一實作中，該快取元資料係由軟體交易式記憶體系統所使用，以加速此一系統之一些昂貴的方面，例如多餘的〇pen_read障壁過濾、多餘的write_undo障壁過濾、讀取記錄有效、再試操作、及具有巢式交易的操作。於此處所述之範例軟體交易式記憶體系統，資料的交易狀態係參考交易式記憶體字（TMW，“Transactional Memory Word”）來說明。一 TMW描述可在一交易中存取之關聯的資料之交易的同步狀態。舉例來說，TMW可包含版本號碼（version number)、及/或具有可開放寫的資料之交易指標、及/或交易式讀取器（例如閉式讀取器）之列表/叶數及/或指示器。於一實作中，讀取器之列表/計數及/或指示器可包含於給定的時間點上存取特定值之讀取器（例如閉式）的數量之計數。於另一實作’讀取器之列表/計數及/或指示器可包含於給定的時間點上存取特定值之特定讀取器 (例如閉式）的列表。於另一實作，讀取器之列表/計數及/ 37 200905474 或指示器係僅為旗標或其他指示器，以指示有於給定的時間點上存取特定值之一或多個讀取器（例如閉式）。這些實作僅為範例，且此處之術語T M W的使用係用以涵蓋許多用以追蹤鎖定狀態之機制。請參考第1 7圖，其係顯示用於操作於第1圖之電腦系統 (例如硬體協助的軟體應用1 5 0之其中一者）的一個實作之硬體加速的軟體交易式記憶體應用之圖解視圖。硬體加速的軟體交易式記憶體應用5 7 0包含程式邏輯5 7 2，其係負責實現此處所述之一些或所有技術。程式邏輯5 7 2包含：用以保留開啟以供讀取位元及記錄以供取消位元於C P U快取之 VAMD之邏輯574 ;用以保留TMW寫入-觀看位元於CPU快取之C LMD之邏輯5 7 6 ;用以重設快取元資料於交易間之邏輯577 ;用以提供Open_Read障壁過濾（其使用開啟以供讀取位元以避免多餘的讀取記錄）之邏輯578;用以提供 Write_Undo障壁過濾（其使用記錄以供取消位元以避免多餘的取消記錄）之邏輯5 8 0 ;用以當已沒有讀取集無效（例如此交易沒有讀取自其他線至資料的寫入）時繞過讀取記錄有效之邏輯5 82 ;用以提供標示CLMD列的再試操作之邏輯 5 8 4;用以提供使用一些快取元資料以避免多餘的過濾及不需要的有效之巢式交易之邏輯586;及用以操作該應用的其他邏輯588。這些操作將於第18圖至第22圖加以描述及/或更為詳細界定。隨著快取元資料狀態被使用以加速過濾及繞過各種多餘的或不需要的交易式記憶體障壁及簿記操作，且對於其 38 200905474 他目的，其通常有助於重設在交易間所有快取元資料為零，使得一個交易的過濾及寫入觀看狀態不影響一些稍後的交易的過濾及觀看邏輯（577)。於一形式，所有壓縮的元資料狀態可用一短序列的指令（例如CLMD_AND_ALL及 VAMD_AND_ALL)(在交易開始之前或在其結束之後立即發出）被迅速地重設。第1 8圖說明涉及提供〇pen_Read障壁過濾（其使用開啟以供讀取位元位置於快取元資料之VAMD以有效地過濾多餘的交易式讀取記錄）之階段的一個實作。於一形式，第 1 8圖的處理係至少部分地實作於計算裝置1 00的操作邏輯。該處理開始於開始點600，該系統使用一硬體加速的軟體交易式記憶體系統開始一〇pen_Read序列（階段602)。若 VAMD開啟以供讀取位元係已對於TMW之位址被設定（決定點604)，則此情形表示此TMW已被開啟以供讀取於此交易，且軟體跳過讀取障壁記錄程序（階段606、607、608)。否則，該系統設定開啟以供讀取位元於TMW之該位址的 VAMD(階段606)並執行交易的Open_Read邏輯。於一形式，此記錄該讀取存取。於一形式，階段6 0 4可利用 VAMD_TEST指令（之後為一條件式的跳位）來實作，而階段 606可利用VAMD_SET或VAMD — 0R指令來實作。於另一形式，階段604及606可用單一 vaMD_TSET指令（測試然後設定）（之後為一條件式的跳位）來一起實作。於階段6 0 7，該系統亦設定TMW寫入-觀看位元於clmd元資料以用於該 TMW的快取列。階段6〇7可用SET或CLMD_0R指令 39 200905474 來實作。該處理結束於結束點6 1 2。第1 9圖說明涉及Open一Write障壁的階段之一個實作，該障壁係與上述快取元資料式〇pen_Read過濾相配）。於一形式，第1 9圖的處理係至少部分地實作於計算裝置1 00的操作邏輯。該處理開始於開始點6 3 0，該系統使用一硬體加速的軟體交易式記憶體系統開始一 0 p e η _ W r i t e序列（階段 632)。於一形式，開啟一 TMW以供寫入亦保證讀取存取。因此，該系統使用V A M D _ S E T指令來設定開啟以供讀取位元於T M W之V A M D (階段6 3 4)。接著，該系統執行交易的// CLMD_OR_ALL // Flash bitwise-OR the specified mask over all the current logical processor's // VAMDs instruction void cImd_or_all(CLMD clmd) { for (int i = 0; i <NLINES; i++) dcache.lines[i]. Clmds[lp] |= clmd; } Figure 12 is a diagrammatic view of an exemplary content switching storage and recovery extension 450 for the central processing unit 2000 of Figure 5. It is used for content switching time to store and restore line content registers (e.g., structured (program designer visible) line stages, such as various general purpose scratchpad files, special registers, etc.). In one implementation, these instructions all use a 512-bit tuple content buffer as a parameter. In one implementation, the content switch store 32 200905474 and the resume command may also store the entire C M D state structure of the logical processor in the current reserved fields of the content buffer. Referring to Figures 13 through 16 , some exemplary systems and techniques will be described using some or all of the metadata, hardware instructions, and/or other aspects as described in Figures 1 through 12. technology. Figures 13-1 illustrate some uses of the techniques described herein using limited transactional memory applications. As further described in Figures 13-16, a limited transactional memory application may use a subset of a programmable CLMD bit such that if any CLMD bit was ever set to a cache column, it represents the column. The transaction has been read (and thus subsequent writes to other logical processors are viewed) or has been speculatively written to the transaction. Presumably written means that the column does not need to know if the value is actually determined to be written permanently. If this speculatively written column is poor and then reclaimed, or read or written by another core, it is invalidally replaced, causing the writes to be revoked. The limited transactional memory application may also include an instruction that automatically tests the cache metadata CLMD to reclaim a subset of the summary bits, and if this retraction does not occur, it minimally clears the CLMD bits of all cached columns. A subset to determine these speculative writes very slightly. Please refer to Figure 13 for a graphical view of a practical limited transaction memory application. In one implementation, the limited transaction memory application 470 is one of the applications located in computing device 100 (e.g., one of the hardware-assisted software applications 150). However, it will be appreciated that the limited transactional memory application 470 may alternatively or additionally be embodied as one or more computers and/or as compared to computers shown in the different variations of FIG. Executable instructions. Alternatively or additionally, one or more portions of the limited intersection 33 200905474 The easy-to-use memory application 470 may be part of the system memory 104 (in other computers and/or applications 115) or may occur in the field of computer software. Other such changes. The Limited Transaction Memory Application 470 contains program logic 427, which is responsible for implementing some or all of the techniques described herein. The program logic 472 includes: logic 474 for accessing the CMD and CLMD metadata in the CPU cache; and logic 476 for setting the CLMD transaction read bit to the address when performing a transactional read; When performing a transactional write, set the CLMD transaction write bit to the address and make a conditional storage logic 487; to test whether any column (marked as transactional read and write) is reclaimed Or invalid (and if not, flash clears all speculative write bits, thus minimizing the logic of all speculative write bits); accessing the metadata to determine if a transaction is The logic 482 of the decision; and other logic 484 to operate the application. In one implementation, the program logic 472 can be operated, which can be programmatically called from another program, for example, using a single call to a program in the program logic 472. The description of the FIG. Metadata provides one of the stages of the application of limited transactional memory applications. In one form, the processing of Figure 4 is at least partially implemented in the operational logic of computing device 1000. The process begins at start point 500, and then the system provides limited transactional memory applications and access to CPU cache metadata (stage 502). For each transactional read, the system s has one of the CLMD bit locations (for example, cl μ D [ 0 ]) as the CLMD for the cached CLMD metadata word listed at the address. The transaction reads the bit (stage 504). For each transactional write, the system sets another 34 200905474 a CLMD bit position (eg, CLMD [ 1 ]) ' as a CLMD transaction write bit for the CLMD metadata word of the cached address, and Conditional storage (stage 5 06). At the time of the determination, the system tests #CLMD transaction read or any column of the CLMD transaction write is invalidated (stage 5 0 8). If no reclaim/invalid is found (decision point 5丨〇), the speculative write bit is flash cleared to determine all speculative writers (stage 5 1 2). If any reclaim/invalid is found (decision point 5 1 〇 ), the column written by the test is abolished and a CMD invalidation is made to reset all CLMDs and reclaim data for 1 (stage 514). The process ends at point 5 1 6 〇 This algorithm easily and correctly implements limited transactional memory for the cache that is appropriate for each logical processor. Since the reading of each data is implicitly interpreted by the software using its cached transaction read bits, and since the speculative transactional writing of each data is written by the software, the transaction is written by the software. The entry element is implicitly interpreted, so a transaction is only down (there is no consistent access to the data from other logical processors during the execution interval). Specifically, if the transaction is not read in the transaction. In (by other logical processors), and if not read (by other logical processors) of the data written by the intersection. If access occurs, the multiprocessor's cached behavior is such that the logical processor of the transaction is quickly invalidated, and the event attributed to the behavior is apparently stored as clmd_evictions in CMD (190). Recover/invalid of non-zero CLMD data. Furthermore, the algorithm does not wait for each of the first references of the Daeyi determination attempt to correctly wait for the time interval to be listed as a return or a transaction with the end of the push transaction. Take the easy-to-receive (25 8) to the delivery of 35 200905474 Any cached access to any of the materials accessed at any transaction. If (as happened during program execution) a cache failure occurs and a cached column must be reclaimed, and this cache column has a CLMD cache metadata containing transaction read or transaction write bit settings, then this event Via (258), it is also obvious as a revocation/invalidation of the clmd_evictions register. In either case, the combination of software and hardware detects any transaction inconsistency or capacity issues and thus guarantees correct transactional memory semantics. Figure 15 illustrates an implementation involving the use of the C L M D reclaim instruction to poll for a transaction due to inconsistent access or capacity being determined. In the form, the processing of Figure 15 is at least partially implemented in the operational logic of computing device 1 〇 . The process begins at start point 530 and then repeatedly issues a GET_CLMD_EVICTI NS command to determine if a transaction was determined (stage 5 3 2). If it is determined, an appropriate action is taken to dispose of the transaction, such as revoking all speculatively written columns and making a C M D invalid (stage 5 3 4). The process ends at end point 5 36. Figure 16 illustrates an implementation of the stage of a transaction involving the use of an add-on to the CMD structure to handle the decision of the hardware. In one form, the processing of FIG. 16 is at least partially implemented in the operational logic of computing device 100. The process begins at start point 550 with the initial recovery of the processor and handler mask (stage 552). The CLMD_SET_EVICTION_HANDLER and CLMD_SET_EVICTION_HANDLER_MASK instructions are used to initialize the clmd_eviction_handler and clmd_eviction_handler_mask CMD control registers. When the cache line with the CLMD bit transaction-write is reclaimed or invalid, the software configures the CPU hardware to switch control to a software 36. 200905474 Retract the handler routine (stage 5 5 2). When the transactioned cached column is retracted or invalid, the program execution immediately jumps to the CLMD transaction failure handler (stage 5 5 4). An appropriate action is taken to dispose of the decision's transaction', eg, abolish all speculatively written columns and make a CMD invalid (stage 5 5 6) ° The process ends at end point 5 5 8 . Some implementations prevent the recursive callback of the handler by clearing the eviction_handler mask when the handler is first called. Please refer to Figure 1 7-22, which depicts a hardware-accelerated software transaction memory system using cache metadata. In a implementation, the cache metadata is used by a software transactional memory system to speed up some of the expensive aspects of the system, such as redundant 〇pen_read barrier filtering, redundant write_undo barrier filtering, and read logging. Effective, retry, and operations with nested transactions. In the example software transactional memory system described herein, the transaction status of the data is described with reference to the transactional memory word (TMW, "Transactional Memory Word"). A TMW describes the synchronization status of transactions for associated data that can be accessed in a transaction. For example, the TMW can include a version number, and/or a transaction indicator having open-write data, and/or a list/leaf number of a transactional reader (eg, a closed reader) and/or Indicator. In one implementation, the reader list/count and/or pointer may include a count of the number of readers (e.g., closed) that access a particular value at a given point in time. In another implementation, the list/count and/or indicator of the reader may include a list of specific readers (e.g., closed) that access a particular value at a given point in time. In another implementation, the reader list/count and / 37 200905474 or indicator are only flags or other indicators to indicate that one or more of the specific values are accessed at a given point in time. Picker (eg closed). These implementations are merely examples, and the use of the term T M W herein is used to encompass a number of mechanisms for tracking lock states. Please refer to FIG. 1A, which shows a implemented hardware accelerated software transaction memory for operating the computer system of FIG. 1 (for example, one of the hardware-assisted software applications 150). A graphical view of the application. The hardware-accelerated software transactional memory application 570 contains program logic 572, which is responsible for implementing some or all of the techniques described herein. Program logic 5 7 2 includes: logic 574 for holding VAMD turned on for reading bits and records for canceling bits in the CPU cache; for retaining TMW write-view bits in CPU cache C LMD logic 5 7 6; logic 577 for resetting cache metadata in the transaction; logic 578 for providing Open_Read barrier filtering (which uses open for reading bits to avoid redundant read records); The logic used to provide Write_Undo barrier filtering (which uses records for canceling bits to avoid redundant cancellations); used to have no read set invalid (eg, this transaction is not read from other lines to the data) When writing) bypasses the logic of reading the record 5 82; provides logic for retrying the CLMD column to provide the use of some cache metadata to avoid unnecessary filtering and unnecessary effective Nested transaction logic 586; and other logic 588 to operate the application. These operations will be described and/or defined in more detail in Figures 18 through 22. As the cache metadata state is used to speed up filtering and bypassing various redundant or unwanted transactional memory barriers and bookkeeping operations, and for its purposes, it usually helps to reset all transactions in the transaction room. The cache metadata is zero, so that the filtering and write viewing status of a transaction does not affect the filtering and viewing logic of some later transactions (577). In one form, all compressed metadata states can be quickly reset using a short sequence of instructions (e. g., CLMD_AND_ALL and VAMD_AND_ALL) (issued before the transaction begins or immediately after its end). Figure 18 illustrates an implementation involving the stage of providing 〇pen_Read barrier filtering, which uses the opening for reading the bit position at the VAMD of the cache metadata to effectively filter the excess transactional read record. In one form, the processing of Figure 18 is at least partially implemented in the operational logic of computing device 100. The process begins at start point 600, which begins a _pen_Read sequence (stage 602) using a hardware accelerated software transactional memory system. If VAMD is turned on for the read bit to be set for the address of the TMW (decision point 604), then this indicates that the TMW has been turned on for reading the transaction, and the software skips the read barrier recording procedure. (stages 606, 607, 608). Otherwise, the system sets the Open_Read logic to open the VAMD (stage 606) for reading the bit to the address of the TMW and execute the transaction. In one form, this record reads the read access. In one form, stage 604 can be implemented using the VAMD_TEST instruction (which is followed by a conditional hop), and stage 606 can be implemented using the VAMD_SET or VAMD - 0R instructions. In another form, stages 604 and 606 can be implemented together using a single vaMD_TSET instruction (test followed by setting) (followed by a conditional hop). In phase 607, the system also sets the TMW write-view bit in the clmd metadata for the cached column of the TMW. Phase 6〇7 can be implemented using the SET or CLMD_0R instruction 39 200905474. This process ends at the end point 6 1 2 . Figure 19 illustrates an implementation of the stage involving the Open-Write barrier, which is matched with the cache metadata 〇pen_Read filter described above. In one form, the processing of Figure 19 is at least partially implemented in the operational logic of computing device 100. The process begins at start point 630, and the system begins a sequence of 0 p e η _ W r i t e using a hardware-accelerated software transactional memory system (stage 632). In one form, opening a TMW for writing also ensures read access. Therefore, the system uses the V A M D _ S E T command to set the turn-on for the read bit at V M M D (stage 6 3 4). Then the system executes the transaction

Open-Write邏輯（階段636)。舉例來說，於一形式，TMW係用一指標覆寫（〇verwritten)至此交易物件（表示其擁有 TMW的物件資料以供寫入）。於另一形式，TMW係用一指標覆寫至交易物件寫入記錄中的一項目。於任一形式，於該TMW之位元係被改變，其表示該tMw藉由一交易而被開啟以供寫入。該處理結束於結束點640。第20圖說明涉及提供Write_Undo障壁過濾之階段的一個實作’其使用記錄以供取消位元位置於V AMD之快取元資料以有效地過濾出多餘的取消記錄。於一形式，第2 0圖的處理係至少部分地實作於計算裝置1 0 0的操作邏輯。該處理開始於開始點6 5 0 ’該系統使用一硬體加速的軟體交易式 S己憶體系統開始一寫入攔位序列（階段652)。若VAMD記錄以供取消位元已對於待覆寫的資料之位址而被設定（決定點6 5 4) ’接著軟體跳過寫入取消記錄序列（階段6 5 6、6 5 8)。否則’该***設定記錄以供取消位元於資料攔位之位址的 40 200905474 VAMD(656)並執行Write — Undo記錄邏輯（階段6 5 8)。於一形式，V AMD元資料之粒化係被調整為資料的四字組。由於，其為根據寫入取消過濾的快取元資料之粒化，因此， Write_Undo記錄邏輯複製被調整為資料的四字組至該記錄’即使該資料棚位本身小於四字組。於一形式，階段6 5 4 可利用VAMD — TEST指令（之後為一條件式的跳位）來實作，而階段656可利用VAMD_SET或VAMD —〇R指令來實作。於另一形式’階段654及656可用單一 VAMD — T SET指令（測試然後設定）（之後為一條件式的跳位）來一起實作。該處理結束於結束點6 6 0。第2 1圖說明涉及提供讀取記錄有效之階段的一個實作’其使用GET_CLMD_EVICTIONS指令以當已沒有讀取集無效（例如此交易沒有讀取自其他線至資料的不一致寫入）時繞過讀取記錄有效。於一形式，第2 1圖的處理係至少部分地實作於計算裝置1 〇〇的操作邏輯。該處理開始於開始點 6 8 0 ’該系統使用一硬體加速的軟體交易式記憶體系統進入對於一交易的確定處理（階段6 8 2)。該系統接著檢查以 C L M D寫入-觀看位元標示之快取歹是否在此交易時被收回或無效。以一形式，軟體發出GET_CLMD_EVICTIONS指令’以取回收回概要，並測試寫入-觀看位元；若其為零，則被觀看的列不被另一線收回或覆寫。其遵循此交易的讀取集從不遭受來自另一線的不一致寫入存取，且可安全地略過昂貴的讀取記錄有效。於此情形，軟體條件式地跳過讀取記錄有效階段（階段6 8 6)。若於收回概要之寫入-觀看 41 200905474 位元被設定’則軟體照常執行讀取記錄有效（階段6 8 6)。於任一事件’該等交易係被確定或者被適當地處置（階段 692)。該處理結束於結束點692。第22說明涉及提供標示CLMD列的再試操作之階段的一個實作。於一形式，第2 2圖的處理係至少部分地實作於計算裝置1 〇〇的操作邏輯。該處理開始於開始點7 0 0，其判定交易再試操作應被使用於硬體加速的軟體交易式記憶體系統（階段7 0 2)。該系統重算更新至寫入該交易的資料並釋放其可能保持的任何資料寫入鎖定（階段7 0 4)。該系統接著利用先前設定的C L M D寫入-觀看位元（階段6 0 7 )，以降低等待的負載，並改善恢復的等待時間，另一線之後的交易更新（寫入）該資料（階段7 0 6)。於一實作，係藉由使用 GET_CLMD_EVICTIONS來輪詢CLMD快取收回以詢問收回概要並接著測試CLMD的收回而完成。於另一實作，軟體使用 CLMD_SET_EVICTION_HANDLER 及 CLMD_SET_EVICTION_HANDLER_MASK 指令以當具有 CLMD位元交易-寫入之快取列被另一線收回或無效（寫入）時初始化該收回處置器以轉換程式控制至再試-喚醒處置器。該線接著可被置於睡眠狀態，於一形式係經由旋轉-迴圈或於另一者，其係藉由執行一暫停指令以進入較低電源狀態。當任何寫入-觀看-元資料解釋的列被收回或無效時，該系統接著喚醒該等待交易並再試一次（階段7 0 8)。該處理結束於結束點7 1 0。於另一實作，可選擇的階段7 0 5亦可被執行。使用CLMD_AND ALL，軟體將於該快取元資料 42 200905474 之所有CLMD上的寫入-觀看位元位置歸零。軟體接著繞於讀取記錄之每一讀取記錄項目，其再建立一寫入-觀 CLMD位元於每一讀取記錄項目所找出的TMW位址之快列。雖然該主題已經以結構特徵及/或方法論行為之特語言來描述，應了解的是，所附申請專利範圍所界定的的不必要受限於上述之特定特徵或行為。相反地，上述特定特徵或行為係被揭露作為實作該申請專利範圍的範形式。此處及/或後附申請專利範圍所述之符合本實作的神之等效、改變、及修改係意欲被保護的。舉例來說，電腦軟體領域中具有通常知識者將了解處所討論之範例中說明的客戶端及/或伺服器安排、使用介面螢幕内容、及/或資料佈局可被不同地組織於一或多電腦上，以包含較少的或額外的選擇或特徵（相較於該等例中所描述者）。【圖式簡單說明】第1圖為一實作之一電腦系統的圖解視圖。第2圖為操作於第1圖之電腦系統之一實作的一中央理單元之細部圖解視圖。第3圖為說明例示硬體結構之圖式，其對於第1圖之統對每一快取列每一邏輯處理器實作額外的元資料。第4圖為說明例示硬體結構之圖式，其對於第1圖之統對每一快取每一邏輯處理器實作額外的元資料。圈看取定標之例精此者個範處系系 43 200905474 第5圖為第1圖之系統的中央處理單元之圖解視圖，其說明例示指令集架構及其與快取元資料的互動。第6圖為用於第5圖之中央處理單元的例示自動快取及處理器操作指令之圖解視圖。第7圖為第1圖之系統的一個實作之處理流程圖，其說明涉及載入一快取列及初始化一些快取元資料為預設值之階段。第8圖為第1圖之系統的一個實作之處理流程圖，其說明涉及收回一快取列或使一快取列無效之階段。第9圖係針對第5圖之中央處理單元的例示CMD指令之圖解視圖。第1 0圖係針對第5圖之中央處理單元的例示V AMD指令之圖解視圖。第1 1圖係針對第5圖之中央處理單元的例示CLMD指令之圖解視圖。第1 2圖係針對第5圖之中央處理單元的例示内容切換儲存及恢復延伸之圖解視圖。第1 3圖為操作於第1圖之電腦系統上一個實作的有限交易式記憶體應用之圖解視圖。第1 4圖為第1圖之系統的一個實作之處理流程圖，其說明涉及使用快取元資料來提供有限交易式記憶體應用之階段。第1 5圖為第1圖之系統的一個實作之處理流程圖，其涉及使用CLMD收回指令以輪詢一交易是否被決定之階段。 44 200905474 第1 6圖為第1圖之系統的一個實作之處理流程圖，其說明涉及使用一附加物至該C M D結構以處置於硬體之決定的交易之階段。第1 7圖為操作於第1圖之電腦系統的一個實作之硬體加速的軟體交易式記憶體應用之圖解視圖。第1 8圖為第1圖之系統的一個實作之處理流程圖，其說明涉及提供Open_Read障壁過濾（其使用開啟以供讀取位元於CPU快取之VAMD以避免多餘的過濾）之階段。第1 9圖為第1圖之系統的一個實作之處理流程圖，其說明涉及提供〇pen_Read障壁過濾（其使用TMW位元於CPU 快取之CLMD上）之階段。第2 0圖為第1圖之系統的一個實作之處理流程圖，其說明涉及提供Write_Undo障壁過濾（其使用記錄以供取消位元以有效地過；慮出多餘的取消記錄）之階段。第21圖為第1圖之系統的一個實作之處理流程圖，其說明涉及提供讀取記錄有效（其使用GET_CLMD_EVICTIONS 指令於CPU上以避免不必要的讀取記錄有效）之階段。第2 2圖為第1圖之系統的一個實作之處理流程圖，其說明涉及提供標示CLMD列的再試操作之階段。【主要元件符號說明】 104記憶體 1 〇 5元資料 1 06虛線 100計算裝置 102中央處理單元 103 CPU快取 45 200905474 1 0 6虛擬位址元資料 1 0 7快取列元資料 1 0 8快取元資料控制暫存器 109可移除的儲存器 1 1 0不可移除的儲存器 111輸出裝置 1 12輸入裝置 114通訊連接 1 1 5電腦/應用 1 5 0硬體協助的軟體應用 1 7 0硬體指令 190暫存器 200中央處理單元 202硬體指令集架構 2 04自動快取及處理器操作行為 206 CMD指令 208 VAMD 指令 2 1 0個別的指令 2 1 2快閃指令 214 CLMD 指令 2 1 6個別的指令 2 1 8快閃指令 2 2 0線内容切換儲存/恢復延伸 222 CPU快取 224快取元資料 226元資料 228 VAMD元資料 230 CLMD元資料 2 5 0自動快取及處理器操作行為 2 5 2初始化（有效） 2 52快取列 2 5 2初始化行為 2 5 2初始化指令Open-Write logic (stage 636). For example, in one form, TMW is overwritten with an indicator to the transaction object (indicating that it has TMW object data for writing). In another form, the TMW is overwritten with an indicator to an item in the transaction object write record. In either form, the bit line of the TMW is changed, indicating that the tMw is opened for writing by a transaction. The process ends at end point 640. Figure 20 illustrates an implementation involving the stage of providing Write_Undo barrier filtering. Its usage record is used to cancel the cache location at V AMD to effectively filter out the excess cancellation record. In one form, the processing of FIG. 20 is at least partially implemented in the operational logic of computing device 100. The process begins at the beginning point 65 5 '. The system begins a write block sequence (stage 652) using a hardware accelerated software transactional S-memory system. If the VAMD record is set for the address of the data to be overwritten (decision point 6 5 4), then the software skips the write cancellation record sequence (stage 6 5 6, 6 5 8). Otherwise, the system sets the record to cancel the bit at the address of the data block 40 200905474 VAMD (656) and executes the Write — Undo recording logic (stage 6 5 8). In one form, the granulation of the V AMD metadata is adjusted to the quadword of the data. Since it is a granulation of the cache metadata filtered according to the write cancellation, the Write_Undo record logical copy is adjusted to the quadword of the material to the record even if the material store itself is smaller than the quad. In one form, stage 6 5 4 can be implemented using the VAMD — TEST instruction (which is followed by a conditional hop), and stage 656 can be implemented using the VAMD_SET or VAMD —〇R instructions. In another form, stages 654 and 656 can be implemented together using a single VAMD-T SET command (test and then set) (followed by a conditional hop). This process ends at the end point 6 6 0. Figure 21 illustrates an implementation involving the stage of providing a valid read record. It uses the GET_CLMD_EVICTIONS instruction to bypass when no read set has been invalidated (for example, this transaction did not read inconsistent writes from other lines to the data). The read record is valid. In one form, the processing of Figure 21 is at least partially implemented in the operational logic of computing device 1 〇 . The process begins at the start point 6 8 0 '. The system uses a hardware-accelerated software transactional memory system to enter a determination process for a transaction (stage 682). The system then checks to see if the cache identified by the C L M D write-view bit is reclaimed or invalidated at the time of the transaction. In one form, the software issues a GET_CLMD_EVICTIONS instruction' to retrieve the summary and test the write-view bit; if it is zero, the viewed column is not retracted or overwritten by another line. The read set that follows this transaction never suffers from inconsistent write access from another line, and can safely skip expensive read records. In this case, the software conditionally skips the read record valid phase (stage 6 8 6). If the write-back summary is written-viewed 41 200905474, the bit is set to 'the software is normally executed as the read record is valid (stage 6 8 6). In either event, the transactions are determined or properly disposed (stage 692). The process ends at end point 692. The 22nd description relates to an implementation involving the stage of retrying the labeling of the CLMD column. In one form, the processing of Figure 22 is at least partially implemented in the operational logic of computing device 1 。 . The process begins at start point 700, which determines that the transaction retry operation should be used in a hardware accelerated software transactional memory system (stage 702). The system recalculates the update to the data written to the transaction and releases any data write locks it may maintain (stage 7 0 4). The system then uses the previously set CLMD write-view bit (stage 607) to reduce the waiting load and improve the recovery wait time, and the transaction after the other line updates (writes) the data (stage 7 0 6). In one implementation, this is done by polling the CLMD cache reclaim using GET_CLMD_EVICTIONS to query the retrieving summary and then testing the retraction of the CLMD. In another implementation, the software uses the CLMD_SET_EVICTION_HANDLER and CLMD_SET_EVICTION_HANDLER_MASK instructions to initialize the reclaim handler to convert to a retry-wake-up when the cache-type column with the CLMD bit transaction-write is reclaimed or invalidated (written) by another line. Disposer. The line can then be placed in a sleep state, in one form via a spin-loop or on the other, by executing a pause command to enter a lower power state. When any of the write-view-metadata interpretation columns are reclaimed or invalidated, the system then wakes up the waiting transaction and tries again (stage 7 0 8). The process ends at the end point 7 1 0. In another implementation, the optional phase 75 5 can also be performed. With CLMD_AND ALL, the software will zero the write-view bit position on all CLMDs of the cache metadata 42 200905474. The software then wraps around each read record entry of the read record, which in turn establishes a write-view CLMD bit in the fast column of the TMW address found by each read record entry. Although the subject matter has been described in terms of structural features and/or methodological acts, it is understood that the scope of the appended claims is not necessarily limited to the particular features or acts described. Rather, the specific features or acts described above are disclosed as a form of implementation of the scope of the application. The equivalents, modifications, and modifications of the present invention as described herein and/or in the appended claims are intended to be protected. For example, those of ordinary skill in the computer software arts will understand that the client and/or server arrangements illustrated in the examples discussed herein, the use of interface screen content, and/or the data layout can be organized differently in one or more computers. Top to include fewer or additional choices or features (as compared to those described in the examples). [Simple description of the diagram] Figure 1 is a graphical view of a computer system. Figure 2 is a detailed pictorial view of a central unit implemented in one of the computer systems of Figure 1. Figure 3 is a diagram illustrating an exemplary hardware structure that implements additional metadata for each logical processor for each cache line in Figure 1. Figure 4 is a diagram illustrating an exemplary hardware structure that implements additional metadata for each logical processor for each cache for Figure 1. Circles See the example of the standard. This is a graphical representation of the central processing unit of the system in Figure 1, which illustrates the instruction set architecture and its interaction with the cache metadata. Figure 6 is a diagrammatic view of an exemplary automatic cache and processor operation instructions for the central processing unit of Figure 5. Figure 7 is a flowchart of an implementation of the system of Figure 1, which illustrates the stage of loading a cache line and initializing some cache metadata to a preset value. Figure 8 is a flow diagram of an implementation of the system of Figure 1, which illustrates the stage of retrieving a cached column or invalidating a cached column. Figure 9 is a diagrammatic view of an exemplary CMD instruction for the central processing unit of Figure 5. Figure 10 is a diagrammatic view of an exemplary V AMD instruction for the central processing unit of Figure 5. Figure 11 is a diagrammatic view of an exemplary CLMD instruction for the central processing unit of Figure 5. Figure 12 is a graphical view of the exemplary content switching storage and recovery extension for the central processing unit of Figure 5. Figure 13 is a graphical view of a implemented limited transaction memory application operating on the computer system of Figure 1. Figure 14 is a flowchart of an implementation of the system of Figure 1, which illustrates the use of cache metadata to provide a limited transactional memory application. Figure 15 is a process flow diagram of an implementation of the system of Figure 1, which involves the use of a CLMD reclaim instruction to poll for a stage in which a transaction is determined. 44 200905474 Figure 16 is a process flow diagram of an implementation of the system of Figure 1, which illustrates the stage of a transaction involving the use of an add-on to the C M D structure for disposal at the hardware. Figure 17 is a graphical illustration of a real hardware accelerated software transactional memory application operating on the computer system of Figure 1. Figure 18 is a process flow diagram of an implementation of the system of Figure 1, which illustrates the stage of providing Open_Read barrier filtering (which uses open VAMD for reading bits on the CPU cache to avoid redundant filtering). . Figure 19 is a process flow diagram of an implementation of the system of Figure 1, which illustrates the stage of providing 〇pen_Read barrier filtering (which uses TMW bits on the CLMD of the CPU cache). Figure 20 is a process flow diagram of an implementation of the system of Figure 1, which illustrates the stage of providing Write_Undo barrier filtering (which uses records for cancellation of bits to effectively pass; considers redundant cancellation records). Figure 21 is a flowchart of an implementation of the system of Figure 1, which illustrates the stage of providing a read record valid (which uses the GET_CLMD_EVICTIONS instruction on the CPU to avoid unnecessary read records being valid). Figure 2 is a process flow diagram of an implementation of the system of Figure 1, which illustrates the stages of providing a retry operation to mark the CLMD column. [Main component symbol description] 104 memory 1 〇 5 yuan data 1 06 dotted line 100 computing device 102 central processing unit 103 CPU cache 45 200905474 1 0 6 virtual address metadata 1 0 7 cache data 1 1 8 fast The metadata storage control register 109 removable storage 1 1 0 non-removable storage 111 output device 1 12 input device 114 communication connection 1 1 5 computer / application 1 5 0 hardware-assisted software application 1 7 0 hardware instruction 190 register 200 central processing unit 202 hardware instruction set architecture 2 04 automatic cache and processor operation behavior 206 CMD instruction 208 VAMD instruction 2 1 0 individual instruction 2 1 2 flash instruction 214 CLMD instruction 2 1 6 individual instructions 2 1 8 flash instructions 2 2 0 line content switching storage / recovery extension 222 CPU cache 224 cache metadata 226 yuan data 228 VAMD metadata 230 CLMD metadata 2 5 0 automatic cache and processor Operational Behavior 2 5 2 Initialization (Valid) 2 52 Cache Column 2 5 2 Initialization Behavior 2 5 2 Initialization Instructions

254 預設 CLMD254 preset CLMD

256 預設 VAMD 2 5 8收回/無效操作 2 60快取收回概要 262 CLMD 及 VAMD 位元 2 7 0核心重置指令 272元資料 3 3 0 CMD指令 3 3 2指令 3 34指令 3 3 6指令 46 200905474256 Preset VAMD 2 5 8 Retract/Invalid Operation 2 60 Cache Retrieve Summary 262 CLMD and VAMD Bits 2 7 0 Core Reset Command 272 Metadata 3 3 0 CMD Command 3 3 2 Command 3 34 Command 3 3 6 Command 46 200905474

3 3 8指令 3 40指令 342指令 3 44指令 3 46指令 348指令 349指令 350指令 3 52指令 3 54指令 356指令 370 VAMD 指令 3 72個別的指令 3 76指令 3 78指令 428快取快閃CLMD指令 3 8 0指令 382指令 3 8 4指令 3 86指令 3 8 8快取快閃VAMD指令 390快閃清除(AND ALL)指令 3 9 2快閃設定（OR_ALL)指令 41 0 CLMD 指令 4 1 2個別的指令 416指令 4 1 8指令 420指令 422指令 424指令 426指令3 3 8 instruction 3 40 instruction 342 instruction 3 44 instruction 3 46 instruction 348 instruction 349 instruction 350 instruction 3 52 instruction 3 54 instruction 356 instruction 370 VAMD instruction 3 72 individual instruction 3 76 instruction 3 78 instruction 428 cache flash CLMD instruction 3 8 0 instruction 382 instruction 3 8 4 instruction 3 86 instruction 3 8 8 cache fast flash VAMD instruction 390 flash clear (AND ALL) instruction 3 9 2 flash setting (OR_ALL) instruction 41 0 CLMD instruction 4 1 2 individual Instruction 416 instruction 4 1 8 instruction 420 instruction 422 instruction 424 instruction 426 instruction

430快閃清除（AND ALL)指令 432快閃設定（〇R_ALL)指令 4 5 0内容切換儲存及恢復延伸 4 7 0有限交易式記憶體應用 472程式邏輯 474用以存取CMD及CLMD元資料於CPU快取之邏輯 476用以當執行一交易式讀取時設定CLMD交易讀取位元於該位址之邏輯 47 200905474 478用以當執行交易式寫入時設定CLMD交易寫入位元於該位址及做出一條件式儲存之邏輯 4 8 0用以測試是否任何列（被標示為交易式讀取及寫入）被收回或無效（且若為否，則快閃清除所有的推測寫入位元，從而極微地確定所有的推測寫入位元）之邏輯 4 8 2用以存取元資料以判定一交易是否被決定之邏輯 4 8 4用以操作該應用之其他邏輯 5 70硬體加速的軟體交易式記憶體應用 572程式邏輯 5 7 4用以保留開啟以供讀取位元及記錄以供取消位元於 CPU快取之VAMD之邏輯 576用以保留TMW寫入-觀看位元於CPU快取之CLMD之邏輯 5 7 7用以重設快取元資料於交易間之邏輯 5 78用以提供Open_Read障壁過濾（其使用開啟以供讀取位元以避免多餘的讀取記錄）之邏輯 580用以提供Write_Undo障壁過濾（其使用記錄以供取消位元以避免多餘的取消記錄）之邏輯 5 8 2用以當已沒有讀取集無效（例如此交易沒有讀取自其他線至資料的寫入）時繞過讀取記錄有效之邏輯 5 84用以提供標示CLMD列的再試操作之邏輯 5 8 6用以提供使用一些快取元資料以避免多餘的過濾及不需要的有效之巢式交易之邏輯 5 8 8用以操作該應用的其他邏輯 48430 flash clear (AND ALL) command 432 flash setting (〇R_ALL) command 4 5 0 content switching storage and recovery extension 4 7 0 limited transaction memory application 472 program logic 474 for accessing CMD and CLMD metadata The CPU cache logic 476 is used to set the CLMD transaction read bit at the address logic when performing a transactional read. 47 200905474 478 is used to set the CLMD transaction write bit when performing transactional writes. The address and logic to make a conditional store is used to test whether any column (marked as transactional read and write) is reclaimed or invalidated (and if not, flashes all speculative writes.) Into the bit, thus extremely determining all the speculative write bits) logic 4 8 2 to access the metadata to determine whether a transaction is determined logic 4 8 4 to operate the other logic of the application 5 70 hard Volume-accelerated software transactional memory application 572 program logic 5 7 4 to retain the logic for reading bits and records for canceling bits in the VAMD of the CPU cache 576 to retain the TMW write-view bit Yuan in the CPU cache CLMD logic 5 7 7 The logic 580 for resetting the cache metadata in the transaction is used to provide Open_Read barrier filtering (which uses open for reading bits to avoid redundant read records) logic 580 to provide Write_Undo barrier filtering (its usage record) The logic for canceling the bit to avoid redundant cancellations is used to bypass the read record when no read set is invalid (for example, the transaction does not read from other lines to the data). Logic 5 84 provides logic for retrying the CLMD column to provide logic for using some cache metadata to avoid redundant filtering and unwanted valid nested transactions. Other logic 48

Claims

200905474 十、申請專利範圍： 1. 一種具有電腦可執行指令之電腦可讀取媒體，該等電腦可執行指令使一電腦執行下列步驟：自一有限交易式記憶體應用，存取在一中央處理單元之一快取中的快取元資料；當執行來自該有限交易式記憶體應用之一交易式讀取時，軟體設定一快取列元資料交易讀取位元；及當執行來自該有限交易式記憶體應用之一交易式寫入時，軟體設定一快取列元資料交易寫入位元，並執行一條件式儲存，其中該快取列元資料交易寫入位元係配置為一推測寫入（s p e c u 1 a t i v e - w r i t e)位元。 2. 如申請專利範圍第1項所述之電腦可讀取媒體，更包含電腦可執行指令，該等電腦可執行指令使一電腦執行下列步驟：若沒有以該快取列元資料交易讀取位元或該快取列元資料交易寫入位元標示之列被收回或無效，則清除所有推測寫入位元。 3. 如申請專利範圍第2項所述之電腦可讀取媒體，其中該等推測寫入位元係藉由一快閃清除而被清除。 4. 如申請專利範圍第3項之電腦可讀取媒體，其中該等推測寫入位元之該快閃清除立即確定（c 〇 m m i t)該推測地寫入的資料。 5. 如申請專利範圍第1項所述之電腦可讀取媒體，更包含電腦可執行指令，該等電腦可執行指令使一電腦執 49 200905474 行下列步驟：使用該快取之快取元資料來判定是否已決定一特定交易。 6. 如申請專利範圍第1項所述之電腦可讀取媒體，其中該快取元資料包含用於每一快取列之元資料。 7. 如申請專利範圍第1項所述之電腦可讀取媒體，其中該快取元資料包含快取元資料控制暫存器。 8. 如申請專利範圍第1項所述之電腦可讀取媒體，其中該條件式儲存係藉由包含於該中央處理單元上之一指令集架構内的一硬體指令來執行。 9. 如申請專利範圍第1項所述之電腦可讀取媒體，其中該條件式儲存係以該快取列元資料交易寫入位元的連續存在為條件。 1 0. —種使用一中央處理單元之一快取中的快取元資料以改善一有限交易式記憶體系統之操作的方法，包含下列步驟：提供一有限交易式記憶體應用，對一中央處理單元之一快取中的快取元資料之存取；對於每一交易式讀取，設定一快取列元資料交易讀取位元；對於每一交易式寫入，設定一快取列元資料交易寫入位元，其亦表示一推測寫入；及於確定時（C 0 m m i t t i m e)，若有任何以該快取列元資料交易讀取位元或該快取列元資料交易寫入位元標 50 200905474 示之列被收回或無效，則廢除所有推測地寫入的列。 1 1.如申請專利範圍第1 0項所述之方法，其中除了廢除所有推測地寫入的列以外，呼叫一快取元資料無效指令。 1 2.如申請專利範圍第1 1項所述之方法，其中該快取元資料無效指令係於該中央處理單元之一指令集架構的一部份。 1 3 ·如申請專利範圍第1 0項所述之方法，其中於確定時 (commit time)，若沒有以該快取列元資料交易讀取位元或該快取列元資料交易寫入位元標示之列被收回或無效，則快閃清除所有推測交易寫入位元。 1 4.如申請專利範圍第1 3項所述之方法，其中藉由快閃清除該等推測寫入位元，確定（c o m m i t) —分別的一或多個寫入。 15. 一種具有電腦可執行指令之電腦可讀取媒體，該等電腦可執行指令係使一電腦執行如申請專利範圍第項所述之步驟。 16. 一種使用一中央處理單元之一快取中的快取元資料以處置一決定的交易之方法，包含下列步驟：提供一有限交易式記憶體應用，對一中央處理單元之一快取中的快取元資料之存取；若一快取元資料交易讀取或交易寫入位元被收回或無效，則配置一待調用之收回處置器；及自該交易被決定後，執行一適當行動。 1 7 _如申請專利範圍第1 6項所述之方法，其中該適當行動 51 200905474 包含廢除所有推測地寫入的列。 1 8 .如申請專利範圍第1 6項所述之方法，其中該適當行動包含執行一快取元資料快閃清除指令。 1 9.如申請專利範圍第1 8項所述之方法，其中該快取元資料無效指令係於該中央處理單元上之一指令集架構的一部份。 20. 一種具有電腦可執行指令之電腦可讀取媒體，該等電腦可執行指令使一電腦執行如申請專利範圍第1 6項所述之步驟。200905474 X. Patent application scope: 1. A computer readable medium with computer executable instructions, such computer executable instructions enable a computer to perform the following steps: From a limited transaction memory application, access is handled in a central processing a cache metadata in one of the cells; when executing a transactional read from the limited transaction memory application, the software sets a cached metadata transaction read bit; and when executed from the limited When one of the transactional memory applications is transactionally written, the software sets a cached metadata transaction write bit and performs a conditional storage, wherein the cached metadata transaction write bit is configured as a Speculative write (specu 1 ative - write) bit. 2. The computer readable medium as described in claim 1 of the patent application, further comprising computer executable instructions, the computer executable instructions causing a computer to perform the following steps: if the transaction is not read by the cached metadata transaction The bit or the column of the cached metadata transaction write bit is reclaimed or invalidated, and all speculative write bits are cleared. 3. The computer readable medium of claim 2, wherein the speculative write bits are cleared by a flash clear. 4. The computer readable medium of claim 3, wherein the flash clearing of the speculative write bits immediately determines (c 〇 m m i t) the speculatively written data. 5. The computer readable medium as described in item 1 of the patent application, further comprising computer executable instructions, such computer executable instructions enable a computer to execute the following steps: 2009 05474: Use the cache to retrieve the metadata To determine if a particular transaction has been decided. 6. The computer readable medium of claim 1, wherein the cache metadata includes metadata for each cache. 7. The computer readable medium of claim 1, wherein the cache metadata comprises a cache metadata control register. 8. The computer readable medium of claim 1, wherein the conditional storage is performed by a hardware instruction included in an instruction set architecture on the central processing unit. 9. The computer readable medium of claim 1, wherein the conditional storage is conditional on the continued existence of the cached data transaction write bit. 1 0. A method of using a cache metadata in a cache of a central processing unit to improve the operation of a limited transactional memory system, comprising the steps of: providing a limited transactional memory application to a central Accessing cached metadata in one of the processing units; setting a cached metadata transaction read bit for each transactional read; setting a cached column for each transactional write The metadata transaction is written to the bit, which also indicates a speculative write; and at the time of the determination (C 0 mmittime), if any of the cached metadata transaction read bit or the cached metadata transaction is written Inplacement element 50 200905474 The column shown is reclaimed or invalidated, and all speculatively written columns are revoked. 1 1. The method of claim 10, wherein a cached metadata invalidation instruction is invoked in addition to revoking all speculatively written columns. 1 2. The method of claim 11, wherein the cached metadata invalidation instruction is part of an instruction set architecture of one of the central processing units. 1 3 · The method of claim 10, wherein at the time of the commit, if the cache bit is not read by the cache or the cache data is written to the cache If the column of the meta tag is reclaimed or invalid, then all speculative transaction write bits are cleared by flash. 1 4. The method of claim 13 wherein the speculative write bits are cleared by flash to determine (c o m m i t) - one or more writes, respectively. 15. A computer readable medium having computer executable instructions for causing a computer to perform the steps of the first aspect of the patent application. 16. A method of processing a determined transaction using a cached metadata in a cache of a central processing unit, comprising the steps of: providing a limited transactional memory application for caching in a central processing unit Access to the cache metadata; if a cache metadata transaction read or transaction write bit is retrieved or invalid, then configure a callback handler to be invoked; and after the transaction is determined, execute an appropriate action. 1 7 _ The method of claim 16, wherein the appropriate action 51 200905474 includes revoking all speculatively written columns. 18. The method of claim 16, wherein the appropriate action comprises executing a cache metadata flash clear command. The method of claim 18, wherein the cache element invalidation instruction is part of an instruction set architecture on the central processing unit. 20. A computer readable medium having computer executable instructions for causing a computer to perform the steps of claim 16 of the scope of the patent application.

5252