US20060106991A1 - Victim prefetching in a cache hierarchy - Google Patents
Victim prefetching in a cache hierarchy Download PDFInfo
- Publication number
- US20060106991A1 US20060106991A1 US10/989,997 US98999704A US2006106991A1 US 20060106991 A1 US20060106991 A1 US 20060106991A1 US 98999704 A US98999704 A US 98999704A US 2006106991 A1 US2006106991 A1 US 2006106991A1
- Authority
- US
- United States
- Prior art keywords
- level cache
- line
- processor
- cache
- based system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/122—Replacement control using replacement algorithms of the least frequently used [LFU] type, e.g. with individual count value
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
- G06F12/0897—Caches characterised by their organisation or structure with two or more cache hierarchy levels
Definitions
- the present invention relates to computer storage management, and, more particularly, to victim prefetching in a cache hierarchy.
- Multithreading refers to the ability of a processor to execute two programs almost simultaneously. This permits the processing of one program or “thread” while the other is waiting for data to be fetched.
- prefetching traditionally refers to retrieving data expected to be used in the future. Each results in increased complexity, as well as increased off-chip traffic. In the case of multithreading, the increased traffic is due to decreased per thread cache capacity, yielding higher miss rates. Prefetching increases traffic by fetching data which is not referenced before castout. As used herein, the term “castout” refers to the process by which data is removed from the cache to make room for data that is fetched.
- a number of approaches to prefetching are described in the prior art and/or are implemented in computer systems. These generally include the fetching of data based on currently observed access behavior such as strides (where accesses are to a series of addresses separated by a fixed increment), or via previously recorded access patterns, or by direct software instructions.
- strides where accesses are to a series of addresses separated by a fixed increment
- previously recorded access patterns or by direct software instructions.
- Each of these approaches have some advantages. However, they either require increased program complexity (for software instructions), have problems dealing with nonregular access patterns (in the case of, for example, stride-based prefetching), or unnecessary complexity (in the case of, for example, recorded access patterns). Further, these earlier approaches do not utilize information readily available to the processor, namely the identity of recently discarded cache lines coupled with their status and location in the storage hierarchy.
- a processor-based system for aiding in prefetching between a first level cache and a second level cache.
- the system includes a directory extension operatively connected to the first level cache and the second level cache; wherein the directory extension stores an entry, the entry (a) identifying at least one page in the first level cache including at least one line that has been recently ejected, and (b) for each of the at least one page, indicating which of the at least one line is prefetchable from the second level cache.
- a processor-based system for aiding in the prefetching between a first level cache and a second level cache.
- the system includes information associated with each line in the second level cache; wherein the information identifies whether the each was referenced in the most recent insertion of the each line into the first level cache.
- FIG. 1 depicts a block diagram of a processor-based system implementing a directory extension, in accordance with one exemplary embodiment of the present invention
- FIG. 2 depicts a flow diagram illustrating an exemplary method applying the directory extension of FIG. 1 on a L2 cache fault, in accordance with one exemplary embodiment of the present invention
- FIG. 3 depicts a flow diagram illustrating an exemplary method applying the directory extension of FIG. 1 on a L2 castout, in accordance with one exemplary embodiment of the present invention.
- FIG. 4 depicts a flow diagram illustrating an exemplary method applying the directory-extension of FIG. 1 on deletions of a line from the L3 cache or invalidations of the line in the L3 cache, in accordance with one exemplary embodiment of the present invention.
- the systems and methods described herein may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof.
- at least a portion of the present invention is preferably implemented as an application comprising program instructions that are tangibly embodied on one or more program storage devices (e.g., hard disk, magnetic floppy disk, RAM, ROM, CD ROM, etc.) and executable by any device or machine comprising suitable architecture, such as a general purpose digital computer having a processor, memory, and input/output interfaces.
- a page is 4K bytes, and is divided into lines of 128 bytes. It should be appreciated that the page and lines sizes are presented solely for the sake of simplicity; other configurations may be used as contemplated by those skilled in the art.
- the unit of storage allocation in the caches is a line. Lines are fetched or discarded from the caches as a function of references by the processors in the system.
- the storage management approach described herein yields substantial performance gains over prior art approaches with little additional complexity and acceptable levels of traffic.
- a cache hierarchy is a type of storage hierarchy.
- the DX may maintain (1) a list of pages which contains recently ejected lines from a given level in the cache hierarchy, and (2) for each page in this list, a set of ejected lines, provided these lines are prefetchable from, for example, the next level of the cache hierarchy. Given a cache fault to a line within a page in this list, the DX identifies other lines from this page which are advantageous to prefetch, and which may be prefetched without the substantial overhead to directory lookup which would otherwise be required.
- directory lookup refers to the process where a cache directory is searched for the presence and status of a line. Such searches are expensive as they are time consuming and tie up the directory mechanism. It should be understood that lines to be prefetched, as identified by the above-described directory extension, could also be identified by additions to the standard directory. In the following, however, we concentrate our description on implementations using the extension.
- a prefetch mechanism that exploits some properties of reference patterns, e.g., that parts of pages accessed together tend to be reaccessed in proximate time.
- a cache level L(i) may be viewed as a mechanism which stores lines that have been referenced recently. 15
- the cache at the next level towards memory typically cache level L(i+1) stores lines referenced somewhat less recently than in cache level L(i). References at cache level L(i) may then suggest that some items from cache level L(i+1) should be prefetched, such as other lines from the same page.
- the exemplary embodiment described herein presents the interface between a level two cache (hereinafter “L2”) and a level three cache (hereinafter “L3”).
- L2 level two cache
- L3 level three cache
- the L2 directory and contents, and the L3 directory are on the processor chip; the L3 contents are off-chip.
- other embodiments include the case where a single chip holds multiple processors, each with a private L2.
- the DX is distinguishable from a victim cache, which stores recently discarded lines.
- the DX Rather than holding the discarded cache lines (in this exemplary embodiment, a purpose served by the L3), the DX maintains a record of lines which may be advantageous to prefetch, based on their ejection or victimization.
- the term “ejection” refers to the process where lines are removed from a cache as others are fetched. References which cause a prefetch are referred to herein as “prefetch events” or “PEs.”
- prefetch events or “PEs.”
- the PEs described in the exemplary embodiments below correspond to cache faults. It should be understood that in alternate embodiments, PEs may correspond to more general events, such as references to a given page.
- a line is prefetchable if it is present in L3. It should also be understood that information contained in the DX could be incorporated as part of the cache directories.
- lines that are missing from a given cache level and have recently been referenced are likely to be in the next cache level.
- lines missing from L2 are likely to be in L3.
- L2 prefetches it is desirable in this exemplary embodiment to limit L2 prefetches to lines held in L3, as the expense of mistaken prefetches from lower levels such as main memory may be too high.
- the prefetch of a line is termed mistaken if the line is not referenced before being ejected.
- L2 cache fault Given an L2 cache fault to a line in a given page, one can simply fetch all lines from the given page currently in L3 but not L2.
- One problem with this approach is the load imposed on the cache directories. For example, suppose the page size is 4K, and the line size is 128 B. Thus, there are 32 lines in each page. The L3 cache directory is searched for the requested line, and both the L2 and L3 cache directories are searched for the other 31 lines. Performing such a search request can be quite costly for applications where typically only a modest fraction of lines from each page are present in the caches. Thus, servicing such search requests, as well as search requests for additional cache faults, is likely to cause substantial queuing delays for directory lookup. An additional consideration is that L3 directory lookups may be slow. Another problem with this approach is that it provides no way to identify which of the lines from this page are likely to be referenced. The present invention eliminates much of the directory lookup overhead, and provides a good quality of prefetching.
- the system 100 includes a processor chip 105 . Included on the processor chip 105 are two processors: processor A and processor B. Processor A includes a private L1 cache A that is located on the processor A. Processor B includes a private L1 cache B that is located on the processor B. Processor A is associated with its own private L2 cache A, and processor B is associated with its own private L2 cache B. Also included on the processor chip 105 is a L3 cache directory associated with off-chip L3 cache contents. The system 100 further includes a main memory.
- the system 100 includes DX-A (i.e., directory extension A) associated with processor A and L2 cache A, and a DX-B (i.e., directory extension B) associated with processor B and L2 cache B.
- a directory extension may be an n-way set associative cache.
- the DX-A and the DX-B are 8-way set associative caches.
- Each equivalence class in the DX contains “entries” or tags for the n pages in this equivalence class that have most recently had an ejection of a line, and for each such entry, a victim vector (“VV”) described below.
- VV victim vector
- the term “equivalence class” refers to the set of lines which can be mapped to the eight locations referred to above.
- a VV is a 32 bit vector that holds the identity of lines ejected from the L2 cache (but not necessarily referenced) during a current visit of this entry (as defined above) into the DX, and whether the ejected lines are present in L3.
- the ith bit in the victim vector is 1 if the ith line in the corresponding page has been ejected from L2, and is present in L3.
- the ith bit in the VV is 1 if the ith line in the corresponding page has been ejected from L2, is present in L3, and has a prefetch score of, 1.
- the ith bit in the VV is 1 only if the line in question has not been invalidated, or otherwise rendered non-prefetchable by another processor.
- a suboptimal alternative to our preferred embodiment would include the above information in L3 directory entries associated with individual lines. Thus, given an L2 fault, L3 entries for other lines in this page can be scanned and prefetched, if appropriate.
- the processor-based system may be similar to the one illustrated in FIG. 1 .
- the L3 cache i.e., the directory and contents combined
- the L3 cache is a victim cache of the L2 cache. That is, each line that is evicted from the L2 cache is immediately inserted in the L3 cache, assuming the line is not in the L3 cache already. Further, as a line is fetched into the L2 cache from the L3 cache, the line is deleted from the L3 cache.
- the prefetch events herein are simply cache faults.
- an exemplary method applying a directory extension on a L2 cache fault is illustrated, in accordance with one embodiment of the present invention. It is assumed here that an L2 cache fault has occurred. That is, a requested line for an associated page is not present in the L2 cache. On the L2 cache fault, it is determined (at 205 ) whether the requested line is present in the L3 cache. If the requested line is present in the L3 cache, then it is fetched (at 210 ) from the L3 cache. If the requested line is not present in the L3 cache, then it is fetched (at 215 ) from the main memory.
- the DX is accessed to determine (at 220 ) whether it has an entry (e.g., a tag) for the associated page. Concurrency is beneficial as it minimizes delays. If the entry is not found, the method terminates (at 225 ). If the entry is found, lines with Is in the VV are prefetched (at 230 ) into the L2 cache from the L3 cache. The related page tag is removed (at 235 ) from the DX.
- an entry e.g., a tag
- Lines that are prefetched into the L2 cache are placed at an appropriate level within their respective equivalence class. For example, in a cache where entries in an equivalence class are ordered according to how recently they have been used or referenced, in one embodiment, the least recently used (“LRU”) position may be chosen.
- LRU least recently used
- FIG. 3 an exemplary method applying a directory extension on a L2 castout is illustrated, in accordance with one embodiment of the present invention. It is assumed here that a L2 castout has occurred. That is, a line ejected from L2 is written back to L3. The ejected line is included in an associated page. It is determined (at 305 ) whether a prefetch score for the ejected line is equal to one. If the prefetch score for the ejected line is not equal to one, then the method terminates (at 310 ). If the prefetch score for the ejected line is equal to one, then it is determined (at 315 ) whether a VV entry for the associated page is in the DX.
- the VV entry (e.g., a bit) corresponding to the associated page in the DX is set (at 320 ) to 1. If the entry is not present in the DX, then is determined (at 325 ) whether the equivalence class of the associated page in the DX is full. If the equivalence class is full, then the least recently created entry in the equivalence class is deleted (at 330 ). After either steps 325 or 330 , a VV entry is created (at 335 ) in the DX for the associated page at the most recently used or ejected position in the equivalence class of entry. Following step 335 , the VV entry (e.g., a bit) is set (at 320 ) to 1.
- FIG. 4 an exemplary method applying a directory extension on deletions of a line from the L3 cache or invalidations of the line in the L3 cache is illustrated, in accordance with one embodiment of the present invention. It is assumed here that a line is unavailable for one of the three following reasons: (1) an L3 castout; (2) the line was fetched by another processor sharing the L3 cache; or (3) invalidation or claim in exclusive mode by another processor.
- the associated page for this line is the page of which this line is a part.
- the VV entry for the associated page is in the DX, then the VV corresponding to the requested line is updated (e.g., the bit corresponding to the line is set to zero) (at 415 ). It is determined (at 420 ) whether all the VV entries for the associated page are zero. If all the VV entries for the associated page are not zero, then the method terminates (at 425 ). If all the VV entries for the associated page are zero, then the entry for this page is deleted from the DX.
- L3 cache management is unchanged by prefetching. That is, lines fetched or prefetched into the L2 cache are deleted from the L3 cache. Lines ejected from L2 cache are placed in the most recently used (“MRU”) position in their respective equivalence class in L3.
- MRU most recently used
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
- 1. Field of the Invention
- The present invention relates to computer storage management, and, more particularly, to victim prefetching in a cache hierarchy.
- 2. Description of the Related Art
- Memory latency in modern computers is not decreasing at a rate commensurate with increasing processor speeds. This results in the computing device idly waiting for the system to fetch processes from the memory, thereby not fully taking advantage of the faster processor speeds.
- Approaches to mitigating memory latency include multithreading and prefetching. The term “multithreading” refers to the ability of a processor to execute two programs almost simultaneously. This permits the processing of one program or “thread” while the other is waiting for data to be fetched. The term “prefetching,” as used herein, traditionally refers to retrieving data expected to be used in the future. Each results in increased complexity, as well as increased off-chip traffic. In the case of multithreading, the increased traffic is due to decreased per thread cache capacity, yielding higher miss rates. Prefetching increases traffic by fetching data which is not referenced before castout. As used herein, the term “castout” refers to the process by which data is removed from the cache to make room for data that is fetched.
- A number of approaches to prefetching are described in the prior art and/or are implemented in computer systems. These generally include the fetching of data based on currently observed access behavior such as strides (where accesses are to a series of addresses separated by a fixed increment), or via previously recorded access patterns, or by direct software instructions. Each of these approaches have some advantages. However, they either require increased program complexity (for software instructions), have problems dealing with nonregular access patterns (in the case of, for example, stride-based prefetching), or unnecessary complexity (in the case of, for example, recorded access patterns). Further, these earlier approaches do not utilize information readily available to the processor, namely the identity of recently discarded cache lines coupled with their status and location in the storage hierarchy.
- In one aspect of the present invention, a processor-based system for aiding in prefetching between a first level cache and a second level cache is provided. The system includes a directory extension operatively connected to the first level cache and the second level cache; wherein the directory extension stores an entry, the entry (a) identifying at least one page in the first level cache including at least one line that has been recently ejected, and (b) for each of the at least one page, indicating which of the at least one line is prefetchable from the second level cache.
- In another aspect of the present invention, a processor-based system for aiding in the prefetching between a first level cache and a second level cache is provided. The system includes information associated with each line in the second level cache; wherein the information identifies whether the each was referenced in the most recent insertion of the each line into the first level cache.
- The invention may be understood by reference to the following description taken in conjunction with the accompanying drawings, in which like reference numerals identify like elements, and in which:
-
FIG. 1 depicts a block diagram of a processor-based system implementing a directory extension, in accordance with one exemplary embodiment of the present invention; -
FIG. 2 depicts a flow diagram illustrating an exemplary method applying the directory extension ofFIG. 1 on a L2 cache fault, in accordance with one exemplary embodiment of the present invention; -
FIG. 3 depicts a flow diagram illustrating an exemplary method applying the directory extension ofFIG. 1 on a L2 castout, in accordance with one exemplary embodiment of the present invention; and -
FIG. 4 depicts a flow diagram illustrating an exemplary method applying the directory-extension ofFIG. 1 on deletions of a line from the L3 cache or invalidations of the line in the L3 cache, in accordance with one exemplary embodiment of the present invention. - Illustrative embodiments of the invention are described below. In the interest of clarity, not all features of an actual implementation are described in this specification. It will be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
- While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims. It should be understood that the systems and methods described herein may be implemented in various forms of hardware, software, firmware, or a combination thereof.
- It is to be understood that the systems and methods described herein may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. In particular, at least a portion of the present invention is preferably implemented as an application comprising program instructions that are tangibly embodied on one or more program storage devices (e.g., hard disk, magnetic floppy disk, RAM, ROM, CD ROM, etc.) and executable by any device or machine comprising suitable architecture, such as a general purpose digital computer having a processor, memory, and input/output interfaces. It is to be further understood that, because some of the constituent system components and process steps depicted in the accompanying Figures are preferably implemented in software, the connections between system modules (or the logic flow of method steps) may differ depending upon the manner in which the present invention is programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations of the present invention.
- In the exemplary embodiment of a storage management approach presented herein, a page is 4K bytes, and is divided into lines of 128 bytes. It should be appreciated that the page and lines sizes are presented solely for the sake of simplicity; other configurations may be used as contemplated by those skilled in the art. The unit of storage allocation in the caches is a line. Lines are fetched or discarded from the caches as a function of references by the processors in the system. The storage management approach described herein yields substantial performance gains over prior art approaches with little additional complexity and acceptable levels of traffic.
- We present a “directory extension” (hereinafter “DX”) to aid in prefetching between proximate levels in a cache or storage hierarchy. A cache hierarchy is a type of storage hierarchy. The DX may maintain (1) a list of pages which contains recently ejected lines from a given level in the cache hierarchy, and (2) for each page in this list, a set of ejected lines, provided these lines are prefetchable from, for example, the next level of the cache hierarchy. Given a cache fault to a line within a page in this list, the DX identifies other lines from this page which are advantageous to prefetch, and which may be prefetched without the substantial overhead to directory lookup which would otherwise be required. As used herein, the term “directory lookup” refers to the process where a cache directory is searched for the presence and status of a line. Such searches are expensive as they are time consuming and tie up the directory mechanism. It should be understood that lines to be prefetched, as identified by the above-described directory extension, could also be identified by additions to the standard directory. In the following, however, we concentrate our description on implementations using the extension.
- In an exemplary embodiment of the present invention, we propose a prefetch mechanism that exploits some properties of reference patterns, e.g., that parts of pages accessed together tend to be reaccessed in proximate time. For example, a cache level L(i) may be viewed as a mechanism which stores lines that have been referenced recently. 15 Similarly, the cache at the next level towards memory, typically cache level L(i+1), stores lines referenced somewhat less recently than in cache level L(i). References at cache level L(i) may then suggest that some items from cache level L(i+1) should be prefetched, such as other lines from the same page. Searching the cache directories for items missing from cache level L(i) but held in cache level L(i+1) can impose a substantial load on the cache directories. The structure we consider here alleviates this problem. It also permits the exploitation of other properties related to the desirability of prefetching a line. An example of this we consider here is that of noting which lines prefetched in the recent past were actually referenced.
- Although not so limiting, the exemplary embodiment described herein presents the interface between a level two cache (hereinafter “L2”) and a level three cache (hereinafter “L3”). As described herein in this exemplary embodiment, the L2 directory and contents, and the L3 directory are on the processor chip; the L3 contents are off-chip. It should be appreciated that other embodiments include the case where a single chip holds multiple processors, each with a private L2. It should further be appreciated that the DX, as described herein, is distinguishable from a victim cache, which stores recently discarded lines. Rather than holding the discarded cache lines (in this exemplary embodiment, a purpose served by the L3), the DX maintains a record of lines which may be advantageous to prefetch, based on their ejection or victimization. As used herein, the term “ejection” refers to the process where lines are removed from a cache as others are fetched. References which cause a prefetch are referred to herein as “prefetch events” or “PEs.” Although not so limiting, the PEs described in the exemplary embodiments below correspond to cache faults. It should be understood that in alternate embodiments, PEs may correspond to more general events, such as references to a given page. As described herein, a line is prefetchable if it is present in L3. It should also be understood that information contained in the DX could be incorporated as part of the cache directories.
- Generally, lines that are missing from a given cache level and have recently been referenced are likely to be in the next cache level. For example, lines missing from L2 are likely to be in L3. Thus, although not so limiting, it is desirable in this exemplary embodiment to limit L2 prefetches to lines held in L3, as the expense of mistaken prefetches from lower levels such as main memory may be too high. The prefetch of a line is termed mistaken if the line is not referenced before being ejected.
- Given an L2 cache fault to a line in a given page, one can simply fetch all lines from the given page currently in L3 but not L2. One problem with this approach is the load imposed on the cache directories. For example, suppose the page size is 4K, and the line size is 128B. Thus, there are 32 lines in each page. The L3 cache directory is searched for the requested line, and both the L2 and L3 cache directories are searched for the other 31 lines. Performing such a search request can be quite costly for applications where typically only a modest fraction of lines from each page are present in the caches. Thus, servicing such search requests, as well as search requests for additional cache faults, is likely to cause substantial queuing delays for directory lookup. An additional consideration is that L3 directory lookups may be slow. Another problem with this approach is that it provides no way to identify which of the lines from this page are likely to be referenced. The present invention eliminates much of the directory lookup overhead, and provides a good quality of prefetching.
- Referring now to
FIG. 1 , a block diagram of a processor-basedsystem 100 is shown, in accordance with one embodiment of the present invention. Thesystem 100 includes aprocessor chip 105. Included on theprocessor chip 105 are two processors: processor A and processor B. Processor A includes a private L1 cache A that is located on the processor A. Processor B includes a private L1 cache B that is located on the processor B. Processor A is associated with its own private L2 cache A, and processor B is associated with its own private L2 cache B. Also included on theprocessor chip 105 is a L3 cache directory associated with off-chip L3 cache contents. Thesystem 100 further includes a main memory. - The
system 100 includes DX-A (i.e., directory extension A) associated with processor A and L2 cache A, and a DX-B (i.e., directory extension B) associated with processor B and L2 cache B. A directory extension may be an n-way set associative cache. Although not so limiting, in the present exemplary embodiment, the DX-A and the DX-B are 8-way set associative caches. Each equivalence class in the DX contains “entries” or tags for the n pages in this equivalence class that have most recently had an ejection of a line, and for each such entry, a victim vector (“VV”) described below. As used herein, the term “equivalence class” refers to the set of lines which can be mapped to the eight locations referred to above. - In one embodiment, a VV is a 32 bit vector that holds the identity of lines ejected from the L2 cache (but not necessarily referenced) during a current visit of this entry (as defined above) into the DX, and whether the ejected lines are present in L3. For example, the ith bit in the victim vector is 1 if the ith line in the corresponding page has been ejected from L2, and is present in L3. In a second embodiment, we may modify the VV from the above as a function of what we term a “prefetch score.” In the current implementation, the prefetch score is 1 if the line was referenced in the most recent visit to L2, and is zero otherwise. Thus, the ith bit in the VV is 1 if the ith line in the corresponding page has been ejected from L2, is present in L3, and has a prefetch score of, 1. In each of the above examples, the ith bit in the VV is 1 only if the line in question has not been invalidated, or otherwise rendered non-prefetchable by another processor. As noted above, a suboptimal alternative to our preferred embodiment would include the above information in L3 directory entries associated with individual lines. Thus, given an L2 fault, L3 entries for other lines in this page can be scanned and prefetched, if appropriate.
- Although the embodiments described above prefetch from the next level in a cache hierarchy, it should be appreciated that the DX can be used for prefetching from other levels, as contemplated by one skilled in the art.
- We now described in detail exemplary implementations of a VV in a processor-based system. The processor-based system may be similar to the one illustrated in
FIG. 1 . The L3 cache (i.e., the directory and contents combined) is a victim cache of the L2 cache. That is, each line that is evicted from the L2 cache is immediately inserted in the L3 cache, assuming the line is not in the L3 cache already. Further, as a line is fetched into the L2 cache from the L3 cache, the line is deleted from the L3 cache. As previously stated, the prefetch events herein are simply cache faults. - Referring now to
FIG. 2 , an exemplary method applying a directory extension on a L2 cache fault is illustrated, in accordance with one embodiment of the present invention. It is assumed here that an L2 cache fault has occurred. That is, a requested line for an associated page is not present in the L2 cache. On the L2 cache fault, it is determined (at 205) whether the requested line is present in the L3 cache. If the requested line is present in the L3 cache, then it is fetched (at 210) from the L3 cache. If the requested line is not present in the L3 cache, then it is fetched (at 215) from the main memory. Concurrently, the DX is accessed to determine (at 220) whether it has an entry (e.g., a tag) for the associated page. Concurrency is beneficial as it minimizes delays. If the entry is not found, the method terminates (at 225). If the entry is found, lines with Is in the VV are prefetched (at 230) into the L2 cache from the L3 cache. The related page tag is removed (at 235) from the DX. - Lines that are prefetched into the L2 cache are placed at an appropriate level within their respective equivalence class. For example, in a cache where entries in an equivalence class are ordered according to how recently they have been used or referenced, in one embodiment, the least recently used (“LRU”) position may be chosen.
- Referring now to
FIG. 3 , an exemplary method applying a directory extension on a L2 castout is illustrated, in accordance with one embodiment of the present invention. It is assumed here that a L2 castout has occurred. That is, a line ejected from L2 is written back to L3. The ejected line is included in an associated page. It is determined (at 305) whether a prefetch score for the ejected line is equal to one. If the prefetch score for the ejected line is not equal to one, then the method terminates (at 310). If the prefetch score for the ejected line is equal to one, then it is determined (at 315) whether a VV entry for the associated page is in the DX. If the entry is present in the DX, then the VV entry (e.g., a bit) corresponding to the associated page in the DX is set (at 320) to 1. If the entry is not present in the DX, then is determined (at 325) whether the equivalence class of the associated page in the DX is full. If the equivalence class is full, then the least recently created entry in the equivalence class is deleted (at 330). After eithersteps step 335, the VV entry (e.g., a bit) is set (at 320) to 1. - Referring now to
FIG. 4 , an exemplary method applying a directory extension on deletions of a line from the L3 cache or invalidations of the line in the L3 cache is illustrated, in accordance with one embodiment of the present invention. It is assumed here that a line is unavailable for one of the three following reasons: (1) an L3 castout; (2) the line was fetched by another processor sharing the L3 cache; or (3) invalidation or claim in exclusive mode by another processor. The associated page for this line is the page of which this line is a part. For each DX on the processor chip, it is determined (at 405) whether a VV entry for the associated page is in the DX. If the VV entry for the associated page is not in the DX, then the method terminates (at 410). If the VV entry for the associated page is in the DX, then the VV corresponding to the requested line is updated (e.g., the bit corresponding to the line is set to zero) (at 415). It is determined (at 420) whether all the VV entries for the associated page are zero. If all the VV entries for the associated page are not zero, then the method terminates (at 425). If all the VV entries for the associated page are zero, then the entry for this page is deleted from the DX. - If a line is declared invalid, or claimed exclusive by another processor, the entry for this line in the DX is set to zero. If all the VV entries for the associated page are zero, the entry for this page is deleted from the DX.
- It should be appreciated that the L3 cache management is unchanged by prefetching. That is, lines fetched or prefetched into the L2 cache are deleted from the L3 cache. Lines ejected from L2 cache are placed in the most recently used (“MRU”) position in their respective equivalence class in L3.
- The particular embodiments disclosed above are illustrative only, as the invention may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Furthermore, no limitations are intended to the details of design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope and spirit of the invention. Accordingly, the protection sought herein is as set forth in the claims below.
Claims (22)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/989,997 US7716424B2 (en) | 2004-11-16 | 2004-11-16 | Victim prefetching in a cache hierarchy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/989,997 US7716424B2 (en) | 2004-11-16 | 2004-11-16 | Victim prefetching in a cache hierarchy |
Publications (2)
Publication Number | Publication Date |
---|---|
US20060106991A1 true US20060106991A1 (en) | 2006-05-18 |
US7716424B2 US7716424B2 (en) | 2010-05-11 |
Family
ID=36387785
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/989,997 Expired - Fee Related US7716424B2 (en) | 2004-11-16 | 2004-11-16 | Victim prefetching in a cache hierarchy |
Country Status (1)
Country | Link |
---|---|
US (1) | US7716424B2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100325367A1 (en) * | 2009-06-19 | 2010-12-23 | International Business Machines Corporation | Write-Back Coherency Data Cache for Resolving Read/Write Conflicts |
CN111324556A (en) * | 2018-12-13 | 2020-06-23 | 国际商业机器公司 | Cache prefetch |
US11243718B2 (en) * | 2019-12-20 | 2022-02-08 | SK Hynix Inc. | Data storage apparatus and operation method i'hereof |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7805575B1 (en) | 2006-09-29 | 2010-09-28 | Tilera Corporation | Caching in multicore and multiprocessor architectures |
US8347037B2 (en) * | 2008-10-22 | 2013-01-01 | International Business Machines Corporation | Victim cache replacement |
US8209489B2 (en) * | 2008-10-22 | 2012-06-26 | International Business Machines Corporation | Victim cache prefetching |
US8499124B2 (en) | 2008-12-16 | 2013-07-30 | International Business Machines Corporation | Handling castout cache lines in a victim cache |
US8225045B2 (en) * | 2008-12-16 | 2012-07-17 | International Business Machines Corporation | Lateral cache-to-cache cast-in |
US8117397B2 (en) * | 2008-12-16 | 2012-02-14 | International Business Machines Corporation | Victim cache line selection |
US8489819B2 (en) | 2008-12-19 | 2013-07-16 | International Business Machines Corporation | Victim cache lateral castout targeting |
US8949540B2 (en) | 2009-03-11 | 2015-02-03 | International Business Machines Corporation | Lateral castout (LCO) of victim cache line in data-invalid state |
US8285939B2 (en) * | 2009-04-08 | 2012-10-09 | International Business Machines Corporation | Lateral castout target selection |
US8327073B2 (en) * | 2009-04-09 | 2012-12-04 | International Business Machines Corporation | Empirically based dynamic control of acceptance of victim cache lateral castouts |
US8347036B2 (en) * | 2009-04-09 | 2013-01-01 | International Business Machines Corporation | Empirically based dynamic control of transmission of victim cache lateral castouts |
US8312220B2 (en) * | 2009-04-09 | 2012-11-13 | International Business Machines Corporation | Mode-based castout destination selection |
US9189403B2 (en) | 2009-12-30 | 2015-11-17 | International Business Machines Corporation | Selective cache-to-cache lateral castouts |
US8966185B2 (en) * | 2012-06-14 | 2015-02-24 | International Business Machines Corporation | Cache memory prefetching |
GB2506900A (en) | 2012-10-12 | 2014-04-16 | Ibm | Jump positions in recording lists during prefetching |
US10013357B2 (en) | 2016-05-09 | 2018-07-03 | Cavium, Inc. | Managing memory access requests with prefetch for streams |
US11403229B2 (en) * | 2019-05-24 | 2022-08-02 | Texas Instruments Incorporated | Methods and apparatus to facilitate atomic operations in victim cache |
US11442864B2 (en) | 2020-06-29 | 2022-09-13 | Marvell Asia Pte, Ltd. | Managing prefetch requests based on stream information for previously recognized streams |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4807110A (en) * | 1984-04-06 | 1989-02-21 | International Business Machines Corporation | Prefetching system for a cache having a second directory for sequentially accessed blocks |
US4980823A (en) * | 1987-06-22 | 1990-12-25 | International Business Machines Corporation | Sequential prefetching with deconfirmation |
US5551000A (en) * | 1993-03-18 | 1996-08-27 | Sun Microsystems, Inc. | I/O cache with dual tag arrays |
US5845101A (en) * | 1997-05-13 | 1998-12-01 | Advanced Micro Devices, Inc. | Prefetch buffer for storing instructions prior to placing the instructions in an instruction cache |
US6038645A (en) * | 1996-08-28 | 2000-03-14 | Texas Instruments Incorporated | Microprocessor circuits, systems, and methods using a combined writeback queue and victim cache |
US6134643A (en) * | 1997-11-26 | 2000-10-17 | Intel Corporation | Method and apparatus for cache line prediction and prefetching using a prefetch controller and buffer and access history |
US20010037432A1 (en) * | 1993-08-05 | 2001-11-01 | Takashi Hotta | Data processor having cache memory |
US6427188B1 (en) * | 2000-02-09 | 2002-07-30 | Hewlett-Packard Company | Method and system for early tag accesses for lower-level caches in parallel with first-level cache |
US20020138700A1 (en) * | 2000-04-28 | 2002-09-26 | Holmberg Per Anders | Data processing system and method |
US6535961B2 (en) * | 1997-11-21 | 2003-03-18 | Intel Corporation | Spatial footprint prediction |
US6560693B1 (en) * | 1999-12-10 | 2003-05-06 | International Business Machines Corporation | Branch history guided instruction/data prefetching |
US20030208659A1 (en) * | 1995-10-27 | 2003-11-06 | Kenji Matsubara | Information processing system with prefetch instructions having indicator bits specifying cache levels for prefetching |
US20030221072A1 (en) * | 2002-05-22 | 2003-11-27 | International Business Machines Corporation | Method and apparatus for increasing processor performance in a computing system |
US6678795B1 (en) * | 2000-08-15 | 2004-01-13 | International Business Machines Corporation | Method and apparatus for memory prefetching based on intra-page usage history |
US20040015683A1 (en) * | 2002-07-18 | 2004-01-22 | International Business Machines Corporation | Two dimensional branch history table prefetching mechanism |
US20040199740A1 (en) * | 2003-04-07 | 2004-10-07 | Nokia Corporation | Adaptive and recursive compression of lossily compressible files |
US6853643B1 (en) * | 2000-12-21 | 2005-02-08 | Cisco Technology, Inc. | Interleaved read/write operation in a data switch |
-
2004
- 2004-11-16 US US10/989,997 patent/US7716424B2/en not_active Expired - Fee Related
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4807110A (en) * | 1984-04-06 | 1989-02-21 | International Business Machines Corporation | Prefetching system for a cache having a second directory for sequentially accessed blocks |
US4980823A (en) * | 1987-06-22 | 1990-12-25 | International Business Machines Corporation | Sequential prefetching with deconfirmation |
US5551000A (en) * | 1993-03-18 | 1996-08-27 | Sun Microsystems, Inc. | I/O cache with dual tag arrays |
US20010037432A1 (en) * | 1993-08-05 | 2001-11-01 | Takashi Hotta | Data processor having cache memory |
US20030208659A1 (en) * | 1995-10-27 | 2003-11-06 | Kenji Matsubara | Information processing system with prefetch instructions having indicator bits specifying cache levels for prefetching |
US6038645A (en) * | 1996-08-28 | 2000-03-14 | Texas Instruments Incorporated | Microprocessor circuits, systems, and methods using a combined writeback queue and victim cache |
US5845101A (en) * | 1997-05-13 | 1998-12-01 | Advanced Micro Devices, Inc. | Prefetch buffer for storing instructions prior to placing the instructions in an instruction cache |
US6535961B2 (en) * | 1997-11-21 | 2003-03-18 | Intel Corporation | Spatial footprint prediction |
US6134643A (en) * | 1997-11-26 | 2000-10-17 | Intel Corporation | Method and apparatus for cache line prediction and prefetching using a prefetch controller and buffer and access history |
US6560693B1 (en) * | 1999-12-10 | 2003-05-06 | International Business Machines Corporation | Branch history guided instruction/data prefetching |
US6427188B1 (en) * | 2000-02-09 | 2002-07-30 | Hewlett-Packard Company | Method and system for early tag accesses for lower-level caches in parallel with first-level cache |
US20020138700A1 (en) * | 2000-04-28 | 2002-09-26 | Holmberg Per Anders | Data processing system and method |
US6678795B1 (en) * | 2000-08-15 | 2004-01-13 | International Business Machines Corporation | Method and apparatus for memory prefetching based on intra-page usage history |
US6853643B1 (en) * | 2000-12-21 | 2005-02-08 | Cisco Technology, Inc. | Interleaved read/write operation in a data switch |
US20030221072A1 (en) * | 2002-05-22 | 2003-11-27 | International Business Machines Corporation | Method and apparatus for increasing processor performance in a computing system |
US20040015683A1 (en) * | 2002-07-18 | 2004-01-22 | International Business Machines Corporation | Two dimensional branch history table prefetching mechanism |
US20040199740A1 (en) * | 2003-04-07 | 2004-10-07 | Nokia Corporation | Adaptive and recursive compression of lossily compressible files |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100325367A1 (en) * | 2009-06-19 | 2010-12-23 | International Business Machines Corporation | Write-Back Coherency Data Cache for Resolving Read/Write Conflicts |
US8996812B2 (en) * | 2009-06-19 | 2015-03-31 | International Business Machines Corporation | Write-back coherency data cache for resolving read/write conflicts |
CN111324556A (en) * | 2018-12-13 | 2020-06-23 | 国际商业机器公司 | Cache prefetch |
US10884938B2 (en) * | 2018-12-13 | 2021-01-05 | International Business Machines Corporation | Method and apparatus for prefetching data items to a cache |
US11243718B2 (en) * | 2019-12-20 | 2022-02-08 | SK Hynix Inc. | Data storage apparatus and operation method i'hereof |
Also Published As
Publication number | Publication date |
---|---|
US7716424B2 (en) | 2010-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7716424B2 (en) | Victim prefetching in a cache hierarchy | |
JP4486750B2 (en) | Shared cache structure for temporary and non-temporary instructions | |
US7383394B2 (en) | Microprocessor, apparatus and method for selective prefetch retire | |
US6212602B1 (en) | Cache tag caching | |
US5603004A (en) | Method for decreasing time penalty resulting from a cache miss in a multi-level cache system | |
US8458408B2 (en) | Cache directed sequential prefetch | |
US8041897B2 (en) | Cache management within a data processing apparatus | |
US8943272B2 (en) | Variable cache line size management | |
EP0695996B1 (en) | Multi-level cache system | |
US7657726B2 (en) | Context look ahead storage structures | |
US7334088B2 (en) | Page descriptors for prefetching and memory management | |
US6782453B2 (en) | Storing data in memory | |
US8924648B1 (en) | Method and system for caching attribute data for matching attributes with physical addresses | |
US7600098B1 (en) | Method and system for efficient implementation of very large store buffer | |
JP2603476B2 (en) | Data retrieval method | |
US5737751A (en) | Cache memory management system having reduced reloads to a second level cache for enhanced memory performance in a data processing system | |
JPH09190382A (en) | Contention cache for computer memory system | |
JP3262519B2 (en) | Method and system for enhancing processor memory performance by removing old lines in second level cache | |
EP0752662B1 (en) | Method and apparatus for tagging a multi-way associative cache | |
JPH09160827A (en) | Prefetch of cold cache instruction | |
JPH11328024A (en) | Dummy fine i-cache inclusivity for vertical cache | |
US11500779B1 (en) | Vector prefetching for computing systems | |
US6397298B1 (en) | Cache memory having a programmable cache replacement scheme | |
US11379368B1 (en) | External way allocation circuitry for processor cores | |
CN111198827B (en) | Page table prefetching method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FRANASZEK, PETER A.;LASTRAS-MONTANO, LUIS ALFONSO;ROBINSON, JOHN T.;AND OTHERS;REEL/FRAME:015594/0418;SIGNING DATES FROM 20041123 TO 20041207 Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION,NEW YO Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FRANASZEK, PETER A.;LASTRAS-MONTANO, LUIS ALFONSO;ROBINSON, JOHN T.;AND OTHERS;SIGNING DATES FROM 20041123 TO 20041207;REEL/FRAME:015594/0418 |
|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION,NEW YO Free format text: RECORD TO CORRECT THE ASSIGNOR ON AN ASSIGNMENT DOCUMENT PREVIOUSLY RECORDED AT REEL 011594, FRAME 0418;ASSIGNORS:FRANASZEK, PETER A.;KUNKEL, STEVEN R.;LASTRAS-MONTANO, LUIS ALFONSO;AND OTHERS;SIGNING DATES FROM 20041123 TO 20041207;REEL/FRAME:016002/0683 Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: RECORD TO CORRECT THE ASSIGNOR ON AN ASSIGNMENT DOCUMENT PREVIOUSLY RECORDED AT REEL 011594, FRAME 0418;ASSIGNORS:FRANASZEK, PETER A.;KUNKEL, STEVEN R.;LASTRAS-MONTANO, LUIS ALFONSO;AND OTHERS;REEL/FRAME:016002/0683;SIGNING DATES FROM 20041123 TO 20041207 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20140511 |