US10331573B2 - Detection of avoidable cache thrashing for OLTP and DW workloads - Google Patents

Detection of avoidable cache thrashing for OLTP and DW workloads Download PDF

Info

Publication number
US10331573B2
US10331573B2 US15/687,296 US201715687296A US10331573B2 US 10331573 B2 US10331573 B2 US 10331573B2 US 201715687296 A US201715687296 A US 201715687296A US 10331573 B2 US10331573 B2 US 10331573B2
Authority
US
United States
Prior art keywords
cache
memory
metadata entry
computer
fifo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US15/687,296
Other versions
US20180129612A1 (en
Inventor
Justin Matthew Lewis
Zuoyu Tao
Jia Shi
Kothanda Umamageswaran
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Priority to US15/687,296 priority Critical patent/US10331573B2/en
Assigned to ORACLE INTERNATIONAL CORPORATION reassignment ORACLE INTERNATIONAL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LEWIS, JUSTIN MATTHEW, SHI, JIA, TAO, ZUOYU, UMAMAGESWARAN, KOTHANDA
Publication of US20180129612A1 publication Critical patent/US20180129612A1/en
Priority to US16/388,955 priority patent/US11138131B2/en
Application granted granted Critical
Publication of US10331573B2 publication Critical patent/US10331573B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1021Hit rate improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/46Caching storage objects of specific type in disk cache
    • G06F2212/466Metadata, control data

Definitions

  • This disclosure relates to cache control.
  • techniques that opportunistically maximizes throughput by adjusting the behavior of a cache based on statistics such as a count of cache misses for items recently evicted, a count of cache hits, and/or similar metrics.
  • a computer system typically has a natural hierarchy of storage tiers.
  • a computer may have volatile dynamic random access memory (DRAM) for fast access to currently active data, a non-volatile flash drive for moderate-speed access to recently used data that may be needed again soon, and a mechanical disk or network storage for slow access to bulk durable storage.
  • DRAM dynamic random access memory
  • non-volatile flash drive for moderate-speed access to recently used data that may be needed again soon
  • a mechanical disk or network storage for slow access to bulk durable storage. Because the storage tiers have different latencies and capacities, some data items may be replicated or moved between various storage tiers to ensure that data is dynamically distributed to storage tiers according to actual demand.
  • Random access tiers such as volatile and non-volatile memory, may contain a cache that optimizes data residency based on a policy of the cache. For example, a least recently used (LRU) cache may evict data not recently used to make room for recently used data not already cached. Caching policies decide which data items are cacheable and how many data items are cached.
  • LRU least recently used
  • cache thrashing may result.
  • a stressed cache may repeatedly evict and reload a same data item that would perform better if not evicted.
  • an idle cache may consume memory that would increase throughput if instead used for other purposes.
  • cache performance more or less depends on the nature of the workload, cache sizes typically do not dynamically adjust to suit fluctuating workload.
  • a database table scan may trample the current cache contents as the scan passes over copious data that far exceeds the capacity of the cache.
  • the scanned data may be read once during the scan and then not accessed again from the cache once read into cache.
  • Degraded performance such as thrashing, costs extra time (latency) to load needed data.
  • thrashing may cause additional wear that may reduce the life of the cache's physical medium.
  • cache thrashing may pose a capital cost and a reliability hazard, in addition to costing latency and energy.
  • FIG. 1 is a block diagram that depicts an example computer that adjusts the behavior of a cache based on a count of cache misses for items recently evicted, in an embodiment
  • FIG. 2 is a flow diagram that depicts an example process that adjusts the behavior of a cache based on a count of cache misses for items recently evicted, in an embodiment
  • FIG. 3 is a block diagram that depicts an example computer that uses a first-in first-out (FIFO) to track recent evictions, in an embodiment
  • FIG. 4 is a block diagram that depicts an example computer that uses thrashing thresholds to maintain equilibrium, in an embodiment
  • FIG. 5 is a block diagram that depicts an example computer that operates a doubly-linked list as a metadata FIFO, in an embodiment
  • FIG. 6 is a block diagram that illustrates an example computer system upon which an embodiment of the invention may be implemented.
  • FIG. 7 is a block diagram that illustrates an example software system that may be employed for controlling the operation of a computing system.
  • a computer responds to evicting a particular item from a cache by storing in an entry in metadata identifying the particular item. Metadata is stored in low latency random access memory (RAM). If the particular item is not subsequently requested, the metadata entry for the particular item may be removed under some conditions.
  • the computer detects whether or not an entry in metadata identifies that particular item. When a metadata entry for the particular item is detected, the computer increments a victim hit counter. An increment represents a cache hit that could have occurred had the particular item been retained in cache (e.g. if the cache were somewhat larger).
  • the victim hit counter may be used to calculate how much avoidable thrashing is the cache experiencing, which represents how much thrashing could be reduced if the cache were expanded. Either immediately or arbitrarily later, the computer adjusts a policy of the cache based on the value of the victim hit counter. For example, the computer may increase or decrease the capacity of the cache based on the victim hit counter. Particular resizing scenarios are discussed in a section titled “Policy Tuning” below.
  • metadata entries are temporarily retained in a first-in first-out (FIFO).
  • FIFO performance is accelerated with a data structure such as a linked list.
  • the cache is primarily dedicated to online transaction processing (OLTP) data.
  • the cache may be dynamically tuned to opportunistically store lower priority data, such as binary large objects (BLOBs) or temporary sort data spills, when thrashing is low.
  • BLOBs binary large objects
  • temporary sort data spills when thrashing is low.
  • FIG. 1 is a block diagram that depicts an example computer 100 , in an embodiment.
  • Computer 100 adjusts the behavior of a cache based on a count of cache misses for items recently evicted.
  • Computer 100 may be a rack server such as a blade, a personal computer, a mainframe, a network appliance, a virtual machine, a smartphone, or other computing device.
  • Computer 100 contains cache 110 and memory 140 .
  • Memory 140 is random access memory (RAM) (e.g. byte addressable) that may be volatile such as dynamic RAM (DRAM) or static RAM (SRAM), or non-volatile such as flash.
  • Cache 110 is an associative cache of data that is persisted elsewhere (not shown) such as on a local disk, network storage, or other bulk storage tier.
  • Cache 110 may be a write-through cache or a write-back cache.
  • a write-through cache accelerates repeated reads and does not accelerate writes because a write-through cache flushes every write to a backing store (e.g. disk).
  • a write-back cache accelerates repeated reads and all writes because buffering allows flushing to be deferred and thus removed from the critical path of a write.
  • Cache 110 may have an implementation-specific associativity such as fully associative, set associative, or direct mapped.
  • a fully associative cache allows data from any memory address to be stored in any available line of cache 110 , thereby reducing a need to evict other data, but such flexibility needs additional silicon, which increases manufacturing cost, increases power consumption, and increases best-case (no eviction) latency.
  • Direct mapping needs less silicon but requires data from a particular memory address to be stored only in a particular line of cache 110 . Because multiple memory addresses may be directly mapped to a same line of cache 110 , there may be increased contention for a particular line of cache 110 and thus more frequent evictions, Set associative is a compromise between fully associative and direct mapping, such that a particular memory address may be mapped to any of a subset of lines of cache 110 .
  • Cache 110 may reside in volatile or non-volatile RAM, or on a local drive such as a magnetic disk, flash drive, or hybrid drive. In some embodiments (not shown), memory 140 contains cache 110 . Cache 110 contains many data items, such as 121 - 122 . Cache 110 may store data items of mixed sizes.
  • Item 121 may have a fixed size, such as a cache line, cache word, database block, disk block, or other atomic unit of stored data. Item 121 may have a variable size, such as a binary large object (BLOB) or other object, or database temporary data such as a sort spill (overflow) from volatile memory.
  • BLOB binary large object
  • Each of items 121 - 122 is identifiable according to recordable metadata entries that may be stored in memory 140 , such as metadata entry 152 . Thus, metadata entry 152 identifies a data item (not shown).
  • Computer 100 controls which metadata entries reside in memory 140 .
  • metadata entries in memory 140 are contained within an aggregation data structure (not shown) such as an array, linked list, or hash table that resides within memory 140 .
  • Computer 100 operates cache 110 according to policy 130 , which is dynamically adjustable.
  • Policy 130 may specify a capacity (size) of cache 110 , different treatment for different categories of data items, or other operational parameters that affect the configuration or behavior of cache 110 .
  • Cache 110 has limited capacity. Thus, inserting one item into cache 110 may require eviction (removal) of another item from cache 110 .
  • item 121 is drawn with dashed lines to show that it is being evicted from cache 110 . If cache 110 is a write-back cache and item 121 is dirty (modified), then eviction includes computer 100 writing item 121 to persistent storage, such as disk.
  • computer 100 performs additional work with memory 140 during eviction.
  • computer 100 records the metadata entry for the item into memory 140 .
  • computer 100 stores metadata entry 151 into memory 140 to record that item 121 was recently evicted.
  • Metadata entry 151 is drawn with dashed lines to show that it is being inserted into memory 140 .
  • computer 100 may experience a cache miss when retrieving a data item that does not reside within cache 110 . Because item 121 was evicted in this example, item 121 no longer resides in cache 110 . Thus, a subsequent attempt to find item 121 within cache 110 will cause a cache miss. In addition to conventional handling of a cache miss, such as retrieving item 121 from durable storage and inserting item 121 into cache 110 , computer 100 also does additional processing for a cache miss.
  • computer 100 detects whether or not memory 140 contains a metadata entry for the item of the cache miss. For example during a cache miss for item 121 , computer 140 detects whether or not memory 140 contains metadata entry 151 , which is the metadata entry for item 121 .
  • metadata entries in memory 140 are aggregated within a container (not shown) that resides within memory 140 and that encapsulates functionality for detecting the presence or absence of a particular metadata entry.
  • metadata entries 151 - 152 may reside in a tree or hash table (within memory 140 ) that may look up (e.g. linearly scan or randomly access) a particular metadata entry.
  • computer 100 performs two more activities. First, computer 100 increments victim hit counter 160 . Second, computer 100 removes metadata entry 151 from memory 140 . It is that removal of a metadata entry during a cache miss that ensures that: a) an item may reside in cache 110 , orb) the item's metadata may reside in memory 140 , but both may not be simultaneously resident. Furthermore sometimes, neither may be resident.
  • Thrashing is turnover (replacement) of cache contents, which adds latency, such as for input/output (I/O). Thrashing occurs when demand for storage within cache 110 exceeds the capacity of cache 110 , assuming that cache 110 is a fully associative cache. Without full associativity, thrashing may occur even when cache 110 has spare capacity because some memory address must contend for a same line of cache 110 . Generally, a small cache may thrash more than a large cache. Thus, increasing a size of a cache may decrease thrashing. However, it may be difficult to dynamically and accurately predict how much the size of cache 110 should be increased to decrease thrashing by a desired amount. Techniques herein may accurately recognize avoidable thrashing as a trigger for optimally resizing cache 110 .
  • avoidable thrashing of cache 110 is measured by victim hit counter 160 , such that a large count indicates much avoidable thrashing.
  • computer 100 periodically resets victim hit counter 160 to zero.
  • a victim hit rate may be calculated by dividing the value of victim hit counter 160 by the length of the duration between periodic counter resets. For a given duration, a ratio of victim hit count to cache hit count will have a same value as a ratio of victim hit rate to cache hit rate. Thus, such ratios may be interchangeable. Thus, embodiments may use either a hit count ratio or a hit rate ratio. Thus, examples herein that use one such ratio may instead be implemented with the other ratio.
  • Thrashing may be somewhat alleviated by tuning the operation of cache 110 .
  • computer 100 Based on victim hit counter 160 , computer 100 adjusts policy 130 to tune the performance of cache 110 in response to detecting too little or too much avoidable thrashing. A relationship between victim hit counter 160 and a count of actual hits within cache 110 is discussed later herein.
  • no avoidable thrashing is best, such as when victim hit counter 160 is idle (zero).
  • cache 110 may be prone to avoidable thrashing.
  • computer 100 may need to prioritize caching for some data. Because cache demand may fluctuate with conditions, a need for preferential treatment of priority data may be only temporary. Thus, computer 100 may sometimes reserve cache 110 for more important data and sometimes not. For example, computer 100 may have a category of data that is sometimes excluded from cache 110 according to policy 130 .
  • system throughput may be maximized by adjusting policy 130 to allow caching of excluded data when avoidable thrashing is low.
  • adjusting policy 130 may cause the size (capacity) of cache 110 to change. For example when avoidable thrashing is too high, computer 100 may adjust policy 130 to expand cache 110 . Likewise, cache 110 may shrink when avoidable thrashing is too low. Thus, cache 110 may (more or less temporarily) surrender some of its underlying storage medium (e.g. flash) for low-value or low-priority uses when avoidable thrashing is low, and then reclaim the surrendered capacity when avoidable thrashing becomes too high.
  • computer 100 may use policy 130 and victim hit counter 160 to dynamically tune cache 110 according to changing load and conditions to maximize system throughput. With such techniques, the operation of computer 100 itself may be opportunistically accelerated.
  • FIG. 2 is a flow diagram that depicts an example process that adjusts the behavior of a cache based on a count of cache misses for items recently evicted.
  • FIG. 2 is discussed with reference to FIG. 1 .
  • this example assumes that cache 110 is already warm (filled with data items). Thus, cache 110 has no room for additional items unless other items are evicted. In some embodiments, neither the process of FIG. 2 nor the phenomenon of eviction require that cache 110 be full to capacity.
  • step 201 is responsive to eviction of a first item as caused by a cache miss for a second item.
  • a metadata entry for the first item is stored into memory.
  • computer 100 may receive a request to read item 122 when item 122 does not currently reside in the cache.
  • computer 100 evicts item 121 and stores metadata entry 151 for item 121 into memory 140 .
  • step 201 may be caused by an access request for item 122 .
  • step 202 may be caused by an access request for item 121 .
  • Item 121 is evicted in step 201 and then requested for access in step 202 .
  • request for item 121 causes a cache miss.
  • the computer detects whether or not a metadata entry for the missing item resides in memory.
  • the request for item 121 causes a cache miss, which causes computer 100 to detect whether or not metadata entry 151 for item 121 resides in memory 140 .
  • step 203 the computer reacts based on whether or not the metadata of the missing item was found in memory. Because metadata entry 151 for item 121 was recently stored in memory 140 when item 121 evicted in step 201 , during steps 202 - 203 , computer 100 does indeed find metadata entry 151 in memory 140 .
  • step 204 a victim hit counter is incremented.
  • computer 100 increments victim hit counter 160 because computer 100 found metadata entry 151 in memory 140 .
  • An arbitrary delay may separate steps 204 - 205 .
  • step 204 may be caused by an access request for item 121 in step 202 . Whereas, when step 205 occurs and what triggers it depends on an implementation.
  • step 205 may be hard coded to occur (perhaps after detecting additional conditions) more or less immediately after step 204 and perhaps in a same computational thread as step 204 .
  • step 205 occurs periodically, such as with an interval timer, and perhaps by a demon process or thread.
  • a policy of the cache is adjusted based on the victim hit counter. For example, a daemon thread may periodically awaken to inspect and/or reset (clear) victim hit counter 160 .
  • computer 100 detects whether the value of victim hit counter 160 indicates that a current amount of avoidable thrashing is acceptable, too high, or too low.
  • characterization of thrashing as high or low depends on statistics that consider more measurements than victim hit counter 160 alone. For example, techniques based on a thrashing ratio are discussed later herein.
  • step 205 may complete without adjusting policy 130 .
  • cache 110 may be too big (have too much spare capacity), in which case some of the memory or flash that implements cache 110 may be temporarily reallocated to more productive use, such as storing BLOBs or sort spills.
  • computer 100 may detect that victim hit counter 160 has fallen beneath a low threshold, in which case computer 100 may reduce the size of cache 110 by adjusting policy 130 accordingly or achieve other effects by otherwise adjusting policy 130 .
  • computer 100 may detect that victim hit counter 160 exceeds a high threshold, in which case computer 100 may increase the size of cache 110 by adjusting policy 130 accordingly or achieve other effects by otherwise adjusting policy 130 .
  • computer 100 may dynamically tune cache 110 by adjusting policy 130 to best allocate resources between cache 110 and other uses.
  • FIG. 3 is a block diagram that depicts an example computer 300 , in an embodiment.
  • Computer 300 uses a first-in first-out (FIFO) to track recent evictions.
  • Computer 300 may be an implementation of computer 100 .
  • Computer 300 contains volatile memory 340 and non-volatile RAM 370 .
  • Non-volatile RAM 370 may have integrated circuitry based on technology such as flash, phase change memory, ferroelectric circuits, or other non-volatile RAM technology. Thus, computer 300 has increased reliability because dirty data (recent modifications) are preserved in non-volatile RAM 370 even if computer 300 crashes and needs rebooting. Non-volatile RAM 370 contains cache 310 that stores data items such as 321 . Computer 300 may dynamically adjust policy 330 to tune (optimize) the configuration and behavior of cache 310 .
  • Volatile memory 340 may be RAM of higher speed, higher density, higher capacity, and/or lower manufacturing cost than non-volatile RAM 370 .
  • Volatile memory 340 contains first-in first-out (FIFO) 380 , which computer 300 operates as a queue of metadata entries, such as 351 - 354 .
  • FIFO first-in first-out
  • FIFO 380 may reside in inexpensive bulk RAM, such as DRAM, FIFO 380 may be huge.
  • FIFO 380 may have a physical capacity of more than two gigabytes. According to simulation, FIFO sizes ranging from two to five gigabytes are not too big to deliver substantial benefit for generating victim statistics. Indeed, FIFO 380 works well when sized to store hundreds of thousands of metadata entries. However when FIFO 380 is sized above some immense (e.g. 5 gigabyte) threshold, marginal benefit diminishes, and performance of cache 310 may cease to proportionally increase.
  • some immense (e.g. 5 gigabyte) threshold marginal benefit diminishes, and performance of cache 310 may cease to proportionally increase.
  • FIFO 380 stores a sequence of metadata entries that extends from head 391 to tail 392 .
  • computer 300 operates FIFO 380 as follows. If FIFO 380 is already filled to capacity with metadata, then computer 300 removes from FIFO 380 whichever metadata entry occupies head 391 . In this example when item 321 is evicted, computer 300 removes metadata entry 354 from FIFO 380 .
  • Removal of a metadata entry at head 391 causes head 391 to be retargeted to a next metadata entry in FIFO 380 .
  • removal of metadata entry 354 causes head 391 to be retargeted (not shown) to the next metadata entry, which is 353 .
  • Removal of a metadata entry at head 391 causes a vacancy within FIFO 380 .
  • Computer 300 uses that available capacity by appending metadata entry 351 (of item 321 ) onto the end of FIFO 380 . That causes tail 392 to be retargeted to metadata entry 351 as shown.
  • metadata entries of the most recently evicted items are appended to tail 392
  • metadata entries of the least recently evicted items are removed from head 391 .
  • FIFO 380 has a bounded (fixed) capacity dedicated to metadata entries of the most recently evicted items.
  • Computer 300 increments victim hit counter 360 when FIFO 380 contains a metadata entry of an item that experiences a cache miss.
  • metadata entry 351 of item 321 was appended to FIFO 380 .
  • FIFO 380 contains metadata entry 351 , which causes computer 300 to increment victim hit counter 360 .
  • FIFO 380 should be searchable, such that computer 300 can efficiently detect the presence of a given metadata entry.
  • FIFO 380 may be content addressable for metadata lookup in constant time.
  • An alternative search mechanism such as brute-force linear scanning of FIFO 380 may or may not be fast enough.
  • FIFO 380 Per conventional caching when item 321 causes a cache miss, computer 300 loads item 321 into cache 310 , regardless of whether or not FIFO 380 contains metadata entry 351 . However if FIFO 380 does contain metadata entry 351 , then computer 300 removes metadata entry 351 from FIFO 380 . Although not shown as such, metadata entry 351 may be in the middle of FIFO 380 when removed. Thus, FIFO 380 should support removal of any metadata entry, regardless of position within FIFO 380 . Discussed later herein is FIG. 5 that can be used to implement a flexible FIFO.
  • FIG. 4 is a block diagram that depicts an example computer 400 , in an embodiment.
  • Computer 400 uses thrashing thresholds to maintain equilibrium.
  • Computer 400 may be an embodiment of computer 100 .
  • Computer 400 contains cache 410 and thrashing thresholds 431 - 432 .
  • Computer 400 has a FIFO (not shown) for storing metadata entries of items that were recently evicted from cache 410 .
  • Computer 400 uses hit counters 461 - 462 to record statistics about the FIFO and cache 410 . If a request to access an item can be fulfilled from cache 410 , then computer 400 increments cache hit counter 462 . Otherwise there is a cache miss, and computer 400 increments victim hit counter 461 if the item's metadata entry resides in the FIFO.
  • computer 400 uses hit counters 461 - 462 to detect cache effectiveness (cache hit counter 462 ) and the potential effectiveness of increasing cache capacity (victim hit counter 461 ). Using those counters, computer 400 detects a level of avoidable thrashing of cache 410 by dividing the value of victim hit counter 461 by the value of cache hit counter 462 to calculate ratio 440 .
  • ratio 440 may fluctuate with a quantity and quality of request workload.
  • computer 400 may reconfigure cache 410 to opportunistically maximize throughput.
  • Cache 410 may have a small working set of items that are frequently hit.
  • cache hit counter 462 may be high, and ratio 440 may be low. If the value of ratio 440 drops below low threshold 432 , then avoidable thrashing is low and cache 410 may be bigger than needed.
  • computer 400 may respond to ratio 440 falling beneath low threshold 432 by shrinking (reducing the capacity of) cache 410 .
  • Shrinking cache 410 may increase system throughput by freeing up memory or flash for other purposes.
  • cache 410 may be too small to avoid thrashing. The smaller is cache 410 , the more prone it will be to evict items. If evicted items are frequently accessed, then victim hit counter 461 and ratio 440 may be high, which indicates avoidable thrashing. Thus if ratio 440 exceeds high threshold 431 , then cache 410 avoidably thrashes too much. Computer 400 may respond to ratio 440 exceeding high threshold 431 by growing (increasing the capacity of) cache 410 . By reactively keeping ratio 440 between thresholds 431 - 432 , computer 400 may dynamically resize cache 410 based on actual workload. Thus, computer 400 maintains homeostasis for the performance/efficiency of cache 410 , despite fluctuating workload characteristics.
  • a thrashing ratio may be used to characterize thrashing as excessive or acceptable.
  • average input/output (I/O) latency may also or instead be used to characterize thrashing. I/O latency is relevant because even when there is a high cache hit rate (e.g. >99%), each cache miss may be very costly if an involved hard disk is overwhelmed with activity and incurs latency much higher than that of a flash drive. Thus, I/O latency may be used as a proxy or estimation factor of thrashing with which to characterize thrashing as acceptable or excessive. I/O latency may or may not actually measure thrashing.
  • I/O latency more or less accurately measures the cost of a cache miss, which may be as relevant or more relevant than other predictive performance measurements discussed herein. Thus, I/O latency may be important to measure and integrate into thrashing detection instrumentation. Average I/O latency is calculated as follows. (average hard disk latency in the last time period)*(number of cache misses in last time period)+(average flash drive latency in the last time period)*(number of cache hits in last time period)
  • the above formula calculates average actual I/O latency as observed without considering victim cache metrics. Actual (recent) I/O latency has some predictive utility for characterizing thrashing. Ideally, potential (future) I/O latency is also considered when characterizing thrashing. Potential I/O latency may be calculated by integrating observed I/O latency and victim cache metrics according to the following formula. (average hard disk latency in last time period)*(number of cache misses in last time period ⁇ additional potential hits according to victim cache)+(average flash disk latency in last time period)*(number of cache hits in last time period+additional potential hits according to victim cache)
  • potential I/O latency may be more or less predictive. Whereas, actual I/O latency is retrospective.
  • Actual I/O latency may be compared to potential I/O latency to characterize thrashing.
  • thrashing may be characterized as excessive, even if the thrashing ratio does not indicate excessive thrashing.
  • thrashing may be characterized as excessive when either a thrashing ratio or an I/O latency ratio exceeds a respective threshold.
  • cache 410 may be accordingly tuned based on the performance of cache 410 and/or the performance of its backing store (hard disk).
  • cache 410 may store different categories of items.
  • Cache 410 may have a partition for storing items of each category, such as classes 421 - 422 .
  • Each category may have a different priority and/or need a different quality of service (QoS).
  • first class 421 may be online transaction processing (OLTP) data items, which are high priority.
  • second class 422 may be low priority data items.
  • second class 422 may store items such as binary large objects (BLOBs) or temporary data such as a sort spill (overflow) from volatile memory.
  • BLOBs binary large objects
  • first class 421 may be so important that second class 422 should be cached only when first class 421 would not be impacted by avoidable thrashing.
  • the cache adjustment logic of computer 400 may be dedicated to protecting the quality of service for OLTP items in first class 421 .
  • computer 400 may use the metadata FIFO and hit counters 461 - 462 to track only performance of first class 421 within cache 410 .
  • hit counters 461 - 462 are incremented only when an OLTP (first class) data item is involved. For example, a cache hit would increment cache hit counter 462 only if the item resides in first class 421 and not second class 422 .
  • computer 400 may dynamically resize the cache partitions for classes 421 - 422 based on actual workload. For example when ratio 440 falls below low threshold 432 , first class partition 421 is bigger than actually needed, and computer 400 may resize the cache partitions of classes 421 - 422 by shifting some idle capacity from first class 421 to second class 422 . Thus, second class 422 may opportunistically expand to accommodate more low priority items when first class 421 is underutilized. Whereas if cache 410 begins to avoidably thrash, and ratio 440 exceeds high threshold 431 , then computer 400 may restore first class 421 by shifting capacity from second class 422 back to first class 421 . In that way, the partition division between classes 421 - 422 may move back and forth to maximize system throughput despite fluctuating actual load.
  • FIG. 5 is a block diagram that depicts an example computer 500 , in an embodiment.
  • Computer 500 uses a doubly-linked list to implement a metadata FIFO.
  • Computer 500 may be an embodiment of computer 100 .
  • Computer 500 contains a metadata FIFO of metadata entries, such as 551 - 553 , having forward and backward pointers that link the elements.
  • Computer 500 may append metadata entries to the tail of the FIFO and remove metadata entries from the head of the FIFO.
  • Each metadata entry in the FIFO has a previous pointer that refers to the previous metadata entry in the FIFO.
  • metadata entry 552 has previous 542 that points to metadata entry 551 .
  • metadata entry 552 occurs in the FIFO after metadata entry 551 .
  • each metadata entry in the FIFO has a next pointer that refers to the next metadata entry in the FIFO.
  • metadata entry 551 has next 561 that points to metadata entry 552 .
  • computer 500 When computer 500 appends a metadata entry to the tail of the FIFO, computer 500 should assign a previous pointer and a next pointer. For example when metadata entry 553 is appended, computer 500 sets next 562 in metadata entry 552 to point to metadata entry 553 , and sets previous 543 in metadata entry 553 to point to metadata entry 552 . When computer 500 removes a metadata entry from the head of the FIFO, computer 500 should clear a previous pointer. For example when metadata entry 551 is removed, computer 500 clears previous 542 in metadata entry 552 so that it no longer points to metadata entry 551 (which is removed).
  • each metadata entry should contain a unique identifier of the corresponding data item.
  • the unique identifier is a compound identifier that consists of multiple fields.
  • each cacheable data item may reside at a particular location within one of many disks.
  • metadata entry 552 may correspond to a data item, such as a disk storage block or database page, that resides at a particular location within storage device 510 , which may be a mechanical disk, a flash drive, a network drive, or other durable bulk storage.
  • metadata entry 552 may contain a compound unique identifier that include storage device identifier 520 (e.g. identifier of storage device 510 ) and a location field.
  • the location field may be implementation dependent such as a physical block address 532 or logical block address 533 (that a driver for storage device 510 may translate into a physical block address).
  • the location field of the unique identifier does not identify a block, but may instead be a byte or word address or other offset into the capacity of storage device 510 , such as offset 531 .
  • location fields 531 - 533 are shown, a metadata implementation would only have one of those fields (in addition to storage device identifier 520 ).
  • metadata entry 552 comprises a 4-byte storage device identifier 520 , a 4-byte offset or address such as one of 531 - 533 , and 8-byte pointers 542 and 562 .
  • metadata entry 552 may be readily packed into as little as 24 bytes.
  • the fields of metadata entry 552 are word aligned instead of packed into 24 bytes.
  • metadata entry 552 may need 32 bytes for well-aligned storage.
  • a victim FIFO should support removal of any metadata entry, regardless of position within the FIFO.
  • a linked list is often used for a general purpose FIFO due to fast mutation (append or remove).
  • a simple linked list does not support random access. Random access can accelerate finding a particular metadata entry within the FIFO.
  • the FIFO may supplement the linked list with an associative map, such as a hash table.
  • Java has a LinkedHashMap that provides methods that support semantics of both a random-access map (e.g. containsKey) and a bounded-capacity FIFO (e.g. removeEldestEntry).
  • the techniques described herein are implemented by one or more special-purpose computing devices.
  • the special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques.
  • the special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
  • FIG. 6 is a block diagram that illustrates a computer system 600 upon which an embodiment of the invention may be implemented.
  • Computer system 600 includes a bus 602 or other communication mechanism for communicating information, and a hardware processor 604 coupled with bus 602 for processing information.
  • Hardware processor 604 may be, for example, a general purpose microprocessor.
  • Computer system 600 also includes a main memory 606 , such as a random access memory (RAM) or other dynamic storage device, coupled to bus 602 for storing information and instructions to be executed by processor 604 .
  • Main memory 606 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 604 .
  • Such instructions when stored in non-transitory storage media accessible to processor 604 , render computer system 600 into a special-purpose machine that is customized to perform the operations specified in the instructions.
  • Computer system 600 further includes a read only memory (ROM) 608 or other static storage device coupled to bus 602 for storing static information and instructions for processor 604 .
  • ROM read only memory
  • a storage device 66 such as a magnetic disk or optical disk, is provided and coupled to bus 602 for storing information and instructions.
  • Computer system 600 may be coupled via bus 602 to a display 612 , such as a cathode ray tube (CRT), for displaying information to a computer user.
  • a display 612 such as a cathode ray tube (CRT)
  • An input device 614 is coupled to bus 602 for communicating information and command selections to processor 604 .
  • cursor control 616 is Another type of user input device
  • cursor control 616 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 604 and for controlling cursor movement on display 612 .
  • This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • Computer system 600 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 600 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 600 in response to processor 604 executing one or more sequences of one or more instructions contained in main memory 606 . Such instructions may be read into main memory 606 from another storage medium, such as storage device 66 . Execution of the sequences of instructions contained in main memory 606 causes processor 604 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • Non-volatile media includes, for example, optical or magnetic disks, such as storage device 66 .
  • Volatile media includes dynamic memory, such as main memory 606 .
  • Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
  • Storage media is distinct from but may be used in conjunction with transmission media.
  • Transmission media participates in transferring information between storage media.
  • transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 602 .
  • transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
  • Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 604 for execution.
  • the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to computer system 600 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
  • An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 602 .
  • Bus 602 carries the data to main memory 606 , from which processor 604 retrieves and executes the instructions.
  • the instructions received by main memory 606 may optionally be stored on storage device 66 either before or after execution by processor 604 .
  • Computer system 600 also includes a communication interface 618 coupled to bus 602 .
  • Communication interface 618 provides a two-way data communication coupling to a network link 620 that is connected to a local network 622 .
  • communication interface 618 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • communication interface 618 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
  • LAN local area network
  • Wireless links may also be implemented.
  • communication interface 618 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • Network link 620 typically provides data communication through one or more networks to other data devices.
  • network link 620 may provide a connection through local network 622 to a host computer 624 or to data equipment operated by an Internet Service Provider (ISP) 626 .
  • ISP 626 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 628 .
  • Internet 628 uses electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on network link 620 and through communication interface 618 which carry the digital data to and from computer system 600 , are example forms of transmission media.
  • Computer system 600 can send messages and receive data, including program code, through the network(s), network link 620 and communication interface 618 .
  • a server 630 might transmit a requested code for an application program through Internet 628 , ISP 626 , local network 622 and communication interface 618 .
  • the received code may be executed by processor 604 as it is received, and/or stored in storage device 66 , or other non-volatile storage for later execution.
  • FIG. 7 is a block diagram of a basic software system 700 that may be employed for controlling the operation of computing system 600 .
  • Software system 700 and its components, including their connections, relationships, and functions, is meant to be exemplary only, and not meant to limit implementations of the example embodiment(s).
  • Other software systems suitable for implementing the example embodiment(s) may have different components, including components with different connections, relationships, and functions.
  • Software system 700 is provided for directing the operation of computing system 600 .
  • Software system 700 which may be stored in system memory (RAM) 606 and on fixed storage (e.g., hard disk or flash memory) 66 , includes a kernel or operating system (OS) 710 .
  • OS operating system
  • the OS 710 manages low-level aspects of computer operation, including managing execution of processes, memory allocation, file input and output (I/O), and device I/O.
  • One or more application programs represented as 702 A, 702 B, 702 C . . . 702 N, may be “loaded” (e.g., transferred from fixed storage 66 into memory 606 ) for execution by the system 700 .
  • the applications or other software intended for use on computer system 600 may also be stored as a set of downloadable computer-executable instructions, for example, for downloading and installation from an Internet location (e.g., a Web server, an app store, or other online service).
  • Software system 700 includes a graphical user interface (GUI) 715 , for receiving user commands and data in a graphical (e.g., “point-and-click” or “touch gesture”) fashion. These inputs, in turn, may be acted upon by the system 700 in accordance with instructions from operating system 710 and/or application(s) 702 .
  • the GUI 715 also serves to display the results of operation from the OS 710 and application(s) 702 , whereupon the user may supply additional inputs or terminate the session (e.g., log off).
  • OS 710 can execute directly on the bare hardware 720 (e.g., processor(s) 104 ) of computer system 600 .
  • a hypervisor or virtual machine monitor (VMM) 730 may be interposed between the bare hardware 720 and the OS 710 .
  • VMM 730 acts as a software “cushion” or virtualization layer between the OS 710 and the bare hardware 720 of the computer system 600 .
  • VMM 730 instantiates and runs one or more virtual machine instances (“guest machines”). Each guest machine comprises a “guest” operating system, such as OS 710 , and one or more applications, such as application(s) 702 , designed to execute on the guest operating system.
  • the VMM 730 presents the guest operating systems with a virtual operating platform and manages the execution of the guest operating systems.
  • the VMM 730 may allow a guest operating system to run as if it is running on the bare hardware 720 of computer system 600 directly. In these instances, the same version of the guest operating system configured to execute on the bare hardware 720 directly may also execute on VMM 730 without modification or reconfiguration. In other words, VMM 730 may provide full hardware and CPU virtualization to a guest operating system in some instances.
  • a guest operating system may be specially designed or configured to execute on VMM 730 for efficiency.
  • the guest operating system is “aware” that it executes on a virtual machine monitor.
  • VMM 730 may provide para-virtualization to a guest operating system in some instances.
  • a computer system process comprises an allotment of hardware processor time, and an allotment of memory (physical and/or virtual), the allotment of memory being for storing instructions executed by the hardware processor, for storing data generated by the hardware processor executing the instructions, and/or for storing the hardware processor state (e.g. content of registers) between allotments of the hardware processor time when the computer system process is not running.
  • Computer system processes run under the control of an operating system, and may run under the control of other programs being executed on the computer system.
  • cloud computing is generally used herein to describe a computing model which enables on-demand access to a shared pool of computing resources, such as computer networks, servers, software applications, and services, and which allows for rapid provisioning and release of resources with minimal management effort or service provider interaction.
  • a cloud computing environment (sometimes referred to as a cloud environment, or a cloud) can be implemented in a variety of different ways to best suit different requirements.
  • a cloud environment in a public cloud environment, the underlying computing infrastructure is owned by an organization that makes its cloud services available to other organizations or to the general public.
  • a private cloud environment is generally intended solely for use by, or within, a single organization.
  • a community cloud is intended to be shared by several organizations within a community; while a hybrid cloud comprise two or more types of cloud (e.g., private, community, or public) that are bound together by data and application portability.
  • a cloud computing model enables some of those responsibilities which previously may have been provided by an organization's own information technology department, to instead be delivered as service layers within a cloud environment, for use by consumers (either within or external to the organization, according to the cloud's public/private nature).
  • the precise definition of components or features provided by or within each cloud service layer can vary, but common examples include: Software as a Service (SaaS), in which consumers use software applications that are running upon a cloud infrastructure, while a SaaS provider manages or controls the underlying cloud infrastructure and applications.
  • SaaS Software as a Service
  • PaaS Platform as a Service
  • PaaS Platform as a Service
  • PaaS provider manages or controls other aspects of the cloud environment (i.e., everything below the run-time execution environment).
  • Infrastructure as a Service IaaS
  • IaaS Infrastructure as a Service
  • IaaS in which consumers can deploy and run arbitrary software applications, and/or provision processing, storage, networks, and other fundamental computing resources, while an IaaS provider manages or controls the underlying physical cloud infrastructure (i.e., everything below the operating system layer).
  • Database as a Service (DBaaS) in which consumers use a database server or Database Management System that is running upon a cloud infrastructure, while a DbaaS provider manages or controls the underlying cloud infrastructure and applications.
  • DBaaS Database as a Service
  • the example embodiment(s), however, are not necessarily limited to any particular computing environment or computing device configuration. Instead, the example embodiment(s) may be implemented in any type of system architecture or processing environment that one skilled in the art, in light of this disclosure, would understand as capable of supporting the features and functions of the example embodiment(s) presented herein.

Abstract

Techniques are provided to adjust the behavior of a cache based on a count of cache misses for items recently evicted. In an embodiment, a computer responds to evicting a particular item (PI) from a cache by storing a metadata entry for the PI into memory. In response to a cache miss for the PI, the computer detects whether or not the metadata entry for the PI resides in memory. When the metadata entry for the PI is detected in memory, the computer increments a victim hit counter (VHC) that may be used to calculate how much avoidable thrashing is the cache experiencing, which is how much thrashing would be reduced if the cache were expanded. Either immediately or arbitrarily later, the computer adjusts a policy of the cache based on the VHC's value. For example, the computer may adjust the capacity of the cache based on the VHC.

Description

BENEFIT CLAIM
This application claims the benefit of Provisional Appln. 62/418,005, filed Nov. 14, 2016, the entire contents of which is hereby incorporated by reference as if fully set forth herein, under 35 U.S.C. § 119(e).
FIELD OF THE DISCLOSURE
This disclosure relates to cache control. Presented herein are techniques that opportunistically maximizes throughput by adjusting the behavior of a cache based on statistics such as a count of cache misses for items recently evicted, a count of cache hits, and/or similar metrics.
BACKGROUND
A computer system typically has a natural hierarchy of storage tiers. For example, a computer may have volatile dynamic random access memory (DRAM) for fast access to currently active data, a non-volatile flash drive for moderate-speed access to recently used data that may be needed again soon, and a mechanical disk or network storage for slow access to bulk durable storage. Because the storage tiers have different latencies and capacities, some data items may be replicated or moved between various storage tiers to ensure that data is dynamically distributed to storage tiers according to actual demand.
Random access tiers, such as volatile and non-volatile memory, may contain a cache that optimizes data residency based on a policy of the cache. For example, a least recently used (LRU) cache may evict data not recently used to make room for recently used data not already cached. Caching policies decide which data items are cacheable and how many data items are cached.
However the performance of cache policies, no matter how well designed, may not be optimal. For example, the quantity and quality of data access requests may fluctuate, such that at times a cache may be under- or over-utilized.
For example if a caching policy does not match a current workload, then cache thrashing may result. For example, a stressed cache may repeatedly evict and reload a same data item that would perform better if not evicted.
Likewise, an idle cache may consume memory that would increase throughput if instead used for other purposes. However although cache performance more or less depends on the nature of the workload, cache sizes typically do not dynamically adjust to suit fluctuating workload.
Thus, workload related performance degradations may arise. For example, a database table scan may trample the current cache contents as the scan passes over copious data that far exceeds the capacity of the cache. The scanned data may be read once during the scan and then not accessed again from the cache once read into cache.
Thus a scan may not only evict frequently used data, but also further aggravates the problem by filling the cache with data that is unlikely to be accessed again while in the cache. Thus a cache whose configuration and behavior is immutable will likely suffer degraded performance under various realistic loads.
Degraded performance, such as thrashing, costs extra time (latency) to load needed data. Furthermore if the cache is non-volatile, then thrashing may cause additional wear that may reduce the life of the cache's physical medium. Thus, cache thrashing may pose a capital cost and a reliability hazard, in addition to costing latency and energy.
BRIEF DESCRIPTION OF THE DRAWINGS
In the drawings:
FIG. 1 is a block diagram that depicts an example computer that adjusts the behavior of a cache based on a count of cache misses for items recently evicted, in an embodiment;
FIG. 2 is a flow diagram that depicts an example process that adjusts the behavior of a cache based on a count of cache misses for items recently evicted, in an embodiment;
FIG. 3 is a block diagram that depicts an example computer that uses a first-in first-out (FIFO) to track recent evictions, in an embodiment;
FIG. 4 is a block diagram that depicts an example computer that uses thrashing thresholds to maintain equilibrium, in an embodiment;
FIG. 5 is a block diagram that depicts an example computer that operates a doubly-linked list as a metadata FIFO, in an embodiment;
FIG. 6 is a block diagram that illustrates an example computer system upon which an embodiment of the invention may be implemented.
FIG. 7 is a block diagram that illustrates an example software system that may be employed for controlling the operation of a computing system.
DETAILED DESCRIPTION
In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
Embodiments are described herein according to the following outline:
    • 1.0 General Overview
    • 2.0 Example Computer
      • 2.1 Cache
      • 2.2 Data Item
      • 2.3 Cache Policy
      • 2.4 Eviction
      • 2.5 Metadata
      • 2.6 Cache Miss
      • 2.7 Thrashing
      • 2.8 Victim Hit Counter
      • 2.9 Policy Tuning
    • 3.0 Thrash Detection Process
      • 3.1 Metadata Storage
      • 3.2 Metadata Detection
      • 3.3 Statistics
      • 3.4 Tuning
    • 4.0 Metadata First-In First-Out (FIFO)
      • 4.1 Tiered Memory
      • 4.2 FIFO Data Flow
      • 4.3 FIFO Inspection
      • 4.4 Metadata Removal
    • 5.0 Equilibrium Thresholds
      • 5.1 Thrashing Ratio
      • 5.2 I/O Latency
      • 5.3 Data Categories
      • 5.4 Cache Partitions
    • 6.0 Data Structures
      • 6.1 Linked List
      • 6.2 Metadata Details
      • 6.3 Random Access
    • 7.0 Hardware Overview
    • 8.0 Software Overview
    • 9.0 Cloud Computing
      1.0 General Overview
Techniques are provided to adjust the behavior of a cache based on a count of cache misses for items recently evicted. In an embodiment, a computer responds to evicting a particular item from a cache by storing in an entry in metadata identifying the particular item. Metadata is stored in low latency random access memory (RAM). If the particular item is not subsequently requested, the metadata entry for the particular item may be removed under some conditions. In response to a cache miss for the particular item, the computer detects whether or not an entry in metadata identifies that particular item. When a metadata entry for the particular item is detected, the computer increments a victim hit counter. An increment represents a cache hit that could have occurred had the particular item been retained in cache (e.g. if the cache were somewhat larger). The victim hit counter may be used to calculate how much avoidable thrashing is the cache experiencing, which represents how much thrashing could be reduced if the cache were expanded. Either immediately or arbitrarily later, the computer adjusts a policy of the cache based on the value of the victim hit counter. For example, the computer may increase or decrease the capacity of the cache based on the victim hit counter. Particular resizing scenarios are discussed in a section titled “Policy Tuning” below.
In some embodiments, metadata entries are temporarily retained in a first-in first-out (FIFO). In some embodiments, FIFO performance is accelerated with a data structure such as a linked list.
In some embodiments, the cache is primarily dedicated to online transaction processing (OLTP) data. The cache may be dynamically tuned to opportunistically store lower priority data, such as binary large objects (BLOBs) or temporary sort data spills, when thrashing is low.
2.0 Example Computer
FIG. 1 is a block diagram that depicts an example computer 100, in an embodiment. Computer 100 adjusts the behavior of a cache based on a count of cache misses for items recently evicted. Computer 100 may be a rack server such as a blade, a personal computer, a mainframe, a network appliance, a virtual machine, a smartphone, or other computing device. Computer 100 contains cache 110 and memory 140.
Memory 140 is random access memory (RAM) (e.g. byte addressable) that may be volatile such as dynamic RAM (DRAM) or static RAM (SRAM), or non-volatile such as flash. Cache 110 is an associative cache of data that is persisted elsewhere (not shown) such as on a local disk, network storage, or other bulk storage tier.
2.1 Cache
Cache 110 may be a write-through cache or a write-back cache. A write-through cache accelerates repeated reads and does not accelerate writes because a write-through cache flushes every write to a backing store (e.g. disk). A write-back cache accelerates repeated reads and all writes because buffering allows flushing to be deferred and thus removed from the critical path of a write. Cache 110 may have an implementation-specific associativity such as fully associative, set associative, or direct mapped. A fully associative cache allows data from any memory address to be stored in any available line of cache 110, thereby reducing a need to evict other data, but such flexibility needs additional silicon, which increases manufacturing cost, increases power consumption, and increases best-case (no eviction) latency. Direct mapping needs less silicon but requires data from a particular memory address to be stored only in a particular line of cache 110. Because multiple memory addresses may be directly mapped to a same line of cache 110, there may be increased contention for a particular line of cache 110 and thus more frequent evictions, Set associative is a compromise between fully associative and direct mapping, such that a particular memory address may be mapped to any of a subset of lines of cache 110. Cache 110 may reside in volatile or non-volatile RAM, or on a local drive such as a magnetic disk, flash drive, or hybrid drive. In some embodiments (not shown), memory 140 contains cache 110. Cache 110 contains many data items, such as 121-122. Cache 110 may store data items of mixed sizes.
2.2 Data Item
Item 121 may have a fixed size, such as a cache line, cache word, database block, disk block, or other atomic unit of stored data. Item 121 may have a variable size, such as a binary large object (BLOB) or other object, or database temporary data such as a sort spill (overflow) from volatile memory. Each of items 121-122 is identifiable according to recordable metadata entries that may be stored in memory 140, such as metadata entry 152. Thus, metadata entry 152 identifies a data item (not shown).
Computer 100 controls which metadata entries reside in memory 140. In some embodiments, metadata entries in memory 140 are contained within an aggregation data structure (not shown) such as an array, linked list, or hash table that resides within memory 140.
2.3 Cache Policy
Computer 100 operates cache 110 according to policy 130, which is dynamically adjustable. Policy 130 may specify a capacity (size) of cache 110, different treatment for different categories of data items, or other operational parameters that affect the configuration or behavior of cache 110.
2.4 Eviction
Cache 110 has limited capacity. Thus, inserting one item into cache 110 may require eviction (removal) of another item from cache 110. In this example, item 121 is drawn with dashed lines to show that it is being evicted from cache 110. If cache 110 is a write-back cache and item 121 is dirty (modified), then eviction includes computer 100 writing item 121 to persistent storage, such as disk.
2.5 Metadata
However regardless of whether cache 110 is write-back or write-through, computer 100 performs additional work with memory 140 during eviction. When computer 100 evicts an item from cache 110, computer 100 records the metadata entry for the item into memory 140. For example when computer 100 evicts item 121, computer 100 stores metadata entry 151 into memory 140 to record that item 121 was recently evicted. Metadata entry 151 is drawn with dashed lines to show that it is being inserted into memory 140.
2.6 Cache Miss
In operation, computer 100 may experience a cache miss when retrieving a data item that does not reside within cache 110. Because item 121 was evicted in this example, item 121 no longer resides in cache 110. Thus, a subsequent attempt to find item 121 within cache 110 will cause a cache miss. In addition to conventional handling of a cache miss, such as retrieving item 121 from durable storage and inserting item 121 into cache 110, computer 100 also does additional processing for a cache miss.
During a cache miss, computer 100 detects whether or not memory 140 contains a metadata entry for the item of the cache miss. For example during a cache miss for item 121, computer 140 detects whether or not memory 140 contains metadata entry 151, which is the metadata entry for item 121.
In some embodiments, metadata entries in memory 140 are aggregated within a container (not shown) that resides within memory 140 and that encapsulates functionality for detecting the presence or absence of a particular metadata entry. For example, metadata entries 151-152 may reside in a tree or hash table (within memory 140) that may look up (e.g. linearly scan or randomly access) a particular metadata entry.
If memory 140 contains metadata entry 151, then computer 100 performs two more activities. First, computer 100 increments victim hit counter 160. Second, computer 100 removes metadata entry 151 from memory 140. It is that removal of a metadata entry during a cache miss that ensures that: a) an item may reside in cache 110, orb) the item's metadata may reside in memory 140, but both may not be simultaneously resident. Furthermore sometimes, neither may be resident.
2.7 Thrashing
Thrashing is turnover (replacement) of cache contents, which adds latency, such as for input/output (I/O). Thrashing occurs when demand for storage within cache 110 exceeds the capacity of cache 110, assuming that cache 110 is a fully associative cache. Without full associativity, thrashing may occur even when cache 110 has spare capacity because some memory address must contend for a same line of cache 110. Generally, a small cache may thrash more than a large cache. Thus, increasing a size of a cache may decrease thrashing. However, it may be difficult to dynamically and accurately predict how much the size of cache 110 should be increased to decrease thrashing by a desired amount. Techniques herein may accurately recognize avoidable thrashing as a trigger for optimally resizing cache 110.
2.8 Victim Hit Counter
According to an embodiment, avoidable thrashing of cache 110 is measured by victim hit counter 160, such that a large count indicates much avoidable thrashing. In some embodiments, computer 100 periodically resets victim hit counter 160 to zero.
In some embodiments, a victim hit rate may be calculated by dividing the value of victim hit counter 160 by the length of the duration between periodic counter resets. For a given duration, a ratio of victim hit count to cache hit count will have a same value as a ratio of victim hit rate to cache hit rate. Thus, such ratios may be interchangeable. Thus, embodiments may use either a hit count ratio or a hit rate ratio. Thus, examples herein that use one such ratio may instead be implemented with the other ratio.
Thrashing may be somewhat alleviated by tuning the operation of cache 110. Based on victim hit counter 160, computer 100 adjusts policy 130 to tune the performance of cache 110 in response to detecting too little or too much avoidable thrashing. A relationship between victim hit counter 160 and a count of actual hits within cache 110 is discussed later herein.
2.9 Policy Tuning
Ideally, no avoidable thrashing is best, such as when victim hit counter 160 is idle (zero). In less than ideal conditions, cache 110 may be prone to avoidable thrashing. Because some processing is more important than other processing, computer 100 may need to prioritize caching for some data. Because cache demand may fluctuate with conditions, a need for preferential treatment of priority data may be only temporary. Thus, computer 100 may sometimes reserve cache 110 for more important data and sometimes not. For example, computer 100 may have a category of data that is sometimes excluded from cache 110 according to policy 130.
In some cases, system throughput may be maximized by adjusting policy 130 to allow caching of excluded data when avoidable thrashing is low. Likewise, adjusting policy 130 may cause the size (capacity) of cache 110 to change. For example when avoidable thrashing is too high, computer 100 may adjust policy 130 to expand cache 110. Likewise, cache 110 may shrink when avoidable thrashing is too low. Thus, cache 110 may (more or less temporarily) surrender some of its underlying storage medium (e.g. flash) for low-value or low-priority uses when avoidable thrashing is low, and then reclaim the surrendered capacity when avoidable thrashing becomes too high. Thus, computer 100 may use policy 130 and victim hit counter 160 to dynamically tune cache 110 according to changing load and conditions to maximize system throughput. With such techniques, the operation of computer 100 itself may be opportunistically accelerated.
3.0 Thrash Detection Process
FIG. 2 is a flow diagram that depicts an example process that adjusts the behavior of a cache based on a count of cache misses for items recently evicted. FIG. 2 is discussed with reference to FIG. 1.
3.1 Metadata Storage
For demonstrative purposes, this example assumes that cache 110 is already warm (filled with data items). Thus, cache 110 has no room for additional items unless other items are evicted. In some embodiments, neither the process of FIG. 2 nor the phenomenon of eviction require that cache 110 be full to capacity.
Although a cache miss is not shown for step 201, step 201 is responsive to eviction of a first item as caused by a cache miss for a second item. In step 201, a metadata entry for the first item is stored into memory. For example, computer 100 may receive a request to read item 122 when item 122 does not currently reside in the cache. To make room in cache 110 for item 122, computer 100 evicts item 121 and stores metadata entry 151 for item 121 into memory 140.
3.2 Metadata Detection
An arbitrary delay may separate steps 201-202. For example, step 201 may be caused by an access request for item 122. Whereas, step 202 may be caused by an access request for item 121. Item 121 is evicted in step 201 and then requested for access in step 202. However, because item 121 no longer resides in cache 110, request for item 121 causes a cache miss. In response to the cache miss, the computer detects whether or not a metadata entry for the missing item resides in memory. For example, the request for item 121 causes a cache miss, which causes computer 100 to detect whether or not metadata entry 151 for item 121 resides in memory 140.
In step 203, the computer reacts based on whether or not the metadata of the missing item was found in memory. Because metadata entry 151 for item 121 was recently stored in memory 140 when item 121 evicted in step 201, during steps 202-203, computer 100 does indeed find metadata entry 151 in memory 140.
3.3 Statistics
Having found metadata entry 151 in memory 140, computer 100 proceeds to step 204 from step 203. In step 204, a victim hit counter is incremented. For example, computer 100 increments victim hit counter 160 because computer 100 found metadata entry 151 in memory 140. An arbitrary delay may separate steps 204-205. For example, step 204 may be caused by an access request for item 121 in step 202. Whereas, when step 205 occurs and what triggers it depends on an implementation.
3.4 Tuning
In some embodiments, step 205 may be hard coded to occur (perhaps after detecting additional conditions) more or less immediately after step 204 and perhaps in a same computational thread as step 204. In some embodiments, step 205 occurs periodically, such as with an interval timer, and perhaps by a demon process or thread. In step 205, a policy of the cache is adjusted based on the victim hit counter. For example, a daemon thread may periodically awaken to inspect and/or reset (clear) victim hit counter 160. In step 205, computer 100 detects whether the value of victim hit counter 160 indicates that a current amount of avoidable thrashing is acceptable, too high, or too low. In preferred embodiments, characterization of thrashing as high or low depends on statistics that consider more measurements than victim hit counter 160 alone. For example, techniques based on a thrashing ratio are discussed later herein.
If the value of victim hit counter 160 indicates an acceptable amount of avoidable thrashing, then step 205 may complete without adjusting policy 130. However if the value of victim hit counter 160 indicates little or no avoidable thrashing, then cache 110 may be too big (have too much spare capacity), in which case some of the memory or flash that implements cache 110 may be temporarily reallocated to more productive use, such as storing BLOBs or sort spills. For example, computer 100 may detect that victim hit counter 160 has fallen beneath a low threshold, in which case computer 100 may reduce the size of cache 110 by adjusting policy 130 accordingly or achieve other effects by otherwise adjusting policy 130. Likewise, computer 100 may detect that victim hit counter 160 exceeds a high threshold, in which case computer 100 may increase the size of cache 110 by adjusting policy 130 accordingly or achieve other effects by otherwise adjusting policy 130. Thus in step 205, computer 100 may dynamically tune cache 110 by adjusting policy 130 to best allocate resources between cache 110 and other uses.
4.0 Metadata First-in First-Out (FIFO)
FIG. 3 is a block diagram that depicts an example computer 300, in an embodiment. Computer 300 uses a first-in first-out (FIFO) to track recent evictions. Computer 300 may be an implementation of computer 100. Computer 300 contains volatile memory 340 and non-volatile RAM 370.
4.1 Tiered Memory
Non-volatile RAM 370 may have integrated circuitry based on technology such as flash, phase change memory, ferroelectric circuits, or other non-volatile RAM technology. Thus, computer 300 has increased reliability because dirty data (recent modifications) are preserved in non-volatile RAM 370 even if computer 300 crashes and needs rebooting. Non-volatile RAM 370 contains cache 310 that stores data items such as 321. Computer 300 may dynamically adjust policy 330 to tune (optimize) the configuration and behavior of cache 310.
Volatile memory 340 may be RAM of higher speed, higher density, higher capacity, and/or lower manufacturing cost than non-volatile RAM 370. Volatile memory 340 contains first-in first-out (FIFO) 380, which computer 300 operates as a queue of metadata entries, such as 351-354.
Because FIFO 380 may reside in inexpensive bulk RAM, such as DRAM, FIFO 380 may be huge. For example, FIFO 380 may have a physical capacity of more than two gigabytes. According to simulation, FIFO sizes ranging from two to five gigabytes are not too big to deliver substantial benefit for generating victim statistics. Indeed, FIFO 380 works well when sized to store hundreds of thousands of metadata entries. However when FIFO 380 is sized above some immense (e.g. 5 gigabyte) threshold, marginal benefit diminishes, and performance of cache 310 may cease to proportionally increase.
4.2 FIFO Data Flow
FIFO 380 stores a sequence of metadata entries that extends from head 391 to tail 392. When item 321 is evicted from cache 310, computer 300 operates FIFO 380 as follows. If FIFO 380 is already filled to capacity with metadata, then computer 300 removes from FIFO 380 whichever metadata entry occupies head 391. In this example when item 321 is evicted, computer 300 removes metadata entry 354 from FIFO 380.
Removal of a metadata entry at head 391 causes head 391 to be retargeted to a next metadata entry in FIFO 380. In this example, removal of metadata entry 354 causes head 391 to be retargeted (not shown) to the next metadata entry, which is 353. Removal of a metadata entry at head 391 causes a vacancy within FIFO 380. Computer 300 uses that available capacity by appending metadata entry 351 (of item 321) onto the end of FIFO 380. That causes tail 392 to be retargeted to metadata entry 351 as shown. Thus, metadata entries of the most recently evicted items are appended to tail 392, and metadata entries of the least recently evicted items are removed from head 391. Thus, metadata entries flow into, along, and out of FIFO 380 in the direction shown by the vertical arrow labeled ‘flow’. Thus, FIFO 380 has a bounded (fixed) capacity dedicated to metadata entries of the most recently evicted items.
4.3 FIFO Inspection
Computer 300 increments victim hit counter 360 when FIFO 380 contains a metadata entry of an item that experiences a cache miss. In this example after 321 was evicted, metadata entry 351 of item 321 was appended to FIFO 380. Thus, a subsequent attempt to find item 321 within cache 310 will cause a cache miss because cache 310 no longer contains item 321. Computer 300 detects that FIFO 380 contains metadata entry 351, which causes computer 300 to increment victim hit counter 360. Thus, FIFO 380 should be searchable, such that computer 300 can efficiently detect the presence of a given metadata entry. For example, FIFO 380 may be content addressable for metadata lookup in constant time. An alternative search mechanism such as brute-force linear scanning of FIFO 380 may or may not be fast enough.
4.4 Metadata Removal
Per conventional caching when item 321 causes a cache miss, computer 300 loads item 321 into cache 310, regardless of whether or not FIFO 380 contains metadata entry 351. However if FIFO 380 does contain metadata entry 351, then computer 300 removes metadata entry 351 from FIFO 380. Although not shown as such, metadata entry 351 may be in the middle of FIFO 380 when removed. Thus, FIFO 380 should support removal of any metadata entry, regardless of position within FIFO 380. Discussed later herein is FIG. 5 that can be used to implement a flexible FIFO.
5.0 Equilibrium Thresholds
FIG. 4 is a block diagram that depicts an example computer 400, in an embodiment. Computer 400 uses thrashing thresholds to maintain equilibrium. Computer 400 may be an embodiment of computer 100. Computer 400 contains cache 410 and thrashing thresholds 431-432.
Computer 400 has a FIFO (not shown) for storing metadata entries of items that were recently evicted from cache 410. Computer 400 uses hit counters 461-462 to record statistics about the FIFO and cache 410. If a request to access an item can be fulfilled from cache 410, then computer 400 increments cache hit counter 462. Otherwise there is a cache miss, and computer 400 increments victim hit counter 461 if the item's metadata entry resides in the FIFO.
5.1 Thrashing Ratio
Thus, computer 400 uses hit counters 461-462 to detect cache effectiveness (cache hit counter 462) and the potential effectiveness of increasing cache capacity (victim hit counter 461). Using those counters, computer 400 detects a level of avoidable thrashing of cache 410 by dividing the value of victim hit counter 461 by the value of cache hit counter 462 to calculate ratio 440.
The value of ratio 440 may fluctuate with a quantity and quality of request workload. In response to fluctuation of ratio 440, computer 400 may reconfigure cache 410 to opportunistically maximize throughput. Cache 410 may have a small working set of items that are frequently hit. Thus, cache hit counter 462 may be high, and ratio 440 may be low. If the value of ratio 440 drops below low threshold 432, then avoidable thrashing is low and cache 410 may be bigger than needed. Thus, computer 400 may respond to ratio 440 falling beneath low threshold 432 by shrinking (reducing the capacity of) cache 410. Shrinking cache 410 may increase system throughput by freeing up memory or flash for other purposes. However with dynamically changing workload, cache 410 may be too small to avoid thrashing. The smaller is cache 410, the more prone it will be to evict items. If evicted items are frequently accessed, then victim hit counter 461 and ratio 440 may be high, which indicates avoidable thrashing. Thus if ratio 440 exceeds high threshold 431, then cache 410 avoidably thrashes too much. Computer 400 may respond to ratio 440 exceeding high threshold 431 by growing (increasing the capacity of) cache 410. By reactively keeping ratio 440 between thresholds 431-432, computer 400 may dynamically resize cache 410 based on actual workload. Thus, computer 400 maintains homeostasis for the performance/efficiency of cache 410, despite fluctuating workload characteristics.
5.2 I/O Latency
As explained above, a thrashing ratio may be used to characterize thrashing as excessive or acceptable. In some embodiments, average input/output (I/O) latency may also or instead be used to characterize thrashing. I/O latency is relevant because even when there is a high cache hit rate (e.g. >99%), each cache miss may be very costly if an involved hard disk is overwhelmed with activity and incurs latency much higher than that of a flash drive. Thus, I/O latency may be used as a proxy or estimation factor of thrashing with which to characterize thrashing as acceptable or excessive. I/O latency may or may not actually measure thrashing. However, I/O latency more or less accurately measures the cost of a cache miss, which may be as relevant or more relevant than other predictive performance measurements discussed herein. Thus, I/O latency may be important to measure and integrate into thrashing detection instrumentation. Average I/O latency is calculated as follows.
(average hard disk latency in the last time period)*(number of cache misses in last time period)+(average flash drive latency in the last time period)*(number of cache hits in last time period)
The above formula calculates average actual I/O latency as observed without considering victim cache metrics. Actual (recent) I/O latency has some predictive utility for characterizing thrashing. Ideally, potential (future) I/O latency is also considered when characterizing thrashing. Potential I/O latency may be calculated by integrating observed I/O latency and victim cache metrics according to the following formula.
(average hard disk latency in last time period)*(number of cache misses in last time period−additional potential hits according to victim cache)+(average flash disk latency in last time period)*(number of cache hits in last time period+additional potential hits according to victim cache)
Thus, potential I/O latency may be more or less predictive. Whereas, actual I/O latency is retrospective.
Actual I/O latency may be compared to potential I/O latency to characterize thrashing. When the ratio between the actual I/O latency and the potential I/O latency exceeds a threshold, then thrashing may be characterized as excessive, even if the thrashing ratio does not indicate excessive thrashing. Thus, thrashing may be characterized as excessive when either a thrashing ratio or an I/O latency ratio exceeds a respective threshold. Thus, cache 410 may be accordingly tuned based on the performance of cache 410 and/or the performance of its backing store (hard disk).
5.3 Data Categories
Furthermore, cache 410 may store different categories of items. Cache 410 may have a partition for storing items of each category, such as classes 421-422. Each category may have a different priority and/or need a different quality of service (QoS). For example, first class 421 may be online transaction processing (OLTP) data items, which are high priority. Whereas, second class 422 may be low priority data items. For example, second class 422 may store items such as binary large objects (BLOBs) or temporary data such as a sort spill (overflow) from volatile memory. For example, first class 421 may be so important that second class 422 should be cached only when first class 421 would not be impacted by avoidable thrashing. Thus, the cache adjustment logic of computer 400 may be dedicated to protecting the quality of service for OLTP items in first class 421. Thus, computer 400 may use the metadata FIFO and hit counters 461-462 to track only performance of first class 421 within cache 410. Thus, hit counters 461-462 are incremented only when an OLTP (first class) data item is involved. For example, a cache hit would increment cache hit counter 462 only if the item resides in first class 421 and not second class 422.
5.4 Cache Partitions
To maximize system throughput, computer 400 may dynamically resize the cache partitions for classes 421-422 based on actual workload. For example when ratio 440 falls below low threshold 432, first class partition 421 is bigger than actually needed, and computer 400 may resize the cache partitions of classes 421-422 by shifting some idle capacity from first class 421 to second class 422. Thus, second class 422 may opportunistically expand to accommodate more low priority items when first class 421 is underutilized. Whereas if cache 410 begins to avoidably thrash, and ratio 440 exceeds high threshold 431, then computer 400 may restore first class 421 by shifting capacity from second class 422 back to first class 421. In that way, the partition division between classes 421-422 may move back and forth to maximize system throughput despite fluctuating actual load.
6.0 Data Structures
FIG. 5 is a block diagram that depicts an example computer 500, in an embodiment. Computer 500 uses a doubly-linked list to implement a metadata FIFO. Computer 500 may be an embodiment of computer 100. Computer 500 contains a metadata FIFO of metadata entries, such as 551-553, having forward and backward pointers that link the elements.
6.1 Linked List
Computer 500 may append metadata entries to the tail of the FIFO and remove metadata entries from the head of the FIFO. Each metadata entry in the FIFO has a previous pointer that refers to the previous metadata entry in the FIFO. For example, metadata entry 552 has previous 542 that points to metadata entry 551. Thus, metadata entry 552 occurs in the FIFO after metadata entry 551. Likewise, each metadata entry in the FIFO has a next pointer that refers to the next metadata entry in the FIFO. For example, metadata entry 551 has next 561 that points to metadata entry 552.
When computer 500 appends a metadata entry to the tail of the FIFO, computer 500 should assign a previous pointer and a next pointer. For example when metadata entry 553 is appended, computer 500 sets next 562 in metadata entry 552 to point to metadata entry 553, and sets previous 543 in metadata entry 553 to point to metadata entry 552. When computer 500 removes a metadata entry from the head of the FIFO, computer 500 should clear a previous pointer. For example when metadata entry 551 is removed, computer 500 clears previous 542 in metadata entry 552 so that it no longer points to metadata entry 551 (which is removed).
Implementing the FIFO with a linked list enables metadata entries to be quickly added or removed and with more predictable latency. If only the head and tail of the FIFO were mutable, then computer 500 could instead use a circular queue as the FIFO. However, computer 500 removes a metadata entry from the middle of the FIFO when that metadata entry is for a data item that suffers a cache miss and is reloaded into the cache. When a data item is reloaded into cache, computer 500 should remove the associated metadata entry, even if the metadata entry resides in the middle of the FIFO. Because circular buffer performance may be slow for removing an entry from the middle, computer 500 instead uses a linked list (as shown) for the FIFO.
6.2 Metadata Details
Also shown are possible implementations of metadata content. Some metadata implementations may perform better than others. For best performance, each metadata entry should contain a unique identifier of the corresponding data item. In this example, the unique identifier is a compound identifier that consists of multiple fields. For example, each cacheable data item may reside at a particular location within one of many disks. For example, metadata entry 552 may correspond to a data item, such as a disk storage block or database page, that resides at a particular location within storage device 510, which may be a mechanical disk, a flash drive, a network drive, or other durable bulk storage. Thus, metadata entry 552 may contain a compound unique identifier that include storage device identifier 520 (e.g. identifier of storage device 510) and a location field. The location field may be implementation dependent such as a physical block address 532 or logical block address 533 (that a driver for storage device 510 may translate into a physical block address).
In some implementations, the location field of the unique identifier does not identify a block, but may instead be a byte or word address or other offset into the capacity of storage device 510, such as offset 531. Thus although location fields 531-533 are shown, a metadata implementation would only have one of those fields (in addition to storage device identifier 520). In an embodiment, metadata entry 552 comprises a 4-byte storage device identifier 520, a 4-byte offset or address such as one of 531-533, and 8- byte pointers 542 and 562. Thus, metadata entry 552 may be readily packed into as little as 24 bytes. In an embodiment, the fields of metadata entry 552 are word aligned instead of packed into 24 bytes. Thus, metadata entry 552 may need 32 bytes for well-aligned storage.
6.3 Random Access
As explained above, a victim FIFO should support removal of any metadata entry, regardless of position within the FIFO. A linked list is often used for a general purpose FIFO due to fast mutation (append or remove). However, a simple linked list does not support random access. Random access can accelerate finding a particular metadata entry within the FIFO. Thus, the FIFO may supplement the linked list with an associative map, such as a hash table. For example, Java has a LinkedHashMap that provides methods that support semantics of both a random-access map (e.g. containsKey) and a bounded-capacity FIFO (e.g. removeEldestEntry).
4.0 Hardware Overview
According to one embodiment, the techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
For example, FIG. 6 is a block diagram that illustrates a computer system 600 upon which an embodiment of the invention may be implemented. Computer system 600 includes a bus 602 or other communication mechanism for communicating information, and a hardware processor 604 coupled with bus 602 for processing information. Hardware processor 604 may be, for example, a general purpose microprocessor.
Computer system 600 also includes a main memory 606, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 602 for storing information and instructions to be executed by processor 604. Main memory 606 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 604. Such instructions, when stored in non-transitory storage media accessible to processor 604, render computer system 600 into a special-purpose machine that is customized to perform the operations specified in the instructions.
Computer system 600 further includes a read only memory (ROM) 608 or other static storage device coupled to bus 602 for storing static information and instructions for processor 604. A storage device 66, such as a magnetic disk or optical disk, is provided and coupled to bus 602 for storing information and instructions.
Computer system 600 may be coupled via bus 602 to a display 612, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 614, including alphanumeric and other keys, is coupled to bus 602 for communicating information and command selections to processor 604. Another type of user input device is cursor control 616, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 604 and for controlling cursor movement on display 612. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
Computer system 600 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 600 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 600 in response to processor 604 executing one or more sequences of one or more instructions contained in main memory 606. Such instructions may be read into main memory 606 from another storage medium, such as storage device 66. Execution of the sequences of instructions contained in main memory 606 causes processor 604 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
The term “storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operate in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 66. Volatile media includes dynamic memory, such as main memory 606. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 602. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 604 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 600 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 602. Bus 602 carries the data to main memory 606, from which processor 604 retrieves and executes the instructions. The instructions received by main memory 606 may optionally be stored on storage device 66 either before or after execution by processor 604.
Computer system 600 also includes a communication interface 618 coupled to bus 602. Communication interface 618 provides a two-way data communication coupling to a network link 620 that is connected to a local network 622. For example, communication interface 618 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 618 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 618 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Network link 620 typically provides data communication through one or more networks to other data devices. For example, network link 620 may provide a connection through local network 622 to a host computer 624 or to data equipment operated by an Internet Service Provider (ISP) 626. ISP 626 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 628. Local network 622 and Internet 628 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 620 and through communication interface 618, which carry the digital data to and from computer system 600, are example forms of transmission media.
Computer system 600 can send messages and receive data, including program code, through the network(s), network link 620 and communication interface 618. In the Internet example, a server 630 might transmit a requested code for an application program through Internet 628, ISP 626, local network 622 and communication interface 618.
The received code may be executed by processor 604 as it is received, and/or stored in storage device 66, or other non-volatile storage for later execution.
5.0 Software Overview
FIG. 7 is a block diagram of a basic software system 700 that may be employed for controlling the operation of computing system 600. Software system 700 and its components, including their connections, relationships, and functions, is meant to be exemplary only, and not meant to limit implementations of the example embodiment(s). Other software systems suitable for implementing the example embodiment(s) may have different components, including components with different connections, relationships, and functions.
Software system 700 is provided for directing the operation of computing system 600. Software system 700, which may be stored in system memory (RAM) 606 and on fixed storage (e.g., hard disk or flash memory) 66, includes a kernel or operating system (OS) 710.
The OS 710 manages low-level aspects of computer operation, including managing execution of processes, memory allocation, file input and output (I/O), and device I/O. One or more application programs, represented as 702A, 702B, 702C . . . 702N, may be “loaded” (e.g., transferred from fixed storage 66 into memory 606) for execution by the system 700. The applications or other software intended for use on computer system 600 may also be stored as a set of downloadable computer-executable instructions, for example, for downloading and installation from an Internet location (e.g., a Web server, an app store, or other online service).
Software system 700 includes a graphical user interface (GUI) 715, for receiving user commands and data in a graphical (e.g., “point-and-click” or “touch gesture”) fashion. These inputs, in turn, may be acted upon by the system 700 in accordance with instructions from operating system 710 and/or application(s) 702. The GUI 715 also serves to display the results of operation from the OS 710 and application(s) 702, whereupon the user may supply additional inputs or terminate the session (e.g., log off).
OS 710 can execute directly on the bare hardware 720 (e.g., processor(s) 104) of computer system 600. Alternatively, a hypervisor or virtual machine monitor (VMM) 730 may be interposed between the bare hardware 720 and the OS 710. In this configuration, VMM 730 acts as a software “cushion” or virtualization layer between the OS 710 and the bare hardware 720 of the computer system 600.
VMM 730 instantiates and runs one or more virtual machine instances (“guest machines”). Each guest machine comprises a “guest” operating system, such as OS 710, and one or more applications, such as application(s) 702, designed to execute on the guest operating system. The VMM 730 presents the guest operating systems with a virtual operating platform and manages the execution of the guest operating systems.
In some instances, the VMM 730 may allow a guest operating system to run as if it is running on the bare hardware 720 of computer system 600 directly. In these instances, the same version of the guest operating system configured to execute on the bare hardware 720 directly may also execute on VMM 730 without modification or reconfiguration. In other words, VMM 730 may provide full hardware and CPU virtualization to a guest operating system in some instances.
In other instances, a guest operating system may be specially designed or configured to execute on VMM 730 for efficiency. In these instances, the guest operating system is “aware” that it executes on a virtual machine monitor. In other words, VMM 730 may provide para-virtualization to a guest operating system in some instances.
A computer system process comprises an allotment of hardware processor time, and an allotment of memory (physical and/or virtual), the allotment of memory being for storing instructions executed by the hardware processor, for storing data generated by the hardware processor executing the instructions, and/or for storing the hardware processor state (e.g. content of registers) between allotments of the hardware processor time when the computer system process is not running. Computer system processes run under the control of an operating system, and may run under the control of other programs being executed on the computer system.
6.0 Cloud Computing
The term “cloud computing” is generally used herein to describe a computing model which enables on-demand access to a shared pool of computing resources, such as computer networks, servers, software applications, and services, and which allows for rapid provisioning and release of resources with minimal management effort or service provider interaction.
A cloud computing environment (sometimes referred to as a cloud environment, or a cloud) can be implemented in a variety of different ways to best suit different requirements. For example, in a public cloud environment, the underlying computing infrastructure is owned by an organization that makes its cloud services available to other organizations or to the general public. In contrast, a private cloud environment is generally intended solely for use by, or within, a single organization. A community cloud is intended to be shared by several organizations within a community; while a hybrid cloud comprise two or more types of cloud (e.g., private, community, or public) that are bound together by data and application portability.
Generally, a cloud computing model enables some of those responsibilities which previously may have been provided by an organization's own information technology department, to instead be delivered as service layers within a cloud environment, for use by consumers (either within or external to the organization, according to the cloud's public/private nature). Depending on the particular implementation, the precise definition of components or features provided by or within each cloud service layer can vary, but common examples include: Software as a Service (SaaS), in which consumers use software applications that are running upon a cloud infrastructure, while a SaaS provider manages or controls the underlying cloud infrastructure and applications. Platform as a Service (PaaS), in which consumers can use software programming languages and development tools supported by a PaaS provider to develop, deploy, and otherwise control their own applications, while the PaaS provider manages or controls other aspects of the cloud environment (i.e., everything below the run-time execution environment). Infrastructure as a Service (IaaS), in which consumers can deploy and run arbitrary software applications, and/or provision processing, storage, networks, and other fundamental computing resources, while an IaaS provider manages or controls the underlying physical cloud infrastructure (i.e., everything below the operating system layer). Database as a Service (DBaaS) in which consumers use a database server or Database Management System that is running upon a cloud infrastructure, while a DbaaS provider manages or controls the underlying cloud infrastructure and applications.
The above-described basic computer hardware and software and cloud computing environment presented for purpose of illustrating the basic underlying computer components that may be employed for implementing the example embodiment(s). The example embodiment(s), however, are not necessarily limited to any particular computing environment or computing device configuration. Instead, the example embodiment(s) may be implemented in any type of system architecture or processing environment that one skilled in the art, in light of this disclosure, would understand as capable of supporting the features and functions of the example embodiment(s) presented herein.
In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the invention, and what is intended by the applicants to be the scope of the invention, is the literal and equivalent scope of the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.

Claims (22)

What is claimed is:
1. A method comprising:
storing in memory, in response to evicting a particular item from a cache, a metadata entry for the particular item;
detecting, in response to a cache miss for the particular item, whether or not the metadata entry for the particular item resides in memory;
incrementing, when the metadata entry for the particular item is detected in memory, a victim hit counter;
incrementing, when a second metadata entry for a second item is detected in memory, same said victim hit counter;
adjusting, based on the victim hit counter, a policy of the cache.
2. The method of claim 1 wherein:
the memory comprises a first-in first-out (FIFO) of metadata for a plurality of items, wherein the FIFO comprises a head and a tail;
storing the metadata entry for the particular item in memory comprises:
removing a metadata entry for another item from the head of the FIFO;
appending the metadata entry for the particular item to the tail of the FIFO.
3. The method of claim 1 further comprising:
storing, after evicting the particular item, the particular item into the cache;
removing the metadata entry for the particular item from the memory.
4. The method of claim 1 wherein:
the method further comprises incrementing, during a hit in the cache, a cache hit counter;
the adjusting is further based on the cache hit counter.
5. The method of claim 4 wherein the adjusting is further based on at least one of: a ratio of the victim hit counter to the cache hit counter, or a ratio of a victim hit rate to a cache hit rate.
6. The method of claim 5 wherein:
adjusting the policy of the cache comprises at least one of:
increasing, when the ratio exceeds a first threshold, a size of the cache;
decreasing, when a second threshold exceeds the ratio, the size of the cache.
7. The method of claim 5 wherein:
the policy of the cache includes distinguishing first-class items from second-class items;
adjusting the policy of the cache comprises at least one of:
decreasing, when the ratio exceeds a first threshold, how much of the cache is reserved for first-class items;
increasing, when a second threshold exceeds the ratio, how much of the cache is reserved for first-class items.
8. The method of claim 4 wherein only items of a particular subset of classes are counted during at least one of: said incrementing the victim hit counter, or said incrementing the cache hit counter.
9. The method of claim 1 wherein the policy of the cache includes distinguishing first-class items from second-class items.
10. The method of claim 9 wherein adjusting the policy of the cache comprises adjusting how much of the cache is reserved for first-class items.
11. The method of claim 1 wherein adjusting the policy of the cache comprises adjusting a size of the cache.
12. The method of claim 1 wherein said adjusting a policy of the cache is further based on an input/output (I/O) latency metric of a backing store of the cache.
13. The method of claim 2 wherein the FIFO contains more metadata entries than the cache contains items.
14. The method of claim 1 wherein said metadata entry contains at least one of: a device identifier, a block identifier, or a reference to a second metadata entry.
15. One or more non-transitory computer-readable media storing instructions that, when executed by one or more processors, cause:
storing in memory, in response to evicting a particular item from a cache, a metadata entry for the particular item;
detecting, in response to a cache miss for the particular item, whether or not the metadata entry for the particular item resides in memory;
incrementing, when the metadata entry for the particular item is detected in memory, a victim hit counter;
incrementing, when a second metadata entry for a second item is detected in memory, same said victim hit counter;
adjusting, based on the victim hit counter, a policy of the cache.
16. The one or more non-transitory computer-readable media of claim 15 wherein:
the memory comprises a first-in first-out (FIFO) of metadata for a plurality of items, wherein the FIFO comprises a head and a tail;
storing the metadata entry for the particular item in memory comprises:
removing a metadata entry for another item from the head of the FIFO;
appending the metadata entry for the particular item to the tail of the FIFO.
17. The one or more non-transitory computer-readable media of claim 15 wherein the instructions, when executed by one or more processors, further cause:
storing, after evicting the particular item, the particular item into the cache;
removing the metadata entry for the particular item from the memory.
18. The one or more non-transitory computer-readable media of claim 15 wherein:
the instructions, when executed by one or more processors, further cause incrementing, during a hit in the cache, a cache hit counter;
the adjusting is further based on the cache hit counter.
19. The one or more non-transitory computer-readable media of claim 18 wherein the adjusting is further based on at least one of: a ratio of the victim hit counter to the cache hit counter, or a ratio of a victim hit rate to a cache hit rate.
20. The one or more non-transitory computer-readable media of claim 18 wherein only items of a particular subset of classes are counted during at least one of: said incrementing the victim hit counter, or said incrementing the cache hit counter.
21. The one or more non-transitory computer-readable media of claim 15 wherein adjusting the policy of the cache comprises adjusting a size of the cache.
22. The one or more non-transitory computer-readable media of claim 15 wherein said adjusting a policy of the cache is further based on an input/output (I/O) latency metric of a backing store of the cache.
US15/687,296 2016-11-04 2017-08-25 Detection of avoidable cache thrashing for OLTP and DW workloads Active US10331573B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US15/687,296 US10331573B2 (en) 2016-11-04 2017-08-25 Detection of avoidable cache thrashing for OLTP and DW workloads
US16/388,955 US11138131B2 (en) 2016-11-04 2019-04-19 Detection of avoidable cache thrashing for OLTP and DW workloads

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662418005P 2016-11-04 2016-11-04
US15/687,296 US10331573B2 (en) 2016-11-04 2017-08-25 Detection of avoidable cache thrashing for OLTP and DW workloads

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/388,955 Continuation US11138131B2 (en) 2016-11-04 2019-04-19 Detection of avoidable cache thrashing for OLTP and DW workloads

Publications (2)

Publication Number Publication Date
US20180129612A1 US20180129612A1 (en) 2018-05-10
US10331573B2 true US10331573B2 (en) 2019-06-25

Family

ID=62063966

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/687,296 Active US10331573B2 (en) 2016-11-04 2017-08-25 Detection of avoidable cache thrashing for OLTP and DW workloads
US16/388,955 Active 2037-10-18 US11138131B2 (en) 2016-11-04 2019-04-19 Detection of avoidable cache thrashing for OLTP and DW workloads

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/388,955 Active 2037-10-18 US11138131B2 (en) 2016-11-04 2019-04-19 Detection of avoidable cache thrashing for OLTP and DW workloads

Country Status (1)

Country Link
US (2) US10331573B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11201914B2 (en) * 2018-08-10 2021-12-14 Wangsu Science & Technology Co., Ltd. Method for processing a super-hot file, load balancing device and download server

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10229161B2 (en) 2013-09-20 2019-03-12 Oracle International Corporation Automatic caching of scan and random access data in computing systems
US10331573B2 (en) 2016-11-04 2019-06-25 Oracle International Corporation Detection of avoidable cache thrashing for OLTP and DW workloads
US20200241792A1 (en) * 2019-01-29 2020-07-30 Sap Se Selective Restriction of Large Object Pages in a Database
US11113192B2 (en) * 2019-11-22 2021-09-07 EMC IP Holding Company LLC Method and apparatus for dynamically adapting cache size based on estimated cache performance
US11550732B2 (en) * 2020-02-22 2023-01-10 International Business Machines Corporation Calculating and adjusting ghost cache size based on data access frequency
US11281594B2 (en) 2020-02-22 2022-03-22 International Business Machines Corporation Maintaining ghost cache statistics for demoted data elements
US11694758B2 (en) 2021-08-09 2023-07-04 Micron Technology, Inc. Changing scan frequency of a probabilistic data integrity scan based on data quality
US11740956B2 (en) * 2021-08-09 2023-08-29 Micron Technology, Inc. Probabilistic data integrity scan with an adaptive scan frequency
US11545229B1 (en) 2021-08-09 2023-01-03 Micron Technology, Inc. Probabilistic data integrity scan with dynamic scan frequency
CN113810298B (en) * 2021-09-23 2023-05-26 长沙理工大学 OpenFlow virtual flow table elastic acceleration searching method supporting network flow jitter
US11842085B1 (en) * 2022-03-31 2023-12-12 Amazon Technologies, Inc. Up-sized cluster performance modeling for a tiered data processing service
US11886354B1 (en) 2022-05-20 2024-01-30 Apple Inc. Cache thrash detection

Citations (107)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4425615A (en) 1980-11-14 1984-01-10 Sperry Corporation Hierarchical memory system having cache/disk subsystem with command queues for plural disks
US4463424A (en) 1981-02-19 1984-07-31 International Business Machines Corporation Method for dynamically allocating LRU/MRU managed memory among concurrent sequential processes
WO1993018461A1 (en) 1992-03-09 1993-09-16 Auspex Systems, Inc. High-performance non-volatile ram protected write cache accelerator system
US5394531A (en) * 1989-04-03 1995-02-28 International Business Machines Corporation Dynamic storage allocation system for a prioritized cache
US5717893A (en) 1989-03-22 1998-02-10 International Business Machines Corporation Method for managing a cache hierarchy having a least recently used (LRU) global cache and a plurality of LRU destaging local caches containing counterpart datatype partitions
US5765034A (en) 1995-10-20 1998-06-09 International Business Machines Corporation Fencing system for standard interfaces for storage devices
US6044367A (en) 1996-08-02 2000-03-28 Hewlett-Packard Company Distributed I/O store
US6457105B1 (en) 1999-01-15 2002-09-24 Hewlett-Packard Company System and method for managing data in an asynchronous I/O cache memory
JP2002278704A (en) 2001-03-19 2002-09-27 Toshiba Corp Method for optimizing processing, computer and storage device
US20020143755A1 (en) 2000-11-28 2002-10-03 Siemens Technology-To-Business Center, Llc System and methods for highly distributed wide-area data management of a network of data sources through a database interface
US6526483B1 (en) 2000-09-20 2003-02-25 Broadcom Corporation Page open hint in transactions
JP2003150419A (en) 2001-11-14 2003-05-23 Hitachi Ltd Storage device having means for obtaining execution information of database management system
US20030115324A1 (en) 1998-06-30 2003-06-19 Steven M Blumenau Method and apparatus for providing data management for a storage system coupled to a network
US20040003087A1 (en) 2002-06-28 2004-01-01 Chambliss David Darden Method for improving performance in a computer storage system by regulating resource requests from clients
JP2004038758A (en) 2002-07-05 2004-02-05 Hitachi Ltd Storage controller, control method for storage controller, and program
US20040054860A1 (en) 2002-09-17 2004-03-18 Nokia Corporation Selective cache admission
US20040062106A1 (en) 2002-09-27 2004-04-01 Bhashyam Ramesh System and method for retrieving information from a database
US6728823B1 (en) 2000-02-18 2004-04-27 Hewlett-Packard Development Company, L.P. Cache connection with bypassing feature
US20040117441A1 (en) 2002-12-09 2004-06-17 Infabric Technologies, Inc. Data-aware data flow manager
US20040148486A1 (en) 2003-01-29 2004-07-29 Burton David Alan Methods and systems of host caching
US20040215626A1 (en) 2003-04-09 2004-10-28 International Business Machines Corporation Method, system, and program for improving performance of database queries
US20040225845A1 (en) 2000-10-06 2004-11-11 Kruckemyer David A. Cache coherent protocol in which exclusive and modified data is transferred to requesting agent from snooping agent
US20040230753A1 (en) 2003-05-16 2004-11-18 International Business Machines Corporation Methods and apparatus for providing service differentiation in a shared storage environment
US20040254943A1 (en) 1998-07-31 2004-12-16 Blue Coat Systems, Inc., A Delaware Corporation Multiple cache communication and uncacheable objects
US20050056520A1 (en) 2003-09-12 2005-03-17 Seagle Donald Lee Sensor position adjusting device for a coin dispenser
US20050120025A1 (en) 2003-10-27 2005-06-02 Andres Rodriguez Policy-based management of a redundant array of independent nodes
GB2409301A (en) 2003-12-18 2005-06-22 Advanced Risc Mach Ltd Error correction within a cache memory
US20050160224A1 (en) 2001-12-21 2005-07-21 International Business Machines Corporation Context-sensitive caching
US20050193160A1 (en) 2004-03-01 2005-09-01 Sybase, Inc. Database System Providing Methodology for Extended Memory Support
US20050210202A1 (en) 2004-03-19 2005-09-22 Intel Corporation Managing input/output (I/O) requests in a cache memory system
US20050283637A1 (en) 2004-05-28 2005-12-22 International Business Machines Corporation System and method for maintaining functionality during component failures
US7036147B1 (en) 2001-12-20 2006-04-25 Mcafee, Inc. System, method and computer program product for eliminating disk read time during virus scanning
US20060106890A1 (en) 2004-11-16 2006-05-18 Vipul Paul Apparatus, system, and method for cache synchronization
US7069324B1 (en) 2000-06-30 2006-06-27 Cisco Technology, Inc. Methods and apparatus slow-starting a web cache system
US7093162B2 (en) 2001-09-04 2006-08-15 Microsoft Corporation Persistent stateful component-based applications via automatic recovery
US20060209444A1 (en) 2005-03-17 2006-09-21 Dong-Hyun Song Hard disk drive with reduced power consumption, related data processing apparatus, and I/O method
US20060218123A1 (en) 2005-03-28 2006-09-28 Sybase, Inc. System and Methodology for Parallel Query Optimization Using Semantic-Based Partitioning
US20060224551A1 (en) 2005-04-01 2006-10-05 International Business Machines Corporation Method, system and program for joining source table rows with target table rows
US20060224451A1 (en) 2004-10-18 2006-10-05 Xcelerator Loyalty Group, Inc. Incentive program
US20060271605A1 (en) 2004-11-16 2006-11-30 Petruzzo Stephen E Data Mirroring System and Method
US20060271740A1 (en) 2005-05-31 2006-11-30 Mark Timothy W Performing read-ahead operation for a direct input/output request
US20060277439A1 (en) 2005-06-01 2006-12-07 Microsoft Corporation Code coverage test selection
US7159076B2 (en) 2003-06-24 2007-01-02 Research In Motion Limited Cache operation with non-cache memory
US20070067575A1 (en) 2005-09-20 2007-03-22 Morris John M Method of managing cache memory based on data temperature
US20070124415A1 (en) 2005-11-29 2007-05-31 Etai Lev-Ran Method and apparatus for reducing network traffic over low bandwidth links
US7237027B1 (en) 2000-11-10 2007-06-26 Agami Systems, Inc. Scalable storage system
US20070220348A1 (en) 2006-02-28 2007-09-20 Mendoza Alfredo V Method of isolating erroneous software program components
US20070260819A1 (en) 2006-05-04 2007-11-08 International Business Machines Corporation Complier assisted victim cache bypassing
US20070271570A1 (en) 2006-05-17 2007-11-22 Brown Douglas P Managing database utilities to improve throughput and concurrency
US20080046736A1 (en) 2006-08-03 2008-02-21 Arimilli Ravi K Data Processing System and Method for Reducing Cache Pollution by Write Stream Memory Access Patterns
CN101150483A (en) 2007-11-02 2008-03-26 华为技术有限公司 Route table adjustment method, route query method and device and route table storage device
US20080104329A1 (en) 2006-10-31 2008-05-01 Gaither Blaine D Cache and method for cache bypass functionality
US20080104283A1 (en) 2006-10-31 2008-05-01 George Shin Method and system for achieving fair command processing in storage systems that implement command-associated priority queuing
US20080147599A1 (en) 2006-12-18 2008-06-19 Ianywhere Solutions, Inc. Load balancing for complex database query plans
US20080155229A1 (en) 2006-12-21 2008-06-26 Kevin Scott Beyer System and method for generating a cache-aware bloom filter
US20080177803A1 (en) 2007-01-24 2008-07-24 Sam Fineberg Log Driven Storage Controller with Network Persistent Memory
US20080244209A1 (en) 2007-03-27 2008-10-02 Seelam Seetharami R Methods and devices for determining quality of services of storage systems
US7461147B1 (en) 2002-08-26 2008-12-02 Netapp. Inc. Node selection within a network based on policy
US20080307266A1 (en) 2004-09-24 2008-12-11 Sashikanth Chandrasekaran Techniques for automatically tracking software errors
US20090164536A1 (en) 2007-12-19 2009-06-25 Network Appliance, Inc. Using The LUN Type For Storage Allocation
US20090182960A1 (en) 2008-01-10 2009-07-16 International Business Machines Corporation Using multiple sidefiles to buffer writes to primary storage volumes to transfer to corresponding secondary storage volumes in a mirror relationship
US20090193189A1 (en) 2008-01-30 2009-07-30 Formation, Inc. Block-based Storage System Having Recovery Memory to Prevent Loss of Data from Volatile Write Cache
US20090248871A1 (en) 2008-03-26 2009-10-01 Fujitsu Limited Server and connecting destination server switch control method
US7636814B1 (en) 2005-04-28 2009-12-22 Symantec Operating Corporation System and method for asynchronous reads of old data blocks updated through a write-back cache
US20100017556A1 (en) 2008-07-19 2010-01-21 Nanostar Corporationm U.S.A. Non-volatile memory storage system with two-stage controller architecture
US7660945B1 (en) 2004-03-09 2010-02-09 Seagate Technology, Llc Methods and structure for limiting storage device write caching
US20100077107A1 (en) 2008-09-19 2010-03-25 Oracle International Corporation Storage-side storage request management
US20100158486A1 (en) 2008-12-19 2010-06-24 Seagate Technology Llc Storage device and controller to selectively activate a storage media
US7769802B2 (en) 2003-12-04 2010-08-03 Microsoft Corporation Systems and methods that employ correlated synchronous-on-asynchronous processing
US20100199042A1 (en) 2009-01-30 2010-08-05 Twinstrata, Inc System and method for secure and reliable multi-cloud data replication
US20100205367A1 (en) 2009-02-09 2010-08-12 Ehrlich Richard M Method And System For Maintaining Cache Data Integrity With Flush-Cache Commands
US20100274962A1 (en) 2009-04-26 2010-10-28 Sandisk Il Ltd. Method and apparatus for implementing a caching policy for non-volatile memory
US7836262B2 (en) 2007-06-05 2010-11-16 Apple Inc. Converting victim writeback to a fill
US20100332901A1 (en) 2009-06-30 2010-12-30 Sun Microsystems, Inc. Advice-based feedback for transactional execution
US20110022801A1 (en) 2007-12-06 2011-01-27 David Flynn Apparatus, system, and method for redundant write caching
US20110040861A1 (en) 2009-08-17 2011-02-17 At&T Intellectual Property I, L.P. Integrated Proximity Routing for Content Distribution
US20110047084A1 (en) 2008-04-14 2011-02-24 Antonio Manzalini Distributed service framework
US20110066791A1 (en) 2009-09-14 2011-03-17 Oracle International Corporation Caching data between a database server and a storage system
US20110153719A1 (en) 2009-12-22 2011-06-23 At&T Intellectual Property I, L.P. Integrated Adaptive Anycast for Content Distribution
US20110173325A1 (en) 2008-09-15 2011-07-14 Dell Products L.P. System and Method for Management of Remotely Shared Data
US20110191543A1 (en) 2010-02-02 2011-08-04 Arm Limited Area and power efficient data coherency maintenance
US20110238899A1 (en) 2008-12-27 2011-09-29 Kabushiki Kaisha Toshiba Memory system, method of controlling memory system, and information processing apparatus
US20110320804A1 (en) 2010-06-24 2011-12-29 International Business Machines Corporation Data access management in a hybrid memory server
US20120124296A1 (en) * 2010-11-17 2012-05-17 Bryant Christopher D Method and apparatus for reacquiring lines in a cache
US20120144234A1 (en) 2010-12-03 2012-06-07 Teradata Us, Inc. Automatic error recovery mechanism for a database system
US20120159480A1 (en) 2010-12-21 2012-06-21 Hitachi, Ltd. Data processing method and apparatus for remote storage system
US8244984B1 (en) 2008-12-08 2012-08-14 Nvidia Corporation System and method for cleaning dirty data in an intermediate cache using a data class dependent eviction policy
US8326839B2 (en) 2009-11-09 2012-12-04 Oracle International Corporation Efficient file access in a large repository using a two-level cache
US8327080B1 (en) 2010-09-28 2012-12-04 Emc Corporation Write-back cache protection
US8359429B1 (en) 2004-11-08 2013-01-22 Symantec Operating Corporation System and method for distributing volume status information in a storage system
US8370452B2 (en) 2010-12-27 2013-02-05 Limelight Networks, Inc. Partial object caching
US20130086330A1 (en) 2011-09-30 2013-04-04 Oracle International Corporation Write-Back Storage Cache Based On Fast Persistent Memory
US20130275402A1 (en) 2012-04-17 2013-10-17 Oracle International Corporation Redistributing Computation Work Between Data Producers And Data Consumers
US20130326152A1 (en) 2012-05-31 2013-12-05 Oracle International Corporation Rapid Recovery From Loss Of Storage Device Cache
US20140089565A1 (en) 2012-09-27 2014-03-27 Arkologic Limited Solid state device write operation management system
US20140149638A1 (en) 2012-11-26 2014-05-29 Lsi Corporation System and method for providing a flash memory cache input/output throttling mechanism based upon temperature parameters for promoting improved flash life
US20140281167A1 (en) 2013-03-15 2014-09-18 Skyera, Inc. Compressor resources for high density storage units
US20140281272A1 (en) 2013-03-13 2014-09-18 Oracle International Corporation Rapid Recovery From Downtime Of Mirrored Storage Device
US8868707B2 (en) 2009-06-16 2014-10-21 Oracle International Corporation Adaptive write-back and write-through caching for off-line data
US20150012690A1 (en) 2013-03-15 2015-01-08 Rolando H. Bruce Multi-Leveled Cache Management in a Hybrid Storage System
US20150089121A1 (en) 2013-09-20 2015-03-26 Oracle International Corporation Managing A Cache On Storage Devices Supporting Compression
US9003159B2 (en) 2009-10-05 2015-04-07 Marvell World Trade Ltd. Data caching in non-volatile memory
US9256542B1 (en) 2008-09-17 2016-02-09 Pmc-Sierra Us, Inc. Adaptive intelligent storage controller and associated methods
US20160117125A1 (en) * 2014-10-24 2016-04-28 Spectra Logic Corporation Authoritative power management
US9361232B2 (en) 2008-09-19 2016-06-07 Oracle International Corporation Selectively reading data from cache and primary storage
US20170322886A1 (en) * 2016-05-09 2017-11-09 Cavium, Inc. Admission control for memory access requests
US20170357588A1 (en) * 2016-06-13 2017-12-14 Advanced Micro Devices, Inc. Scaled set dueling for cache replacement policies

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6408292B1 (en) 1999-08-04 2002-06-18 Hyperroll, Israel, Ltd. Method of and system for managing multi-dimensional databases using modular-arithmetic based address data mapping processes on integer-encoded business dimensions
US6683873B1 (en) 1999-12-27 2004-01-27 Cisco Technology, Inc. Methods and apparatus for redirecting network traffic
US6823377B1 (en) 2000-01-28 2004-11-23 International Business Machines Corporation Arrangements and methods for latency-sensitive hashing for collaborative web caching
WO2004015873A1 (en) * 2002-08-13 2004-02-19 Vanu, Inc. Convolutional decoding
US6931504B2 (en) * 2003-05-06 2005-08-16 Sun Microsystems, Inc. Method and apparatus for relocating objects within an object-addressed memory hierarchy
US9805077B2 (en) 2008-02-19 2017-10-31 International Business Machines Corporation Method and system for optimizing data access in a database using multi-class objects
US8156374B1 (en) * 2009-07-23 2012-04-10 Sprint Communications Company L.P. Problem management for outsized queues
JP5447980B2 (en) * 2010-07-16 2014-03-19 株式会社マキタ Brake mechanism selection structure for electric tools
US8407419B2 (en) 2010-11-30 2013-03-26 Open Text S.A. System and method for managing a cache using file system metadata
WO2013123589A1 (en) 2012-02-24 2013-08-29 Rayan Zachariassen General storage functionality extension
US9433623B2 (en) * 2012-09-14 2016-09-06 Elizabeth LEVINA Medicinal drug with activity against gram positive bacteria, mycobacteria and fungi
US9009439B2 (en) 2013-03-12 2015-04-14 Sap Se On-disk operations on fragments to support huge data sizes
US9330055B2 (en) 2013-06-04 2016-05-03 International Business Machines Corporation Modular architecture for extreme-scale distributed processing applications
JP2015011421A (en) * 2013-06-27 2015-01-19 ソニー株式会社 Storage controller, storage device and storage control method thereof
US10229161B2 (en) 2013-09-20 2019-03-12 Oracle International Corporation Automatic caching of scan and random access data in computing systems
US9460143B2 (en) * 2014-02-14 2016-10-04 Oracle International Corporation Methods, systems, and computer readable media for a multi-view data construct for lock-free operations and direct access
US10331573B2 (en) 2016-11-04 2019-06-25 Oracle International Corporation Detection of avoidable cache thrashing for OLTP and DW workloads
US10715443B2 (en) 2017-10-26 2020-07-14 Cisco Technology, Inc. Effective handling of WCCP reject traffic

Patent Citations (125)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4425615A (en) 1980-11-14 1984-01-10 Sperry Corporation Hierarchical memory system having cache/disk subsystem with command queues for plural disks
US4463424A (en) 1981-02-19 1984-07-31 International Business Machines Corporation Method for dynamically allocating LRU/MRU managed memory among concurrent sequential processes
US5717893A (en) 1989-03-22 1998-02-10 International Business Machines Corporation Method for managing a cache hierarchy having a least recently used (LRU) global cache and a plurality of LRU destaging local caches containing counterpart datatype partitions
US5394531A (en) * 1989-04-03 1995-02-28 International Business Machines Corporation Dynamic storage allocation system for a prioritized cache
WO1993018461A1 (en) 1992-03-09 1993-09-16 Auspex Systems, Inc. High-performance non-volatile ram protected write cache accelerator system
US5765034A (en) 1995-10-20 1998-06-09 International Business Machines Corporation Fencing system for standard interfaces for storage devices
US6044367A (en) 1996-08-02 2000-03-28 Hewlett-Packard Company Distributed I/O store
US20030115324A1 (en) 1998-06-30 2003-06-19 Steven M Blumenau Method and apparatus for providing data management for a storage system coupled to a network
US20040254943A1 (en) 1998-07-31 2004-12-16 Blue Coat Systems, Inc., A Delaware Corporation Multiple cache communication and uncacheable objects
US6457105B1 (en) 1999-01-15 2002-09-24 Hewlett-Packard Company System and method for managing data in an asynchronous I/O cache memory
US6728823B1 (en) 2000-02-18 2004-04-27 Hewlett-Packard Development Company, L.P. Cache connection with bypassing feature
US7069324B1 (en) 2000-06-30 2006-06-27 Cisco Technology, Inc. Methods and apparatus slow-starting a web cache system
US6526483B1 (en) 2000-09-20 2003-02-25 Broadcom Corporation Page open hint in transactions
US20040225845A1 (en) 2000-10-06 2004-11-11 Kruckemyer David A. Cache coherent protocol in which exclusive and modified data is transferred to requesting agent from snooping agent
US7237027B1 (en) 2000-11-10 2007-06-26 Agami Systems, Inc. Scalable storage system
US20020143755A1 (en) 2000-11-28 2002-10-03 Siemens Technology-To-Business Center, Llc System and methods for highly distributed wide-area data management of a network of data sources through a database interface
JP2002278704A (en) 2001-03-19 2002-09-27 Toshiba Corp Method for optimizing processing, computer and storage device
US7093162B2 (en) 2001-09-04 2006-08-15 Microsoft Corporation Persistent stateful component-based applications via automatic recovery
US6928451B2 (en) 2001-11-14 2005-08-09 Hitachi, Ltd. Storage system having means for acquiring execution information of database management system
JP2003150419A (en) 2001-11-14 2003-05-23 Hitachi Ltd Storage device having means for obtaining execution information of database management system
US7036147B1 (en) 2001-12-20 2006-04-25 Mcafee, Inc. System, method and computer program product for eliminating disk read time during virus scanning
US20050160224A1 (en) 2001-12-21 2005-07-21 International Business Machines Corporation Context-sensitive caching
US7228354B2 (en) 2002-06-28 2007-06-05 International Business Machines Corporation Method for improving performance in a computer storage system by regulating resource requests from clients
US20040003087A1 (en) 2002-06-28 2004-01-01 Chambliss David Darden Method for improving performance in a computer storage system by regulating resource requests from clients
JP2004038758A (en) 2002-07-05 2004-02-05 Hitachi Ltd Storage controller, control method for storage controller, and program
US6886084B2 (en) 2002-07-05 2005-04-26 Hitachi, Ltd. Storage controlling device and control method for a storage controlling device
US7461147B1 (en) 2002-08-26 2008-12-02 Netapp. Inc. Node selection within a network based on policy
US20040054860A1 (en) 2002-09-17 2004-03-18 Nokia Corporation Selective cache admission
US20040062106A1 (en) 2002-09-27 2004-04-01 Bhashyam Ramesh System and method for retrieving information from a database
US6922754B2 (en) 2002-12-09 2005-07-26 Infabric Technologies, Inc. Data-aware data flow manager
US20040117441A1 (en) 2002-12-09 2004-06-17 Infabric Technologies, Inc. Data-aware data flow manager
US20040148486A1 (en) 2003-01-29 2004-07-29 Burton David Alan Methods and systems of host caching
US20040215626A1 (en) 2003-04-09 2004-10-28 International Business Machines Corporation Method, system, and program for improving performance of database queries
US20040230753A1 (en) 2003-05-16 2004-11-18 International Business Machines Corporation Methods and apparatus for providing service differentiation in a shared storage environment
US7290090B2 (en) 2003-06-24 2007-10-30 Research In Motion Limited Cache operation with non-cache memory
US7506103B2 (en) 2003-06-24 2009-03-17 Research In Motion Limited Cache operation with non-cache memory
US7159076B2 (en) 2003-06-24 2007-01-02 Research In Motion Limited Cache operation with non-cache memory
US20080016283A1 (en) 2003-06-24 2008-01-17 Madter Richard C Cache operation with non-cache memory
US20050056520A1 (en) 2003-09-12 2005-03-17 Seagle Donald Lee Sensor position adjusting device for a coin dispenser
US20050120025A1 (en) 2003-10-27 2005-06-02 Andres Rodriguez Policy-based management of a redundant array of independent nodes
US7769802B2 (en) 2003-12-04 2010-08-03 Microsoft Corporation Systems and methods that employ correlated synchronous-on-asynchronous processing
GB2409301A (en) 2003-12-18 2005-06-22 Advanced Risc Mach Ltd Error correction within a cache memory
US20050193160A1 (en) 2004-03-01 2005-09-01 Sybase, Inc. Database System Providing Methodology for Extended Memory Support
US7660945B1 (en) 2004-03-09 2010-02-09 Seagate Technology, Llc Methods and structure for limiting storage device write caching
US20050210202A1 (en) 2004-03-19 2005-09-22 Intel Corporation Managing input/output (I/O) requests in a cache memory system
US7165144B2 (en) 2004-03-19 2007-01-16 Intel Corporation Managing input/output (I/O) requests in a cache memory system
US20050283637A1 (en) 2004-05-28 2005-12-22 International Business Machines Corporation System and method for maintaining functionality during component failures
US20080307266A1 (en) 2004-09-24 2008-12-11 Sashikanth Chandrasekaran Techniques for automatically tracking software errors
US20060224451A1 (en) 2004-10-18 2006-10-05 Xcelerator Loyalty Group, Inc. Incentive program
US8359429B1 (en) 2004-11-08 2013-01-22 Symantec Operating Corporation System and method for distributing volume status information in a storage system
US20060106890A1 (en) 2004-11-16 2006-05-18 Vipul Paul Apparatus, system, and method for cache synchronization
US20060271605A1 (en) 2004-11-16 2006-11-30 Petruzzo Stephen E Data Mirroring System and Method
US20060209444A1 (en) 2005-03-17 2006-09-21 Dong-Hyun Song Hard disk drive with reduced power consumption, related data processing apparatus, and I/O method
US20060218123A1 (en) 2005-03-28 2006-09-28 Sybase, Inc. System and Methodology for Parallel Query Optimization Using Semantic-Based Partitioning
US20060224551A1 (en) 2005-04-01 2006-10-05 International Business Machines Corporation Method, system and program for joining source table rows with target table rows
US7636814B1 (en) 2005-04-28 2009-12-22 Symantec Operating Corporation System and method for asynchronous reads of old data blocks updated through a write-back cache
US20060271740A1 (en) 2005-05-31 2006-11-30 Mark Timothy W Performing read-ahead operation for a direct input/output request
US20060277439A1 (en) 2005-06-01 2006-12-07 Microsoft Corporation Code coverage test selection
US20070067575A1 (en) 2005-09-20 2007-03-22 Morris John M Method of managing cache memory based on data temperature
US20070124415A1 (en) 2005-11-29 2007-05-31 Etai Lev-Ran Method and apparatus for reducing network traffic over low bandwidth links
US20070220348A1 (en) 2006-02-28 2007-09-20 Mendoza Alfredo V Method of isolating erroneous software program components
US20070260819A1 (en) 2006-05-04 2007-11-08 International Business Machines Corporation Complier assisted victim cache bypassing
US20070271570A1 (en) 2006-05-17 2007-11-22 Brown Douglas P Managing database utilities to improve throughput and concurrency
US20080046736A1 (en) 2006-08-03 2008-02-21 Arimilli Ravi K Data Processing System and Method for Reducing Cache Pollution by Write Stream Memory Access Patterns
US20080104283A1 (en) 2006-10-31 2008-05-01 George Shin Method and system for achieving fair command processing in storage systems that implement command-associated priority queuing
US20080104329A1 (en) 2006-10-31 2008-05-01 Gaither Blaine D Cache and method for cache bypass functionality
US8683139B2 (en) 2006-10-31 2014-03-25 Hewlett-Packard Development Company, L.P. Cache and method for cache bypass functionality
US20080147599A1 (en) 2006-12-18 2008-06-19 Ianywhere Solutions, Inc. Load balancing for complex database query plans
US20080155229A1 (en) 2006-12-21 2008-06-26 Kevin Scott Beyer System and method for generating a cache-aware bloom filter
US20080177803A1 (en) 2007-01-24 2008-07-24 Sam Fineberg Log Driven Storage Controller with Network Persistent Memory
US20080244209A1 (en) 2007-03-27 2008-10-02 Seelam Seetharami R Methods and devices for determining quality of services of storage systems
US7836262B2 (en) 2007-06-05 2010-11-16 Apple Inc. Converting victim writeback to a fill
CN101150483A (en) 2007-11-02 2008-03-26 华为技术有限公司 Route table adjustment method, route query method and device and route table storage device
US20110022801A1 (en) 2007-12-06 2011-01-27 David Flynn Apparatus, system, and method for redundant write caching
US20090164536A1 (en) 2007-12-19 2009-06-25 Network Appliance, Inc. Using The LUN Type For Storage Allocation
US20090182960A1 (en) 2008-01-10 2009-07-16 International Business Machines Corporation Using multiple sidefiles to buffer writes to primary storage volumes to transfer to corresponding secondary storage volumes in a mirror relationship
US20090193189A1 (en) 2008-01-30 2009-07-30 Formation, Inc. Block-based Storage System Having Recovery Memory to Prevent Loss of Data from Volatile Write Cache
US20090248871A1 (en) 2008-03-26 2009-10-01 Fujitsu Limited Server and connecting destination server switch control method
US7904562B2 (en) 2008-03-26 2011-03-08 Fujitsu Limited Server and connecting destination server switch control method
US20110047084A1 (en) 2008-04-14 2011-02-24 Antonio Manzalini Distributed service framework
US20100017556A1 (en) 2008-07-19 2010-01-21 Nanostar Corporationm U.S.A. Non-volatile memory storage system with two-stage controller architecture
US20110173325A1 (en) 2008-09-15 2011-07-14 Dell Products L.P. System and Method for Management of Remotely Shared Data
US9256542B1 (en) 2008-09-17 2016-02-09 Pmc-Sierra Us, Inc. Adaptive intelligent storage controller and associated methods
US20140337314A1 (en) 2008-09-19 2014-11-13 Oracle International Corporation Hash join using collaborative parallel filtering in intelligent storage with offloaded bloom filters
US8874807B2 (en) 2008-09-19 2014-10-28 Oracle International Corporation Storage-side storage request management
US8521923B2 (en) 2008-09-19 2013-08-27 Oracle International Corporation Storage-side storage request management
US20100082648A1 (en) 2008-09-19 2010-04-01 Oracle International Corporation Hash join using collaborative parallel filtering in intelligent storage with offloaded bloom filters
US9361232B2 (en) 2008-09-19 2016-06-07 Oracle International Corporation Selectively reading data from cache and primary storage
US20100077107A1 (en) 2008-09-19 2010-03-25 Oracle International Corporation Storage-side storage request management
US8145806B2 (en) 2008-09-19 2012-03-27 Oracle International Corporation Storage-side storage request management
US8244984B1 (en) 2008-12-08 2012-08-14 Nvidia Corporation System and method for cleaning dirty data in an intermediate cache using a data class dependent eviction policy
US20100158486A1 (en) 2008-12-19 2010-06-24 Seagate Technology Llc Storage device and controller to selectively activate a storage media
US20110238899A1 (en) 2008-12-27 2011-09-29 Kabushiki Kaisha Toshiba Memory system, method of controlling memory system, and information processing apparatus
US20100199042A1 (en) 2009-01-30 2010-08-05 Twinstrata, Inc System and method for secure and reliable multi-cloud data replication
US20100205367A1 (en) 2009-02-09 2010-08-12 Ehrlich Richard M Method And System For Maintaining Cache Data Integrity With Flush-Cache Commands
US20100274962A1 (en) 2009-04-26 2010-10-28 Sandisk Il Ltd. Method and apparatus for implementing a caching policy for non-volatile memory
US8868707B2 (en) 2009-06-16 2014-10-21 Oracle International Corporation Adaptive write-back and write-through caching for off-line data
US20100332901A1 (en) 2009-06-30 2010-12-30 Sun Microsystems, Inc. Advice-based feedback for transactional execution
US20110040861A1 (en) 2009-08-17 2011-02-17 At&T Intellectual Property I, L.P. Integrated Proximity Routing for Content Distribution
US20150006813A1 (en) 2009-09-14 2015-01-01 Oracle International Corporation Caching data between a database server and a storage system
US8868831B2 (en) 2009-09-14 2014-10-21 Oracle International Corporation Caching data between a database server and a storage system
US9405694B2 (en) 2009-09-14 2016-08-02 Oracle Internation Corporation Caching data between a database server and a storage system
US20110066791A1 (en) 2009-09-14 2011-03-17 Oracle International Corporation Caching data between a database server and a storage system
US9003159B2 (en) 2009-10-05 2015-04-07 Marvell World Trade Ltd. Data caching in non-volatile memory
US8326839B2 (en) 2009-11-09 2012-12-04 Oracle International Corporation Efficient file access in a large repository using a two-level cache
US20110153719A1 (en) 2009-12-22 2011-06-23 At&T Intellectual Property I, L.P. Integrated Adaptive Anycast for Content Distribution
US20110191543A1 (en) 2010-02-02 2011-08-04 Arm Limited Area and power efficient data coherency maintenance
US20110320804A1 (en) 2010-06-24 2011-12-29 International Business Machines Corporation Data access management in a hybrid memory server
US8327080B1 (en) 2010-09-28 2012-12-04 Emc Corporation Write-back cache protection
US20120124296A1 (en) * 2010-11-17 2012-05-17 Bryant Christopher D Method and apparatus for reacquiring lines in a cache
US20120144234A1 (en) 2010-12-03 2012-06-07 Teradata Us, Inc. Automatic error recovery mechanism for a database system
US20120159480A1 (en) 2010-12-21 2012-06-21 Hitachi, Ltd. Data processing method and apparatus for remote storage system
US8370452B2 (en) 2010-12-27 2013-02-05 Limelight Networks, Inc. Partial object caching
US20130086330A1 (en) 2011-09-30 2013-04-04 Oracle International Corporation Write-Back Storage Cache Based On Fast Persistent Memory
US20130275402A1 (en) 2012-04-17 2013-10-17 Oracle International Corporation Redistributing Computation Work Between Data Producers And Data Consumers
US20130326152A1 (en) 2012-05-31 2013-12-05 Oracle International Corporation Rapid Recovery From Loss Of Storage Device Cache
US20140089565A1 (en) 2012-09-27 2014-03-27 Arkologic Limited Solid state device write operation management system
US20140149638A1 (en) 2012-11-26 2014-05-29 Lsi Corporation System and method for providing a flash memory cache input/output throttling mechanism based upon temperature parameters for promoting improved flash life
US20140281272A1 (en) 2013-03-13 2014-09-18 Oracle International Corporation Rapid Recovery From Downtime Of Mirrored Storage Device
US20140281167A1 (en) 2013-03-15 2014-09-18 Skyera, Inc. Compressor resources for high density storage units
US20150012690A1 (en) 2013-03-15 2015-01-08 Rolando H. Bruce Multi-Leveled Cache Management in a Hybrid Storage System
US20150089121A1 (en) 2013-09-20 2015-03-26 Oracle International Corporation Managing A Cache On Storage Devices Supporting Compression
US20160117125A1 (en) * 2014-10-24 2016-04-28 Spectra Logic Corporation Authoritative power management
US20170322886A1 (en) * 2016-05-09 2017-11-09 Cavium, Inc. Admission control for memory access requests
US20170357588A1 (en) * 2016-06-13 2017-12-14 Advanced Micro Devices, Inc. Scaled set dueling for cache replacement policies

Non-Patent Citations (15)

* Cited by examiner, † Cited by third party
Title
IBM TDB, "A Scheduling Algorithm for Processing Mutually Exclusive Workloads in a multi-system Configuration", ip.com dated Aug. 19, 2002 (3 pages).
Kakarla, U.S. Appl. No. 14/489,221, filed Sep. 17, 2014, Interview Summary, dated Nov. 9, 2017.
Kakarla, U.S. Appl. No. 14/489,221, filed Sep. 17, 2014, Notice of Allowance, dated Oct. 25, 2018.
Loizos, M., et al., "Improving distributed join efficiency with extended bloom filter operations", Advanced Networking and Applications, 2007. AINA '07., 21st International Conference on IEEE, PI, May 1, 2007, pp. 187-194, ISBN: 978-0-7695-2846-5.
Mackert, F. Lothar et al., "R* optimizer validation and performance evaluation for local queries" SIGMOD Record, ACM, New York, NY, US., vol. 15, No. 2, Jun. 1, 1986, pp. 84-95, ISSN: 0163-5808.
O'Neil, P., et al., "Multi-table joins through bitmapped join indices", SIGMOD Record, ACM, New York, NY, US, vol. 24, No. 3, Sep. 1, 1995, pp. 8-11, ISSN: 0163-5808.
U.S. Appl. No. 12/691,146, filed Jan. 21, 2010, Decision on Appeal, dated Sep. 29, 2015.
U.S. Appl. No. 12/691,146, filed Jan. 21, 2010, Examiners Answers, dated Apr. 29, 2013.
U.S. Appl. No. 12/691,146, filed Jan. 21, 2010, Examiners Answers, dated Dec. 1, 2016.
U.S. Appl. No. 12/691,146, filed Jan. 21, 2010, Final Office Action, dated Apr. 20, 2016.
U.S. Appl. No. 12/691,146, filed Jan. 21, 2010, Final Office Action, dated Nov. 1, 2012.
U.S. Appl. No. 12/691,146, filed Jan. 21, 2010, Interview Summary, dated Mar. 21, 2016.
U.S. Appl. No. 12/691,146, filed Jan. 21, 2010, Office Action, dated Aug. 17, 2012.
U.S. Appl. No. 12/691,146, filed Jan. 21, 2010, Office Action, dated Dec. 15, 2015.
Zhe, Li, et al., "PERF join: an alternative to two-way semijoin and Bloomjoin" Proceedings of the 1995 ACM CIKM International Conference on Information and Knowledge Management ACM New York. NY, US., 1995, pp. 187-144, ISBN: 0-89791-812-6.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11201914B2 (en) * 2018-08-10 2021-12-14 Wangsu Science & Technology Co., Ltd. Method for processing a super-hot file, load balancing device and download server

Also Published As

Publication number Publication date
US20190243783A1 (en) 2019-08-08
US20180129612A1 (en) 2018-05-10
US11138131B2 (en) 2021-10-05

Similar Documents

Publication Publication Date Title
US11138131B2 (en) Detection of avoidable cache thrashing for OLTP and DW workloads
Shan et al. {LegoOS}: A disseminated, distributed {OS} for hardware resource disaggregation
Raybuck et al. Hemem: Scalable tiered memory management for big data applications and real nvm
US8943272B2 (en) Variable cache line size management
US8874854B2 (en) Method for selectively enabling and disabling read caching in a storage subsystem
US9798655B2 (en) Managing a cache on storage devices supporting compression
US9176870B2 (en) Off-heap direct-memory data stores, methods of creating and/or managing off-heap direct-memory data stores, and/or systems including off-heap direct memory data store
Saemundsson et al. Dynamic performance profiling of cloud caches
US9348752B1 (en) Cached data replication for cache recovery
US20160253269A1 (en) Spatial Sampling for Efficient Cache Utility Curve Estimation and Cache Allocation
US20160357674A1 (en) Unified Online Cache Monitoring and Optimization
US9984004B1 (en) Dynamic cache balancing
US8606994B2 (en) Method for adapting performance sensitive operations to various levels of machine loads
US9658957B2 (en) Systems and methods for managing data input/output operations
US9229869B1 (en) Multi-lock caches
US8301836B2 (en) Methods for determining alias offset of a cache memory
Laga et al. Lynx: A learning linux prefetching mechanism for ssd performance model
US8285931B2 (en) Methods for reducing cache memory pollution during parity calculations of RAID data
Min et al. Vmmb: Virtual machine memory balancing for unmodified operating systems
US8219751B2 (en) Methods for optimizing performance of transient data calculations
Venkatesan et al. Ex-tmem: Extending transcendent memory with non-volatile memory for virtual machines
US9542318B2 (en) Temporary cache memory eviction
US20170293570A1 (en) System and methods of an efficient cache algorithm in a hierarchical storage system
Peng et al. Umap: Enabling application-driven optimizations for memory mapping persistent store
Fukuda et al. Cache Management with Fadvise Based on LFU

Legal Events

Date Code Title Description
AS Assignment

Owner name: ORACLE INTERNATIONAL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEWIS, JUSTIN MATTHEW;TAO, ZUOYU;SHI, JIA;AND OTHERS;SIGNING DATES FROM 20170823 TO 20170824;REEL/FRAME:043420/0750

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
CC Certificate of correction
MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4