WO2014138370A1 - Fast cache reheat - Google Patents

Fast cache reheat Download PDF

Info

Publication number
WO2014138370A1
WO2014138370A1 PCT/US2014/021136 US2014021136W WO2014138370A1 WO 2014138370 A1 WO2014138370 A1 WO 2014138370A1 US 2014021136 W US2014021136 W US 2014021136W WO 2014138370 A1 WO2014138370 A1 WO 2014138370A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
data
storage
index
data store
Prior art date
Application number
PCT/US2014/021136
Other languages
French (fr)
Inventor
Rodney G. Harrison
Jason P. O'broin
Original Assignee
Drobo, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/790,163 external-priority patent/US10922225B2/en
Application filed by Drobo, Inc. filed Critical Drobo, Inc.
Publication of WO2014138370A1 publication Critical patent/WO2014138370A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1441Resetting or repowering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0871Allocation or management of cache space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/31Providing disk cache in a specific location of a storage system
    • G06F2212/313In storage device

Definitions

  • the present invention relates generally to data storage systems, and, more particularly, to initialization of a cache memory following a system reset or other event.
  • a cache is commonly used in a computer system to provide fast access to part of a dataset.
  • a cache memory is significantly faster than the main data store, often by more than an order of magnitude.
  • Cache memories are usually quite small relative to a larger data store from which their contents are obtained.
  • a CPU may have a cache of 2MiB used to accelerate access to 16GiB of DRAM, or a 4TiB hard disk may have 64MiB of DRAM as its cache.
  • a computer system may have several levels of cache, perhaps of differing speed and size, and also may have several types of cache.
  • Some caches may be generic and able to hold any data in the system, e.g. a processor L2 cache, and some caches may be specialized and able to only hold very specific types of data, e.g. a processors translation look-aside buffer used to hold only address translation tables.
  • Some caches are built from special hardware, e.g. processor L2 and TLB caches, while other caches may be ordinary DRAM used to accelerate access to data normally held on a slower medium, e.g. a magnetic disk.
  • Some caches may hold data that are expected to cycle through very quickly (e.g. a processor L2 cache, Host Logical Block information) and some hold data that may stay in cache for a long time (e.g., some page address translations in a TLB, Cluster Lookup Translations).
  • hot set Those data frequently accessed so as to be held in a cache are often referred to as a "hot set” or "hot data.” As the set of hot data changes, the data in the cache will be accessed less frequently and the data in the main store will be accessed more frequently. This can be viewed as a cooling of the temperature of the data in the cache and is a sign that some data should be evicted from the cache in order to make way for new, hotter, data to be cached.
  • Non-volatile cache memories e.g., NVRAM, DRAM with battery backup, etc.
  • a method for fast cache reheat in a data storage system involves periodically storing, in a first data store, a snapshot of an index identifying storage locations associated with contents of a cache, and, upon a restart of the data storage system, retrieving the index from the last snapshot stored prior to the restart, retrieving, from a second data store, data from storage locations identified in the index, and storing the retrieved data in the cache.
  • the first data store and the second data store may be the same data store or may be different data stores.
  • the first data store and/or the second data store may include a set of block storage devices.
  • the index may identify physical storage locations and/or virtual storage addresses. Retrieving data from storage locations identified in the index may involve translating a virtual storage address to a physical storage address for the second data store.
  • the cache may include an application program interface, in which case the cache may provide the index via the application program interface. Retrieving data from storage locations identified in the index may involve providing the retrieved index to the cache via the application program interface and retrieving the data by the cache.
  • the data storage system may include a cache miss handler, in which case retrieving the data by the cache may include making calls to the cache miss handler by the cache.
  • a data storage system including a storage processor configured to manage storage of data in at least one data store and a cache, wherein the storage processor is configured to periodically store an index identifying storage locations associated with contents of the cache in a first data store and, upon a restart of the data storage system, retrieve the index from the last snapshot stored prior to the restart.
  • the storage processor and/or the cache is configured to retrieve data from storage locations identified in the index from a second data store and store the retrieved data in the cache.
  • the first data store and the second data store may be the same data store or may be different data stores.
  • the first data store and/or the second data store may include a set of block storage devices.
  • the index may identify physical storage locations and/or virtual storage addresses. Retrieving data from storage locations identified in the index may involve translating a virtual storage address to a physical storage address for the second data store.
  • the cache may include an application program interface, in which case the cache may provide the index via the application program interface and the storage processor may store the snapshot of the index in the second data store. Retrieving data from storage locations identified in the index may involve retrieving the index from the second data store by the storage processor, providing the retrieved index by the storage processor to the cache via the application program interface, and retrieving the data by the cache.
  • the storage processor may include a cache miss handler, in which case retrieving the data by the cache may include making calls to the cache miss handler by the cache.
  • FIG. 1 is a schematic block diagram of a data storage system, in accordance with one exemplary embodiment of the present invention
  • FIG. 2 is a logic flow diagram for a fast cache reheat process, in accordance with one exemplary embodiment
  • FIG. 3 is a schematic diagram depicting production and storage of the cache index, in accordance with one exemplary embodiment
  • FIG. 4 is a schematic diagram depicting retrieval of the index and reheating of the cache, in accordance with the exemplary embodiment of FIG. 3 ;
  • FIG. 5 is a schematic block diagram of a data storage system, in accordance with an exemplary embodiment of the present invention in which the cache provides an API for managing the cache indexes;
  • FIG. 6 is a schematic transaction flow diagram for fast cache reheat as discussed with reference to FIG. 5, in accordance with one exemplary embodiment.
  • a “data store” is a non- volatile storage system that may contain one or more non-volatile storage devices, such as disk drives, SSD drives, NVRAM, etc.
  • cache refers generally to a memory that is used to temporarily store copies of certain data that is also stored in a data store, for example, to improve the speed of access to such data and/or reduce the number of accesses made to the data store relative to such data.
  • a "snapshot" is a process by which certain information is written to a data store on a periodic basis.
  • peripheral with regard to a snapshot just means that a snapshot is taken from time to time (e.g., roughly every minute, every ten minutes, or every hour, etc.) and does not necessarily mean that the snapshot is taken at precise periodic intervals.
  • a “data storage system” can be any system that stores data in one or more data stores and also includes a cache.
  • the data store(s) may be integral to the data storage system or external to the data storage system (e.g., accessed via a communication interface).
  • a data storage system may be a file server, a NAS device, a disk array system, a disk drive, a computer, etc.
  • a "restart" of a data storage system may include any event for which fast cache reheat is performed in a particular embodiment. Without limitation, examples of such events include power-on or power cycle, soft reboot of the system, or reboot of a hardware or software component of the system that manages a cache.
  • a “block storage device” is a type of data store that includes a block storage interface.
  • block level storage devices may include certain disk drives, SSD drives, storage appliances, storage arrays, etc.
  • a “set” includes one or more members.
  • Embodiments of the present invention allow for fast cache reheat by periodically storing a snapshot of information identifying the contents of the cache at the time of the snapshot (referred to hereinafter as an "index" of the contents of the cache), and then using the information from the last snapshot to restore the contents of the cache following an event that causes loss or corruption of cache contents such as a loss of power or system reset. Since there can be a time gap between the taking of a snapshot and such an event, the actual contents of the cache, and hence the corresponding data stored in a data store, may have changed since the last snapshot was taken.
  • the index stored at the last snapshot is used to retrieve current data from the data store for use in restoring the contents of the cache, as opposed to periodically storing the actual contents of the cache and restoring those contents back to the cache.
  • the ability to rapidly reheat the cache with valid contents allows the system to immediately come back online at full operational performance so as to avoid the penalties due to cache misses that would result from restarting with a cold cache.
  • the snapshots do not contain the actual cached data but instead contain the identity of the cached data.
  • the cache's indexing information rather than the actual data in the cache are stored in the snapshot.
  • the cache contains the contents of certain disk blocks
  • the snapshots would contain information identifying the disk blocks rather than the content of those disk blocks.
  • the cache is reheated, the current contents of the disk blocks are retrieved and loaded into cache and so the cache will contain the current contents of the blocks even if the contents had changed since the last snapshot was taken.
  • FIG. 1 is a schematic block diagram of a data storage system 100, in accordance with one exemplary embodiment of the present invention.
  • the data storage system 100 includes a storage processor 102, one or more data stores 104, and a cache 106. Based on any of a variety of caching schemes (some of which may be driven by the storage processor 102, some of which may be driven by hardware external to the storage processor 102), certain data from the data store(s) is stored in the cache 106 based on accesses to that data.
  • the storage processor 102 includes various hardware and/or software components configured to manage storage of data in the data store(s) 104, such as, for example, managing the physical and logical storage constructs of a file system managed in the data store(s) 104 such as clusters, blocks, and zones.
  • FIG. 2 is a logic flow diagram for a fast cache reheat process, in accordance with one exemplary embodiment.
  • the storage processor 102 runs a timer 21 1 to determine when to take snapshots of the cache index.
  • the storage processor 102 determines that it is time to take a snapshot of the cache index (YES in block 211)
  • the storage processor 102 produces an index of the cache contents, in block 212, and stores the index in the Data Store(s) 104, in block 213.
  • the logic returns from block 213 to block 211 to await the next snapshot interval.
  • the system enters a restart state 220 in which the cache is reheated, specifically by retrieving the index from the last snapshot, in block 221, and retrieving data from the Data Store(s) 104 based on the index and re- populating the cache with the retrieved data, in block 222.
  • the logic moves from block 222 to the normal state 210, and more specifically to block 21 1 to await the next snapshot interval.
  • the snapshots may be stored in the same data store as the data represented in the cache or may be stored in a different data store than the data represented in the cache.
  • the snapshots may be stored in a NVRAM or portion of battery-backup RAM, while the data may be stored in a disk drive or disk array.
  • FIG. 3 is a schematic diagram depicting production and storage of the cache index, in accordance with one exemplary embodiment.
  • the cache 106 contains the contents of two blocks, specifically data X from Block X 304 and data Y from Block Y 306.
  • an index 302 that identifies the contents of the cache is produced as represented by the arrow 308, and the index 302 is stored in the data store(s) 104, as represented by the arrow 310.
  • FIG. 4 is a schematic diagram depicting retrieval of the index and reheating of the cache, in accordance with the exemplary embodiment of FIG. 3.
  • the index 302 is retrieved from the data store(s) 104, as indicated by the arrow 408.
  • the index 302 is used to retrieve the contents of Block X 304 and Block Y 306 from the data store(s) 104 and store the retrieved data in the cache 106, as indicated by the arrow 410.
  • the data in Block Y 306 was changed from a state Y to a state Y'.
  • the cache 106 is repopulated with contents X and Y'.
  • the data in the snapshots are described in the same address space as the indexes of the cache itself. For but one example, if address translations for certain host logical blocks (e.g., host logical blocks 100 to 200) are stored in a certain disk block (e.g., disk block 652), then information from which that disk block can be accessed (e.g., a physical or logical address, a block number, etc.) is stored in the snapshot rather than storing the actual address translations from the disk block. This further ensures that if the usage of a block previously used to describe host logical block address (LBA) translations is changed after a snapshot is taken, the cache will not load stale translations when reheated.
  • LBA host logical block address
  • disk block 652 may be repurposed after the most recent snapshot was taken. In this case, the contents of disk block 652 would still be loaded into cache as part of the reheating process. However, since disk block 652 no longer describes the address translations for host LBAs 100 to 200, one of two things may happen. Either disk block 652 contains host LBA translations for a different range or it contains some other data not related to host LBA translations. In the former case, the data loaded during reheat is valid for the cache and may be accessed or aged out normally. In the latter case, the data loaded is not valid for the cache, i.e., it is not data of the same type as the rest of the data in the cache. This would seem to be a potential cause for corruption.
  • the cache since the cache is indexed in terms of the same address space stored in the snapshots, it can be demonstrated that whatever index table is mapping host LBA ranges to disk blocks must no longer contain an entry for the stale block. In other words, the cache will never be asked to return data for disk block 652, at least not until that disk block has once again be repurposed and rewritten with valid translation data.
  • data to be restored to the cache may be indexed using physical addresses, virtual addresses, file names, file handles, data store object numbers, and/or other information that allows the data to be retrieved from the data store(s) 104.
  • data to be restored to the cache may be indexed using a virtual address space, such as addresses in the form of a zone and offset tuple (e.g., zone 96, offset 16384), in which case the snapshot saved to the data store would contain those same addresses.
  • a virtual address space such as addresses in the form of a zone and offset tuple (e.g., zone 96, offset 16384), in which case the snapshot saved to the data store would contain those same addresses.
  • any virtualized addresses would be converted into physical addresses, e.g., disk number and block(s), in order to retrieve the corresponding data from the data store(s) 104 and repopulate the cache 106 with the retrieved data.
  • transactional data may be stored in a physical or logical transactional storage tier.
  • Transactional performance is heavily gated by the hit rate on cluster access table (CAT) records, which are stored in nonvolatile storage, and which translate between logical host addresses and the corresponding locations of clusters in storage zones.
  • CAT cluster access table
  • ZMDT Zone MetaData Tracker
  • the ZMDT memory After a system restart, the ZMDT memory will naturally be empty and so transactional I O will pay the large penalty of cache misses caused by the additional I/O required to load the array's metadata.
  • the addresses of the cluster lookup table (CLT) sectors in the ZMDT cache may be stored during a snapshot, allowing those CLT sectors to be pre-loaded after a restart so as to enable the system to boot with an instantly hot ZMDT cache.
  • the data that needs to be saved is already in the cache's index structure, implemented in an exemplary embodiment as a splay tree.
  • the storage processor 102 does not need to have knowledge of the internal structure or workings of the cache 106, but rather the cache 106 manages the index and reheating of the cache contents based on the index. Specifically, in order for the storage processor 102 to take a snapshot of the cache index, the cache 106 provides the index to the storage processor 102 via an application program interface (API), and the storage processor stores the index in the data store(s) 104.
  • API application program interface
  • the storage processor 102 retrieves the index from the last snapshot and provides the index to the cache 106, which uses data in the snapshot to make calls to a cache miss handler of the storage processor 102 in order to effect the repopulation required for reheat, i.e., the cache 106 can simply look up the required data as if it were being asked to do so by one if its usual consumers.
  • FIG. 5 is a schematic block diagram of a data storage system 500, in accordance with an exemplary embodiment of the present invention in which the cache 106 provides an API for managing the cache indexes as discussed above.
  • the cache 106 presents an API 508 through which the storage processor 102 obtains the cache index from the cache 106 and provides a cache index back to the cache 106 for cache reheat.
  • the storage processor 102 may produce the cache index in block 212 of FIG. 2 by making a call to the cache 106 via the API 508, and the storage processor 102 provide a cache index to the cache 106 for reheat by similarly making a call to the cache 106 via the API 508.
  • FIG. 6 is a schematic transaction flow diagram for fast cache reheat as discussed with reference to FIG. 5, in accordance with one exemplary embodiment.
  • the cache 106 provides a cache index to the storage processor 102 via the API 508, and, in transaction (2), the storage processor 102 stores the cache index in the data store(s) 104.
  • Transactions (1) and (2) can be repeated numerous times during normal operation of the data storage system.
  • the storage processor 102 retrieves the cache index for the last snapshot from the data store(s) 104 in transaction (3) and provides the cache index to the cache 106 via the API 508.
  • Transaction (5) represents the calls made by the cache 106 to the cache miss handler of the storage processor 102 based on the cache index, the retrieval of data by the storage processor 102 from the data store(s) in response to the calls from the cache 106, and the storage processor 102 providing the retrieved data to the cache 106 for fast cache reheat.
  • Double- ended arrows generally indicate that activity may occur in both directions (e.g., a command/request in one direction with a corresponding reply back in the other direction, or peer-to-peer communications initiated by either entity), although in some situations, activity may not necessarily occur in both directions.
  • Single-ended arrows generally indicate activity exclusively or predominantly in one direction, although it should be noted that, in certain situations, such directional activity actually may involve activities in both directions (e.g., a message from a sender to a receiver and an acknowledgement back from the receiver to the sender, or establishment of a connection prior to a transfer and termination of the connection following the transfer).
  • the type of arrow used in a particular drawing to represent a particular activity is exemplary and should not be seen as limiting.
  • Such devices typically include one or more network interfaces for communicating over a communication network and a processor (e.g., a microprocessor with memory and other peripherals and/or application-specific hardware) configured accordingly to perform device functions.
  • a processor e.g., a microprocessor with memory and other peripherals and/or application-specific hardware
  • Communication networks generally may include public and/or private networks; may include local- area, wide-area, metropolitan-area, storage, and/or other types of networks; and may employ communication technologies including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies.
  • communication technologies including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies.
  • devices may use communication protocols and messages (e.g., messages created, transmitted, received, stored, and/or processed by the device), and such messages may be conveyed by a communication network or medium.
  • a communication message generally may include, without limitation, a frame, packet, datagram, user datagram, cell, or other type of communication message.
  • references to specific communication protocols are exemplary, and it should be understood that alternative embodiments may, as appropriate, employ variations of such communication protocols (e.g., modifications or extensions of the protocol that may be made from time-to-time) or other protocols either known or developed in the future.
  • logic flows may be described herein to demonstrate various aspects of the invention, and should not be construed to limit the present invention to any particular logic flow or logic implementation.
  • the described logic may be partitioned into different logic blocks (e.g., programs, modules, functions, or subroutines) without changing the overall results or otherwise departing from the true scope of the invention.
  • logic elements may be added, modified, omitted, performed in a different order, or implemented using different logic constructs (e.g., logic gates, looping primitives, conditional logic, and other logic constructs) without changing the overall results or otherwise departing from the true scope of the invention.
  • the present invention may be embodied in many different forms, including, but in no way limited to, computer program logic for use with a processor (e.g., a microprocessor, microcontroller, digital signal processor, or general purpose computer), programmable logic for use with a programmable logic device (e.g., a Field Programmable Gate Array (FPGA) or other PLD), discrete components, integrated circuitry (e.g., an Application Specific Integrated Circuit (ASIC)), or any other means including any combination thereof.
  • Computer program logic implementing some or all of the described functionality is typically implemented as a set of computer program instructions that is converted into a computer executable form, stored as such in a computer readable medium, and executed by a
  • Hardware-based logic implementing some or all of the described functionality may be implemented using one or more appropriately configured FPGAs.
  • Source code may include a series of computer program instructions implemented in any of various programming languages (e.g., an object code, an assembly language, or a high-level language such as Fortran, C, C++, JAVA, or HTML) for use with various operating systems or operating environments.
  • the source code may define and use various data structures and communication messages.
  • the source code may be in a computer executable form (e.g., via an interpreter), or the source code may be converted (e.g., via a translator, assembler, or compiler) into a computer executable form.
  • Computer program logic implementing all or part of the functionality previously described herein may be executed at different times on a single processor (e.g., concurrently) or may be executed at the same or different times on multiple processors and may run under a single operating system process/thread or under different operating system processes/threads.
  • the term "computer process” refers generally to the execution of a set of computer program instructions regardless of whether different computer processes are executed on the same or different processors and regardless of whether different computer processes run under the same operating system process/thread or different operating system processes/threads.
  • the computer program may be fixed in any form (e.g., source code form, computer executable form, or an intermediate form) either permanently or transitorily in a tangible storage medium, such as a semiconductor memory device (e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM), a magnetic memory device (e.g., a diskette or fixed disk), an optical memory device (e.g., a CD-ROM), a PC card (e.g., PCMCIA card), or other memory device.
  • a semiconductor memory device e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM
  • a magnetic memory device e.g., a diskette or fixed disk
  • an optical memory device e.g., a CD-ROM
  • PC card e.g., PCMCIA card
  • the computer program may be fixed in any form in a signal that is transmittable to a computer using any of various communication technologies, including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies.
  • the computer program may be distributed in any form as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the communication system (e.g., the Internet or World Wide Web).
  • Hardware logic including programmable logic for use with a programmable logic device
  • implementing all or part of the functionality previously described herein may be designed using traditional manual methods, or may be designed, captured, simulated, or documented electronically using various tools, such as Computer Aided Design (CAD), a hardware description language (e.g., VHDL or AHDL), or a PLD programming language (e.g., PALASM, ABEL, or CUPL).
  • CAD Computer Aided Design
  • a hardware description language e.g., VHDL or AHDL
  • PLD programming language e.g., PALASM, ABEL, or CUPL
  • Programmable logic may be fixed either permanently or transitorily in a tangible storage medium, such as a semiconductor memory device (e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM), a magnetic memory device (e.g., a diskette or fixed disk), an optical memory device (e.g., a CD-ROM), or other memory device.
  • a semiconductor memory device e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM
  • a magnetic memory device e.g., a diskette or fixed disk
  • an optical memory device e.g., a CD-ROM
  • the programmable logic may be fixed in a signal that is transmittable to a computer using any of various communication technologies, including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies.
  • the programmable logic may be distributed as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the communication system (e.g., the Internet or World Wide Web).
  • a computer system e.g., on system ROM or fixed disk
  • a server or electronic bulletin board over the communication system
  • some embodiments of the invention may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments of the invention are implemented as entirely hardware, or entirely software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present invention allow for fast cache reheat by periodically storing a snapshot of information identifying the contents of the cache at the time of the snapshot, and then using the information from the last snapshot to restore the contents of the cache following an event that causes loss or corruption of cache contents such as a loss of power or system reset. Since there can be a time gap between the taking of a snapshot and such an event, the actual contents of the cache, and hence the corresponding data stored in a data store, may have changed since the last snapshot was taken. Thus, the information stored at the last snapshot is used to retrieve current data from the data store for use in restoring the contents of the cache.

Description

FAST CACHE REHEAT
CROSS-REFERENCE TO RELATED APPLICATION(S)
This application claims priority from United States Patent Application No. 13/790,163 entitled FAST CACHE REHEAT filed on March 8, 2013, which is hereby incorporated herein by reference in its entirety.
FIELD OF THE INVENTION
The present invention relates generally to data storage systems, and, more particularly, to initialization of a cache memory following a system reset or other event.
BACKGROUND OF THE INVENTION
A cache is commonly used in a computer system to provide fast access to part of a dataset. Typically, a cache memory is significantly faster than the main data store, often by more than an order of magnitude. Cache memories are usually quite small relative to a larger data store from which their contents are obtained. For example, a CPU may have a cache of 2MiB used to accelerate access to 16GiB of DRAM, or a 4TiB hard disk may have 64MiB of DRAM as its cache.
Because of the large disparity in the size of a typical cache and the dataset being accelerated, the choice of which data to cache, and when, is critical. Equally critical is the choice of which data to evict from a cache and when such an eviction should take place.
A computer system may have several levels of cache, perhaps of differing speed and size, and also may have several types of cache. Some caches may be generic and able to hold any data in the system, e.g. a processor L2 cache, and some caches may be specialized and able to only hold very specific types of data, e.g. a processors translation look-aside buffer used to hold only address translation tables. Some caches are built from special hardware, e.g. processor L2 and TLB caches, while other caches may be ordinary DRAM used to accelerate access to data normally held on a slower medium, e.g. a magnetic disk. Some caches may hold data that are expected to cycle through very quickly (e.g. a processor L2 cache, Host Logical Block information) and some hold data that may stay in cache for a long time (e.g., some page address translations in a TLB, Cluster Lookup Translations).
Some caches hold datasets that may take considerable time to build, e.g., translations from host address blocks in frequently accessed data to internal locations in a complex virtualized disk system. For example, address translations for a large database file, e.g. a mail server's index database, may take hours or even days to build into a hot set. However, once those translations are cached, some of them may remain stable for a considerable time, days or even weeks. Maintaining such long lived cached data can provide a significant performance win but also can be difficult as older cached data will naturally tend to age out and be replaced with more recently accessed, but transient, data. Splitting the cache into parts and driving each part with an algorithm tuned specifically for its duty, e.g. long lived versus highly transient data, can successfully alleviate this premature eviction issue.
Those data frequently accessed so as to be held in a cache are often referred to as a "hot set" or "hot data." As the set of hot data changes, the data in the cache will be accessed less frequently and the data in the main store will be accessed more frequently. This can be viewed as a cooling of the temperature of the data in the cache and is a sign that some data should be evicted from the cache in order to make way for new, hotter, data to be cached.
Under certain circumstances, such as a loss of power or a system reset, contents of a cache may be lost or corrupted. When this happens, any cached information that was long-lived in the previous boot of the system are lost and the cache must be repopulated. In many cases, the same data are once again brought into cache as the data are accessed, but the cache miss penalty must be paid on the first access to each piece of data, and performance suffers as a result. For datasets such as a mail server's index, this loss of performance can persist for many hours as heat is slowly built back into the cache.
Some systems attempt to avoid such loss or corruption of cache contents by using non-volatile cache memories (e.g., NVRAM, DRAM with battery backup, etc.).
Other systems attempt to avoid such loss or corruption of cache contents by periodically storing a copy (snapshot) of the contents of the cache and, when needed, restoring the contents of the cache to the contents of the last snapshot. However, since there can be a time gap between storing a snapshot copy of the contents of the cache and an event that causes loss or corruption of cache contents such as a loss of power or system reset, the actual contents of the cache, and hence the corresponding data stored in a data store, may have changed since the last snapshot was taken.
Restoring a stale snapshot into a cache would likely cause stale data to be loaded into the cache with disastrous consequences. For example, if an address translation had changed between the time of the last snapshot and the time of a system restart, the snapshot would contain stale data.
SUMMARY OF EXEMPLARY EMBODIMENTS
In one embodiment there is provided a method for fast cache reheat in a data storage system. The method involves periodically storing, in a first data store, a snapshot of an index identifying storage locations associated with contents of a cache, and, upon a restart of the data storage system, retrieving the index from the last snapshot stored prior to the restart, retrieving, from a second data store, data from storage locations identified in the index, and storing the retrieved data in the cache.
In various alternative embodiments, the first data store and the second data store may be the same data store or may be different data stores. The first data store and/or the second data store may include a set of block storage devices. The index may identify physical storage locations and/or virtual storage addresses. Retrieving data from storage locations identified in the index may involve translating a virtual storage address to a physical storage address for the second data store.
In certain embodiments, the cache may include an application program interface, in which case the cache may provide the index via the application program interface. Retrieving data from storage locations identified in the index may involve providing the retrieved index to the cache via the application program interface and retrieving the data by the cache. The data storage system may include a cache miss handler, in which case retrieving the data by the cache may include making calls to the cache miss handler by the cache.
In another embodiment there is provided a data storage system including a storage processor configured to manage storage of data in at least one data store and a cache, wherein the storage processor is configured to periodically store an index identifying storage locations associated with contents of the cache in a first data store and, upon a restart of the data storage system, retrieve the index from the last snapshot stored prior to the restart. The storage processor and/or the cache is configured to retrieve data from storage locations identified in the index from a second data store and store the retrieved data in the cache.
In various alternative embodiments, the first data store and the second data store may be the same data store or may be different data stores. The first data store and/or the second data store may include a set of block storage devices. The index may identify physical storage locations and/or virtual storage addresses. Retrieving data from storage locations identified in the index may involve translating a virtual storage address to a physical storage address for the second data store.
In certain embodiments, the cache may include an application program interface, in which case the cache may provide the index via the application program interface and the storage processor may store the snapshot of the index in the second data store. Retrieving data from storage locations identified in the index may involve retrieving the index from the second data store by the storage processor, providing the retrieved index by the storage processor to the cache via the application program interface, and retrieving the data by the cache. The storage processor may include a cache miss handler, in which case retrieving the data by the cache may include making calls to the cache miss handler by the cache.
Additional embodiments may be disclosed and claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing features of embodiments will be more readily understood by reference to the following detailed description, taken with reference to the accompanying drawings, in which:
FIG. 1 is a schematic block diagram of a data storage system, in accordance with one exemplary embodiment of the present invention;
FIG. 2 is a logic flow diagram for a fast cache reheat process, in accordance with one exemplary embodiment;
FIG. 3 is a schematic diagram depicting production and storage of the cache index, in accordance with one exemplary embodiment;
FIG. 4 is a schematic diagram depicting retrieval of the index and reheating of the cache, in accordance with the exemplary embodiment of FIG. 3 ; FIG. 5 is a schematic block diagram of a data storage system, in accordance with an exemplary embodiment of the present invention in which the cache provides an API for managing the cache indexes; and
FIG. 6 is a schematic transaction flow diagram for fast cache reheat as discussed with reference to FIG. 5, in accordance with one exemplary embodiment.
It should be noted that the foregoing figures and the elements depicted therein are not necessarily drawn to consistent scale or to any scale. Unless the context otherwise suggests, like elements are indicated by like numerals.
DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
Definitions. As used in this description and the accompanying claims, the following terms shall have the meanings indicated, unless the context otherwise requires:
A "data store" is a non- volatile storage system that may contain one or more non-volatile storage devices, such as disk drives, SSD drives, NVRAM, etc.
The term "cache" refers generally to a memory that is used to temporarily store copies of certain data that is also stored in a data store, for example, to improve the speed of access to such data and/or reduce the number of accesses made to the data store relative to such data.
A "snapshot" is a process by which certain information is written to a data store on a periodic basis.
The term "periodic" with regard to a snapshot just means that a snapshot is taken from time to time (e.g., roughly every minute, every ten minutes, or every hour, etc.) and does not necessarily mean that the snapshot is taken at precise periodic intervals.
A "data storage system" can be any system that stores data in one or more data stores and also includes a cache. The data store(s) may be integral to the data storage system or external to the data storage system (e.g., accessed via a communication interface). Without limitation, a data storage system may be a file server, a NAS device, a disk array system, a disk drive, a computer, etc.
A "restart" of a data storage system may include any event for which fast cache reheat is performed in a particular embodiment. Without limitation, examples of such events include power-on or power cycle, soft reboot of the system, or reboot of a hardware or software component of the system that manages a cache.
A "block storage device" is a type of data store that includes a block storage interface. Without limitation, block level storage devices may include certain disk drives, SSD drives, storage appliances, storage arrays, etc.
A "set" includes one or more members.
Embodiments of the present invention allow for fast cache reheat by periodically storing a snapshot of information identifying the contents of the cache at the time of the snapshot (referred to hereinafter as an "index" of the contents of the cache), and then using the information from the last snapshot to restore the contents of the cache following an event that causes loss or corruption of cache contents such as a loss of power or system reset. Since there can be a time gap between the taking of a snapshot and such an event, the actual contents of the cache, and hence the corresponding data stored in a data store, may have changed since the last snapshot was taken. Thus, the index stored at the last snapshot is used to retrieve current data from the data store for use in restoring the contents of the cache, as opposed to periodically storing the actual contents of the cache and restoring those contents back to the cache. The ability to rapidly reheat the cache with valid contents allows the system to immediately come back online at full operational performance so as to avoid the penalties due to cache misses that would result from restarting with a cold cache.
Thus, in order to ensure proper cache coherency, the snapshots do not contain the actual cached data but instead contain the identity of the cached data. In other words, the cache's indexing information rather than the actual data in the cache are stored in the snapshot. When the cache is reheated, the data described by the snapshot is reloaded into the cache, thus ensuring only up-to-date data are loaded.
For example, if the cache contains the contents of certain disk blocks, then the snapshots would contain information identifying the disk blocks rather than the content of those disk blocks. When the cache is reheated, the current contents of the disk blocks are retrieved and loaded into cache and so the cache will contain the current contents of the blocks even if the contents had changed since the last snapshot was taken.
FIG. 1 is a schematic block diagram of a data storage system 100, in accordance with one exemplary embodiment of the present invention. Among other things, the data storage system 100 includes a storage processor 102, one or more data stores 104, and a cache 106. Based on any of a variety of caching schemes (some of which may be driven by the storage processor 102, some of which may be driven by hardware external to the storage processor 102), certain data from the data store(s) is stored in the cache 106 based on accesses to that data. For example, as blocks of data are read from the data store(s), those blocks may be stored in the cache 106 so that future accesses to those blocks can be satisfied from the cache 106 rather than from the data store(s) 104, increasing the speed of those accesses while reducing the load on the data store(s) 104. The storage processor 102 includes various hardware and/or software components configured to manage storage of data in the data store(s) 104, such as, for example, managing the physical and logical storage constructs of a file system managed in the data store(s) 104 such as clusters, blocks, and zones.
FIG. 2 is a logic flow diagram for a fast cache reheat process, in accordance with one exemplary embodiment. During a normal operational state 210, the storage processor 102 runs a timer 21 1 to determine when to take snapshots of the cache index. When the storage processor 102 determines that it is time to take a snapshot of the cache index (YES in block 211), the storage processor 102 produces an index of the cache contents, in block 212, and stores the index in the Data Store(s) 104, in block 213. The logic returns from block 213 to block 211 to await the next snapshot interval. When a restart occurs, the system enters a restart state 220 in which the cache is reheated, specifically by retrieving the index from the last snapshot, in block 221, and retrieving data from the Data Store(s) 104 based on the index and re- populating the cache with the retrieved data, in block 222. The logic moves from block 222 to the normal state 210, and more specifically to block 21 1 to await the next snapshot interval. It should be noted that the snapshots may be stored in the same data store as the data represented in the cache or may be stored in a different data store than the data represented in the cache. Thus, for example, the snapshots may be stored in a NVRAM or portion of battery-backup RAM, while the data may be stored in a disk drive or disk array.
FIG. 3 is a schematic diagram depicting production and storage of the cache index, in accordance with one exemplary embodiment. In this example, the cache 106 contains the contents of two blocks, specifically data X from Block X 304 and data Y from Block Y 306. When a snapshot is taken, an index 302 that identifies the contents of the cache is produced as represented by the arrow 308, and the index 302 is stored in the data store(s) 104, as represented by the arrow 310.
FIG. 4 is a schematic diagram depicting retrieval of the index and reheating of the cache, in accordance with the exemplary embodiment of FIG. 3. Here, the index 302 is retrieved from the data store(s) 104, as indicated by the arrow 408. The index 302 is used to retrieve the contents of Block X 304 and Block Y 306 from the data store(s) 104 and store the retrieved data in the cache 106, as indicated by the arrow 410. In this example, however, between the time the index 302 was produced and stored and the time the index was retrieved, the data in Block Y 306 was changed from a state Y to a state Y'. Thus, rather than repopulating the cache 106 with contents X and Y, the cache 106 is repopulated with contents X and Y'.
In certain exemplary embodiments, the data in the snapshots are described in the same address space as the indexes of the cache itself. For but one example, if address translations for certain host logical blocks (e.g., host logical blocks 100 to 200) are stored in a certain disk block (e.g., disk block 652), then information from which that disk block can be accessed (e.g., a physical or logical address, a block number, etc.) is stored in the snapshot rather than storing the actual address translations from the disk block. This further ensures that if the usage of a block previously used to describe host logical block address (LBA) translations is changed after a snapshot is taken, the cache will not load stale translations when reheated. In the example above, disk block 652 may be repurposed after the most recent snapshot was taken. In this case, the contents of disk block 652 would still be loaded into cache as part of the reheating process. However, since disk block 652 no longer describes the address translations for host LBAs 100 to 200, one of two things may happen. Either disk block 652 contains host LBA translations for a different range or it contains some other data not related to host LBA translations. In the former case, the data loaded during reheat is valid for the cache and may be accessed or aged out normally. In the latter case, the data loaded is not valid for the cache, i.e., it is not data of the same type as the rest of the data in the cache. This would seem to be a potential cause for corruption. However, since the cache is indexed in terms of the same address space stored in the snapshots, it can be demonstrated that whatever index table is mapping host LBA ranges to disk blocks must no longer contain an entry for the stale block. In other words, the cache will never be asked to return data for disk block 652, at least not until that disk block has once again be repurposed and rewritten with valid translation data.
The above example refers to the cache index, and therefore the contents of the reheat snapshot, as single, or ranges of, disk blocks, e.g., disk block 652. However, it should be noted that the present invention is not limited to any particular type of indexing scheme. Thus, for example, data to be restored to the cache may be indexed using physical addresses, virtual addresses, file names, file handles, data store object numbers, and/or other information that allows the data to be retrieved from the data store(s) 104.
For another example, data to be restored to the cache may be indexed using a virtual address space, such as addresses in the form of a zone and offset tuple (e.g., zone 96, offset 16384), in which case the snapshot saved to the data store would contain those same addresses. Upon retrieval of the cache index from the data store(s) 104, any virtualized addresses would be converted into physical addresses, e.g., disk number and block(s), in order to retrieve the corresponding data from the data store(s) 104 and repopulate the cache 106 with the retrieved data.
For yet another example, in certain exemplary embodiments as described in United States Patent Application No. 13/363,740, transactional data may be stored in a physical or logical transactional storage tier. Transactional performance is heavily gated by the hit rate on cluster access table (CAT) records, which are stored in nonvolatile storage, and which translate between logical host addresses and the corresponding locations of clusters in storage zones. The system maintains a cache of CAT records in the Zone MetaData Tracker (ZMDT) cache. A cache miss forces an extra read from disk for the host I O, thereby essentially nullifying any advantage from storing data in a higher-performance transactional zone. Thus, in order to deliver reasonable transactional performance, the system effectively must sustain a high hit rate from this cache.
After a system restart, the ZMDT memory will naturally be empty and so transactional I O will pay the large penalty of cache misses caused by the additional I/O required to load the array's metadata. Using fast cache reheat as described herein, the addresses of the cluster lookup table (CLT) sectors in the ZMDT cache may be stored during a snapshot, allowing those CLT sectors to be pre-loaded after a restart so as to enable the system to boot with an instantly hot ZMDT cache. In this exemplary embodiment, the data that needs to be saved is already in the cache's index structure, implemented in an exemplary embodiment as a splay tree.
In certain exemplary embodiments, the storage processor 102 does not need to have knowledge of the internal structure or workings of the cache 106, but rather the cache 106 manages the index and reheating of the cache contents based on the index. Specifically, in order for the storage processor 102 to take a snapshot of the cache index, the cache 106 provides the index to the storage processor 102 via an application program interface (API), and the storage processor stores the index in the data store(s) 104. When a restart occurs, the storage processor 102 retrieves the index from the last snapshot and provides the index to the cache 106, which uses data in the snapshot to make calls to a cache miss handler of the storage processor 102 in order to effect the repopulation required for reheat, i.e., the cache 106 can simply look up the required data as if it were being asked to do so by one if its usual consumers.
FIG. 5 is a schematic block diagram of a data storage system 500, in accordance with an exemplary embodiment of the present invention in which the cache 106 provides an API for managing the cache indexes as discussed above.
Specifically, the cache 106 presents an API 508 through which the storage processor 102 obtains the cache index from the cache 106 and provides a cache index back to the cache 106 for cache reheat. Thus, the storage processor 102 may produce the cache index in block 212 of FIG. 2 by making a call to the cache 106 via the API 508, and the storage processor 102 provide a cache index to the cache 106 for reheat by similarly making a call to the cache 106 via the API 508.
FIG. 6 is a schematic transaction flow diagram for fast cache reheat as discussed with reference to FIG. 5, in accordance with one exemplary embodiment. In transaction (1), the cache 106 provides a cache index to the storage processor 102 via the API 508, and, in transaction (2), the storage processor 102 stores the cache index in the data store(s) 104. Transactions (1) and (2) can be repeated numerous times during normal operation of the data storage system. Upon a system restart, the storage processor 102 retrieves the cache index for the last snapshot from the data store(s) 104 in transaction (3) and provides the cache index to the cache 106 via the API 508. Transaction (5) represents the calls made by the cache 106 to the cache miss handler of the storage processor 102 based on the cache index, the retrieval of data by the storage processor 102 from the data store(s) in response to the calls from the cache 106, and the storage processor 102 providing the retrieved data to the cache 106 for fast cache reheat.
It should be noted that headings are used above for convenience and are not to be construed as limiting the present invention in any way.
It should be noted that arrows may be used in drawings to represent communication, transfer, or other activity involving two or more entities. Double- ended arrows generally indicate that activity may occur in both directions (e.g., a command/request in one direction with a corresponding reply back in the other direction, or peer-to-peer communications initiated by either entity), although in some situations, activity may not necessarily occur in both directions. Single-ended arrows generally indicate activity exclusively or predominantly in one direction, although it should be noted that, in certain situations, such directional activity actually may involve activities in both directions (e.g., a message from a sender to a receiver and an acknowledgement back from the receiver to the sender, or establishment of a connection prior to a transfer and termination of the connection following the transfer). Thus, the type of arrow used in a particular drawing to represent a particular activity is exemplary and should not be seen as limiting.
It should be noted that terms such as "data storage system," "file server," "NAS device," "disk drive," and "computer" may be used herein to describe devices that may be used in certain embodiments of the present invention and should not be construed to limit the present invention to any particular device type unless the context otherwise requires. Such devices typically include one or more network interfaces for communicating over a communication network and a processor (e.g., a microprocessor with memory and other peripherals and/or application-specific hardware) configured accordingly to perform device functions. Communication networks generally may include public and/or private networks; may include local- area, wide-area, metropolitan-area, storage, and/or other types of networks; and may employ communication technologies including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies.
It should also be noted that devices may use communication protocols and messages (e.g., messages created, transmitted, received, stored, and/or processed by the device), and such messages may be conveyed by a communication network or medium. Unless the context otherwise requires, the present invention should not be construed as being limited to any particular communication message type, communication message format, or communication protocol. Thus, a communication message generally may include, without limitation, a frame, packet, datagram, user datagram, cell, or other type of communication message. Unless the context requires otherwise, references to specific communication protocols are exemplary, and it should be understood that alternative embodiments may, as appropriate, employ variations of such communication protocols (e.g., modifications or extensions of the protocol that may be made from time-to-time) or other protocols either known or developed in the future.
It should also be noted that logic flows may be described herein to demonstrate various aspects of the invention, and should not be construed to limit the present invention to any particular logic flow or logic implementation. The described logic may be partitioned into different logic blocks (e.g., programs, modules, functions, or subroutines) without changing the overall results or otherwise departing from the true scope of the invention. Often times, logic elements may be added, modified, omitted, performed in a different order, or implemented using different logic constructs (e.g., logic gates, looping primitives, conditional logic, and other logic constructs) without changing the overall results or otherwise departing from the true scope of the invention.
The present invention may be embodied in many different forms, including, but in no way limited to, computer program logic for use with a processor (e.g., a microprocessor, microcontroller, digital signal processor, or general purpose computer), programmable logic for use with a programmable logic device (e.g., a Field Programmable Gate Array (FPGA) or other PLD), discrete components, integrated circuitry (e.g., an Application Specific Integrated Circuit (ASIC)), or any other means including any combination thereof. Computer program logic implementing some or all of the described functionality is typically implemented as a set of computer program instructions that is converted into a computer executable form, stored as such in a computer readable medium, and executed by a
microprocessor under the control of an operating system. Hardware-based logic implementing some or all of the described functionality may be implemented using one or more appropriately configured FPGAs.
Computer program logic implementing all or part of the functionality previously described herein may be embodied in various forms, including, but in no way limited to, a source code form, a computer executable form, and various intermediate forms (e.g., forms generated by an assembler, compiler, linker, or locator). Source code may include a series of computer program instructions implemented in any of various programming languages (e.g., an object code, an assembly language, or a high-level language such as Fortran, C, C++, JAVA, or HTML) for use with various operating systems or operating environments. The source code may define and use various data structures and communication messages. The source code may be in a computer executable form (e.g., via an interpreter), or the source code may be converted (e.g., via a translator, assembler, or compiler) into a computer executable form.
Computer program logic implementing all or part of the functionality previously described herein may be executed at different times on a single processor (e.g., concurrently) or may be executed at the same or different times on multiple processors and may run under a single operating system process/thread or under different operating system processes/threads. Thus, the term "computer process" refers generally to the execution of a set of computer program instructions regardless of whether different computer processes are executed on the same or different processors and regardless of whether different computer processes run under the same operating system process/thread or different operating system processes/threads.
The computer program may be fixed in any form (e.g., source code form, computer executable form, or an intermediate form) either permanently or transitorily in a tangible storage medium, such as a semiconductor memory device (e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM), a magnetic memory device (e.g., a diskette or fixed disk), an optical memory device (e.g., a CD-ROM), a PC card (e.g., PCMCIA card), or other memory device. The computer program may be fixed in any form in a signal that is transmittable to a computer using any of various communication technologies, including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies. The computer program may be distributed in any form as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the communication system (e.g., the Internet or World Wide Web). Hardware logic (including programmable logic for use with a programmable logic device) implementing all or part of the functionality previously described herein may be designed using traditional manual methods, or may be designed, captured, simulated, or documented electronically using various tools, such as Computer Aided Design (CAD), a hardware description language (e.g., VHDL or AHDL), or a PLD programming language (e.g., PALASM, ABEL, or CUPL).
Programmable logic may be fixed either permanently or transitorily in a tangible storage medium, such as a semiconductor memory device (e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM), a magnetic memory device (e.g., a diskette or fixed disk), an optical memory device (e.g., a CD-ROM), or other memory device. The programmable logic may be fixed in a signal that is transmittable to a computer using any of various communication technologies, including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies. The programmable logic may be distributed as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the communication system (e.g., the Internet or World Wide Web). Of course, some embodiments of the invention may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments of the invention are implemented as entirely hardware, or entirely software.
The present invention may be embodied in other specific forms without departing from the true scope of the invention. Any references to the "invention" are intended to refer to exemplary embodiments of the invention and should not be construed to refer to all embodiments of the invention unless the context otherwise requires. The described embodiments are to be considered in all respects only as illustrative and not restrictive.

Claims

What is claimed is:
1. A method for fast cache reheat in a data storage system, the method comprising:
periodically storing, in a first data store, a snapshot of an index identifying storage locations associated with contents of a cache;
upon a restart of the data storage system, retrieving the index from the last snapshot stored prior to the restart;
retrieving, from a second data store, data from storage locations identified in the index; and
storing the retrieved data in the cache.
2. A method according to claim 1, wherein the first data store and the second data store are the same data store.
3. A method according to claim 1, wherein the first data store and the second data store are different data stores.
4. A method according to claim 1, wherein at least one of the first data store or the second data store includes a set of block storage devices.
5. A method according to claim 1, wherein the index identifies physical storage locations.
6. A method according to claim 1 , wherein the index identifies virtual storage addresses.
7. A method according to claim 6, wherein retrieving data from storage locations identified in the index comprises:
translating a virtual storage address to a physical storage address for the second data store.
8. A method according to claim 1, wherein the cache includes an application program interface, and wherein the cache provides the index via the application program interface.
9. A method according to claim 8, wherein retrieving data from storage locations identified in the index comprises:
providing the retrieved index to the cache via the application program interface; and
retrieving the data by the cache.
10. A method according to claim 9, wherein the data storage system includes a cache miss handler, and wherein retrieving the data by the cache includes making calls to the cache miss handler by the cache.
11. A data storage system comprising:
a storage processor configured to manage storage of data in at least one data store; and
a cache, wherein:
the storage processor is configured to periodically store an index identifying storage locations associated with contents of the cache in a first data store and, upon a restart of the data storage system, retrieve the index from the last snapshot stored prior to the restart; and
at least one of the storage processor and the cache is configured to retrieve data from storage locations identified in the index from a second data store and store the retrieved data in the cache.
12. A system according to claim 1 1, wherein the first data store and the second data store are the same data store.
13. A system according to claim 1 1, wherein the first data store and the second data store are different data stores.
14. A system according to claim 1 1, wherein at least one of the first data store or the second data store includes a set of block storage devices.
15. A system according to claim 1 1, wherein the index identifies physical storage locations.
16. A system according to claim 1 1, wherein the index identifies virtual storage addresses.
17. A system according to claim 16, wherein retrieving data from storage locations identified in the index comprises:
translating a virtual storage address to a physical storage address for the second data store.
18. A system according to claim 1 1, wherein:
the cache includes an application program interface;
the cache provides the index to the storage processor via the application program interface; and
the storage processor stores the snapshot of the index in the second data store.
19. A system according to claim 18, wherein retrieving data from storage locations identified in the index comprises:
retrieving the index from the second data store by the storage processor; providing the retrieved index by the storage processor to the cache via the application program interface; and
retrieving the data by the cache.
20. A system according to claim 19, wherein the storage processor includes a cache miss handler, and wherein retrieving the data by the cache includes making calls to the cache miss handler by the cache.
PCT/US2014/021136 2013-03-08 2014-03-06 Fast cache reheat WO2014138370A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/790,163 US10922225B2 (en) 2011-02-01 2013-03-08 Fast cache reheat
US13/790,163 2013-03-08

Publications (1)

Publication Number Publication Date
WO2014138370A1 true WO2014138370A1 (en) 2014-09-12

Family

ID=51491942

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/021136 WO2014138370A1 (en) 2013-03-08 2014-03-06 Fast cache reheat

Country Status (1)

Country Link
WO (1) WO2014138370A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060107006A1 (en) * 2002-01-22 2006-05-18 Green Robbie A Persistent snapshot management system
WO2007021997A2 (en) * 2005-08-18 2007-02-22 Emc Corporation Snapshot indexing
US20070260846A1 (en) * 2003-05-16 2007-11-08 Burton David A Methods of prefetching data in data storage systems
US20100082547A1 (en) * 2008-09-22 2010-04-01 Riverbed Technology, Inc. Log Structured Content Addressable Deduplicating Storage
WO2011082138A1 (en) * 2009-12-31 2011-07-07 Commvault Systems, Inc. Systems and methods for performing data management operations using snapshots

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060107006A1 (en) * 2002-01-22 2006-05-18 Green Robbie A Persistent snapshot management system
US20070260846A1 (en) * 2003-05-16 2007-11-08 Burton David A Methods of prefetching data in data storage systems
WO2007021997A2 (en) * 2005-08-18 2007-02-22 Emc Corporation Snapshot indexing
US20100082547A1 (en) * 2008-09-22 2010-04-01 Riverbed Technology, Inc. Log Structured Content Addressable Deduplicating Storage
WO2011082138A1 (en) * 2009-12-31 2011-07-07 Commvault Systems, Inc. Systems and methods for performing data management operations using snapshots

Similar Documents

Publication Publication Date Title
US10922225B2 (en) Fast cache reheat
CN109074317B (en) Adaptive deferral of lease for an entry in a translation look-aside buffer
US10176057B2 (en) Multi-lock caches
US9940023B2 (en) System and method for an accelerator cache and physical storage tier
US10540279B2 (en) Server-based persistence management in user space
KR102273622B1 (en) Memory management to support huge pages
US11119923B2 (en) Locality-aware and sharing-aware cache coherence for collections of processors
US9514054B2 (en) Method to persistent invalidation to ensure cache durability
US10078588B2 (en) Using leases for entries in a translation lookaside buffer
CN109240945B (en) Data processing method and processor
US9977760B1 (en) Accessing data on distributed storage systems
JP2018504694A (en) Cache accessed using virtual address
JP2018504694A5 (en)
US10049036B2 (en) Reliable distributed messaging using non-volatile system memory
US9229869B1 (en) Multi-lock caches
WO2014011481A1 (en) Solid state drives as a persistent cache for database systems
US10152422B1 (en) Page-based method for optimizing cache metadata updates
EP3072052A1 (en) Memory unit and method
Zhang et al. “Anti-Caching”-based elastic memory management for Big Data
US9830081B2 (en) System and method for synchronizing caches after reboot
WO2014138370A1 (en) Fast cache reheat
CN107423232B (en) FTL quick access method and device
KR20160068481A (en) Mobile device and management method of mobile device
Alwadi High Performance and Secure Execution Environments for Emerging Architectures

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14760917

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14760917

Country of ref document: EP

Kind code of ref document: A1