GB2493243A - Determining hot data in a storage system using counting bloom filters - Google Patents

Determining hot data in a storage system using counting bloom filters Download PDF

Info

Publication number
GB2493243A
GB2493243A GB1210250.5A GB201210250A GB2493243A GB 2493243 A GB2493243 A GB 2493243A GB 201210250 A GB201210250 A GB 201210250A GB 2493243 A GB2493243 A GB 2493243A
Authority
GB
United Kingdom
Prior art keywords
counters
text
data entity
subset
counter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1210250.5A
Other versions
GB2493243B (en
GB201210250D0 (en
Inventor
Xiao-Yu Hu
Loannis Koltsidas
Roman Pletka
Robert Haas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB201210250D0 publication Critical patent/GB201210250D0/en
Publication of GB2493243A publication Critical patent/GB2493243A/en
Application granted granted Critical
Publication of GB2493243B publication Critical patent/GB2493243B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Complex Calculations (AREA)

Abstract

Determining a characteristic of a data entity based on a frequency of access to said data entity in a storage system using a counting bloom filter (CBF') comprising a set (S') of counters (C1); and a data structure having a set of elements each corresponding to a counter. To avoid counter overflow the counting bloom filter is operated for an interval in time wherein the set of counters are reset at the start of the interval. Each time said data entity is accessed during the interval a value of at least one counter (C1) to which said data entity is mapped in the counting bloom filter is increased. At the end of the interval the values of the elements in the data structure are updated based on the current value of that element and the value of the counter to which it is assigned. The interval in time may be a predefined number of accesses. A plurality of counting bloom filters can be used. The method may produce a heat map which is used for selectively populating a cache with â hotâ data or controlling data placement of â hotâ data in fastest storage tier of a tiered storage system.

Description

METHOD AND STORAGE CONTROLLER FOR DETERMINING AN ACCESS
CHARACTERISTIC OF A DATA ENTITY
FIELD OF THE INVENTION
The present invention relates to methods and storage controllers for determining a characteristic of a data entity which characteristic is based on a frequency of access to said data entity in a storage system.
BACKGROUND
In the following a characteristic of a data entity representing a frequency at which the data entity is accessed at a relative basis also is denoted as a temperature of such data entity.
Determining the temperature of a particular data entity, including in particular its logical address, is a long-standing challenge in storage systems. The temperature of a particular data entity refers to its relative frequency of references, which may include read or write accesses to its peers in the same storage system. A collection of temperature information for the whole storage system is also referred to as a heat map. A data entity is often called "hot" if it is frequently accessed, or "cold" if it is infrequently accessed. The temperature may measure quantitatively how frequently and how recently a data entity is accessed.
A simple and straightforward way to determine the temperature of data entities is to use a counter for each data entity to keep track of the number of references. However, this may be memory inefficient for large-capacity storage systems. In order to shrink the memory footprint of the heat map, a popular solution is to use a counter for a group of contiguous data entities, that is, track the temperature of data at a coarser granularity.
BRIEF SUMMARY OF THE INVENTION
According to a first aspect of the invention, a method is provided for determining a characteristic of a data entity which characteristic is based on a frequency of access to said data entity in a storage system. A counting bloom filter is provided for being operated for an interval in time, which counting bloom filter comprises a set of counters. A data structure is provided the data structure comprising a set of ekments wherein each element of the set of elements is assigned to a counter of the set of counters. The characteristic of said data entity is determined subject to a value of at least one element of the set of elements.
For each individual interval in time the counting bloom filter is operated the counters of the set of counters are reset prior to or at a beginning of the individual interval in time, -a value of at least one counter of a subset of counters to which subset of counters said data entity is mapped in the counting bloom filter is increased each time said data entity is accessed during the individual interval in time, and -the value of each individual element of the set of elements is updated ator after an end of the individual interval in time, wherein the value of the individual element is updated subject to a value the counter assigned to the individual clement holds at the end of the individual interval in time and subject to a present value of the individual element.
In embodiments, this method may comprise one or more of the following features: -the counting bloom filter is operated muhiple times for consecutive intervals in time; -the value of the individual element is updated subject to a weighted value the counter assigned to the individual element holds at the end of the individual interval in time and subject to a weighted present value of the individual element; -the value of the individual element is updated by the value the counter assigned to the individual element holds at the end of the individual interval in time which value is weighted by a factor a, plus the present value of the individual element which present value is weighted by a factor 1-a; -the factor a has a value between 0.75 and 0.95; -said data entity is mapped to the subset of counters by means of one or more hash functions; -the subset of counters comprises multiple counters to which said data entity is mapped in the counting bloom filter, and wherein only the value of a single counter in the subset is increased, which single counter is the counter in the subset that presently shows a lowest value amongst the muhiple counters in the subset; -each element of the set of elements is assigned to a single counter of the set of counters, and wherein each counter of the set of counters is assigned to a single element of the set of elements; -the subset of counters comprises multiple counters to which said data entity is mapped in the counting bloom filter, a subset of elements contains elements which are assigned to the counters of the subset of counters, and the characteristic of said data entity is determined subject to the value of one or more elements of the subset of elements; -the characteristic of said data entity is determined subject to the value of the element that shows the lowest value amongst the multiple elements in the subset of elements; -accessing said data entity includes at least one of reading said data entity and updating said data entity; -said data entity represents data addressed by a single logical block address; -subject to the determined characteristic of said data entity, said data entity is selected for being cached; -subject to the determined characteristic of said data entity, said data entity is selected for a being stored in a dedicated tier in a tiered storage system.
According to a second aspect of the present invention, a method is provided for determining a characteristic of a data entity which characteristic is based on a frequency of access to said data entity in a storage system. A first counting bloom filter is provided for being active for a first interval in time, which first counting bloom filter comprises a set of first counters. Each time said data entity is accessed during the first interval in time increasing a value of at least one first counter of a subset of first counters to which subset of first counters said data entity is mapped in the first counting bloom filter is increased. A second counting bloom filter is provided for being active for a second interval in time, which second counting bloom filter comprises a set of second counters. Each time the data entity is accessed during the second interval in time a value of at least one second counter of a subset of second counters to which subset of second counters said data entity is mapped in the second counting bloom filter is increased. The characteristic of the data entity is determined subject to a value of at least one first counter of the subset of first counters at the end of the first interval in time and subject to a value of at least one second counter of the subset of the second counters at the end of the second interval in time.
In embodiments, this method may comprise one or more of the following features: -overall n counting bloom filters are provided each of which n counting bloom filters being active for an associated interval in time, which associated intervals in time follow each other; each of the n counting bloom filters is operated according to the first or second counting bloom filter each time said data entity is accessed during the associated interval in time; and the characteristic of said data entity is determined subject to, for each of then counting bloom filters, a value of at least one counter of a subset of counters associated with said data entity in the respective counting bloom filter at the end of the associated interval in time; -the characteristic of said data entity is determined based on an average of the counter values selected from the n counting bloom filters; -said data entity is mapped to the subset of first counters by mcans of one or more hash ftrnctions, and said data entity is mapped to the subset of second counters by means of the o same one or more hash functions; -the subset of first counters comprises multiple first counters to which said data entity is mapped in the first counting bloom filter; only the value of a single first counter in the subset is increased, which single first counter is the first counter in the subset that presently shows a lowest value amongst the multiple first counters in the subset; and the subset of second counters comprises multiple second counters to which said entity is mapped in the second counting bloom filter; only the value of a single second counter in the subset is increased, which single second counter is the second counter that presently shows a lowest value amongst the muhiple second counters in the subset; -the subset of first counters comprises multiple first counters to which said data entity is mapped in the first counting bloom filter, the subset of second countcrs comprises multiple second counters to which said entity is mapped in the second counting bloom filter; the characteristic of said data entity is determined subject to a value of a dedicated first counter of the subset of first counters which dedicated first counter is the first counter that shows the lowest value amongst the multiple first counters in the subset at the end of the first interval in time, and subject to a value of a dedicated second counter of the subset of second counters which dedicated second counter is the second counter that shows the lowest value amongst the multiple second counters in thc subset at the end of the second interval in time; -accessing said data entity includes at least one of reading said data entity and updating said data entity; -said data entity represents data addressed by a single logical block address; -subject to the determined characteristic of said data entity, said data entity is selected for being cached; -subject to the determined characteristic of said data entity, said data entity is selected for a being stored in a dedicated tier in a tiered storage system.
A further aspect of the invention refers to a computer program product comprising a computer readable medium having computer readable program code embodied therewith, the computer readable program code comprising computer readable program code configured to perform a method according to any one of the preceding aspects or embodiments.
A further aspect of the invention refers to a storage controller for determining a characteristic of a data entity which characteristic is based on a frequency of access to said data entity in a storage system, comprising a control unit adapted to execute a method according to any one of the preceding aspects or embodiments.
It is understood that method steps may be executed in a different order than listed in a method claim. Such different order shall also be inc'uded in the scope of such claim as is the order of steps as presently listed.
Embodiments described in relation to the aspect of an apparatus shall also be considered as embodiments disclosed in connection with any of the other categories such as the method, the computer program product, etc.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention and its embodiments will be more frilly appreciated by reference to the following detailed description of presently preferred but nonetheless illustrative embodiments in accordance with the present invention when taken in conjunction with the accompanying drawings.
The figures are illustrating: FIG. I, a diagram of a timing sequence of counting bloom filters applied according to an embodiment of the present invention; FIG. 2 a diagram of a first counting bloom filter applied according to an embodiment of the present invention; FIG. 3 a diagram of a second counting bloom filter applied according to an embodiment of the present invention; FIG. 4 a diagram of a tiered storage system; FIG. 5 a flow chart of a method according to an embodiment of the present invention; and FIG. 6 a flow chart of a method according to another embodiment of the present invention.
DETAILED DESCRIPTION OF THE EMBODIMENTS
As an introduction to the following description, it is fir st pointed at general aspects of the invention, concerning methods and controllers for determining a characteristic of a data entity which characteristic is based on a frequency of access to the data entity. Such methods and storage controllers make use of one or more bloom filters specifically adapted to the present application, and specifically make use of one or more counting bloom filters.
A bloom filter may be regarded as a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters achieve space savings at the cost of allowing false positives; the probability of a false positive, however, can be bound to a sufficiently low value. Bloom filters were introduced by Burton Bloom in the l970s, and since then have found widespread adoption in database applications as well as networking. A bloom filter may be regarded as a method for representing a set S=/si, s2 s/ of elements from a universe U, by using a bit vector Vofrn = 0(n) bits. All the bits in the vector V are initially set to 0. The bloom filter may use k hash functions, h1, h, hA for mapping elements from U to the range /1, 2 inJ. For each elements inS, the bits at positions his), h2(s) hk(c) in Vare set to I. To query for an element, i.e. test whether the element is in the set, the element preferably is fed to each one of the k hash functions to get k bit positions. If any of the bits at these positions are 0, the clement is not in the set -if it were, then all the bits would have been set to I when it was inserted. If all the identified bit positions are I, then either the clement is in the set, or the bits have been set to I during the insertion of other elements; the latter case is called a false positive. The probability for an error due to a false positive depends on the selection of the parameters in, Ic. This probability is minimized for klog2(in/n). The bloom filter may be considered as highly effective even for m=cn using a small constant c. For c=8, for example, the false positive error rate is slightly higher than 2%.
Inserting a new element into a bloom filter, i.e. inserting a new element into the set of elements, is accomplished by the following steps: hash the new element k times by means of the /c hash functions and set the bits resulting from this hashing to I. However, a deletion of an clement from the sct may not be achieved by reversing the process. If the elcment to be deleted is hashed and the corresponding bits are set to 0, a bit position may be set to 0 that is hashed to by some other element in the set. To avoid this problem, the idea of a counting bloom filter was developed in the art. In a counting bloom filter, each bit position in the Noom filter is not represented by a single bit but rather by a counter. When a new element is inserted into the set, the corresponding counters are incremented; when an element is deleted from the set, the corresponding counters are deeremented. In order to avoid counter overflow, the counters are designed to be sufficiently large. For example, four bits per counter may suffice for most applications.
In present storage applications, counting bloom filters may not be suited to be used directly for generating heat maps because counting bloom filters arc inherently short-term. As more and more data entities arc requested, i.e. added to the storage system, their corresponding counters are incremented, which counters are of size and may eventually overflow.
Hence, memory-and computation-efficient methods are proposed to estimate a characteristic related to the frequency of access of any data entity in a storage system.
According to the first aspect of a method, preferably a single counting bloom filter comprising a set of counters preferably is repetitively applied to a sequence of individual intervals in time to capture the frequency of access of a, and preferably any data entity in each of these intervals in time. Preferably, the sequence of individual intervals in time form a continuous period in time. Hence, for a specific data entity each time said data entity is accessed during the individual interval in time a value of at least one counter is increased which counter is part of a subset of counters to which subset of counters said data entity is mapped preferably by means of one or more hash firnctions.
In addition, a data structure is provided which data structure comprises a set of elements.
Preferably, each element out of the set of elements is associated with a dedicated, single counter out of the set of counters and, preferably, each counter out of the set of counters is assigned only to a dedicated, single element.
When the method is started the counting bloom filter is started for the fir st time to operate for a first interval in time. At such point in time or prior to that, all the counters of the set of
S
counters and all the elements of the set of elements are preferably reset, i.e. in a specific embodiment are set to value zero. Accordingly, zero values represent the present values of elements and counters at the beginning of the first interval in time. However, during the first interval in time, the counters may be increased subject to data entities accessed such that at the end of the first interval in time the counter values may represent indicators how often various data entities were accessed during this first interval in time, in contrast, element values typically do not change during an interval in time.
At, after, or in response to the end of the first interval in time, one or more, and preferably all of the present values of elements in the set of elements are updated. Such update includes for an individual clement to have a new value assigned wherein the new value is depcndcnt on a value of the counter assigned to the individual element and on the present value of the individual element, For determining the characteristic of a specific data entity at any point in time, the set of elements preferably is queried. Hash functions assigned to the subject data entity are applied and result in specific subset of counters and/or a specific subset of elements respectively.
From values of elements of the subset of elements at the given point in time the characteristic may be derived from.
In this embodiment, the counting bloom filter may be applied to relatively short intervals in time wherein the counting bloom filter may not run the risk of being blocked by a counter overflow. The data structure including the set of elements is used for determining an average of counter values over multiple intervals in time. Hence, in a preferred embodiment, the present value of an element of the data structure represents an average of previous counter values at the end of intervals in time of the associated counter. Upon expiration of another interval in time, the counter value reached at the end of such interval in time preferably is set into relation to the present average value. In a preferred embodiment, this is achieved by weighting the present element value by a factor close to 1, and by weighting the new counter value by a factor close to zero, and by adding both weighted values. By such means, only a single counting bloom filter is needed together with a data structure holding long-term averaged counter values.
In this respect, the data structure may also be interpreted as a "long-term counting bloom filter" since it holds element values representing the timing average of associated counter values of the counting bloom filter which counters of the counting bloom filter arc limited in size. Once the long-term counting bloom filter is updated, the short-term counting bloom filter preferably is reset by initializing all counters of the set to zero and a subsequent interval of time starts. The characteristic of a data entity may preferably be determined by reading out a minimal element value among those elements indexed by hash values of an LBA of said data entity.
According to the second aspect of a method, a first counting bloom fiher is applied only for a limited interval in time before another counting bloom filter is applied to address a subsequent interval in time. Out of the two or more bloom filters each reflecting access patterns to corresponding data entities during the associated intervals, an averaging routine may preferably average the counting bloom filter results achieved at the end of each interval, i.e. average counter values representing such a multitude of counting bloom filter results over time. It is further noted, that the results of the counting bloom filters are averaged by selecting the counter values of each counting bloom filter that corresponds to the data entity which access frequency shall be determined and which counter values may preferably be averaged.
With respect to all aspects, it is noted that an increase of a counter or counter value may also include any other modification of the counter or counter value that may allow for an estimate of the number/frequency of accesses to the corresponding data entity.
The counting bloom filter in the first aspect may preferably use for each interval in time the same set of/c independent hash functions for populating the counters that arc determined as a result of hashed data entities. The counting bloom filters in the second aspect, and specifically the first counting bloom filter and thc second counting bloom filter, may use the same set of k independent hash functions for populating the counters that are determined as a rcsuh of hashed data entities.
Preferably, a counting bloom filter is maintained over a span of requests which number of requests defines the interval in time the counting bloom filter is active.
The long-term counting bloom filter in both aspects is preferably represented by a smoothed or exponentially moving average of a number or all past short-term counting bloom filters, which long-term counting bloom filter may be used as a heat map. The temperature of a partiefflar data entity is obtained by querying the long-term counting bloom filter. In this respect, a temperature of a data entity again denotes its relative frequency of references, which may include read or write accesses to its peers in the same storage system, which temperature may be one of the characteristics of interest to be determined in a storage system.
In particular, the entire temperature information for an entire storage system may also be referred to as a heat map. A data entity is often called "hot" if it is frequently accessed, or 0 "cold" if it is infrequently accessed or updated. The temperature measures quantitatively how frequently and how recently a data entity is accessed. 1-lowever, a characteristic based on access frequency to/of a data entity may in another embodiment refer to its absolute access frequency/numbers.
A sample data entity may preferably be a data chunk that is addressed by a logical block address (LEA).
In Figure 1, a timing sequence of counting bloom filters CBIP1 to CBF4 applied according to an embodiment of the present invention an application is illustrated. A first counting bloom filter CBF1 is applied during a first interval in time t1 -to, a second counting bloom filter CBF2 is applied during a second interval in time t2 -t1, a third counting bloom filter CBF3 is applied during a third interval in time t -t2 and a fourth counting bloom filter CBF4 is applied during a fourth interval in time t4 -t;. Overall n counting bloom filters CBF may be applied each of which being active during an associated time intervaL According to the first aspect of the present invention, all counting bloom filters CBF' to CBF4 my physically be represented by a single counting bloom filter CHF being reused and restarted right at the end of each interval in time, which restart may preferably include a prior reset of its counters.
Preferably, the time intervals do not overlap and a subsequent time interval follows the preceding time interval without a gap in between. Each time interval may be of defined limited length, which defined length, for example, may be represented by a pre-defined number of accesses during such interval in time. As a result, the various time intervals may not necessarily be of equal length. The pre-defined number of accesses may be chosen to be a largest possible before a majority of the counters C of the corresponding counting bloom filter CBF have overflown. Furthermore, even the number of accesses for individual time intervals may be unequal.
Hence, a multiple use a single counting bloom filter, or, alternatively, a single use of multiple counting bloom filters is considered, each being active during a specific time interval, as shown in Fig. 1. For the latter aspect, at the beginning of each time interval, a new counting bloom filter CBFX is initialized with all counters of such counting bloom filter CBF being set to zero. For the first aspect, the single counting bloom filter is initialized with all counters of such counting bloom filter CBF being set to zero at the beginning of each new interval in time.
A first counting bloom filter CBF' is depicted in Figure 2. A number m of first counters C10 to C1m.j build a set S' of first counters assigned to the first counting bloom filter CBF'. An input value which in the present case may be a logical block address LBA representing a data entity is mapped by preferably multiple hash functions hl(LBA), h2(LBA) hk(LBA) -with k=2 in the present example -to k first counters C1 out of the set S' of m first counters C1.
This means that two different hash functions are applied to each LBA in the present case once such LBA is accessed by a host, the storage system itself or any other entity. In the present example, the LBA of value I is hashed to first counters C10 and C'11., The LBA of value 4 is hashed to first countcrs C', and C14. The LBA of value 5 is hashed to first counters C'3 and C'5. Hence, a subset of two first counters C1 out of the set S1 of first counters C' is assigned to each data entry represented by an EBA. With each access of an LBA, the corresponding first counters C' of its subset are incremented. If k hash functions are applied for building the first counting bloom filter CBF1, i.e. for mapping each data entry to k first counters C1, a subset of first counters C1 typically consists of k first counters C'. In another embodiment, only a single first counter C1 out of the subset of first counters C' is incremented for each access to the corresponding data entity, which preferably is the first counter C' out of the subset of first counters C1 with the lowest value. The rational for such embodiment is to accommodate more accesses in this short-term CBF without overflowing of its counters and to increase the accuracy of the frequency estimation.
The first counting bloom filter according to Figure 2 may be used in the single counting bloom filter application repetitively.
A second counting bloom filter CBF2 as may bc uscd in the multiple counting bloom filter application is depicted in Figure 3. Basically, the second counting bloom filter is identical to the first counting bloom filter CBF' in its structure. A set S2 of m second counters C2 contains second counters C20 to C2mj which arc assigned to the second counting bloom filter CBF2.
The input value which again is a logical block address LBA accessed during a second interval in time during which interval the second counting bloom fiher CBF2 is active, is mapped by thc same k hash ifinctions as used in the first counting bloom filter CBF1, i.e. hash flmctions h I (LBA), b2(LBA), hk(LBA) -with k=2 -to second counters C2 out of the set 2 of m second counters C2. Two different hash functions are applied to each LBA in the present case once such LBA is accessed by a host, the storage system itself or any other entity. In the present example, the LBA of value I is hashed to second counters C20 and C21. The LBA of value 4 is hashed to second counters C21 and C24. The LBA of value 5 is hashed to second counters C2 and C25. Hence, a subset of two second counters C2 out of the set 2 of m second counters C2 is assigned to each data entry represented by an LBA. With each access of an LBA, the corresponding second counters C2 of its subset arc incremented. If k hash frmnctions are applied for building the second counting bloom filter CBF2, i.e. for mapping each data entry to k second counters C2, a subset of second counters C2 typically consists of k second counters C2. In another embodiment, only a single second counter C2 out of the subset of second counters C2 is incremented for each access to the corresponding data entity, which preferably is the second counter C2 out of the subset of second counters C2 with the lowest value. The rational for such embodiment is to accommodate more accesses in this short-term CBF without overflowing of its counters and to increase the accuracy of the frequency estimation.
In the present example, the second interval in time in which the second counting bloom filter is applied, is defined by allowing the same given number of data entity accesses as is used for defining thc length of the first interval in time.
In the same way, n counting short term bloom filters CBF may be applied for covering a large time interval according to Figure 1. Preferably, the counter values of each counter of a counting bloom filter CBE' at the end of its associated time interval arc stored. Let C/be the value of the i-th counter in the /-th counting bloom filter CBF, then the value of the i-th counter C2 of the long-term counting bloom filter CBF can be obtained by averaging C? son all short-term counting bloom filters CBF' to CBF, namely, The resulting counter value C1 may then be used as a temperature of a related data entity. By determining all counter values Co to C,,,1 a heat map of the underlying storage system can be achieved. Such counter value C1 may also be denoted more generally as an element of a data structure which data structure supports the averaging of the individual counter values. The temperature of a specific data entity can be determined by hashing its LBA k times resulting in a subset of k counter values out of C0 to Crnj what is also denoted as long-term counting bloom filter, and taking the minimum value out of the subset of k counter values as the estimated temperature of the corresponding data entity.
In another preferred embodiment of implementing a long-term counting bloom filter, a smoothed or exponential moving average of all past short-term counting bloom filter values is used. As a result, it only may be tracked the single short-term counting bloom filter CBF. The single short-term counting bloom filter CBF is reused for each new epoch, i.e. each new interval in time, and is initialized to zero at the beginning of that epoch, i.e. its counters C are set to zero. Again, C/denotes the value of the i-th counter of the set of counters. reached at the end of the most recent interval in timej which interval in time j may just have been terminated. Note that counter values of more previous intervals in time no longer are accessible since only a single counting bloom filter is used. The updated value of /-th element C1 of the set of elements can be obtained by weighting the assigned counter value C/ and by adding the weighted present value of the i-th element C1, for example by using one of the following rules: C =acç+(l-a)C/ C. = [C! +(J-1)C1 I
I
where &cj are weighting factors, typically set to 0.75 0.95. This operation preferably is performed for all elements C out of the set for elements C0 to resulting in m element values. Once the current short-term counting bloom filter CBF is merged into the associated set of elements, all its counters are reset to zero. Hence, only a single counting bloom filter may be used for covering data entity accesses for the current interval of time. Upon expiry of the interval in time, the associated data structure is updated by applying the counter values to the assigned element values. Then, the counting bloom filter is reset by initializing all its counters to zero and a new interval of time is started for which the counting bloom filter is operated from new.
In this way, only a single short-term counting filter and a data structure are needed and thus the RAM requirement is drastically reduced.
The advantages of the periodically-updated data structure are twofold. First it requires a main memory size of only two stored counting bloom filters CBFs, thus drastically reducing the memory requirement. Secondly, the proposed long-term counting bloom filters CBF can adapt to the changing dynamics of workloads thanks to the use of exponential moving average.
An accurate estimation of the temperature of a given data entity can help improve the performance and/or cost efficiency of storage systems. This information can be incorporated into one or more of a cache, a tiered storage system, or a Flash memory based device. For example, "hot" data, once identified, can be inserted into a cache to improve a cache hit rate and thus performance. A hierarchical, i.e. a tiered storage system comprises of at least two storage media: one is typically expensive but fast, while the other is typically inexpensive but slower. "Hot" data, once identified, can be stored on the expensive but fast storage medium in a first tier of the tiered storage system while "cold" data can be stored on the larger-capacity, inexpensive but slower storage medium in a second tier of the tiered storage medium, aiming at a high performance at a lower cost. When a flash memory device is used as the storage medium, data of similar updating frequency may preferably be stored in the same flash erase unit in order to minimize write amplification.
The present idea may be applicable to any system that may benefit from tracking the value of a metric/characteristic for a very large population of data entities over a long time, while using a very small amount of memory.
In a preferred embodiment of the present invention, the present method may be applied for selectively populating a cache, and may preferably also be applied for deciding on block evictions from the cache. A cache typically is a portion of memory space that holds data entities frequently accessed in order to reduce access latency by avoiding multiple accesses to the underlying storage medium. A cache may be implemented as a read cache, a write cache, or a combined read and write cache.
Especially when a cache may be implemented on flash memory, filtering data entities that populate the cache is crucial: populating the cache with "cold" data entities not only pollutes the cache and may force potentially "hot" data entities out of the cachc, but also may result in a large number of flash writes, typically random ones. The latter results in a much lower 0 cache performance, as it severely decreases the throughput of the cache and increases the latency of other read and write requests executed in parallel. Moreover, a high rate of writes to the flash cache results in the flash chips wearing out sooner and, therefore, to a shorter lifetime of the device.
The present method can be used to efficiently maintain a cache by using a long term counting bloom filter CBF, i.e. an averaging means over counter values stemming from counting bloom filters applied to limited period in times. A corresponding storage controller may maintain a long-term CBF over the whole storage system address space at block granularity, i.e. a data entity representing a data block, i.e. the temperature of all blocks in the system is tracked. On each and every access to a block, the system updates its temperature in the short-term CBF. At the same time, the storage controller may preferably keep track of the lowest temperature found in data blocks the cache.
In response to a request for accessing a data block, if such data block is found in the cache, it is served from the cache. Assuming that a data block is requested for access and is not found in the cache, the system reads the block from the underlying storage medium which may be in one embodiment an 1-IDD array. Subsequently, the storage controller uses the current short-term counting bloom filter CBF and the long-term counting bloom filter CBF to get a measure of the temperature of the block. If that temperature is higher than the minimum temperature in the cache, the block is admitted to the cache, i.e. a copy of the block is written to the cache, and specifically to flash memory in case the cache is embodied as a flash cache. Otherwise the block is served to the user, but is not stored in the cache.
When admitting a block to the cache, it may be the case that the cache is fill, that is, a cached block needs to be evicted before the new block can be written into the cache. Then, the system may or may not use the counting bloom filter CBF to select a block to be removed from the cache. Tn the former case, the block with the least temperature in the cache is selected for removal. In the latter case, the system may use any other existing page replacement policy to select a block for removal. That policy can be based on one or more of rcccncy of accesses, on frequency of accesses or any other arbitrary criterion the designer finds suitable. An advantage of this approach is that the internals of the cachc need not be modified.
In another embodiment, a storage system may comprise a storage controller and tiered storage media. Such system is also denoted as tiered storage system. Storage systems comprising multiple tiers of persistent storage with respect to performance and capacity can also benefit from the present approach. In a typical tiered storage system, there is an ordering of storage media according to performance characteristics. Naturally, the more high-performing a storage medium is, the more expensive it is per unit of storage and, consequently, the less its capacity is expected to be. Such a system is shown in the diagram of Figure 4. In this example, the system includes four tiers TO-T3, with a tape storage um being the slowest medium with the most capacity in lowest tier TO, while a flash storage medium is the fastest storage medium with the least capacity amongst the present storage media residing in premium tier T3. In between the two extremes there are two tiers T2 and Tl comprising magnetic disks; the second highest tier T2 comprises SAS disks configured in RAID 5, for example, while the second lowest tier TI comprises SATA disks, configured in a RAID 6 array, for example. Traversing the hierarchy from Tier TO to Tier T3, performance improves both in terms of latency and throughput, while the capacity shrthks.
Typically, in tiered storage systems the total capacity of the storage system is equal to the aggregate capacity of the individual tiers. This effectively means that all tiers are utilized by the system as persistent storage and no block is found in more than one of the tiers at any given time. Specifically, none of the tiers is used as a cache in the hierarchy. Of course, any entity of data can migrate from one tier to some other tier. To achieve a maximum performance, the storage controller of such a tiered storage system aims to store data blocks with the hottest temperature on the fastest tiers, while data blocks with the coldest temperature are pushed down to the less premium tiers.
In such tiered storage system, the present approach of determining temperatures of data entities can be applied by determining the temperature over the whole storage system address space at data block granularity, i.e., the temperature of all data blocks in the storage system are tracked by means of counting bloom filters. On each and every access to a data Mock, the system updates its temperature in the short-term counting bloom filter CBF. At the same time, the system may keep track of the highest and the lowest temperature(s) found in each tier of the system.
On each access to a block currently stored on tierj, the system may use the current short-term counting bloom filter CBF and the long-term counting bloom filter CBF to get a measure of the temperature of the block. If that temperature is higher than the lowest temperature in tier j+J, then a migration is triggered for that block from tierj to tierj+ 1. At the same time, the block with the lowest temperature from tier j+I is demoted to tierj, assuming that Tierj+] is full, i.e., all its blocks have been allocated. Note that as an alternative the block can be moved to any tierj' >j-l-I, if its temperature is found higher that the lowest temperature of ticrj'. A block is demoted to a lower tier preferably when it is replaced by some other block, i.e., it is found to be the coldest block in its current tier. Initially, when a new block is allocated, it is placed in the highest tier that is not thIl yet.
Figure 5 illustrates a flow chart representing a method according to an embodiment of the present invention. In step SO, the method is started by setting a counting bloom filter index I to 1. Instep Si a first counting bloom filter -according to the index Hi -is initiated by setting all counters of the first counting bloom filter to 0. In step 52 a new request is received for accessing a data entity of the present storage medium. Instep 53 it is verified if a first interval in time associated with the first counting bloom filter is expired. If the first interval is not expired (N), the data entity or its identifier respectively, such as the LBA, is fed into the first counting bloom filter, and the subset of corresponding counters identified by hashing the present LBA by means of Ic hash functions arc incremented in step 54. In step 55, the request for access may be served, and optionally, in step 56 the counters of the subset are analyzed in comparison with a lowest temperature value of a data entity in a cache of the storage system.
Then, the storage system continues with step S2 and waits for/receives a new request for data access.
If the first interval is expired/terminated (Y) in step S3, the counter valucs of the fir st counting bloom filter are stored in step 57 and in step 58 new average counter values are determined from all previous counter values. In next step S9, the counting bloom filter index is incremented, and in step SI a next counting bloom filter, i.e. the second counting bloom filter according to the index is initialized.
Figure 6 illustrates a flow chart representing another method according to an embodiment of the present invention. In step SO, the method is started and x elements of a data structure corresponding to x counters of a counting bloom filter are set to zero. Instep Si, the counting bloom filter is initiated by setting all x counters of the counting bloom filter to zero, and the counting bloom filter is started to operate for a defined interval in time which interval in time is started in step Si. in step S2, a new request is received for accessing a data entity of the present storage medium. Instep S3, it is verified if the interval in time the counting bloom filter is expected to operate is already expired. If the interval in time is not expired (N), the data entity, or its respective identifier such as the LBA, is fed into the counting bloom filter, and counters of a subset of counters identified by hashing the present LBA by means of Ic hash thnctions are incremented in step 54. In step 55, the request for access may be served, and optionally, in step S6 the counters of the subset are analyzed in comparison with a lowest temperature value of a data entity in a cache of the storage system. Then, the storage system continues with step S2 and waits for/receives a new request for data access.
If the interval in time is expired/terminated (Y) in step S3, -which may be determined, for example, by having reached a dcfmcd number of data entity accesses -in step 57 new values of elements of the data structure arc determined based on the present counter values and based on the present element values of the first counting bloom filter are stored. Preferably, a new value is determined for each element in the data structure given that each element corresponds to a counter of the counting bloom filter. The new element values arc stored in step S8. In the following step SI, the counter values are reset, and a new interval in time is started. The pending request for access may have been temporarily stored and may be executed during the new interval in time.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention, in particular in form of the controller, may take the form of an entirely hardware embodiment, an entirely software embodimcnt (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," "module" or "system." Furthermore, aspects of the present invention, such as the methods, may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium maybe, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium maybe any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in bascband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remotc computer may be connected to the user's computcr through any type of network, including a local area network (LAN) or a wide area network (WAN), or the 0 connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions.
These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data proccssing apparatus to produce a machine, such that the instructions, which execute via the proccssor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to firnetion in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the flinction'act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical frmnction(s). It should also be noted that, in some alternative implementations, the functions 0 noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the ftmnctionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims (1)

  1. <claim-text>CLAIMSA computer-implemented method for determining a characteristic of a data entity which characteristic is based on a frequency of access to said data entity in a storagc system, comprising providing a counting bloom filter (CBF') for being operated for an interval in time which counting bloom filter (CBF1) comprises a set (S1) of counters (C1), providing a data structure comprising a set of elements wherein each element of the set of elements is assigned to a counter of the set of counters, determining the characteristic of said data entity subject to a value of at least one element of the set of elements, wherein for each individual interval in time the counting bloom filter is operated -the counters of the set of counters are reset prior to or at a beginning of the individual interval in time, -a value of at least one counter (C1) of a subset of counters (C1) to which subset of counters (C1) said data entity is mapped in the counting bloom filter (CBF1) is increased each time said data entity is accessed during the individual interval in time, -the value of each individual element of the set of elements is updated at or after an end of the individual interval in time, wherein the value of the individual element is updated subject to a value the counter assigned to the individual element holds at the end of the individual interval in time and subject to a present value of the individual element.</claim-text> <claim-text>2. The method according to claim 1, wherein the counting bloom filter is operated multiple times for consecutive interva's in time.</claim-text> <claim-text>3. The method according to claim I or claim 2, wherein the value of the individual element is updated subject to a weighted value the counter assigned to the individual element holds at the end of the individual interval in time and subject to a weighted present value of the individual element.</claim-text> <claim-text>4. The method according to claim 3, wherein the value of the individual element is updated by the value the counter assigned to the individual element holds at the end of the individual interval in time which value is weighted by a factor a, plus the prescnt value of the individual element which present value is weighted by a factor 1-a.</claim-text> <claim-text>5. The method according to claim 4, wherein the factor a has a value betwecn 0.75 and 0.95.</claim-text> <claim-text>6. The method according to any one of the prcccding claims, wherein said data cntity is mapped to thc subsct of counters (C') by means of one or more hash functions (h).</claim-text> <claim-text>7. The method according to any one of the preceding claims, whcrein the subset of counters (C1) comprises multipk counters (C') to which said data entity is mapped in the counting bloom filter (CBF'), and wherein only the value of a single counter (C1) in the subset is increased, which single counter (C1) is the counter (C1) in the subset that presently shows a lowest value amongst the multiple counters (C') in the subset.</claim-text> <claim-text>8. The method according to any one of the preceding claims.wherein each element of the set of elements is assigned to a single counter of the set of counters, and wherein each counter of the set of counters is assigned to a single element of the set of elements.</claim-text> <claim-text>9. The method according to any one of the preceding claims, wherein the subset of counters (C1) comprises multiple counters (C') to which said data entity is mapped in the counting bloom filter (CBF1), wherein a subset of elements contains elements which are assigned to the counters of the subset of counters, and wherein the characteristic of said data entity is determined subject to the value of one or more elements of the subset of elements.</claim-text> <claim-text>10. The method according to claim 9, wherein the characteristic of said data entity is determined subject to the value of the element that shows the lowest value amongst the multiple elements in the subset of elements.</claim-text> <claim-text>11. A computer-implemented method for determining a characteristic of a data entity which characteristic is based on a frequency of access to said data entity in a storage system, comprising providing a first counting bloom filter (CBF1) being active for a first interval in time, which first counting bloom filter (CBF1) comprises a set (S5 of first counters (C1), each lime said data entity is accessed during the first interval in time increasing a value of at least one first counter (C') of a subset of first counters (C1) to which subset of first counters (C') said data entity is mapped in the first counting bloom filter (CBF'), providing a second counting bloom filter (CBF2) being active for a second interval in time, which second counting bloom filter (CBF2) comprises a set (S2) of second counters (C2), each time the data entity is accessed during the second interval in time increasing a value of at least one second counter (C2) of a subset of second counters (C2) to which subset of second counters (C2) said data entity is mapped in the second counting bloom filter (CBF2), determining the characteristic of the data entity subject to a value of at least one first counter (C') of the subset of first counters (C1) at the end of the first interval in time, and subject to a value of at least one second counter (C2) of the subset of the second counters (C2) at the end of the second interval in time (CBF2).</claim-text> <claim-text>12. The method according to claim 11, wherein overall n counting bloom filters (CBF) are provided each of which n counting bloom filters (CBF) being active for an associated interval in time, which associated intervals in time follow each other, wherein each of the n counting bloom filters CBF is operated according to the first or second counting bloom filter (CBF,1CBF2) each time said data entity is accessed during the associated interval in time, and wherein the characteristic of said data entity is determined subject to, for each of then counting bloom filters (CBF), a value of at least one counter (C) of a subset of counters (C) associated with said data entity in the respective counting bloom filter (CBF) at the end of the associated interval in time.</claim-text> <claim-text>13. The method according to claim 12, wherein the characteristic of said data entity is determined based on an average of the counter values selected from the n counting bloom filters (CBF).</claim-text> <claim-text>14. The method according to any one of the preceding claims 11 to 13, wherein said data entity is mapped to the subset of first counters (C1) by means of one or more hash functions (h), and wherein said data entity is mapped to the subset of second counters (C2) by means of the same one or more hash functions (h).</claim-text> <claim-text>15. The method according to any one of the preceding claims 11 to 14, wherein the subset of first counters (C') comprises multiple first counters (C1) to which said data entity is mapped in the first counting bloom filter (CBF'), and wherein only the value of a single first counter (C') in the subset is increased, which single first counter (C') is the first counter (C') in the subset that presently shows a lowest value amongst the multiple fir st counters (C2) in the subset, and wherein the subset of second counters (C2) comprises multiple second counters (C2) to which said entity is mapped in the second counting bloom filter (CBF2), and wherein only the value of a single second counter (C2) in the subset is increased, which single second counter (C2) is the second counter (C2) that presently shows a lowest value amongst the multiple second counters (C2) in the subset.</claim-text> <claim-text>16. The method according to any one of the preceding claims 11 to 15, wherein the subset of first counters (C') comprises multiple first counters (C') to which said data entity is mapped in the first counting bloom filter (CBF'), wherein the subset of second counters (C2) comprises multiple second counters (C2) to which said entity is mapped in the second counting bloom filter (CBF2), and wherein the characteristic of said data entity is determined subject to a value of a dedicated first counter (C') of the subset of first counters (C') which dedicated f,rst counter (C') is the first counter (C') that shows the lowest value amongst the multiple first counters (C') in the subsct at the end of the first interval in time, and subject to a value of a dedicated second counter (C2) of the subset of second counters (C2) which dedicated second counter (C2) is the second counter (C2) that shows the lowest value amongst the multiple second counters (C2) in the subset at the end of the second interval in time.</claim-text> <claim-text>17. The method according to any one of the preceding claims, wherein accessing said data entity includes at least one of reading said data entity and updating said data entity.</claim-text> <claim-text>18. The method according to any one of the preceding claims, wherein said data entity represents data addressed by a single logical block address (L BA).</claim-text> <claim-text>19. The method according to any one of thc preceding claims, wherein subject to the determined characteristic of said data entity, said data entity is selected for being cached.</claim-text> <claim-text>20. The method according to any one of the preceding claims, whcrcin subject to the determined characteristic of said data cntity, said data entity is selected for a being stored in a dedicated tier (T) in a tiered storage system.</claim-text> <claim-text>21. A computer program product comprising a computer readable medium having computer readable program code embodied therewith, the computer readable program code comprising computer readable program code configured to pcrform a method according to any onc of thc prcceding claims.</claim-text> <claim-text>22. A storagc controller for determining a characteristic of a data entity which characteristic is bascd on a frequency of access to said data entity in a storagc system, comprising a control unit adaptcd to execute a mcthod according to any onc of the preceding claims Ito 19.</claim-text> <claim-text>23. A computcr-implemented mcthod for determining a characteristic of a data entity substantially as hereinbefore described, with reference to Figures 1-3 and 5-6 of the accompanying drawings.</claim-text> <claim-text>24. A storage controller substantially as hereinbefore described, with reference to Figures 1-3 and 5-6 of the accompanying drawings.</claim-text> <claim-text>25. A computer program substantially as hereinbefore described, with reference to Figures 1-3 and 5-6 of the accompanying drawings.</claim-text>
GB1210250.5A 2011-07-26 2012-06-11 Method and storage controller for determining an access characteristic of a data entity Active GB2493243B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP11175348 2011-07-26

Publications (3)

Publication Number Publication Date
GB201210250D0 GB201210250D0 (en) 2012-07-25
GB2493243A true GB2493243A (en) 2013-01-30
GB2493243B GB2493243B (en) 2014-04-23

Family

ID=46605713

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1210250.5A Active GB2493243B (en) 2011-07-26 2012-06-11 Method and storage controller for determining an access characteristic of a data entity

Country Status (3)

Country Link
CN (1) CN103150245B (en)
DE (1) DE102012212183B4 (en)
GB (1) GB2493243B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838850A (en) * 2014-03-11 2014-06-04 湖州师范学院 Hashing data representing and querying method based on dynamic counting type Bloom filter
US9285994B2 (en) 2014-06-05 2016-03-15 International Business Machines Corporation Block-level predictive data migration
US10108368B2 (en) 2017-01-09 2018-10-23 International Business Machines Corporation Heat map transfer in space-efficient storage
EP3494478A4 (en) * 2016-08-05 2020-04-01 Micron Technology, Inc. Proactive corrective actions in memory based on a probabilistic data structure
US11099849B2 (en) * 2016-09-01 2021-08-24 Oracle International Corporation Method for reducing fetch cycles for return-type instructions

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105487823B (en) * 2015-12-04 2018-06-05 华为技术有限公司 A kind of method and device of Data Migration
KR20200021821A (en) * 2018-08-21 2020-03-02 에스케이하이닉스 주식회사 Memory controller and operating method thereof
CN109656901A (en) * 2018-10-15 2019-04-19 阿里巴巴集团控股有限公司 Data processing method and device, electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090031082A1 (en) * 2006-03-06 2009-01-29 Simon Andrew Ford Accessing a Cache in a Data Processing Apparatus
CN101655861B (en) * 2009-09-08 2011-06-01 中国科学院计算技术研究所 Hashing method based on double-counting bloom filter and hashing device
US20110276744A1 (en) * 2010-05-05 2011-11-10 Microsoft Corporation Flash memory cache including for use with persistent key-value store

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Bloom filter-based dynamic wear leveling for phase-change RAM" Joosung Yun et al. Design, automation & Test in Eurpoe Conf. 12-16 March 2012. Pages 1513-1518 *
"Hot data identification for flash-based storage systems using multiple bloom filters" Dongchul Park. IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST) 2011. 23-27 May 2011. Pages 1-11 *
"Scavenger: A New Last Level Cache Architecture with Global Block Priority" Arkaprava Basu et al. 40th IEEE symposium on Microarchitecture. 1 Dec 2007. Pages 421-432 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838850A (en) * 2014-03-11 2014-06-04 湖州师范学院 Hashing data representing and querying method based on dynamic counting type Bloom filter
CN103838850B (en) * 2014-03-11 2017-02-08 湖州师范学院 Hashing data representing and querying method based on dynamic counting type Bloom filter
US9285994B2 (en) 2014-06-05 2016-03-15 International Business Machines Corporation Block-level predictive data migration
US9557923B2 (en) 2014-06-05 2017-01-31 International Business Machines Corporation Block-level predictive data migration
EP3494478A4 (en) * 2016-08-05 2020-04-01 Micron Technology, Inc. Proactive corrective actions in memory based on a probabilistic data structure
US10929474B2 (en) 2016-08-05 2021-02-23 Micron Technology, Inc. Proactive corrective actions in memory based on a probabilistic data structure
US11586679B2 (en) 2016-08-05 2023-02-21 Micron Technology, Inc. Proactive corrective actions in memory based on a probabilistic data structure
US11099849B2 (en) * 2016-09-01 2021-08-24 Oracle International Corporation Method for reducing fetch cycles for return-type instructions
US10108368B2 (en) 2017-01-09 2018-10-23 International Business Machines Corporation Heat map transfer in space-efficient storage
US10838650B2 (en) 2017-01-09 2020-11-17 International Business Machines Corporation Heat map transfer in space-efficient storage

Also Published As

Publication number Publication date
GB2493243B (en) 2014-04-23
DE102012212183A1 (en) 2013-01-31
CN103150245A (en) 2013-06-12
GB201210250D0 (en) 2012-07-25
DE102012212183B4 (en) 2017-10-05
CN103150245B (en) 2016-06-08

Similar Documents

Publication Publication Date Title
GB2493243A (en) Determining hot data in a storage system using counting bloom filters
US10656838B2 (en) Automatic stream detection and assignment algorithm
US9665485B2 (en) Logical and physical block addressing for efficiently storing data to improve access speed in a data deduplication system
US20150356125A1 (en) Method for data placement based on a file level operation
US9164676B2 (en) Storing multi-stream non-linear access patterns in a flash based file-system
US8799601B1 (en) Techniques for managing deduplication based on recently written extents
CN104115134B (en) For managing the method and system to be conducted interviews to complex data storage device
US9569351B2 (en) Storing corresponding data units in a common storage unit
US9733991B2 (en) Deferred re-MRU operations to reduce lock contention
US8572325B2 (en) Dynamic adjustment of read/write ratio of a disk cache
US20110072225A1 (en) Application and tier configuration management in dynamic page reallocation storage system
US9959054B1 (en) Log cleaning and tiering in a log-based data storage system
US9921974B2 (en) Assigning cache control blocks and cache lists to multiple processors to cache and demote tracks in a storage system
Puttaswamy et al. Frugal storage for cloud file systems
JP2008027444A (en) Method, system, and product (using multiple data structures to manage data in cache)
GB2476536A (en) Modified B+ tree to map logical addresses to physical addresses in NAND flash memory
JP6167646B2 (en) Information processing apparatus, control circuit, control program, and control method
TW201812591A (en) SSD, driver, and method of using automatic stream detection &amp; assignment algorithm
US20170315924A1 (en) Dynamically Sizing a Hierarchical Tree Based on Activity
US10621059B2 (en) Site recovery solution in a multi-tier storage environment
CN108228088B (en) Method and apparatus for managing storage system
CN105988720B (en) Data storage device and method
US9552298B2 (en) Smart pre-fetch for sequential access on BTree
US8832379B2 (en) Efficient cache volume SIT scans
US20140359228A1 (en) Cache allocation in a computerized system

Legal Events

Date Code Title Description
746 Register noted 'licences of right' (sect. 46/1977)

Effective date: 20140428