CN111475507A

CN111475507A - Key value data indexing method for workload self-adaptive single-layer L SMT

Info

Publication number: CN111475507A
Application number: CN202010244527.4A
Authority: CN
Inventors: 陈珂; 周信静; 寿黎但; 骆歆远; 伍赛; 江大伟; 陈刚
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-07-31
Anticipated expiration: 2040-03-31
Also published as: CN111475507B

Abstract

The invention discloses a key value data index method of a working load self-adaptive single-layer L SMT, which optimizes a traditional log structure merged Tree (L g-Structured-Merge Tree, L SMT), removes multi-layer design and fixed memory table capacity design, and introduces the single-layer L SMT and dynamic capacity memory table design.

Description

Key value data indexing method for workload self-adaptive single-layer L SMT

Technical Field

The invention belongs to the technical field of database systems, and particularly relates to a key value data indexing method of a workload self-adaptive single-layer L SMT.

Background

A key-value storage system based on a log-Structured Merge Tree (L SMT) is widely applied to data-intensive Internet applications due to excellent processing-intensive writing capacity, but the conventional storage system based on the L SMT generally has the problems of amplification and no perception of workload

To quantify this issue, the problem of workload unaware refers to the inability of existing L SMT systems to make more appropriate optimizations to the storage structure based on the distribution of reads and writes in the workload.

In order to solve the problem of read-write amplification, researchers have proposed many methods, but these methods generally sacrifice read amplification for reduction of write amplification (such as WiscKey, PebblesDB), and cannot guarantee high efficiency of both read and write. The problem of no perception of workload is also very rare to study and solve.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a workload adaptive key value data indexing method on a block storage device. The method can effectively reduce read-write amplification, and meanwhile, self-adaptive storage structure optimization can be performed according to the working load, so that the read delay is further reduced.

The invention aims to realize the technical scheme that a workload self-adaptive single-layer L SMT key value data indexing method specifically comprises the following steps:

(1) the L SMT storage structure is designed in a modified mode, and comprises the following sub-steps:

(1.1) removing L intermediate layer of SMT multilayer structure, retaining the last layer, and using the last layer as storage layer L₀(ii) a The original memory table with fixed capacity is replaced by a dynamic capacity memory table, the capacity value of the dynamic capacity memory table is M, a real number parameter R is introduced, R is more than 1, and the requirement of meeting the requirement

|L₀And | is the data amount of the current storage layer.

(1.2) storage layer L₀Partitioning into N sub-key spaces l according to key range₁,l₂,…,l_NThe sub-key spaces do not overlap, and each sub-key space l_iThe data (i is more than or equal to 1 and less than or equal to N) are stored in independent storage files. Each sub-key space l_i(1 ≦ i ≦ N) storing at most T updated Run's from the dynamic volumetric memory table for data in the sub-key space, and noting γ (l)_i) Is the sub-bond space l_iThe contained Run set, | gamma (l)_i) I is the size of the set; the Run key value data are sorted according to the key sequence, and the Run of one sub-key space can be overlapped; and T is more than or equal to 1.

(2) Merging the dynamic capacity memory tables, specifically, when the capacity value of the dynamic capacity memory table exceeds M, converting the dynamic capacity memory table into a read-only memory table, starting a merging process in a background thread, and merging the read-only memory table into a storage layer L₀At the same time, a new active memory list is established, and the front-end read-write request is continuously processed, and the merging process is specifically carried out according to the storage layer L₀The read-only memory table is divided into N Run, which is recorded as r₁，r₂，...，r_NWherein r is_iBelong to_i. Then r is_iWrite the corresponding sub-key space l_iIn the corresponding storage file. When a sub-key space l_iExceeds a threshold value β or gamma (l)_i) When | is greater than T, gamma (l)_i) Merging into a Run, i.e. | γ (l) after merging_i) And (3) equally dividing the data into two sub-key spaces according to the data quantity, and finishing the index writing step after the merging is finished.

Further, the threshold β takes the value of 64 MB.

Further, the indexing method also comprises an adaptive reading optimization method, which comprises the following sub-steps:

(a) will be provided with

The sub-key space l as time t_iThe heat reading statistics of (a) for the time t are obtained as:

wherein the content of the first and second substances,

α is an attenuation factor between t-1 and t, a is more than 0 and less than 1,

and is

(b) Will be provided with

The sub-key space l as time t_iThe write heat statistics of (1) for the write heat at time t are:

wherein the content of the first and second substances,

the number of times of writing into the sub-key space li between the times t-1 and t,

and is

I^(t)The time elapsed between the current time t and the last time t-1.

(c) The following process was run periodically: firstly, clustering is carried out on sub-key spaces according to the writing heat degree, and the sub-key spaces are divided into four types, namely: cold, Warm, WriteBalanced, WriteHeavy. Selecting the child key space of the Cold class with the lowest writing heat degree, then sorting the child key spaces of the Cold class in a descending order according to the reading heat degree, and filtering out the child key space l meeting the following conditions_i：

Then, the first P, l in the sorted and filtered Cold-type sub-key space set are selected₁，l₂，...，l_PSpace l of each sub-key_iGamma (l) of_i) (i is more than or equal to 1 and less than or equal to P) are combined into a Run.

Compared with the prior art, the formula has the advantages that the indexing method provided by the invention removes the traditional L SMT multilayer design, and introduces a dynamic capacity memory table and a self-adaptive reading optimization mechanism:

1) compared with the conventional L SMT, the indexing method provided by the invention has the advantages that the read-write amplification factor is lower, the read-write throughput of the system is higher, and the service life of the storage device is prolonged.

2) The indexing method provided by the invention can automatically make structure optimization according to the read-write distribution in the workload, thereby further reducing the system delay.

Drawings

FIG. 1 is a flow chart of an indexing method proposed by the present invention.

Detailed Description

The technical solutions of the present invention are further described below with reference to the accompanying drawings, and it should be understood that the specific embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention.

Fig. 1 is a flowchart of a workload adaptive key value data indexing method on a block storage device according to the present invention, which specifically includes the following steps:

(1.1) removing L intermediate layers of the SMT multilayer structure, reserving the last layer, using the last layer as a storage layer L0, changing the original memory table with fixed capacity into a dynamic capacity memory table, keeping other structures basically unchanged, reserving a log file for restoring the memory table after downtime, wherein the capacity value of the dynamic capacity memory table is M, and a real number parameter R is introduced, wherein R is more than 1 to meet the requirement that the real number parameter R is larger than 1

R is used for balancing the memory usage amount and the read-write amplification, and M is increased along with the increase of the data amount.

(1.2) to facilitate adaptive read optimization of subsequent sequences, for the storage layer L₀A dynamic partitioning mechanism is introduced, the storage layer L₀Partitioning into N sub-key spaces l according to key range₁，l₂，...，l_NThe sub-key spaces do not overlap, and each sub-key space l_iThe data (i is more than or equal to 1 and less than or equal to N) are stored in independent storage files. Each sub-key space l_i(1 ≦ i ≦ N) storing at most T updated Run's from the dynamic volumetric memory table for data in the sub-key space, and noting γ (l)_i) Is the sub-bond space l_iThe contained Run set, | gamma (l)_i) I is the size of the set; the Run key value data are sorted according to the key sequence, and the Run of one sub-key space can be overlapped; and T is more than or equal to 1. Each sub-key space holds metadata: max _ key, num _ runs, index _ data; the max _ key is the maximum key for describing the sub-key interval and is used for describing the range of the sub-key space; num runs describes how many runs the sub-key space has; index _ data is index data that stores each Run of the sub-key space. WhereinThe index data of each Run in turn contains: the bloom _ filter _ data is bloom _ filter _ data established for the Run; the block _ index _ data divides the data stored by the Run by blocks, wherein the blocks are 4KB, and the block _ index _ data stores the maximum key of each block and is used for quickly locating the block where one query key is possibly located. Finally, the metadata is further processed<max_key，metadata>To store key-value data in another independent conventional L SMT key-value storage system, such as L evelDB/RocskDB, since these metadata are much smaller than the data, these metadata can generally be cached in memory without the need to execute IOs.

(2) Merging the dynamic capacity memory tables, specifically, when the capacity value of the dynamic capacity memory table exceeds M, converting the dynamic capacity memory table into a read-only memory table, starting a merging process in a background thread, and merging the read-only memory table into a storage layer L₀And meanwhile, establishing a new active memory table, and continuously processing the front-end read-write request, wherein the merging process specifically comprises the step of dividing the read-only memory table into N Runs which are recorded as r according to the range partition of the sub-key space of the storage layer L0₁，r₂，...，r_NWherein r is_iBelong to_iI.e. r_iMaximum key of middle key value data is less than or equal to l_iThe largest key of the stored key-value data (available by querying the original data store L SMT), and r_iMinimum key > l of medium-key value data_i-1The largest key of the key-value data stored in (obtained by querying the original data store L SMT)_iWrite the corresponding sub-key space l_iIn the corresponding storage file. The metadata is updated at the same time, and this updating process includes modifying num _ runs and index _ data, etc. To limit the number of queries IO within a sub-key space, when a sub-key space l_iExceeds a threshold value β (64MB) or | γ (l)_i) When | is greater than T, gamma (l)_i) Merging into a Run, i.e. | γ (l) after merging_i) 1 and equally divided into two sub-key spaces according to the data amount, the process is a split operation. By introducing split operations, each sub-key is guaranteed to be emptyThe split operation involves writing in a sub-key space file, updating original data L SMT, when all Runs finish writing and the split operation is triggered, the whole merging is finished, when the merging is finished, the index writing step is finished.

One embodiment of the invention also comprises an adaptive read optimization method, which comprises the following sub-steps:

(a) will be provided with

The sub-key space l as time t_iThe reading heat value at the time t is obtained by the following exponential decay formula:

wherein the content of the first and second substances,

between t-1 and t, the sub-key space l_iα is an attenuation factor, a is more than 0 and less than 1,

and is

The exponential decay technique is used here to allow the system to capture temporal locality, i.e., reads that are executed recently are more reflective of the current state of the workload than older reads, giving these reads more weight. While these read heat information are periodically updated in the system.

(b) Will be provided with

The sub-key space l as time t_iThe writing heat value at the time t can be obtained by the following exponential decay formula:

wherein the content of the first and second substances,

between t-1 and t, the sub-key space l_iThe number of times of writing of (a),

and is

I^(t)The time elapsed between the current time t and the last time t-1. The time is defined as the starting time of each merging, namely, the updating of the writing heat information is carried out at the merging time, so that the updating overhead can be reduced. Introduction of I^(t)This is to deal with the situation where the system has no write requests for a long time, in which case the weight of these writes should also be reduced.

(c) With the read-write heat, we introduce an adaptive read optimization mechanism. Firstly, clustering is carried out on sub-key spaces according to the writing heat degree, and the sub-key spaces are divided into four types, namely: cold, Warm, WriteBalanced, WriteHeavy. Optimizing the child key spaces of the Cold type, wherein the child key spaces of the other three types are always subjected to more writing operations, so that the writing operation performance of the writing child key spaces is not influenced; sorting the child key spaces of the Cold class in a descending order according to the reading heat degree, and filtering out the child key spaces l meeting the following conditions_i：

Then, the first P, l in the sorted and filtered Cold-type sub-key space set are selected₁,l₂,…,l_PSpace l of each sub-key_iGamma (l) of_i) (i is more than or equal to 1 and less than or equal to P) are combined into a Run.

The method is used for balancing the memory usage amount and the read-write amplification, and increases along with the increase of the data amount. Formula one embodiment of the present invention further includes read operations that are divided into a point read operation and a range read operation. For the point reading operation, firstly, inquiring the dynamic capacity memory table, if not, inquiring the read-only memory table, and if so, finishing the point reading operation. If all dynamic capacity memory tables do not find corresponding data, determining a sub-key space possibly containing a query key through a metadata file, then searching Run according to the reverse order of the writing sequence of the Run in the sub-key space, if the Run is found, finishing the query, and if the Run is found, declaring that the query cannot find the query key. In order to optimize the point reading operation, the invention also allocates a bloom filter for the Run of each sub-key space, before executing the Run query, the bloom filter is queried to judge whether the Run may contain the query key, and invalid IO is avoided with low cost. And for the range reading operation, simultaneously inquiring Run of the dynamic capacity memory table and the sub-key space, and obtaining an inquiry result through merging and merging.

Compared with WiscKey and L evenDB, the indexing method disclosed by the invention has the advantages that the write amplification required for completing the same operation is reduced by 4 times at most, the read operation performance is kept excellent, and the click-to-read delay is reduced by 30% again under a self-adaptive read optimization mechanism.

Claims

1. A method for indexing key value data of a workload adaptive single layer L SMT is characterized by comprising the following steps:

|L₀And | is the data amount of the current storage layer.

(1.2) storage layer L₀Partitioning into N sub-key spaces l according to key range₁，l₂，...，l_NThe sub-key spaces do not overlap, and each sub-key space l_iThe data (i is more than or equal to 1 and less than or equal to N) are stored in independent storage files. Each sub-key space l_i(1 ≦ i ≦ N) storing at most T updated Run's from the dynamic volumetric memory table for data in the sub-key space, and noting γ (l)_i) Is the sub-bond space l_iThe contained Run set, | gamma (l)_i) I is the size of the set; the Run key value data are sorted according to the key sequence, and the Run of one sub-key space can be overlapped; and T is more than or equal to 1.

2. The key-value data indexing method of claim 1, wherein the threshold β takes on a value of 64 MB.

3. The key-value data indexing method of claim 1, further comprising an adaptive read optimization method comprising the sub-steps of:

(a) will be provided with

wherein the content of the first and second substances,

and (b) and (c).

(b) Will be provided with

wherein the content of the first and second substances,

between t-1 and t, the sub-key space l_iThe number of times of writing of (a),

and (b) and (c).

I^(t)The time elapsed between the current time t and the last time t-1.

(c) The following process was run periodically: root of firstClustering the sub-key space according to the writing heat degree, and dividing the sub-key space into four types, namely: cold, Warm, WriteBalanced, WriteHeavy. Selecting the child key space of the Cold class with the lowest writing heat degree, then sorting the child key spaces of the Cold class in a descending order according to the reading heat degree, and filtering out the child key space l meeting the following conditions_i：