CN118069069B

CN118069069B - Data storage method and device, storage medium and computer program product

Info

Publication number: CN118069069B
Application number: CN202410459825.3A
Authority: CN
Inventors: 边文辉; 韦新伟; 李红
Original assignee: Lenovo Netapp Technology Ltd
Current assignee: Lenovo Netapp Technology Ltd
Priority date: 2024-04-17
Filing date: 2024-04-17
Publication date: 2024-06-21
Anticipated expiration: 2044-04-17
Also published as: CN118069069A

Abstract

The embodiment of the application provides a data storage method and device, a storage medium and a computer program product, comprising the following steps: writing a first value to be written into a key value pair into a first data shelving log file, and generating first address information of the first value in the first data shelving log file; writing a first key to be written in the key value pair and a first key address combination of the first address information into a first memory table of the data area; writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area; writing the first memory table into a first effective character table file of the data area; writing a second memory table into a second effective character table file of the reverse index area; adding the first effective character list file into a first log structure merging tree of the data area; and adding the second active character table file into a second log structured merge tree of the reverse index region.

Description

Data storage method and device, storage medium and computer program product

Technical Field

The present application relates to the field of data storage, and in particular, to a data storage method and apparatus, a storage medium, and a computer program product.

Background

With the common use of Non-volatile memory host controller interface specifications (Non-Volatile Memory express, NVMe) in file systems, log-Structured merge trees (Log-Structured MERGE TREE, LSM-Tree) are widely used in Log file systems, and key-value pair separation is also a trend of LSM-Tree in order to reduce NVMe wear.

In conventional key-value split storage, values are written to a data-pending Log File (DATA PLACE Log File, dataPlog) and the inverted index is cached and then written to DataPlog in a unified manner. Since the reverse index needs to be written into DataPlog after being cached, the atomicity of the writing DataPlog operation is destroyed, and the service performance is further affected.

Disclosure of Invention

The embodiment of the application provides a data storage method and device, a storage medium and a computer program product, which can keep the atomicity of writing DataPlog operation and improve service performance.

The technical scheme of the application is realized as follows:

In a first aspect, an embodiment of the present application proposes a data storage method, including:

writing a first value to be written in a key value pair into a first data shelving log file, and generating first address information of the first value in the first data shelving log file;

Writing a first key of the key value pair to be written and a first key address combination of the first address information into a first memory table of a data area; writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area;

Writing the first memory table into a first effective character table file of a data area; writing the second memory table into a second effective character table file of the reverse index area;

Adding the first effective character table file into a first log structure merging tree of a data area; and adding the second valid character table file into a second log-structured merge tree of the reverse index region.

In a second aspect, an embodiment of the present application proposes a data storage device, the device comprising:

The writing unit is used for writing a first value in a key value pair to be written into a first data shelving log file and generating first address information of the first value in the first data shelving log file; writing a first key of the key value pair to be written and a first key address combination of the first address information into a first memory table of a data area; writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area; writing the first memory table into a first effective character table file of a data area; writing the second memory table into a second effective character table file of the reverse index area; adding the first effective character table file into a first log structure merging tree of a data area; and adding the second valid character table file into a second log-structured merge tree of the reverse index region.

In a third aspect, an embodiment of the present application proposes a data storage device, the device comprising: a processor, a memory, and a communication bus; the processor implements the data storage method when executing the running program stored in the memory.

In a fourth aspect, an embodiment of the present application proposes a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-mentioned data storage method.

In a fifth aspect, an embodiment of the present application proposes a computer program product comprising a computer program which, when executed by a processor, implements the above-mentioned data storage method.

The embodiment of the application provides a data storage method and device, a storage medium and a computer program product, wherein the method comprises the following steps: writing a first value to be written into a key value pair into a first data shelving log file, and generating first address information of the first value in the first data shelving log file; writing a first key to be written in the key value pair and a first key address combination of the first address information into a first memory table of the data area; writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area; writing the first memory table into a first effective character table file of the data area; writing a second memory table into a second effective character table file of the reverse index area; adding the first effective character list file into a first log structure merging tree of the data area; and adding the second active character table file into a second log structured merge tree of the reverse index region. By adopting the implementation scheme, the log structure merging tree is split into the data area and the reverse index area, the key address combination and the address key combination are respectively written into the corresponding areas when the data is written, the reverse index is managed by utilizing the characteristics of the log structure merging tree, only the value in the key value pair is used for writing the operation object of the data shelving log file, the atomicity of writing DataPlog operation can be maintained, and further the service performance is improved.

Drawings

FIG. 1 is a schematic diagram of a key-value pair writing process;

FIG. 2 is a diagram of a Value written DataPlog with an inverted index;

FIG. 3 is a schematic flow chart of a method for recycling waste;

FIG. 4 is a flowchart illustrating a data storage method according to an embodiment of the present application;

FIG. 5 is a diagram illustrating an exemplary Value write DataPlog provided by an embodiment of the present application;

FIG. 6 is a flowchart of an exemplary data writing method according to an embodiment of the present application;

FIG. 7 is a flow chart of an exemplary method for data persistence provided by an embodiment of the application;

FIG. 8 is a schematic diagram of an exemplary key-value pair writing process according to an embodiment of the present application;

FIG. 9 is a second flowchart of a data storage method according to an embodiment of the present application;

FIG. 10 is a flow chart of an exemplary compression combining method according to an embodiment of the present application;

FIG. 11 is a third flowchart of a data storage method according to an embodiment of the present application;

FIG. 12 is a schematic flow chart of an exemplary method for garbage collection according to an embodiment of the present application;

FIG. 13 is a schematic diagram of a data storage device according to an embodiment of the present application;

fig. 14 is a schematic diagram of a second structure of a data storage device according to an embodiment of the present application.

Detailed Description

For a more complete understanding of the nature and the technical content of the embodiments of the present application, reference should be made to the following detailed description of embodiments of the application, taken in conjunction with the accompanying drawings, which are meant to be illustrative only and not limiting of the embodiments of the application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict. It should also be noted that the term "first\second\third" in relation to embodiments of the present application is used merely to distinguish similar objects and does not represent a particular ordering for the objects, it being understood that the "first\second\third" may be interchanged in a particular order or sequence, where allowed, to enable embodiments of the present application described herein to be practiced in an order other than that illustrated or described herein.

The performance of garbage collection (Garbage Collection, GC) in key-value pairs separated LSM-Tree affects the overall system performance and space usage. The current GC scheme is to record the reverse index in DataPlog simultaneously with the data, the entire key-value pair write stream Cheng Ru is shown in fig. 1.

1. Writing value and reverse index into DataPlog;

2. to obtain PlogPtr;

3. writing key+ PlogPtr into a memory table (Memtable) of the memory;

4. memtable is persisted into the disk's active character table file (Sorted String Table, SSTable).

When writing value and reverse index into DataPlog, considering the alignment of the number of Input/Output (IO) and data to DataPlog, it is necessary to write the reverse index into the buffer area first and then uniformly into the reverse index area in DataPlog, and directly into the data area in DataPlog for value, which will destroy the atomicity of writing data and reverse index. See in particular fig. 2.

1. Writing value into DataPlog; writing the reverse index into the buffer;

2. the reverse index of the buffer is written DataPlog.

Since the data block has version update, the same key contains a plurality of records in the LSM-Tree and is distributed in different SSTable, so the GC process traverses the reverse index, scans the whole LSM-Tree to confirm whether the data block is valid, and reads a plurality of SSTable. See in particular fig. 3.

1. Performing iterative traversal in a DataPlog data area;

2. Searching a key corresponding to the value from an inverted index area of DataPlog;

3. querying the validity of the key from the LSM-Tree;

4. If the result is valid, writing into a new DataPlog;

5. The LSM-Tree is updated according to the new DataPlog.

The above method has the following problems:

1. The reverse index is written DataPlog after being integrated with the data, which breaks the original structure of the data block and requires 4k or 8k to guarantee alignment.

2. After the data is written, the reverse index is written into DataPlog, so that the atomicity of writing of one data and the reverse index is destroyed.

3. When a certain data block is GC, it is necessary to check whether the data block is still referenced by reversely scanning the whole LSM-Tree, which affects GC performance.

To solve the above problem, an embodiment of the present application provides a data storage method, as shown in fig. 4, which may include:

S101, writing a first value in a key value pair to be written into a first data shelving log file, and generating first address information of the first value in the first data shelving log file.

The data storage method provided by the embodiment of the application is suitable for a scene that a log file system (namely the data storage device provided by the application) stores a key value pair (KeyValue) to be written.

In the embodiment of the present application, referring to fig. 5, only Value is written in the first DataPlog, and no reverse index information is required to be written, so that the information is complete and no other information is coupled; the atomicity of the write operation is guaranteed.

In the embodiment of the application, after the first address information is generated, a first Key and a first Key address combination (Key-PlogPtr) of the first Key and the first address information are also obtained.

In an embodiment of the present application PlogPtr includes DataPlogID and an offset, where the offset is the offset of a data block, characterizing which data block the data block is.

Further, between S101 and S102, a process of writing the first key address combination into the pre-write log system (Wal, WRITE AHEAD Logging) to generate a write result is also performed. S102 is executed only when the writing result is that the first key address combination is successfully written into the pre-write log system; and when the writing result is that the writing fails, directly exiting the data storage process.

S102, a first key in a key value pair to be written and a first key address combination of first address information are written into a first memory table of a data area; and writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area.

In an embodiment of the present application, the data storage device includes a memory, in which a memory data area and a memory reverse index area are provided, the memory data area includes one or more memory tables, and the memory reverse index area includes one or more memory tables. The first key address combination is written into a first memory table in which the data area is written, and the first address key combination is written into a second memory table in which the reverse index area is written. And returning a result after the writing is completed.

Referring to fig. 6, specific steps of S101 and S102 are implemented, specifically:

1. Value is written DataPlog to obtain PlogPtr;

2. Generating Key-PlogPtr;

3. writing Key-PlogPtr into Wal to generate a writing result;

4. if the writing result is that the writing is successful, writing Key-PlogPtr into the data area Memtable;

5. writing PlogPtr-keys into reverse index field Memtable;

6. If the writing result is that the writing fails, the method exits.

S103, writing the first memory table into a first effective character table file of the data area; and writing the second memory table into a second effective character table file of the reverse index area.

In the embodiment of the application, the data storage device further comprises a disk, wherein a disk data area and a disk reverse index area are arranged in the disk, a first log structure merging tree consisting of a group of valid character table files is arranged in the disk data area, and a second log structure merging tree consisting of a group of valid character table files is arranged in the disk reverse index area.

In the embodiment of the present application, the data area Memtable is written in the first valid character table file of the data area, and the reverse index area Memtable is written in the second valid character table file of the reverse index area.

Further, if the writing of the first Memtable of the data area into the first valid character table file of the data area fails, the writing of the second Memtable of the reverse index area into the second valid character table file of the reverse index area fails, the process is directly ended.

S104, adding the first effective character list file into a first log structure merging tree of the data area; and adding the second active character table file into a second log structured merge tree of the reverse index region.

Further, if the adding of the first valid character table file into the first log structure merging tree of the data area fails; and if the second effective character table file fails to be added into the second log structure merging tree of the reverse index area, the process is ended after deleting the first effective character table file and the second effective character table file.

Referring to fig. 7, specific steps of S103 and S104 are implemented, specifically:

1. writing the first Memtable of the data area into a first SSTable of the data area;

2. If the writing of 1 is successful, writing a second Memtable of the reverse index area into a second SSTable of the reverse index area;

3. if the writing of the data area 2 is successful, writing the first SSTable into a first LSM-Tree of the data area; writing a second SSTable into a second LSM-Tree of the reverse index region;

4. And if the 2 writing fails, discarding the first SSTable and the second SSTable.

It can be understood that the log-structured merge tree is split into a data area and an inverted index area, when writing data, the key address combination and the address key combination are respectively written into the corresponding areas, the inverted index is managed by utilizing the characteristics of the log-structured merge tree, only the value in the key value pair is used for the writing operation object of the data shelving log file, and the atomicity of writing DataPlog operation can be maintained.

Based on the above embodiments, an embodiment of the present application proposes a key-value pair writing process, referring to fig. 8, which includes:

1. Writing Value to DataPlog;

2. return PlogPtr;

3. conversion to a key+ PlogPtr combination;

4. Writing the Key-PlogPtr combination into Memtable of a data area in the memory;

5. Writing PlogPtr-Key combinations into Memtable of the reverse index region in the memory;

6. persisting Memtable of the data area in the memory to SSTable of the data area in the disk;

7. and persisting Memtable of the reverse index area in the memory into SSTable of the reverse index area in the disk.

The key Value pair writing flow writes Value into DataPlog, and sequentially writes the reverse index into the memory and the reverse index area of the disk, so that the atomicity of writing the Value and the reverse index information is ensured. Further, after S104, a compression merging operation may be further performed, where compression merging is a process of merging and cleaning SSTable files, and when the total data size of sstables of a certain layer of LSM-Tree is greater than a preset data size threshold, merging and sorting operations are performed with sstables of a next layer on the disk, where new files generated by merging the two files are sequentially written into the hard disk, instead of sstables of the next layer of the old version, specifically, see fig. 9.

S201, under the condition that the total data quantity of the N-layer effective character table file in the first log structure merging tree is detected to be larger than a preset data quantity threshold value, iterating through the N-layer effective character table file to obtain a corresponding group of key address combinations; n is an integer greater than or equal to zero.

S202, validity of a group of key address combinations is detected respectively.

In the embodiment of the application, a group of key address combinations are sequentially searched for at an N layer and an N+1 layer in a first log structure merging tree; if a plurality of key address combinations with the same keys are found, selecting the key address combination corresponding to the key of the latest version from the plurality of key address combinations to be effective, and the rest key address combinations to be ineffective; if the keys of one or more of the key address combinations in a set of key address combinations are unique at the nth layer and the n+1th layer, the one or more key address combinations are valid.

Further, under the condition that the third key address combination in the group of key address combinations is detected to be invalid, constructing a second address key combination corresponding to the third key address combination; the delete operation for the second address key combination is committed to the second log structured merge tree.

And S203, when the second key address combination in the group of key address combinations is detected to be valid, writing the second key address combination into the first new valid character table file of the data area.

S204, the first new effective character list file of the data area is merged into the first log structure merging tree.

Based on the above embodiments, a method for compression combining is provided, as shown in fig. 10, and the specific method includes:

1. Iteratively reading SSTable of the N layer of the data area;

2. Judging whether the traversal is finished;

3. if the traversal is not finished, obtaining Key-PlogPtr;

4. Judging the effectiveness of Key-PlogPtr;

5. if the second Key-PlogPtr is valid, writing the second Key-PlogPtr into a new SSTable of the data area, and returning to 1;

6. if the third Key-PlogPtr is invalid, constructing a second PlogPtr-Key corresponding to the third Key-PlogPtr, writing a deleting operation aiming at the second PlogPtr-Key into a new SSTable of the reverse index region, and returning 1;

7. and if the traversal is finished, writing the new SSTable of the data area into the first LSM-Tree.

Further, after S104, the operation of the GC may also be performed, wherein the GC operation may be a timing task. In particular, see fig. 11.

S301, searching a second data placement log file with the garbage data quantity meeting a preset garbage data quantity threshold under the condition that the preset time is reached.

In an alternative embodiment, a storage area for recording the garbage data amount of each DataPlog is provided, and when the preset time arrives, the garbage data amount of each DataPlog is searched from the storage area and compared with a preset garbage data amount threshold. And determining a second DataPlog that the garbage data quantity meets the preset garbage data quantity threshold.

S302, iterating through the second data shelving log file, and obtaining one or more pieces of address information in the second data shelving log file.

In an embodiment of the present application, one or more PlogPtr of the second DataPlog are iterated through as keys to query the second LSM-Tree of the inverted index region.

Further, if a PlogPtr-Key corresponding to a PlogPtr is not found in the second LSM-Tree, the corresponding data block is represented to be invalid, and the data block is directly discarded.

S303, if second address information in the one or more address information is found from the second log structure merging tree, writing a second value corresponding to the second address information into the new data shelving log file.

In the embodiment of the application, if the PlogPtr-Key corresponding to the second PlogPtr is found from the second LSM-Tree, the corresponding data block is represented to be valid, and the second Value of the data block is written into the new DataPlog.

S304, writing a second key corresponding to the second address information and a fourth key address combination of the second address information into a second new effective character table file of the data area; and writing a third address key combination corresponding to the fourth key address combination into the new effective character table file of the reverse index area.

S305, adding the second new effective character table file of the data area into the first log structure merging tree; and adding the new valid character table file of the reverse index region to the second log structured merge tree.

It can be understood that, because the log-structured merge tree is split into the data area and the reverse index area, the reverse index can be managed in the reverse index area, and when a certain data block is GC, the reading and searching of the reverse index data can be realized in the second log-structured merge tree in the reverse index area, so that the data shelve log file is not required to be traversed, the GC process is accelerated, and the performance loss is reduced.

Based on the above embodiments, a method for recycling garbage is provided, as shown in fig. 12, and the specific method includes:

1. iteratively traversing a second DataPlog of the garbage data quantity meeting a preset garbage data quantity threshold to obtain one or more PlogPtr;

2. Judging whether the traversal is finished;

3. if the traversal is not finished, using PlogPtr to inquire a second LSM-Tree of the reverse index region;

4. if a second PlogPtr record is found in the second LSM-Tree, writing a second Value corresponding to the second PlogPtr into the new DataPlog, and obtaining a second Key corresponding to the second Value and a fourth Key-PlogPtr combination of the second PlogPtr;

5. Writing the fourth Key-PlogPtr combination into a new SSTable of the data area; writing a third PlogPtr-Key combination corresponding to the fourth Key-PlogPtr combination into a new SSTable of the reverse index region; and execute 1;

6. If the third PlogPtr record is not found in the second LSM-Tree, discarding the data block corresponding to the third PlogPtr; and execute 1.

7. And if the traversal is finished, adding the new SSTable of the data area into the first LSM-Tree, and adding the new SSTable of the reverse index area into the second LSM-Tree.

The embodiment of the application provides a data storage device 1. As shown in fig. 13, the data storage device 1 includes:

a writing unit 10, configured to write a first value in a key value pair to be written into a first data pending log file, and generate first address information of the first value in the first data pending log file; writing a first key of the key value pair to be written and a first key address combination of the first address information into a first memory table of a data area; writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area; writing the first memory table into a first effective character table file of a data area; writing the second memory table into a second effective character table file of the reverse index area; adding the first effective character table file into a first log structure merging tree of a data area; and adding the second valid character table file into a second log-structured merge tree of the reverse index region.

Optionally, the data storage device 1 includes a memory and a disk, a memory data area and a memory reverse index area are set in the memory, the memory data area includes one or more memory tables, and the memory reverse index area includes one or more memory tables; the disk is provided with a disk data area and a disk reverse index area, the disk data area comprises the first log structure merging tree formed by a group of valid character table files, and the disk reverse index area comprises the second log structure merging tree formed by a group of valid character table files.

Optionally, the data storage device includes: a traversing unit and a detecting unit;

The traversing unit is configured to iteratively traverse the nth layer valid character table file to obtain a corresponding set of key address combinations when detecting that the total data amount of the nth layer valid character table file in the first log structure merging tree is greater than a preset data amount threshold; n is an integer greater than or equal to zero;

the detection unit is used for respectively detecting the validity of the group of key address combinations;

the writing unit 10 is further configured to, when detecting that a second key address combination in the set of key address combinations is valid, write the second key address combination into a first new valid character table file in the data area; and merging the first new valid character table file of the data area into the first log structure merging tree.

Optionally, the data storage device includes: a construction unit and a deletion unit;

the construction unit is used for constructing a second address key combination corresponding to a third key address combination when the third key address combination in the group of key address combinations is detected to be invalid;

and the deleting unit is used for submitting the deleting operation aiming at the second address key combination to the second log-structured merging tree.

Optionally, the data storage device includes: a search unit and a selection unit;

The searching unit is used for sequentially searching the group of key address combinations at the Nth layer and the (n+1) th layer in the first log structure merging tree;

The selecting unit is used for selecting a key address combination corresponding to the key of the latest version from the plurality of key address combinations to be effective if the plurality of key address combinations with the same keys are found, and the rest key address combinations are not effective; if the keys of one or more of the set of key address combinations are unique at the nth layer and the n+1th layer, the one or more key address combinations are valid.

Optionally, the searching unit is further configured to search a second data rest log file in which the garbage data amount meets a preset garbage data amount threshold when the preset timing time arrives;

the traversing unit is further configured to iteratively traverse the second data shelving log file to obtain one or more address information in the second data shelving log file;

The writing unit 10 is further configured to, if second address information in the one or more address information is found from the second log-structured merge tree, write a second value corresponding to the second address information into a new data pending log file; writing a second key corresponding to the second address information and a fourth key address combination of the second address information into a second new effective character table file of the data area; writing a third address key combination corresponding to the fourth key address combination into a new effective character table file of the reverse index area; adding a second new valid character table file of the data area into the first log structure merging tree; and adding the new valid character table file of the reverse index area into the second log structure merge tree.

Optionally, the writing unit 10 is further configured to write the first key address combination into a pre-write log system, to generate a writing result; if the writing result is that the first key address combination is successfully written into the pre-writing log system, writing the first key in the key value pair to be written and the first key address combination of the first address information into a first memory table of a data area; and writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area.

The data storage device provided by the embodiment of the application writes a first value in a key value pair to be written into a first data shelving log file to generate first address information of the first value in the first data shelving log file; writing a first key to be written in the key value pair and a first key address combination of the first address information into a first memory table of the data area; writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area; writing the first memory table into a first effective character table file of the data area; writing a second memory table into a second effective character table file of the reverse index area; adding the first effective character list file into a first log structure merging tree of the data area; and adding the second active character table file into a second log structured merge tree of the reverse index region. Therefore, in the data storage device provided by this embodiment, the log-structured merge tree is split into the data area and the reverse index area, the key address combination and the address key combination are written into the corresponding areas when writing data, the reverse index is managed by using the characteristics of the log-structured merge tree, only the value in the key value pair is used for the writing operation object of the data shelving log file, the atomicity of writing DataPlog operation can be maintained, and the service performance is improved.

Fig. 14 is a schematic diagram of a second component structure of the data storage device 1 according to the embodiment of the present application, in practical application, based on the same disclosure concept as the above embodiment, as shown in fig. 14, the data storage device 1 of the present embodiment includes: a processor 11, a memory 12 and a communication bus 13.

In a specific embodiment, the Processor 11 may be at least one of an Application Specific Integrated Circuit (ASIC), a digital signal Processor (DSP, digital Signal Processor), a digital signal processing image processing device (DSPD, digital Signal Processing Device), a programmable logic image processing device (PLD, programmable Logic Device), a field programmable gate array (FPGA, field Programmable GATE ARRAY), a CPU, a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronics for implementing the above-described processor functions may be other for different devices, and the present embodiment is not particularly limited.

In the embodiment of the present application, the communication bus 13 is used to implement connection communication between the processor 11 and the memory 12; the processor 11 implements the following data storage method when executing the running program stored in the memory 12:

writing a first value to be written in a key value pair into a first data shelving log file, and generating first address information of the first value in the first data shelving log file; writing a first key of the key value pair to be written and a first key address combination of the first address information into a first memory table of a data area; writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area; writing the first memory table into a first effective character table file of a data area; writing the second memory table into a second effective character table file of the reverse index area; adding the first effective character table file into a first log structure merging tree of a data area; and adding the second valid character table file into a second log-structured merge tree of the reverse index region.

Further, the data storage device comprises a memory and a magnetic disk, wherein a memory data area and a memory reverse index area are arranged in the memory, one or more memory tables are arranged in the memory data area, and one or more memory tables are arranged in the memory reverse index area; the disk is provided with a disk data area and a disk reverse index area, the disk data area comprises the first log structure merging tree formed by a group of valid character table files, and the disk reverse index area comprises the second log structure merging tree formed by a group of valid character table files.

Further, the processor 11 is further configured to iterate through the nth layer valid character table file to obtain a corresponding set of key address combinations when it is detected that the total data amount of the nth layer valid character table file in the first log structure merging tree is greater than a preset data amount threshold; n is an integer greater than or equal to zero; respectively detecting the validity of the group of key address combinations; writing a second key address combination of the set of key address combinations into a first new valid character table file of the data area under the condition that the second key address combination is detected to be valid; and merging the first new valid character table file of the data area into the first log structure merging tree.

Further, the processor 11 is further configured to construct a second address key combination corresponding to a third key address combination in the group of key address combinations when detecting that the third key address combination is invalid; submitting a delete operation for the second address key combination to the second log structured merge tree.

Further, the processor 11 is further configured to sequentially search the N-th layer and the n+1-th layer in the first log structure merge tree for the set of key address combinations; if a plurality of key address combinations with the same keys are found, selecting a key address combination corresponding to the key of the latest version from the plurality of key address combinations to be effective, and invalidating the rest key address combinations; if the keys of one or more of the set of key address combinations are unique at the nth layer and the n+1th layer, the one or more key address combinations are valid.

Further, the processor 11 is further configured to search for a second data placement log file in which the garbage data amount meets the preset garbage data amount threshold when the preset time arrives; iteratively traversing the second data shelving log file to acquire one or more address information in the second data shelving log file; if second address information in the one or more address information is found out from the second log structure merging tree, writing a second value corresponding to the second address information into a new data shelving log file; writing a second key corresponding to the second address information and a fourth key address combination of the second address information into a second new effective character table file of the data area; writing a third address key combination corresponding to the fourth key address combination into a new effective character table file of the reverse index area; adding a second new valid character table file of the data area into the first log structure merging tree; and adding the new valid character table file of the reverse index area into the second log structure merge tree.

Further, the processor 11 is further configured to write the first key address combination into a pre-write log system, to generate a write result; if the writing result is that the first key address combination is successfully written into the pre-writing log system, writing the first key in the key value pair to be written and the first key address combination of the first address information into a first memory table of a data area; and writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area.

An embodiment of the present application provides a storage medium having stored thereon a computer program, where the computer readable storage medium stores one or more programs, where the one or more programs are executable by one or more processors and applied to a data storage device, where the computer program implements a data storage method as described above.

Based on the above embodiments, the present embodiments provide a computer program product comprising a computer program executable by one or more processors, the computer program implementing a data storage method as described above.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present disclosure may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium (e.g., a magnetic disk, an optical disk), including several instructions for causing an image display device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present disclosure.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the present application.

Claims

1. A method of data storage, the method comprising:

2. The method of claim 1, wherein a memory data area and a memory reverse index area are provided in a memory, the memory data area including one or more memory tables therein, the memory reverse index area including one or more memory tables therein; the disk is provided with a disk data area and a disk reverse index area, the disk data area comprises the first log structure merging tree formed by a group of valid character table files, and the disk reverse index area comprises the second log structure merging tree formed by a group of valid character table files.

3. The method of claim 1, wherein the adding the first active character table file to a first log structure merge tree of a data area; and adding the second active character table file to a second log structured merge tree of the reverse index region, the method further comprising:

Under the condition that the total data quantity of the N-layer effective character table files in the first log structure merging tree is detected to be larger than a preset data quantity threshold value, iterating the N-layer effective character table files to obtain a corresponding group of key address combinations; n is an integer greater than or equal to zero;

respectively detecting the validity of the group of key address combinations;

Writing a second key address combination of the set of key address combinations into a first new valid character table file of the data area under the condition that the second key address combination is detected to be valid;

and merging the first new valid character table file of the data area into the first log structure merging tree.

4. A method according to claim 3, wherein after said separately detecting the validity of said set of key address combinations, the method further comprises:

If the third key address combination in the group of key address combinations is detected to be invalid, constructing a second address key combination corresponding to the third key address combination;

Submitting a delete operation for the second address key combination to the second log structured merge tree.

5. A method according to claim 3, wherein said separately detecting validity of said set of key address combinations comprises:

Sequentially searching the group of key address combinations in an N layer and an N+1 layer in the first log structure merging tree;

if a plurality of key address combinations with the same keys are found, selecting a key address combination corresponding to the key of the latest version from the plurality of key address combinations to be effective, and invalidating the rest key address combinations;

if the keys of one or more of the set of key address combinations are unique at the nth layer and the n+1th layer, the one or more key address combinations are valid.

6. The method of claim 1, wherein the adding the first active character table file to a first log structure merge tree of a data area; and adding the second active character table file to a second log structured merge tree of the reverse index region, the method further comprising:

under the condition that the preset time is reached, searching a second data shelving log file of which the garbage data quantity meets a preset garbage data quantity threshold value;

iteratively traversing the second data shelving log file to acquire one or more address information in the second data shelving log file;

if second address information in the one or more address information is found out from the second log structure merging tree, writing a second value corresponding to the second address information into a new data shelving log file;

Writing a second key corresponding to the second address information and a fourth key address combination of the second address information into a second new effective character table file of the data area; writing a third address key combination corresponding to the fourth key address combination into a new effective character table file of the reverse index area;

Adding a second new valid character table file of the data area into the first log structure merging tree; and adding the new valid character table file of the reverse index area into the second log structure merge tree.

7. The method of claim 1, wherein the writing a first value of a key-value pair to be written into a first data-holding log file generates the first value after first address information in the first data-holding log file; the first key in the key value pair to be written and the first key address combination of the first address information are written into a first memory table of a data area; and before writing the first address information and the first address key combination of the first key into the second memory table of the reverse index area, the method further includes:

Writing the first key address combination into a pre-write log system to generate a writing result;

The first key in the key value pair to be written and the first key address combination of the first address information are written into a first memory table of a data area; and writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area, including:

If the writing result is that the first key address combination is successfully written into the pre-writing log system, writing the first key in the key value pair to be written and the first key address combination of the first address information into a first memory table of a data area; and writing the first address information and the first address key combination of the first key into a second memory table of the reverse index area.

8. A data storage device, the device comprising:

9. A data storage device, the device comprising: a processor, a memory, and a communication bus; the processor, when executing a memory-stored operating program, implements the method of any one of claims 1-7.

10. A storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-7.

11. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1 to 7.