CN114780500B

CN114780500B - Data storage method, device and equipment based on log merging tree and storage medium

Info

Publication number: CN114780500B
Application number: CN202210702791.7A
Authority: CN
Inventors: 瞿晓阳; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-09-20
Anticipated expiration: 2042-06-21
Also published as: CN114780500A

Abstract

The invention relates to the technical field of block chains, and discloses a data storage method, a device, equipment and a storage medium based on a log merge tree, wherein the method comprises the following steps: recording the written key value pair data by adopting a fixed-size sorting character string table structure, and recording the key value pair data as an SSTable file, wherein the SSTable file comprises the key value pair data and a key range corresponding to the key value pair data; orderly arranging the sizes of the key ranges of the SSTable files to generate fragment files, and storing the fragment files into a disk of a log merging tree; if the data layer of the disk meets the preset merging condition, merging all the fragmented files in the data layer into a new fragmented file, storing the new fragmented file to the next data layer of the data layer, and deleting all the fragmented files in the data layer; in the invention, SSTable files with fixed sizes are organized in fragments and are orderly arranged according to the key range, thereby ensuring the ordering of data, reducing the rewriting operation of the data, and reducing the occupied space caused by merging, thereby improving the performance of a data storage system.

Description

Data storage method, device and equipment based on log merging tree and storage medium

Technical Field

The invention relates to the technical field of block chains, in particular to a data storage method, a data storage device, data storage equipment and a data storage medium based on a log merge tree.

Background

In recent years, key-value (KV) databases are widely used for storing blockchain data, but as the amount of data and the amount of user access increase, KV databases are also subject to performance tests. The mainstream KV database is a database based on a Log Merge Tree (LSM-Tree) structure, and a Log Merge Tree architecture provides an algorithm for delaying updating and batch writing, so that random writing is converted into batch writing, The writing performance of The database is improved, and The service requirements of high concurrency and high performance of The database are better met.

However, in order to order data to maintain good reading performance, the log merge tree needs to perform merge operation continuously in the background, and the existing data merge method has a large defect, which may cause problems such as an exponential increase in file data size, occupation of a large amount of storage space, frequent rewriting of data files, and the like, and may seriously affect the performance of the data storage system.

Disclosure of Invention

The invention provides a data storage method, a data storage device, data storage equipment and a data storage medium based on a log merging tree, and aims to solve the problem that the performance of a data storage system is influenced by the defects that a large amount of storage space is occupied, the data is frequently restored and the like in the conventional data merging mode.

The data storage method based on the log merge tree comprises the following steps:

recording the written key value pair data by adopting a fixed-size sorting character string table structure, and recording the key value pair data as an SSTable file, wherein the SSTable file comprises the key value pair data and a key range corresponding to the key value pair data;

orderly arranging the sizes of the key ranges of the SSTable files to generate fragment files, and storing the fragment files into a disk of a log merging tree;

determining whether a data layer of the disk meets a preset merging condition;

if the data layer meets the preset merging condition, merging all the fragmented files in the data layer into a new fragmented file;

and storing the new fragment file to the next data layer of the data layers, and deleting all the fragment files in the data layers.

Further, merging all the fragmented files in the data layer into a new fragmented file includes:

determining SSTable files in all the fragment files in the data layer, and recording the SSTable files as files to be merged;

determining whether the key ranges of the files to be merged in the data layer are overlapped;

and if the key ranges of the files to be merged in the data layer are not overlapped, arranging and merging all the files to be merged in the data layer into a new fragment file according to the key range sequence.

Further, after determining whether there is an overlap between the key ranges of the files to be merged in the data layer, the method further includes:

if the key ranges of the files to be merged in the data layer are overlapped, merging and sequencing the files to be merged by adopting a merging and sequencing algorithm to obtain a plurality of new SSTable files, wherein the key ranges of the new SSTable files are not overlapped;

and orderly arranging the plurality of new SSTable files according to the size of the key range to generate new fragment files.

Further, determining whether the data layer of the disk meets a preset merge condition includes:

determining whether the number of the fragmented files in the data layer reaches a preset number;

and if the number of the fragmented files in the data layer reaches the preset number, determining that the data layer meets the preset merging condition.

Further, after an index structure is arranged on a memory of the log merge tree and the sizes of the key ranges of the plurality of SSTable files are orderly arranged to generate the fragment file, the method further comprises the following steps:

generating an index key range according to all the fragment files in the data layer;

and updating the index key range serving as an index parameter corresponding to the data layer into an index structure to generate index data, wherein the index data comprises a plurality of index parameters.

Further, after storing the new fragmented file to the next data layer of the data layer and deleting all fragmented files in the data layer, the method further includes:

generating an index key range according to all the fragment files in the next data layer of the data layers, and updating the index key range to an index structure to serve as an index parameter corresponding to the next data layer;

the corresponding index parameters of the data layer are deleted in the index structure.

There is provided a log merge tree based data storage apparatus, comprising:

the recording module is used for recording the written key value pair data by adopting a fixed-size sorting character string table structure and recording the key value pair data as an SSTable file, and the SSTable file comprises the key value pair data and a key range corresponding to the key value pair data;

the generating module is used for orderly arranging the sizes of the key ranges of the SSTable files to generate fragment files and storing the fragment files into a disk of a log merging tree;

the determining module is used for determining whether the data layer of the disk meets a preset merging condition;

the merging module is used for merging all the fragmented files in the data layer into a new fragmented file if the data layer meets the preset merging condition;

and the deleting module is used for storing the new fragment file to the next data layer of the data layers and deleting all the fragment files in the data layers.

Further, the merging module is specifically configured to:

There is provided a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the log merge tree based data storage method when executing the computer program.

There is provided a computer readable storage medium storing a computer program, wherein the computer program is configured to implement the steps of the log merge-tree based data storage method when executed by a processor.

In one scheme provided by the data storage method, the data storage device, the data storage equipment and the storage medium based on the log merging tree, written key value pair data are recorded by adopting a fixed-size sorting character string table structure and are recorded as SSTable files, the SSTable files comprise the key value pair data and key ranges corresponding to the key value pair data, then the key ranges of a plurality of SSTable files are orderly arranged to generate fragment files, the fragment files are stored in a disk of the log merging tree, whether a data layer of the disk meets a preset merging condition or not is determined, if the data layer meets the preset merging condition, all the fragment files in the data layer are merged into a new fragment file, and finally the new fragment file is stored to the next data layer of the data layer, and all the fragment files in the data layer are deleted; in the invention, SSTable files with fixed size are organized by fragments, a plurality of SSTable files in the fragment files are orderly arranged according to a key range, so that the data ordering is ensured, when the data merging is carried out, the data merging is different from the traditional merging of the data of the current data layer and the next layer, only all the files of the current data layer are merged into a new file to be stored in the next layer, and the files of the current data layer are deleted, so that the data rewriting operation is reduced, the space occupation caused by the merging can be reduced, and the performance of a data storage system is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of an application environment of a data storage method based on a log merge tree according to an embodiment of the present invention;

FIG. 2 is a flow chart of a data storage method based on a log merge tree according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a structure of a log merge tree according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating an implementation of step S30 in FIG. 2;

FIG. 5 is a flowchart illustrating an implementation of step S40 in FIG. 2;

FIG. 6 is a data merge diagram of data layers in accordance with an embodiment of the present invention;

FIG. 7 is a schematic flow chart of another implementation of step S40 in FIG. 2;

FIG. 8 is a diagram illustrating another data merge of data layers in an embodiment of the invention;

FIG. 9 is a schematic flow chart of a data storage method based on a log merge tree according to an embodiment of the present invention;

FIG. 10 is a schematic flow chart of a data storage method based on a log merge tree according to an embodiment of the present invention;

FIG. 11 is a block diagram of a data storage device based on a log merge tree according to an embodiment of the invention;

FIG. 12 is a block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The data storage method based on the log merge tree provided by the embodiment of the invention can be applied to the application environment shown in fig. 1, wherein the terminal device communicates with the server through the network.

The method comprises the steps that terminal equipment sends a storage instruction to a server, and the server converts metadata into key value pair data after receiving the storage instruction, wherein the key value pair data comprise keys and metadata corresponding to the keys, and each key corresponds to one piece of metadata so as to search corresponding metadata according to the keys in the following process; after the processed key value pair data, recording the written key value pair data by adopting a fixed-size sorting String Table (SSTable) structure, and recording the key value pair data as an SSTable file, wherein the SSTable file comprises the key value pair data and a key range corresponding to the key value pair data; then orderly arranging the sizes of the key ranges of the SSTable files to generate fragment files, storing the fragment files in a disk of a log merging tree, determining whether a data layer of the disk meets a preset merging condition, merging all the fragment files in the data layer into a new fragment file if the data layer meets the preset merging condition, finally storing the new fragment file to the next data layer of the data layer, and deleting all the fragment files in the data layer; in the invention, SSTable files with fixed size are organized by fragments, a plurality of SSTable files in the fragment files are orderly arranged according to a key range, so that the data ordering is ensured, when the data merging is carried out, the data merging is different from the traditional merging of the data of the current data layer and the next layer, only all the files of the current data layer are merged into a new file to be stored in the next layer, and the files of the current data layer are deleted, so that the data rewriting operation is reduced, the space occupation caused by the merging can be reduced, and the performance of a data storage system is improved.

The Log Merge Tree is a storage structure (LSM-Tree) based on The Log Merge Tree, The LSM-Tree includes a memory and a disk, and data in The disk is stored in an SSTable structure. SSTable is a file where one key is ordered, storing key-value pairs in the form of strings, and SSTable provides a persistent, ordered, immutable mapping from keys to values, where keys and values (representing metadata) are strings of any byte length. SSTable provides the following operations: by querying a key for an associated value, a range of keys may be specified to traverse all key-value pair data therein. Each SSTable is internally composed of a series of data blocks (blocks), the data blocks being located using block indices (block indices) stored at the end of the SSTable; when the SSTable file is open, the block index is loaded into memory. The corresponding block is found in the index stored in the memory by searching through a query (lookup) operation, and then the content of the block is read from the disk.

The terminal device may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

In an embodiment, as shown in fig. 2, a data storage method based on a log merge tree is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

s10: and recording the written key value pair data by adopting a fixed-size sorting character string table structure, and recording the key value pair data as an SSTable file and an SSTable file.

And when storing the key value pair data to the log merge Tree LSM-Tree, recording the written key value pair data by adopting a fixed-size sorting character string table structure, and recording the key value pair data as an SSTable file. Wherein the key-value pair data comprises keys and metadata corresponding to the keys; the SSTable file includes key-value pair data and a key range corresponding to the key-value pair data, where the key range is composed of a maximum key and a minimum key in the key-value pair data, and the key range may be an index of the SSTable file.

S20: and orderly arranging the sizes of the key ranges of the SSTable files to generate fragment files, and storing the fragment files into a disk of a log merging tree.

After the written key value pair data is recorded by adopting a fixed-size sorting character string table structure and recorded as an SSTable file, the sizes of the key ranges of a plurality of SSTable files are orderly arranged to generate fragment files, and the fragment files are stored in a disk of a log merging tree. The method comprises the steps that a plurality of SSTable files with fixed sizes are organized in a fragmentation mode to generate a fragmentation file Shard, wherein the sizes of key ranges of the SSTable files are orderly arranged in the fragmentation file Shard, namely the key ranges of different SSTable files in the same fragmentation file Shard cannot be overlapped, so that the ordering of data is guaranteed, the subsequent indexing is facilitated, and the reading and amplifying problems are reduced. After the fragment files Shard are generated by orderly arranging the sizes of the key ranges of the SSTable files, the fragment files Shard are stored in a disk of a log merge tree, wherein the disk comprises a plurality of data layers, the fragment files Shard are stored in the data layers, one data layer can comprise a plurality of fragment files Shard, the key ranges of the fragment files Shard in the same data layer can be overlapped, and no sequence relation exists among the different fragment files Shard, so that the frequent rewriting operation of data for maintaining the key order can be reduced, and the data writing is facilitated.

The information of each SSTable file is recorded in the fragment file, and the information of the SSTable file comprises the information of the size of the SSTable file, key value pair data in the SSTable file, and the key range (including the maximum key and the minimum key) corresponding to the key value pair data.

The log merged Tree LSM-Tree comprises two structural layers of a memory and a disk, wherein the memory comprises a memory table and/or an index structure (index), the memory table is divided into a Memtable and an ImmutableTable, and SSTable files are stored on the disk. The complete storage process of the log merge Tree LSM-Tree comprises the following steps: the server writes the key value pair data into MemTable in the memory, after MemTable is fully written, MemTable becomes the state of not being writable, namely freeze the memory table (Immunable MemTable), write the key value pair data of the frozen memory table into the disk at this moment, produce SSTable file of the fixed size; then, in this embodiment, a plurality of fixed-size SSTable files are organized in a fragmentation form, and a fragmentation file Shard is generated, where each data layer includes a plurality of fragmentation files Shard. As shown in fig. 3, for example, there is one data layer for the i-disk, and the data layer includes two sharded files Shard, each sharded file Shard includes two SSTable files, where the left side of the SSTable file represents the minimum key of the SSTable file key range and the right side of the SSTable file represents the maximum key of the SSTable file key range. In fig. 3, the key ranges in the four SSTable files to the right are (1, 12), (14, 26), (7, 13), (21, 45), and it can be seen that the key ranges of the SSTable files in each sliced file are arranged in order in size, and the key ranges do not overlap.

S30: and determining whether the data layer of the disk meets a preset merging condition.

The method comprises the steps of sequentially arranging the sizes of a plurality of SSTable file key ranges to generate fragment files, storing the fragment files to a disk of a log merging tree, determining whether each data layer in the disk meets a preset merging condition, and if the data layer does not meet the preset merging condition, indicating that the data layer is not full and does not need to be merged, and continuously writing the fragment files.

S40: and if the data layer meets the preset merging condition, merging all the fragment files in the data layer into a new fragment file.

After determining whether each data layer in the disk meets a preset merging condition, if the data layer meets the preset merging condition, merging all the fragmented files in the data layer into a new fragmented file, indicating that the data layer is full of storage and needs to be merged, so as to empty the data of the data layer and write the data into subsequent data, and merging all the fragmented files in the data layer into the new fragmented file.

S50: and storing the new fragment file to the next data layer of the data layers, and deleting all the fragment files in the data layers.

After all the fragmented files in the data layer are merged into a new fragmented file, the new fragmented file is stored to the next data layer of the data layer, and all the fragmented files in the data layer are deleted, so that data can be written into the data layer subsequently.

It should be understood that the conventional LSM-Tree merging manner includes two types, namely size classification (sizing) and level compression (Leveling), wherein the size classification allows the key ranges of the SSTable files in each layer to overlap without frequent data rewriting for maintaining key order, but the exponentially growing ordered SSTable files occupy a large amount of storage space when merging, and redundant data can reduce the indexing performance; the hierarchical compression method solves the problem of occupying a large amount of storage space, but in order to maintain the ordering of keys in the SSTable files, the SSTable files need to be frequently rewritten, and the problem of write amplification exists in the major groups. In the embodiment, the SSTable files with fixed sizes are organized by the fragments, and a plurality of SSTable files in the fragment files are arranged in order according to the keys, so that the order of the keys is ensured, the data ordering is convenient for subsequent index query, and when data merging is performed, only all files in the current data layer are merged into a new file to be stored in the next layer, and the files in the current data layer are deleted.

In this embodiment, a fixed-size sorting string table structure is used to record written key value pair data, which is recorded as an SSTable file, where the SSTable file includes key value pair data and key ranges corresponding to the key value pair data, then the key ranges of a plurality of SSTable files are arranged in order to generate fragment files, the fragment files are stored in a disk of a log merge tree, it is determined whether a data layer of the disk meets a preset merge condition, if the data layer meets the preset merge condition, all fragment files in the data layer are merged into a new fragment file, and finally the new fragment file is stored in a next data layer of the data layer, and all fragment files in the data layer are deleted. The SSTable files with fixed sizes are organized by the fragments, a plurality of SSTable files in the fragment files are orderly arranged according to the key range, data ordering is ensured, when data merging is carried out, the data merging is different from the traditional method of merging the data of the current data layer and the data of the next layer together, all the files of the current data layer are merged into a new file to be stored in the next layer, the files of the current data layer are deleted, data rewriting operation is reduced, space occupation caused by merging can be reduced, and therefore the performance of a data storage system is improved.

In an embodiment, as shown in fig. 4, in step S30, that is, determining whether the data layer of the magnetic disk meets the preset merge condition, the method specifically includes the following steps:

s31: and determining whether the number of the fragmented files in the data layer reaches a preset number.

S32: and if the number of the fragmented files in the data layer reaches the preset number, determining that the data layer meets the preset merging condition.

S33: and if the number of the fragmented files in the data layer does not reach the preset number, determining that the data layer does not meet the preset merging condition.

Determining whether the number of the fragment files in each data layer reaches a preset number, if the number of the fragment files in the data layer does not reach the preset number, indicating that the data layer is not full of files, determining that the data layer does not meet a preset merging condition, and if the number of the fragment files in the data layer reaches the preset number, indicating that the data layer is full of files, determining that the data layer meets the preset merging condition.

In other embodiments, it may also be determined whether the data layer satisfies a preset merging condition according to the total data size of all the fragmented files in the data layer: summarizing the file sizes of the SSTable files in each fragment file to obtain the total data volume of all fragment files in the data layer, determining whether the total data volume of all fragment files in the data layer reaches a preset number, if so, determining that the data layer meets a preset merging condition, and if not, determining that the data layer does not meet the preset merging condition.

In the embodiment, whether the data layer meets the preset merging condition or not is determined according to the number of the fragmented files in the data layer, and compared with a traditional data volume threshold value judging mode, the method is simpler and more intuitive, data volume statistics is infinitely performed, and the calculation burden of a storage system is reduced. In addition, based on the storage mode of the fragmented files, different fragmented files may have different data volumes, that is, different data layers may have different numbers, and the merging operation may be frequently triggered by using a single data volume threshold mode, which reduces the data storage performance.

In this embodiment, by determining whether the number of the fragmented files in the data layer reaches a preset number, if the number of the fragmented files in the data layer reaches the preset number, it is determined that the data layer satisfies a preset merging condition; if the number of the fragment files in the data layer does not reach the preset number, the data layer is determined not to meet the preset merging condition, the specific step of determining whether the data layer of the disk meets the preset merging condition is determined, the classified number of the fragments is used as the merging triggering condition, and compared with a traditional data volume threshold value judging mode, the method is simpler and more intuitive, data volume statistics is carried out infinitely, and the calculation burden of a storage system is reduced.

In an embodiment, as shown in fig. 5, in step S40, merging all the sliced files in the data layer into a new sliced file specifically includes the following steps:

s41: SSTable files in all the fragment files in the data layer are determined and recorded as files to be merged.

After the data layer is determined to meet the preset merging condition, the SSTable files in all the fragmented files in the data layer need to be determined, and each SSTable file in the data layer is recorded as a file to be merged.

S42: and determining whether the key ranges of the files to be merged in the data layer are overlapped.

After SSTable files in all the fragmented files in the data layer are recorded as files to be merged, whether the key ranges of the files to be merged in the data layer overlap or not is determined.

S43: and if the key ranges of the files to be merged in the data layer are not overlapped, arranging and merging all the files to be merged in the data layer into a new fragment file according to the key range sequence.

After determining whether the key ranges of the files to be merged in the data layer are overlapped, if the key ranges of the files to be merged in the data layer are not overlapped, arranging and merging all the files to be merged in the data layer into a new fragment file according to the key range sequence.

For example, as shown in fig. 6, the plurality of data layers in the disk are L0 and L1 in sequence, and taking the preset number as 2 as an example, before merging the data layers, the data layer L0 has 2 Shard files Shard, each Shard file has 1 SSTable file, the minimum key (min) in the first SSTable file is 7, the maximum key (max) in the first SSTable file is 13, and the minimum key in the second SSTable file is 21 and the maximum key in the second SSTable file is 45; the two SSTable file key ranges in the data layer L0 do not overlap, and the two SSTable file key ranges are directly copied to the L1 layer to form a new Shard file Shard containing 2 SSTable files when being merged.

After all files to be merged in a data layer are arranged and merged into new fragmented files according to a key range sequence, the new fragmented files are stored in a next data layer, original fragmented data in the data layer are deleted, the process is equivalent to that only all data in the data layer are directly copied to the next data layer, the data of the next data layer do not participate in merging, only the fragmented files from which the data are output are directly transplanted to the next data layer, complex merging operation is not needed, only metadata information contained in the fragmented files is updated, new and redundant repeated data are not produced, the problem of increment of an enable file data index in a traditional merging mode can be effectively solved, writing amplification of the storage process is reduced, and the problems of space occupation of data rewriting and merging processes can be reduced.

In the embodiment, SSTable files in all fragmented files in a data layer are determined and recorded as files to be merged, and then whether key ranges of the files to be merged in the data layer are overlapped or not is determined; if the key ranges of the files to be merged in the data layer are not overlapped, all the files to be merged in the data layer are arranged and merged into a new fragment file according to the key range sequence, the specific process of merging all the fragment files in the data layer into the new fragment file is clarified, when the key ranges of the SSTable files to be merged are not overlapped, only the data information in the fragment files is updated, real merging operation is not generated, write amplification is reduced, and the new fragment file is directly stored in the next data layer instead of being merged with the SSTable file in the next data layer fragment file during subsequent merging, so that the space occupation of data rewriting and merging processes can be reduced.

In an embodiment, as shown in fig. 7, after step S42, that is, after determining whether there is an overlap between key ranges of files to be merged in the data layer, the method further includes the following steps:

s44: and if the key ranges of the files to be merged in the data layer are overlapped, merging and sequencing the files to be merged by adopting a merging and sequencing algorithm to obtain a plurality of new SSTable files.

S45: and orderly arranging the plurality of new SSTable files according to the size of the key range to generate new fragment files.

After determining whether the key ranges of the files to be merged in the data layer are overlapped, if the key ranges of the files to be merged in the data layer are overlapped, merging and sequencing the files to be merged by adopting a merging and sequencing algorithm to obtain a plurality of new SSTable files, wherein the key ranges of the new SSTable files are not overlapped. After the files to be merged are merged and sorted by adopting a merging and sorting algorithm to obtain a plurality of new SSTable files, the plurality of new SSTable files are orderly arranged according to the size of the key range to generate new fragment files.

The method for merging and sequencing the files to be merged by adopting a merging and sequencing algorithm to obtain a plurality of new SSTable files comprises the following steps: determining the key range of each file to be merged, determining two overlapped key ranges, determining the key with the minimum value and the key with the maximum value in the two overlapped key ranges, generating a new key range according to the key with the minimum value and the key with the maximum value, generating a new SSTable file by using the key value corresponding to the new key range and the new key range until all the two overlapped key ranges are traversed to obtain a plurality of new SSTable files, and then orderly arranging the plurality of new SSTable files according to the size of the key ranges to generate new fragment files.

For example, as shown in fig. 8, the plurality of data layers in the disk are L0, L1, and L2 in sequence, for example, the preset number is 2, the data layer L1 originally stores a sharded file, the original sharded file includes two SSTable files whose key ranges are not overlapped, the key ranges of the two SSTable files are (1, 12), (14, 26), respectively, after the data layer L0 is merged to the next data layer L1, the data layer L1 newly adds a sharded file Shard whose key ranges are (7, 13), (21, 45), respectively, so that the key ranges (1, 12) and (7, 13) are overlapped, the key ranges (14, 26) and (21, 45) are overlapped, the key with the smallest value in the key ranges (1, 12) and (7, 13) is determined to be 1, the key with the largest value is 13, and according to the two key ranges (1, 12) and (7, 13) with overlapping, 13) generating a new key range (1, 13), using the new key range (1, 13) and the corresponding key value pair data as a new SSTable file, thereby generating a new key range (14, 45) according to two overlapped key ranges (14, 26) and (21, 45), using the new key range (14, 45) and the corresponding key value pair data as another new SSTable file, finally orderly arranging the two newly generated SSTable files according to the size of the key range to generate a new Shard file Shard, wherein the new Shard file Shard comprises 2 non-overlapped ranges of SSTable files, the key ranges of the two SSTable files are (1, 13), (14, 45), respectively, merging the data layer L1 into the next data layer L2, and emptying the Shard file Shard of the L1 layer.

In this embodiment, after determining whether the key ranges of the files to be merged in the data layer overlap, if the key ranges of the files to be merged in the data layer overlap, merging and sorting the files to be merged by using a merging and sorting algorithm to obtain a plurality of new SSTable files, the key ranges of the new SSTable files do not overlap, and the plurality of new SSTable files are sequentially arranged according to the size of the key ranges to generate new fragment files.

In an embodiment, as shown in fig. 3, an index structure (index) is set on the memory of the log merge tree, where the index structure is used to record index parameters of data in each data layer of the disk, and the index structure may also be used to manage merging of each data layer. As shown in fig. 9, after step S20, that is, after the sizes of the multiple SSTable file key ranges are arranged in order to generate the sliced file, the method further includes the following steps:

s01: and generating an index key range according to all the fragment files in the data layer.

After the sizes of the key ranges of the SSTable files are orderly arranged to generate the fragment files, the data in each data layer is stored in the form of the fragment files, and at the moment, in order to facilitate subsequent index query of the data, the index key ranges are required to be generated according to all the fragment files in the data layers.

Wherein, generating the index key range according to all the fragment files in each data layer comprises: and determining a key range corresponding to each fragmented file in the data layer, wherein the key range corresponding to each fragmented file is a total key range obtained by summarizing according to the key ranges of the SSTable files in the fragmented file, summarizing and generating the key range corresponding to the data layer according to the key range corresponding to each fragmented file, and taking the key range corresponding to the data layer as an index key range of the data layer.

S02: and updating the index key range to an index structure to generate index data by taking the index key range as an index parameter corresponding to the data layer.

After generating index key ranges according to all the fragment files in the data layer, taking the index key ranges of the data layer as index parameters corresponding to the data layer, and updating the index parameters into an index structure to generate index data. The index data comprises a plurality of index parameters, each index parameter corresponds to one data layer, corresponding key value pair data can be searched in the corresponding data layer according to the index parameters, and therefore corresponding metadata can be obtained.

In the embodiment, by setting the index structure on the memory, after the key ranges of the multiple SSTable files are sequentially arranged to generate the fragment files, the index key range is generated according to all the fragment files in each data layer, the index key range is used as the index parameter corresponding to the data layer, and is updated to the index structure to generate the index data, the index data comprises multiple index parameters, the generation mode of the index data is defined, the corresponding index parameter can be conveniently searched for in the index structure subsequently according to needs, and then data query is performed in the corresponding data layer according to the index parameter, so that the data query amount is reduced, and the data index performance of the storage system is improved.

In an embodiment, after step S02, that is, after the index parameter is updated to the index structure to generate the index data, it is determined whether an index request input by the user through the index structure is received, where the index request includes a key (i.e., an index key) to be indexed, if an index request input by the user through the index structure is received, the index parameter corresponding to the index key is determined in the index data according to the index request, the index parameter corresponding to the index key is recorded as a target index parameter, then a data layer corresponding to the target index parameter is determined, and metadata meeting the index request is queried in the data layer, that is, the metadata corresponding to the index key is queried in the data layer, and the corresponding data layer is directly queried through the index key, so that data layers do not need to be traversed, a data query amount is reduced, and a data index performance of the storage system is improved.

In an embodiment, as shown in fig. 10, after step S50, that is, after storing a new fragmented file into a next data layer of the data layer and deleting all fragmented files in the data layer, the method further includes the following steps:

s61: generating an index key range according to all the fragment files in the next data layer of the data layers, and updating the index key range to an index structure to be used as an index parameter corresponding to the next data layer;

s62: the corresponding index parameters of the data layer are deleted in the index structure.

After the new fragmented file is stored to the next data layer of the data layers and all fragmented files in the data layers are deleted, the data in the current data layer and the data in the next data layer are changed, so that the index data in the index structure needs to be updated for subsequent data indexing.

Therefore, after storing the new fragmented file to the next data layer of the data layer and deleting all fragmented files in the data layer, since all fragmented files in the data layer are deleted, the data layer is emptied of non-existing data, so the corresponding index parameter for the data layer needs to be deleted in the index structure, so that the corresponding metadata can not be indexed by installing the original index parameters during the subsequent user indexing, and at the same time, since the data of the data layer is updated and stored in the next data layer, the index key range needs to be generated according to all the fragment files in the next data layer of the data layer, and updating the index key range to the index structure as the index parameter corresponding to the next data layer, namely, the newly generated index key range of the next data layer replaces the original index parameter, so that correct data indexing can be performed according to the index key in the following, and the possibility of data indexing failure is reduced.

In this embodiment, after storing a new fragment file to a next data layer of the data layer and deleting all fragment files in the data layer, an index key range is generated according to all fragment files in the next data layer of the data layer, the index key range is updated to the index structure as an index parameter corresponding to the next data layer, the index parameter corresponding to the data layer is deleted in the index structure, and the index data in the index structure is updated in time according to data changes in each data layer in the disk, so that correct data indexing is performed according to the index key in the following process, the possibility of data indexing failure is reduced, and the indexing performance of the storage system is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In an embodiment, a data storage device based on a log merge tree is provided, and the data storage device based on the log merge tree corresponds to the data storage method based on the log merge tree in the above embodiment one to one. As shown in fig. 11, the log merge-tree based data storage device includes a recording module 111, a generating module 112, a determining module 113, a merging module 114, and a deleting module 115. The detailed description of each functional module is as follows:

the recording module 111 is configured to record the written key value pair data by using a fixed-size sorting character string table structure, and record the key value pair data as an SSTable file, where the SSTable file includes the key value pair data and a key range corresponding to the key value pair data;

the generating module 112 is configured to sequentially arrange the sizes of the multiple SSTable file key ranges to generate fragmented files, and store the fragmented files in a disk of a log merge tree;

a determining module 113, configured to determine whether a data layer of the disk meets a preset merge condition;

a merging module 114, configured to merge all the fragmented files in the data layer into a new fragmented file if the data layer meets a preset merging condition;

and the deleting module 115 is configured to store the new fragmented file to a next data layer of the data layers, and delete all fragmented files in the data layers.

Further, the merging module 114 is specifically configured to:

Further, after determining whether there is an overlap between the key ranges of the files to be merged in the data layer, the merging module 114 is further specifically configured to:

Further, the determining module 113 is specifically configured to:

Further, after an index structure is arranged on the memory and the sizes of the plurality of SSTable file key ranges are arranged in order to generate the fragment file, the generating module 112 is further configured to:

Further, after storing the new fragment file to the next data layer of the data layers and deleting all the fragment files in the data layers, the generating module 112 is further specifically configured to:

For specific limitations of the data storage device based on the log merge tree, reference may be made to the above limitations of the data storage method based on the log merge tree, and details are not repeated here. The various modules in the log merge tree based data store described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 12. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for generating and using data based on the data storage method of the log merging tree. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a log merge tree based data storage method.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

determining whether a data layer of the disk meets a preset merging condition;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

determining whether a data layer of the disk meets a preset merging condition;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the apparatus may be divided into different functional units or modules to perform all or part of the above described functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A data storage method based on log merge tree is characterized by comprising the following steps:

recording written key value pair data by adopting a fixed-size sorting character string table structure, and recording the written key value pair data as an SSTable file, wherein the SSTable file comprises the key value pair data and a key range corresponding to the key value pair data;

the SSTable files are orderly arranged according to the size of the key range to generate fragment files, the fragment files are stored in a disk of a log merging tree, the disk comprises a plurality of data layers, each data layer comprises a plurality of fragment files, and the fragment files in the same data layer have no sequential relation;

determining whether the data layer of the disk meets a preset merging condition;

if the data layer meets the preset merging condition, merging all the fragment files in the data layer into a new fragment file;

2. The log merge tree based data storage method of claim 1, wherein the merging all of the sharded files in the data tier into a new sharded file comprises:

determining the SSTable files in all the fragment files in the data layer, and recording as files to be merged;

if the key ranges of the files to be merged in the data layer are not overlapped, arranging and merging all the files to be merged in the data layer into a new fragment file according to the key range sequence.

3. The log merge-tree based data storage method of claim 2, wherein after determining whether there is an overlap of the key ranges of the files to be merged in the data tier, the method further comprises:

and arranging the plurality of new SSTable files in order according to the size of the key range to generate new fragment files.

4. The log merge-tree based data storage method of claim 1, wherein the determining whether the data layer of the disk meets a preset merge condition comprises:

and if the number of the fragment files in the data layer reaches the preset number, determining that the data layer meets the preset merging condition.

5. The log merge tree-based data storage method of any one of claims 1-4, wherein after providing an index structure on the memory of the log merge tree and generating sharded files by ordering the plurality of SSTable files by the size of the key range, the method further comprises:

and updating the index key range as an index parameter corresponding to the data layer to the index structure to generate index data, wherein the index data comprises a plurality of index parameters.

6. The log merge tree based data storage method of claim 5, wherein after storing the new sharded file to a next data tier of the data tiers and deleting all of the sharded files in the data tiers, the method further comprises:

generating the index key range according to all the fragment files in a next data layer of the data layers, and updating the index key range to the index structure as the index parameter corresponding to the next data layer;

deleting the corresponding index parameter of the data layer in the index structure.

7. A log merge tree based data storage device, comprising:

the recording module is used for recording the written key value pair data by adopting a fixed-size sorting character string table structure and recording the key value pair data as an SSTable file, wherein the SSTable file comprises the key value pair data and a key range corresponding to the key value pair data;

the generating module is used for orderly arranging the SSTable files according to the size of the key range to generate fragment files and storing the fragment files into a disk of a log merging tree, wherein the disk comprises a plurality of data layers, each data layer comprises a plurality of fragment files, and the fragment files in the same data layer have no sequential relation;

a merging module, configured to merge all the fragmented files in the data layer into a new fragmented file if the data layer meets the preset merging condition;

8. The log merge tree based data storage device of claim 7, wherein said merging all of the sharded files in the data tier into a new sharded file comprises:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the log merge-tree based data storage method according to any one of claims 1 to 6.

10. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the log merge-tree based data storage method according to any one of claims 1 to 6.