CN118069043A - High-performance data storage software management method - Google Patents

High-performance data storage software management method Download PDF

Info

Publication number
CN118069043A
CN118069043A CN202311829229.1A CN202311829229A CN118069043A CN 118069043 A CN118069043 A CN 118069043A CN 202311829229 A CN202311829229 A CN 202311829229A CN 118069043 A CN118069043 A CN 118069043A
Authority
CN
China
Prior art keywords
speed
user data
data
space
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311829229.1A
Other languages
Chinese (zh)
Inventor
王冬
任晓瑞
杨琼
张鹏
朱双四
赵艾琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202311829229.1A priority Critical patent/CN118069043A/en
Publication of CN118069043A publication Critical patent/CN118069043A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The high-performance data storage software management method of the invention comprises the following steps: the metadata and the user data are stored separately, wherein the metadata adopts a B+ tree key value structure to manage a logic storage space, and the user data is directly stored in an independent storage medium; hierarchical management is carried out on the physical storage space according to a first-level high speed, a second-level high speed and a third-level low speed, and the log before writing, the metadata and the user data are respectively stored; for multi-modal data that spans logical blocks, logical block alignment, and logical block misalignment, a write strategy is employed in which copy-on-write is combined with a log before write. The invention can effectively improve the performance of the data storage software facing the high-speed storage medium.

Description

High-performance data storage software management method
Technical Field
The invention relates to the technical field of computer system software, in particular to a high-performance data storage software management method.
Background
With the rapid development of storage hardware technology, the read-write performance of the storage medium is improved from hundred MB/s (such as SATA HDD) of the traditional hard disk to GB/s (such as NVMe SSD and SATA SSD) of the current flash memory storage, and further improvement is achieved in the future. The traditional data storage software adopts a management mode of mixed storage of metadata and user data, so that small amount of metadata can be frequently inserted and read and written when user data is read and written, and the performance cost of metadata management is high. Particularly, when the method is applied to high-speed storage media such as flash SSD, the situation can seriously affect the full play of the performance advantages of hardware, and finally the data storage performance is greatly reduced. How to effectively reduce the metadata management performance overhead of the data storage software, so as to fully exert the performance advantage of the high-speed storage medium is a problem that needs to be studied with great importance. The current updating of the data storage software storage medium is changed into GB/s of flash memory storage, and the storage performance cannot meet the requirements.
Disclosure of Invention
In view of this, the high-performance data storage software management method provided by the invention solves the problem that the performance cost of metadata management is too high when the traditional metadata and user data hybrid storage management method is oriented to a high-speed storage medium, so that the data storage performance is not high.
A high-performance data storage software management method, suitable for metadata and user data management, comprises,
S101: the metadata and the user data are stored separately, wherein the metadata adopts a 'B+ tree' key value structure to manage physical storage space, and the user data is directly stored in an independent storage medium;
S102: hierarchical management of physical storage space, comprising: dividing a physical storage space into a first-level high speed, a second-level high speed and a third-level low speed, wherein the first-level high speed is used for storing a log before writing, the second-level high speed is used for storing metadata, and the third-level low speed is used for storing user data;
S103: in the hierarchical management process of the physical storage space, multi-mode data which spans logic blocks, is aligned with the logic blocks and is not aligned with the logic blocks exists, the multi-mode data adopts a writing mode of combining copy-on-write with log before write, formats the log before write structure of the primary high speed and formats the key value structure of a B+ tree of the secondary high speed;
s104: initializing the primary high-speed pre-write log management structure, initializing the B+ tree key value structure of the secondary high-speed physical storage space, recovering user data from the primary high-speed pre-write log structure for maintaining the primary high-speed pre-write log structure, and maintaining the secondary high-speed B+ tree key value structure;
s105: performing an access operation to the multimodal data, comprising:
and executing the operation of writing cross-logic block or non-aligned logic block user data into the three-stage slow overwriting, executing the operation of writing logic block aligned user data into the three-stage slow overwriting, executing the operation of writing user data into the three-stage slow appending, and executing the operation of reading user data from the three-stage slow.
Advantageous effects
The method of the invention can effectively reduce the metadata management performance cost of the data storage software, thereby fully playing the performance advantage of the high-speed storage medium and greatly improving the data storage performance.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
FIG. 1 is a schematic diagram of a high-performance data storage software management method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating an implementation of a method for managing high-performance data storage software according to an embodiment of the present application.
Detailed Description
Embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present disclosure will become readily apparent to those skilled in the art from the following disclosure, which describes embodiments of the present disclosure by way of specific examples. It will be apparent that the described embodiments are merely some, but not all embodiments of the present disclosure. The disclosure may be embodied or practiced in other different specific embodiments, and details within the subject specification may be modified or changed from various points of view and applications without departing from the spirit of the disclosure. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concepts of the disclosure by way of illustration, and only the components related to the disclosure are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that aspects may be practiced without these specific details.
A "B+ tree" is a tree data structure, an n-ary tree, each node typically having multiple children, and a B+ tree containing root nodes, internal nodes, and leaf nodes. The root node may be a leaf node or a node containing two or more child nodes.
In order to ensure that user data can be stably and correctly stored, the data storage software needs to perform free space allocation and recovery management, log management and the like, and the internal management data are collectively called metadata, so that the metadata and the user data of the traditional data storage software are subjected to mixed storage management. In the normal reading and writing process of the user data, a small amount of metadata can be frequently inserted and read and written so as to position information such as the mapping relation between the logic space and the physical space, and the like, so that the reading and writing of the user data can be interrupted frequently, the performance advantage of the high-speed storage medium can not be fully exerted, and the data storage performance can not be further improved. In order to improve the performance of the data storage software, a method of combining metadata separate storage, storage space hierarchical management and multi-mode data write strategy is adopted, and the high-performance data storage software management method shown in fig. 1 is applicable to the management of metadata and user data, and comprises the following steps of,
S101: the metadata and the user data are stored separately, wherein the metadata adopts a 'B+ tree' key value structure to manage the physical storage space, and the user data is directly stored in an independent storage medium;
S102: hierarchical management of physical storage space, comprising: the physical storage space is divided into a primary high speed (the primary high speed physical storage space is simply referred to as a "primary high speed"), a secondary high speed (the secondary high speed physical storage space is simply referred to as a "secondary high speed"), and a tertiary slow speed, wherein the primary high speed is used for storing the pre-write log, the secondary high speed is used for storing the metadata, and the tertiary slow speed (the tertiary slow speed physical storage space is simply referred to as a "tertiary slow speed") is used for storing the user data;
S103: in the hierarchical management process of the physical storage space, multi-mode data which cross logic blocks, align the logic blocks and do not align the logic blocks exist, the multi-mode data adopts a writing mode of combining copy-on-write with log before write, a first-level high-speed log before write structure is formatted, and a second-level high-speed 'B+ tree' key value structure is formatted;
S104: the method comprises the steps of initializing a primary high-speed pre-write log management structure, initializing a B+ tree key value structure of a secondary high-speed physical storage space, recovering user data from the primary high-speed pre-write log structure for maintaining the primary high-speed pre-write log structure, and maintaining the secondary high-speed B+ tree key value structure;
S105: performing access operations for multimodal data, the access including reading and writing, including:
And executing the operation of writing cross-logic block or non-aligned logic block user data into the three-stage slow overwriting, executing the operation of writing the user data into the three-stage slow appending, and executing the operation of reading the user data from the three-stage slow overwriting.
As a specific embodiment provided herein, a secondary high-speed "B + tree" key structure is formatted, including,
When the data storage software is formatted, according to the size of the logic block, the size of the first-stage high-speed space physical block and the maximum concurrent command channel number of the third-stage slow space storage medium, the number of the initial physical block and the maximum physical block which occupy the first-stage high-speed by all the logs before writing are calculated, and the range data is invalidated, wherein,
The maximum entry number of the write-ahead log is the maximum concurrent command channel number of the three-level slow space storage medium, and each channel corresponds to a write-ahead log structure;
The single write-ahead log structure comprises a starting physical block number to be written into the tertiary slow space, a valid state mark and user data, and the single write-ahead log structure size is the sum of the logical block size and the primary high-speed space physical block size.
As a specific embodiment provided herein, a two-level high-speed "b+ tree" key structure is formatted, comprising:
When the data storage software is formatted, calculating the size of a single key value pair and the total number of key value pairs contained in a single node of a B+ tree according to the size of a logic block, the total number of the logic blocks and the size of a physical block of a secondary high-speed space, and writing the key value pairs into a root node of the B+ tree of the secondary high-speed space;
The single key-value pair is in the form of "< logical block number, data block set >", wherein logical block number is used as key, data block set is used as value, which indicates the starting physical block number and number of consecutive blocks, and the state of data block set is divided into idle and allocated.
As a specific embodiment provided herein, a primary high-speed write-ahead log management structure is formatted, comprising:
When the data storage software is started, a pre-write log management structure is created in a memory, and the space pre-write log initial physical block number, the pre-write log structure size and the maximum entry number of the pre-write log in a first-level high speed are filled;
Initializing the secondary high-speed 'B+ tree' key value structure, wherein when the data storage software is started, the 'B+ tree' key value structure is created in a memory, and the physical block number, the size of a single key value pair and the total number of the key value pairs contained in the single node of the 'B+ tree' of the secondary high-speed space are filled.
As a specific embodiment provided in the present application, S105: performing an access operation to the multimodal data, comprising:
Recovering user data from the first-level high-speed pre-write log structure, wherein when the data storage software is started, the first-level high-speed space pre-write log initial physical block number is acquired from the pre-write log management structure of the memory, the data is sequentially and circularly read and analyzed from the first-level high speed according to the size of the pre-write log structure and the maximum entry number of the pre-write log, if the state mark of the pre-write log is valid, the user data is recovered and written into the third-level low-speed physical block, otherwise, the next pre-write log is skipped and continuously processed;
Maintaining a first-level high-speed pre-write log structure, wherein when data storage software is started, after the first-level high-speed pre-write log structure is successfully recovered, the first-level high-speed pre-write log structure is set as invalid; when the data storage software runs, when user data in a logic space is written in a non-aligned manner across logic blocks or the logic blocks, the purpose of delaying writing is achieved through the log before writing, and the user data and the physical block numbers of less than one logic block are written into the log before writing structure of the first-level high-speed corresponding channel, if yes. If the log structure before writing of the position is valid, recovering and invalidating the log structure before writing into a new log;
The B+ tree key value structure for maintaining the secondary high-speed physical storage space specifically comprises the following steps:
After the data storage software is formatted, the B+ tree of the second-level high-speed space takes the whole third-level low-speed space as a whole section of continuous idle physical block set, and is stored in an idle key value pair mode;
When the data storage software writes user data, a search key is generated according to the initial logic block number, traversing is sequentially carried out from the root node of the 'B+ tree' of the secondary high-speed space, and searching for the assigned key value pair of the leaf node is attempted: if the user data is not found, the user data is indicated to be additionally written, the idle key value pair is continuously traversed according to the search key, a section of continuous idle physical block set is allocated, user data is written into the third-level slow space, and after success, the modified B+ tree node is written into the second-level high-speed space; otherwise, it indicates that for overwriting writing, it needs to traverse the idle key value pair according to the search key, then allocate a segment of continuous idle physical block set and write user data into the three-level slow space, after success, update the old key value pair into the newly allocated key value pair and release the old physical block set, and finally write the modified B+ tree node into the secondary high-speed space.
As a specific embodiment provided herein, the operations of writing cross-logical blocks or logical block non-aligned user data to the tertiary slow overwrite, writing logical block aligned user data to the tertiary slow overwrite, writing user data to the tertiary slow append, reading user data to the tertiary slow read are performed, specifically,
Performing an operation of writing cross-logical block or logical block non-aligned user data to the tertiary slow overwrite, comprising: when the data storage software runs, user data crossing the logic blocks means that the size of intermediate data is integral multiple of the logic blocks, and the head or tail of the intermediate data is less than one logic block; the non-aligned user data of the logic block refers to a logic block with a size smaller than one, the user data is written in a delayed manner according to the situation that the size of the logic block is smaller than one, wherein the old physical block data with the size smaller than one logic block corresponding to three stages at a low speed are read into a memory and combined with new data to form complete new data of the logic block, then the new data of the logic block are traversed from a root node of a B+ tree with a high speed of a second stage in sequence, an allocated key value pair of a leaf node is searched for to obtain a physical block number corresponding to the logic block number, and finally the complete new data of the logic block and the physical block number are written into a log structure before writing of a space corresponding to a high speed of a first stage;
Performing an operation of aligning user data to the tertiary slow overwrite logical block, comprising: when the data storage software runs, the logical block alignment user data means that the data size is integral multiple of the logical block, a copy-on-write mechanism can sequentially traverse from a secondary high-speed 'B+ tree' root node, find idle key value pairs of leaf nodes, redistribute a new three-level slow idle physical block set to write the user data, cannot directly overwrite and write original user data, release the old physical block set space after the writing is successful, and update the secondary high-speed 'B+ tree' node.
Performing an additional write of user data to the tertiary slow physical storage space, comprising: when the data storage software runs, traversing from the secondary high-speed 'B+ tree' root node in turn, searching the idle key value pair of the leaf node, distributing a section of idle physical block set from the tertiary slow speed for the user data of the logic space, and then directly writing the user data into the physical block set.
Performing an operation of slowly reading user data from the tertiary, comprising: when the data storage software runs, a searching key is generated by using the initial logic block number of the user data to be read, the searching key is traversed from the root node of the 'B+ tree' with the second-level high speed in sequence, the assigned key value pair of the leaf node is searched, and the user data is read according to the corresponding three-level slow-speed space physical block set.
As the specific implementation mode provided by the scheme, the size of the logic block is configured to be 2 n times of the size of the physical block of the three-level slow space storage medium, and n is a positive integer;
The first-level high-speed configuration is an NVMe-SSD storage medium, and the second-level high-speed configuration is a SATA-SSD or NVMe-SSD storage medium;
The tertiary slow configuration is either a SATA-HDD or SATA-SSD or NVMe-SSD storage medium.
Example 1
Step 1: formatting a pre-write log structure of a primary high-speed physical storage space;
Specifically, when the data storage software is formatted, calculating the maximum size and the range of the primary high-speed physical storage space occupied by all the pre-write logs according to the size of a logic block, the size of a primary high-speed physical block and the maximum concurrent command channel number of a tertiary slow-speed space storage medium, and invalidating the range data, wherein the maximum entry number of the pre-write logs is the maximum concurrent command channel number of the tertiary slow-speed space storage medium, and each channel corresponds to one pre-write log structure; the single write-before log structure comprises a starting physical block number to be written into the three-level slow space, a valid state mark and user data, and the single write-before log structure size is the sum of the physical block size and the logical block size of the first-level slow space.
Step 2: formatting a B+ tree key value structure of a secondary high-speed physical storage space;
Specifically, when the data storage software is formatted, calculating the size of a single key value pair and the total number of key value pairs contained in a single node of a B+ tree according to the size and the total number of the logic blocks and the size of a secondary high-speed space physical block, and writing the information into the B+ tree root node of the secondary high-speed physical storage space; the single key value pair is in the form of < initial offset, data block set >, wherein the key represents initial logical block number, the value represents initial physical block number and continuous block number, and the types are divided into idle and allocated.
Step 3: initializing a pre-write log management structure of the primary high-speed physical storage space;
Specifically, when the data storage software is started, a log management structure before writing is created in the memory, and a first-level high-speed space log starting physical block number before writing, a log structure size before writing and a maximum entry number of the log before writing are required to be filled.
Step 4: initializing a B+ tree key value management structure of the secondary high-speed physical storage space;
Specifically, when the data storage software is started, a B+ tree key value management structure is created in the memory, and a secondary high-speed space B+ tree root node physical block number, a single key value pair size and the total number of key value pairs accommodated by a single node of the B+ tree are required to be filled.
Step 5: recovering user data from the pre-write log structure of the primary high-speed physical storage space;
specifically, when the data storage software is started, a first-stage high-speed space pre-write log initial physical block number is obtained from a pre-write log management structure of a memory, and data is read and analyzed from the first-stage high-speed space in a sequential cycle mode according to the size of the pre-write log structure and the maximum entry number of the pre-write log: if the log state mark before writing is valid, restoring and writing the user data into a physical block of the three-level slow space; otherwise, the next write-ahead log is skipped from being processed.
Step 6: maintaining a pre-write log structure of a first level of high-speed physical storage space;
specifically, when the data storage software is started, after the write-before log structure of the first-level high-speed space is successfully recovered, the write-before log structure is set as invalid;
When the data storage software runs, when user data of a logic space is written across logic blocks or the logic blocks are not aligned, the purpose of delaying writing is achieved through a log before writing, namely, the user data of less than one logic block and the physical block number of the user data are written into a log before writing structure of a corresponding channel of a first-level high-speed space: if the pre-write log structure for that location is already valid, it is first restored and invalidated, and then a new log is written.
Step 7: specifically, a B+ tree key value structure of a secondary high-speed physical storage space is maintained;
Specifically, after the data storage software is formatted, the B+ tree of the second-level high-speed space takes the whole third-level low-speed space as a section of continuous idle physical block set, and stores the whole third-level low-speed space in an idle key value pair mode;
when the data storage software writes user data, a search key is generated according to the initial logic block number, the search key is traversed from the B+ tree root node of the secondary high-speed space in sequence, and the search of the assigned key value pair of the leaf node is attempted: if the user data is not found, the user data is indicated to be additionally written, the idle key value pair is continuously traversed according to the search key, a section of continuous idle physical block set is allocated, user data is written into the third-level slow space, and after success, the modified B+ tree node is written into the second-level high-speed space; otherwise, it indicates that for overwriting writing, it needs to traverse the idle key value pair according to the search key, then allocate a segment of continuous idle physical block set and write user data into the three-level slow space, after success, update the old key value pair into the newly allocated key value pair and release the old physical block set, and finally write the modified B+ tree node into the secondary high-speed space.
Step 8: performing an overlay write of cross-logical block or logical block unaligned user data to the tertiary slow physical storage space;
Specifically, when the data storage software runs, the user data crossing the logic blocks means that the middle data size is integer multiple of the logic blocks, but the head or tail is less than one logic block; the non-aligned user data of a logical block refers to less than one logical block in size. The user data in the two forms can be written in a delayed manner according to the condition that the size of one logic block is less than that of the logic block, namely, the old physical block data corresponding to the logic block in the three-level slow space is read into a memory, then combined with new data to form complete new logic block data, then a new physical block number is acquired from the B+ tree key value structure, and the complete new logic block data is written into a log-before-write structure of a channel corresponding to the high-speed space.
Step 9: performing an overlay write operation of overlay writing logical block alignment user data to the tertiary slow physical storage space;
Specifically, when the data storage software runs, the logical block alignment user data refers to that the data size is integer times of the logical block, a copy-on-write mechanism is used in the situation, a new free physical block set is firstly distributed from a B+ tree key value structure for the user data of the logical space from a third-level slow space, the user data is directly written into the new physical block set, an old physical block set space is released after success, and B+ tree nodes of a second-level high-speed space are updated.
Step 10: executing the operation of additionally writing user data into the three-level slow physical storage space;
Specifically, when the data storage software additionally writes user data, a section of idle physical block set is allocated from the three-level slow space for the user data of the logic space from the B+ tree key value structure, and the user data is directly written into the physical block set.
Step 11: performing an operation of reading user data from the tertiary slow physical storage space;
specifically, when the data storage software reads user data, a search key is generated by using a start logic block number, the search key is sequentially traversed from the B+ tree root node of the secondary high-speed space, the assigned key value pair of the leaf node is searched, and the user data is read according to the corresponding tertiary slow-speed space physical block set.
The logical block size may be configured to be 2n times the physical block size of the three-level slow space storage medium, n being an integer not less than 0.
The primary high-speed physical storage space may be configured as an NVMe SSD storage medium.
The secondary high-speed physical storage space may be configured as SATA SSD, NVMe SSD storage media.
The tertiary slow physical storage space may be configured as a SATA HDD, SATA SSD, NVMe SSD storage media.
Example two
Referring to fig. 1 to 2, the method for managing high-performance data storage software according to the present invention comprises the following steps:
1) Setting a storage medium of a first-level high-speed physical storage space as NVMe SSD, wherein the capacity is 128MB, the physical block size is 512 bytes, and the total number of physical blocks is 26144; setting a storage medium of a secondary high-speed physical storage space as SATA SSD, wherein the capacity is 512MB, the physical block size is 512 bytes, and the total number of physical blocks is 1048576; the storage medium of the three-level slow physical storage space is set as a SATA HDD, the capacity is 256GB, the physical block size is 512 bytes, the total number of physical blocks is 536870912, the logical block size is 4096 bytes, the total number of logical blocks is 67108864, and the maximum concurrent command channel number is 32.
2) When the data storage software formats, all the log structures before writing of the first-level high-speed physical storage space occupy 32 ANGSTROM (512+4096) = 147456 bytes, namely, the data in the range of 0-288 physical block sets are invalid.
3) When the data storage software is formatted, the single key size of the B+ tree key value structure of the secondary high-speed physical storage space is 8 bytes, the single key size is 16 bytes, the node size of one B+ tree is 512 bytes, node head information of 32 bytes is removed, and 20 key value pairs can be accommodated. And writing the information into a No. 0 physical block of the secondary high-speed physical storage space to serve as a B+ tree root node.
4) When the data storage software is started, a pre-write log management structure is created in the memory, and the initial physical block number, the size and the maximum entry number of the pre-write log are set to 0, 4608 and 32 respectively.
5) When the data storage software is started, a B+ tree key value management structure is created in the memory, and the physical block number of the secondary high-speed space B+ tree root node, the size of a single key value pair and the total number of the key value pairs contained in the single node of the B+ tree are respectively set to be 0, 24 and 20.
6) When the data storage software is started, starting from the No. 0 physical block of the primary high-speed physical storage space, the pre-write log structure data of 9 physical blocks are sequentially read according to the 4608 size, and the maximum number of the pre-write log structure data is 32. Assuming that only the first pre-write log structure is valid at present, the first 512 bytes are log header information, the last 4096 bytes are user data, the number of the initial physical block of the three-stage slow physical storage analyzed is 384, and the 4096 bytes of user data are written into the 8 first physical blocks of the 384 th physical block, so that the pre-write log is restored, and the pre-write log structure is set to be invalid after success.
7) When the data storage software runs, assuming that 11488 bytes of user data starts to be overwritten to the 800 byte offset of the 96 th logical block of the three-level slow physical storage space, the writing channel number is 13, the process can be divided into two steps: partially overwriting 3296 bytes to the 96 th logical block; 8192 bytes are written in full cover to logical blocks 97-98. For the first operation, turn 8) is performed; for the second step operation, the turning 9) is performed.
8) The start logical block number 96 is used to generate a lookup key that traverses to lookup the corresponding assigned key-value pair starting from the b+ root node of physical block number 0 of the secondary high speed physical storage space. Assuming that the key value pair is analyzed, the physical blocks 1216-1223 corresponding to the 96 th logical block in the three-level slow space are obtained, the old user data of the 96 th logical block are read, and the old data of the first 800 bytes and the last 3296 bytes to be written are combined into one whole piece of data. Next, continuing to locate the initial physical block number 117 of the 13-channel pre-write log structure of the first-level high-speed space, writing the three-level slow-speed space physical block number 1216 and the merged whole 4096 bytes of user data into the pre-write log structure, and finally writing a valid state flag into the pre-write log structure.
9) The start logical block number 97 is used to generate a lookup key, starting from the B+ tree root node of the second-level high-speed physical storage space No. 0 physical block, and traversing to find the corresponding assigned old key value pair. After the old key value pair is analyzed, the physical blocks of the 97 th-98 th logical blocks corresponding to the 1224 th-1239 th tertiary slow space are obtained, the B+ tree is continuously traversed to search and allocate the physical block spaces corresponding to the two idle logical blocks, and the allocated tertiary slow space idle physical blocks are 1448 th-1463 th. Then writing 8192 bytes of user data into the newly allocated idle physical blocks, after success, updating the B+ tree logical blocks 97-98 of the secondary high-speed physical storage space into 1448-1463 physical blocks with old key value pairs, and then updating the space states of 1224-1239 physical blocks to idle.
10 If the user data of 8192 bytes is added to the head of the 97 th logical block of the third-level slow-speed physical storage space, generating a search key by using the starting logical block number 97, traversing and distributing physical block spaces corresponding to two idle logical blocks from the B+ tree root node of the 0 th physical block of the second-level high-speed physical storage space, and assuming that the distributed third-level slow-speed space idle physical blocks are 1224 th to 1239 th. And writing 8192 bytes of user data into the newly allocated idle physical blocks, adding a 96 th-97 th logical block key value pair to the secondary high-speed physical storage space B+ tree after success, and setting the key value pair content as a 1224 th-1239 th physical block.
11 If the data storage software is running, if 8192 bytes of user data are read from the head of the 97 th logical block of the third-level slow physical storage space, a search key is generated by using the starting logical block number 97, and the corresponding allocated key value pair is searched in a traversing way from the B+ tree root node of the 0 th physical block of the second-level high-speed physical storage space. Assuming that after the key value pair is analyzed, the 97 th-98 th logical block corresponds to the 1224 th-1239 th physical block of the three-level slow space, the user data is read from the range of the physical block set
The above is merely a specific embodiment of the disclosure, but the protection scope of the disclosure is not limited thereto, and any changes or substitutions easily conceivable by those skilled in the art within the technical scope of the disclosure are intended to be covered in the protection scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (8)

1. A high-performance data storage software management method, which is suitable for metadata and user data management, characterized in that the method comprises,
S101: the metadata and the user data are stored separately, wherein the metadata adopts a 'B+ tree' key value structure to manage physical storage space, and the user data is directly stored in an independent storage medium;
S102: hierarchical management of physical storage space, comprising: dividing a physical storage space into a first-level high speed, a second-level high speed and a third-level low speed, wherein the first-level high speed is used for storing a log before writing, the second-level high speed is used for storing metadata, and the third-level low speed is used for storing user data;
S103: in the hierarchical management process of the physical storage space, multi-mode data which spans logic blocks, is aligned with the logic blocks and is not aligned with the logic blocks exists, the multi-mode data adopts a writing mode of combining copy-on-write with log before write, formats the log before write structure of the primary high speed and formats the key value structure of a B+ tree of the secondary high speed;
s104: initializing the primary high-speed pre-write log management structure, initializing the B+ tree key value structure of the secondary high-speed physical storage space, recovering user data from the primary high-speed pre-write log structure for maintaining the primary high-speed pre-write log structure, and maintaining the secondary high-speed B+ tree key value structure;
s105: performing an access operation to the multimodal data, comprising:
and executing the operation of writing cross-logic block or non-aligned logic block user data into the three-stage slow overwriting, executing the operation of writing logic block aligned user data into the three-stage slow overwriting, executing the operation of writing user data into the three-stage slow appending, and executing the operation of reading user data from the three-stage slow.
2. The method of claim 1, wherein formatting the secondary high-speed "B+ tree" key structure comprises,
When the data storage software is formatted, according to the size of the logic block, the size of the first-stage high-speed space physical block and the maximum concurrency command channel number of the third-stage low-speed space storage medium, calculating the number of the initial physical block and the maximum physical block occupied by all the logs before writing, and effectively obtaining the range data represented between the number of the initial physical block and the maximum physical block, wherein,
The maximum entry number of the write-ahead log is the maximum concurrent command channel number of the three-level slow space storage medium, and each channel corresponds to a write-ahead log structure;
The single write-ahead log structure comprises a starting physical block number to be written into the tertiary slow space, a valid state mark and user data, and the single write-ahead log structure size is the sum of the logical block size and the primary high-speed space physical block size.
3. The method of claim 2, wherein formatting the secondary high-speed "b+ tree" key structure comprises:
When the data storage software is formatted, calculating the size of a single key value pair and the total number of key value pairs contained in a single node of a B+ tree according to the size of a logic block, the total number of the logic blocks and the size of a physical block of a secondary high-speed space, and writing the key value pairs into a root node of the B+ tree of the secondary high-speed space;
The single key value pair is in the form of "< logical block number, data block set >", wherein the logical block number is used as a key, the data block set is used as a value to represent the starting physical block number and the number of continuous blocks, and the states of the data block set are divided into idle states and allocated states.
4. The method of claim 3, wherein formatting the primary high-speed write-ahead log management structure comprises:
when the data storage software is started, a pre-write log management structure is created in a memory, and the space pre-write log initial physical block number, the pre-write log structure size and the maximum entry number of the pre-write log in a first-level high speed are filled;
Initializing the secondary high-speed 'B+ tree' key value structure, wherein when the data storage software is started, the 'B+ tree' key value structure is created in a memory, and the physical block number, the size of a single key value pair and the total number of key value pairs contained in the single node of the 'B+ tree' in the secondary high-speed space are filled.
5. The high-performance data storage software management method according to claim 4, wherein S105: performing an access operation to the multimodal data, comprising:
Recovering user data from the first-stage high-speed pre-write log structure, wherein when data storage software is started, a first-stage high-speed space pre-write log initial physical block number is obtained from a pre-write log management structure of a memory, the data is sequentially and circularly read and analyzed from the first-stage high-speed according to the size of the pre-write log structure and the maximum entry number of the pre-write log, if the pre-write log state mark is valid, the user data is recovered and written into the third-stage low-speed physical block, otherwise, the next pre-write log is skipped and continuously processed;
Maintaining a first-level high-speed pre-write log structure, wherein when data storage software is started, after the first-level high-speed pre-write log structure is successfully recovered, the first-level high-speed pre-write log structure is set as invalid; when the data storage software runs, when user data in a logic space is written in a non-aligned manner across logic blocks or the logic blocks, the purpose of delaying writing is achieved through a log before writing, user data and physical block numbers of less than one logic block are written into the log before writing structure of the first-stage high-speed corresponding channel, if the log before writing structure at the position is effective, the log before writing structure is recovered and invalidated first, and then a new log is written;
The 'B+ tree' key value structure for maintaining the secondary high-speed physical storage space specifically comprises the following steps:
after the data storage software is formatted, the whole third-level slow space is used as a whole section of continuous idle physical block set by the 'B+ tree' of the second-level high-speed space, and is stored in an idle key value pair mode;
when the data storage software writes user data, a search key is generated according to the initial logic block number, traversing is sequentially carried out from the root node of the 'B+ tree' of the secondary high-speed space, and searching for the assigned key value pair of the leaf node is attempted: if not, indicating that writing is added, continuing traversing the idle key value pair according to the search key, then distributing a section of continuous idle physical block set, writing user data into the three-level slow space, and writing the modified B+ tree node into the two-level high-speed space after success; otherwise, it indicates that for overwriting writing, it needs to traverse the idle key value pair according to the search key, then allocate a segment of continuous idle physical block set and write user data into the three-level slow space, after success, update the old key value pair into the newly allocated key value pair and release the old physical block set, and finally write the modified node of the B+ tree into the secondary high speed.
6. The high performance data storage software management method of claim 5, wherein performing the operation of writing cross-logical block or logical block non-aligned user data to the tertiary slow overwrite, performing the operation of writing logical block aligned user data to the tertiary slow overwrite, performing the operation of writing user data to the tertiary slow append, performing the operation of reading user data from the tertiary slow comprises:
Performing an operation of writing cross-logical block or logical block non-aligned user data to the tertiary slow overwrite, comprising: when the data storage software runs, user data crossing the logic blocks means that the size of intermediate data is integral multiple of the logic blocks, and the head or tail of the intermediate data is less than one logic block; the non-aligned user data of the logic block refers to a logic block with a size smaller than one, the user data is written in a delayed manner according to the situation that the size of the logic block is smaller than one, wherein the old physical block data with the size smaller than one logic block corresponding to three stages at low speed are read into a memory, are combined with new data to form complete new data of the logic block, then the new data of the logic block are traversed from a root node of a B+ tree with a high speed of a second stage in sequence, an allocated key value pair of a leaf node is searched for to obtain a physical block number corresponding to the logic block number, and finally the complete new data of the logic block and the physical block number are written into a log structure before writing of a channel corresponding to a high-speed space of a first stage.
Performing an operation of aligning user data to the tertiary slow overwrite logical block, comprising: when the data storage software runs, the logical block alignment user data means that the data size is integral multiple of the logical block, a copy-on-write mechanism traverses from a secondary high-speed 'B+ tree' root node in sequence, searches idle key value pairs of leaf nodes, redistributes a new three-level slow idle physical block set to write the user data, does not directly overwrite and write original user data, releases the old physical block set space after successful writing, and updates the secondary high-speed 'B+ tree' node;
Performing an additional write of user data to the tertiary slow physical storage space, comprising: when the data storage software runs, traversing from a second-level high-speed 'B+ tree' root node in sequence, searching idle key value pairs of leaf nodes, slowly distributing a section of idle physical block set from three levels for user data of the logic space, and then directly writing the user data into the physical block set;
Performing an operation of slowly reading user data from the tertiary, comprising: when the data storage software runs, a searching key is generated by using the initial logic block number of the user data to be read, the searching key is traversed from the root node of the 'B+ tree' with the second-level high speed in sequence, the assigned key value pair of the leaf node is searched, and the user data is read according to the corresponding three-level slow-speed space physical block set.
7. The method of claim 6, wherein the logical block size is 2 n times the physical block size of the three-level slow space storage medium, and n is a positive integer;
The primary high speed is configured as an NVMe-SSD storage medium and the secondary high speed is configured as a SATA-SSD or NVMe-SSD storage medium.
8. The high performance data storage software management method of claim 7, wherein the tertiary slow configuration is a SATA-HDD or SATA-SSD or NVMe-SSD storage media.
CN202311829229.1A 2023-12-27 2023-12-27 High-performance data storage software management method Pending CN118069043A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311829229.1A CN118069043A (en) 2023-12-27 2023-12-27 High-performance data storage software management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311829229.1A CN118069043A (en) 2023-12-27 2023-12-27 High-performance data storage software management method

Publications (1)

Publication Number Publication Date
CN118069043A true CN118069043A (en) 2024-05-24

Family

ID=91108333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311829229.1A Pending CN118069043A (en) 2023-12-27 2023-12-27 High-performance data storage software management method

Country Status (1)

Country Link
CN (1) CN118069043A (en)

Similar Documents

Publication Publication Date Title
CN107862064B (en) High-performance and extensible lightweight file system based on NVM (non-volatile memory)
US11301379B2 (en) Access request processing method and apparatus, and computer device
CN110347336B (en) Key value storage system based on NVM (non volatile memory) and SSD (solid State disk) hybrid storage structure
US8225029B2 (en) Data storage processing method, data searching method and devices thereof
US8738845B2 (en) Transaction-safe fat file system improvements
EP3159810B1 (en) Improved secondary data structures for storage class memory (scm) enabled main-memory databases
CN111459846B (en) Dynamic hash table operation method based on hybrid DRAM-NVM
US7831626B1 (en) Integrated search engine devices having a plurality of multi-way trees of search keys therein that share a common root node
CN105117415A (en) Optimized SSD data updating method
CN108319625B (en) File mergences method and apparatus
CN111522507B (en) Low-delay file system address space management method, system and medium
US11030092B2 (en) Access request processing method and apparatus, and computer system
CN103106286A (en) Method and device for managing metadata
WO2013075306A1 (en) Data access method and device
CN113590612A (en) Construction method and operation method of DRAM-NVM (dynamic random Access memory-non volatile memory) hybrid index structure
CN113312300A (en) Nonvolatile memory caching method integrating data transmission and storage
CN111414320B (en) Method and system for constructing disk cache based on nonvolatile memory of log file system
CN114416646A (en) Data processing method and device of hierarchical storage system
US20120317384A1 (en) Data storage method
CN118069043A (en) High-performance data storage software management method
CN111443874A (en) Solid-state disk memory cache management method and device based on content awareness and solid-state disk
CN116226232A (en) Persistent memory data storage method and system for distributed database
KR101946135B1 (en) Database management system and method thereof using a non-volatile memory
CN111274456B (en) Data indexing method and data processing system based on NVM (non-volatile memory) main memory
CN117349477A (en) Graph data heterogeneous hierarchical storage structure based on persistent memory and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination