CN113568582A

CN113568582A - Data management method and device and storage equipment

Info

Publication number: CN113568582A
Application number: CN202110871528.6A
Authority: CN
Inventors: 吴小雄
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-10-29
Anticipated expiration: 2041-07-30
Also published as: CN113568582B

Abstract

The application provides a data management method, a data management device and a storage device, index associated information for constructing a logic object, a logic block and a storage unit is adopted, data at any position can be indexed by the index associated information, efficient metadata representation is achieved, and compared with the existing mode of using a path or a URL character string as an identifier, the storage space occupied by metadata is greatly reduced. In addition, the variable metadata and the immutable metadata are stored separately, so that the confusion of the metadata is avoided, and compared with the method of recording addresses in the metadata in the prior art, the method of using the positions, namely the addresses, for the variable metadata further reduces the amount of the metadata.

Description

Data management method and device and storage equipment

Technical Field

The invention relates to the technical field of storage, in particular to a data management method, a data management device and storage equipment.

Background

Shingled (SMR) disks employ a technique of track overlap, greatly increasing disk capacity. The conventional SMR disk is characterized in that: a random writing area and a sequential writing area, wherein the random writing area has a small capacity, the typical size is one zone, 256MB, and the other zones can only be sequentially additionally written.

Where the random write area is typically used to cover the authoring swap space. When the SMR disk is used as a general HDD (hard disk drive) disk, if the SMR disk is not processed to directly overwrite, the disk controller itself reads the data of the adjacent track to be overwritten and stores the data in the swap space, writes the overwrite write data requested by the host, and finally rewrites the data previously stored in the swap space. This results in very poor write performance of the SMR.

Existing improvements include, for example, adding additional SSD disks for storing metadata, or organizing disk metadata using class inodes and bitmaps, or storing metadata and data in a mixed manner in a data storage area. The conventional improvement method has the problems of increased system cost, disordered metadata, large metadata amount and the like.

Disclosure of Invention

The invention aims to provide a data management method, a data management device and a storage device, which can realize efficient metadata representation and reduce the storage space occupied by metadata.

Embodiments of the invention may be implemented as follows:

in a first aspect, the present invention provides a data management method applied to a storage device, where a disk included in the storage device is divided into a variable metadata area, an immutable metadata area, and a data storage area, the data storage area is divided into a plurality of storage areas, and each storage area is divided into a plurality of storage units, the method including:

creating a logic intermediate layer, wherein the logic intermediate layer comprises a plurality of logic objects which are respectively used for identifying a section of storage space in the data storage area, each logic object is divided into a plurality of logic blocks, and the logic objects and the immutable metadata of each logic block are written into the immutable metadata area, wherein the storage units are in one-to-one correspondence with the logic blocks;

receiving a write instruction, and determining data to be written, a first logic object and a first logic block based on the write instruction;

determining a first storage unit according to the first logic object, and writing the data to be written into the first storage unit;

writing the variable metadata of the first storage unit into a variable metadata area, wherein the writing position of the variable metadata in the variable metadata area corresponds to the position of the first storage unit in a data storage area, and the variable metadata comprises a first logic object, a first logic block and index association information of the first storage unit.

In an alternative embodiment, the method further comprises:

receiving a reading instruction, and determining a second logic object and a second logic block based on the reading instruction;

determining a second storage unit corresponding to the second logical object and the second logical block according to the index association information in the variable metadata area;

and reading the required data indicated by the reading instruction from the second storage unit.

In an optional embodiment, the step of determining a second storage unit corresponding to the second logical object and the second logical block according to the index association information in the variable metadata area includes:

searching the serial numbers of second storage units corresponding to the second logic objects and the second logic blocks according to the index association information in the variable metadata area;

and determining the address of the second storage unit according to the starting address of the data storage area, the length of each storage unit and the sequence number.

In an optional embodiment, a relationship table containing a correspondence between a logical object and a storage area is further stored in the variable metadata area;

the step of determining a first storage location from the first logical object includes:

searching whether a storage area corresponding to the first logic object exists in the relation table;

if not, determining a storage area in an idle state from the plurality of storage areas, taking a storage unit indicated by a write pointer of the storage area in the idle state as a first storage unit, and writing a corresponding relation between the first logic object and the storage area in the idle state into the relation table;

if so, determining the storage area corresponding to the first logic object according to the corresponding relation, and taking the storage unit indicated by the write pointer of the corresponding storage area as the first storage unit.

In an alternative embodiment, the index association information contains a variable metadata field of a logical block;

the method further comprises the following steps:

receiving a control instruction, wherein the control instruction is a closing instruction or a deleting instruction;

determining a third logical object based on the control instruction, and marking a variable metadata field of a first logical block contained in the third logical object as a state indicated by the control instruction;

and deleting the corresponding relation between the third logic object and the storage area in the relation table.

In an alternative embodiment, the immutable metadata region is divided into a plurality of metadata storage areas, and the immutable metadata of the logical object and the logical block is additionally written into the immutable metadata region in a cache assembled to a page size each time;

the method further comprises the following steps:

when the metadata storage area is fully written, compressing and merging the effective immutable metadata of the logic object and the logic block contained in each cache in the metadata storage area;

and writing the compressed and combined immutable metadata into the rest metadata storage area in an idle state, and releasing the data in the metadata storage area.

In an alternative embodiment, the method further comprises:

detecting whether the metadata storage area is full and the proportion of the contained effective immutable metadata relative to the total capacity of the metadata storage area is lower than a preset threshold value or not aiming at the metadata storage area written with the compressed and merged immutable metadata;

if yes, the effective immutable metadata in the metadata storage area is relocated to the rest metadata storage areas with the residual storage space, and the original metadata storage area is released.

In an alternative embodiment, the method further comprises:

after the disk is restarted, reading the variable metadata in the variable metadata area, and recovering to obtain index associated information based on the read variable metadata;

and acquiring a write-in pointer of the immutable metadata area, reading the immutable metadata to a memory based on the length of the write-in pointer when the write-in pointer is not 0, and recovering to obtain all immutable metadata according to the head information of each immutable metadata.

In a second aspect, the present invention provides a data management apparatus, applied to a storage device, where a disk included in the storage device is divided into a variable metadata area, an immutable metadata area, and a data storage area, the data storage area is divided into a plurality of storage areas, and each storage area is divided into a plurality of storage units, the apparatus including:

a creating module, configured to create a logic intermediate layer, where the logic intermediate layer includes a plurality of logic objects respectively used for identifying a segment of storage space in the data storage area, each logic object is divided into a plurality of logic blocks, and the immutable metadata of each logic object and each logic block is written into the immutable metadata area, where the storage units and the logic blocks are in one-to-one correspondence;

the receiving module is used for receiving a writing instruction and determining data to be written, a first logic object and a first logic block based on the writing instruction;

a writing module, configured to determine a first storage unit according to the first logical object, write the data to be written into the first storage unit, and write variable metadata of the first storage unit into a variable metadata area, where a writing position of the variable metadata in the variable metadata area corresponds to a position of the first storage unit in a data storage area, and the variable metadata includes index association information of the first logical object, the first logical block, and the first storage unit.

In a third aspect, the present invention provides a storage device comprising one or more storage media and one or more processors in communication with the storage media, the one or more storage media storing processor-executable machine-executable instructions that, when executed by the processors, perform the method steps of any one of the preceding embodiments.

The beneficial effects of the embodiment of the invention include, for example:

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic view of an application scenario of a data management method according to an embodiment of the present application;

fig. 2 is a flowchart of a data management method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a logic interlayer provided in an embodiment of the present application;

FIG. 4 is a diagram illustrating a conventional magnetic disk;

FIG. 5 is a block diagram of a magnetic disk provided in an embodiment of the present application;

FIG. 6 is another block diagram of a magnetic disk provided in an embodiment of the present application;

FIG. 7 is a flowchart of sub-steps included in step S130 of FIG. 2;

FIG. 8 is a flowchart of a reading method according to an embodiment of the present disclosure;

FIG. 9 is a flow chart of a control response method provided in an embodiment of the present application;

FIG. 10 is a flow chart of a data compression method provided by an embodiment of the present application;

FIG. 11 is a diagram illustrating data compression and merging provided by an embodiment of the present application;

FIG. 12 is a flowchart of a data migration method according to an embodiment of the present application;

FIG. 13 is a flowchart of a data recovery method according to an embodiment of the present application;

fig. 14 is a block diagram of a storage device according to an embodiment of the present application;

fig. 15 is a functional block diagram of a data management apparatus according to an embodiment of the present application.

Icon: 100-a processor; 200-a memory; 300-a data management device; 310-a creation module; 320-a receiving module; 330-write module.

Detailed Description

In the prior art, an improvement scheme for disk write performance mainly includes, for example, additionally configuring an SSD disk for storing metadata of an SMR disk, and sequentially writing data on the SMR disk. Due to the addition of disks, the metadata of SMR disks can be organized at will. In addition, another scheme uses a random write area to store the disk metadata of the SMR disk, and data storage areas are sequentially additionally written, and the disk metadata is still organized in a manner of a similar inode and a bitmap. There are also storage schemes that mix metadata and data in a data storage area, or propose a key-value type of storage scheme dedicated to leveldb.

In the above existing improvement scheme, the way of additionally configuring the SSD disk increases the system cost due to the need to introduce an additional SSD medium. The scheme of storing the disk metadata of the SMR disk by adopting the random writing area is limited by the disk with larger preset capacity of the random writing area and not larger capacity of the data storage area. Due to the adoption of a metadata management method similar to inode and bitmap, the file continuity is poor, and the extensibility is poor when the method is applied to a distributed scene. In practical implementation, the random write area is not sufficient to store the metadata of the disk, or a single metadata is used to manage the capacity with a larger granularity, which easily results in a larger waste of space.

In addition, the scheme of mixing the metadata in the data block for storage is easy to cause the inconsistency of the metadata, and when the fault is restarted, the metadata is loaded slowly and the data storage area needs to be read. The current key-value type storage scheme has the problem of relatively limited use scenarios.

Based on the research findings, the data management scheme provided by the application adopts the index associated information for constructing the logical object, the logical block and the storage unit, can index data at any position by using the index associated information, and can realize efficient metadata representation, so that compared with the existing mode of using a path or a URL character string as an identifier, the storage space occupied by metadata is greatly reduced. In addition, the variable metadata and the immutable metadata are stored separately, so that the confusion of the metadata is avoided, and compared with the method of recording addresses in the metadata in the prior art, the method of using the positions, namely the addresses, for the variable metadata further reduces the amount of the metadata.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that the terms "first", "second", etc. are used only for distinguishing between descriptions and are not intended to indicate or imply relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

Referring to fig. 1, a schematic view of an application scenario of a data management method provided in an embodiment of the present application is shown, where the application scenario includes a storage device and a plurality of clients, where each client may communicate with the storage device to implement transmission and interaction of data and information.

Each client can communicate with and write data to the storage device, and can also read data from the storage device. The storage device may be a storage device containing shingled SMR disks.

With reference to fig. 2, an embodiment of the present application further provides a data management method applicable to the storage device. Wherein the method steps defined by the flow relating to the data management method may be implemented by said storage device. In this embodiment, a disk included in the storage device may be divided into a variable metadata area, an immutable metadata area, and a data storage area, where the data storage area is divided into a plurality of storage areas, and each storage area is divided into a plurality of storage units. The specific process shown in fig. 2 will be described in detail below.

Step S110, a logic intermediate layer is created, where the logic intermediate layer includes a plurality of logic objects respectively used for identifying a segment of storage space in the data storage area, each logic object is divided into a plurality of logic blocks, and the immutable metadata of each logic object and each logic block is written into the immutable metadata area, where the storage units and the logic blocks are in one-to-one correspondence.

Step S120, receiving a write command, and determining data to be written, a first logic object, and a first logic block based on the write command.

Step S130, determining a first storage unit according to the first logical object, and writing the data to be written into the first storage unit.

Step S140, writing the variable metadata of the first storage unit into a variable metadata area, where the writing position of the variable metadata in the variable metadata area corresponds to the position of the first storage unit in the data storage area, and the variable metadata includes the index association information of the first logical object, the first logical block, and the first storage unit.

In this embodiment, a logical intermediate layer is introduced first, an entity of the logical intermediate layer is a logical object, each logical object may be identified by a 64-bit shaping number, and the shaping number identification of each logical object is globally unique and may be used to uniquely identify a segment of storage space in a data storage area. Each logical object can be divided into a plurality of logical block blocks according to a fixed length, wherein the size of each logical block can be specifically set according to actual conditions. Thus, a logical object can be composed of n (n is greater than or equal to 1) logical blocks.

Each logical object has corresponding id information, which is denoted as object _ id, and each logical block in the logical object corresponds to an index block _ idx, where block _ idx represents a sequence number of the logical block in the logical object. The data of any one logical block can be uniquely determined by (object _ id, block _ idx).

As shown in fig. 3, the offset-1572864 position is calculated to be located at block _ idx-1572864/1048576-1, and the position of offset-1572864 can be represented by (object _ id-0 x1234567876543210, block _ idx-1, and offset-524288). The piece of data whose offset is 1572864 in the diagram can be indexed by (object _ id is 0x1234567876543210, block _ idx is 1).

In this embodiment, when the logical intermediate layer is created, a disk management process in the disk device may receive a request for creating an object, where the request carries information such as EC information and stripe size. And the object _ id is generated by the disk management process and returned to the upper layer, and the upper layer reads and writes data through the object _ id subsequently. The process of creating the object is to generate a globally unique 64-bit integer object _ id as described above, then encapsulate the object _ id and parameters (including EC information, stripe size, etc.) in the request into a fixed structure, write the fixed structure in the immutable metadata area of the disk additionally, and then return the object _ id upward.

Since bare disks cannot directly manage data, there is still a need to organize and manage the disks. The structure of a conventional typical SMR disk is generally divided into two types of areas, a random read write area (C ZONE) and a sequential write area (SP/SR ZONE), as shown in fig. 4. Each ZONE is typically 256M, and the entire disk is divided into a number of equally large ZONEs (ZONEs).

In the present embodiment, the SMR disk has a structure as shown in fig. 5, and the disk is divided into a variable metadata area (variable meta data), an immutable metadata area (immutable meta data), and a data storage area (data). Wherein the variable metadata area corresponds to C ZONE of the SMR disk and is used for storing metadata that needs to be changed and modified frequently, such as CRC, written length of data storage area, data status, and the like. The immutable metadata area is used to sequentially write immutable portions of metadata, including immutable metadata for individual logical objects and individual logical blocks, such as object _ id, EC n + m, stripe size, and the like. Wherein the part of the metadata information of one object is not modified once generated.

The immutable metadata region is shown in fig. 5 to be composed of two sequentially written zones, and actually, the immutable metadata region includes zones that are not limited to two, and may be generally within 10 zones, which is merely an example in fig. 5.

And the data storage area is a data storage area supporting sequential writing, and the data storage area can be divided into a plurality of storage areas according to a fixed length, for example, each storage area is 256MB in size. And each storage area is divided into a plurality of storage units chunk, for example, each storage unit being 1MB in size. The storage units on one disk correspond to the logic blocks in the logic middle layer one by one.

Referring to fig. 6, the first part of fig. 6 is a variable metadata area with a size of one zone, that is, a random write area provided by a hard disk manufacturer with a size of 256 MB. The variable metadata area may be numbered in fixed length, and each fixed length may record metadata (variable metadata) of one storage unit, including, for example, object _ id, block _ idx, writen _ size, etc., and the writing may be performed in an overwriting manner.

The second part in fig. 6 is an immutable metadata area, which is composed of a plurality of sequentially written zones, and the number of which can be set in advance. The immutable metadata area in a typical 20TB disk may be divided into 10 zones. The area is saved until the content is the metadata immutable part of the object, such as object _ id, object _ size, etc., and the additional writing mode is available during writing.

The third part in fig. 6 is a data storage area, which provides a writing space for user data. The parts are numbered according to fixed length (one chunk), each fixed length corresponds to a preset block size, and a plurality of fixed lengths form a zone. When writing data, the inside of each zone can be sequentially additionally written.

The first part numbered memory location metadata is in one-to-one correspondence with the third part numbered memory locations in fig. 6, as shown, chunk1 in the first part (variable metadata) corresponds to the first 1MB chunk in the first zone of the third part, chunk157 corresponds to the first 1MB chunk in the second zone of the third part, and so on.

The object immutable metadata in the second portion is associated with the variable metadata information in the first portion by the object, so that one object's metadata is composed of the object immutable metadata of the second portion and the same variable metadata of the object _ id in the first portion.

On the basis of constructing a logic intermediate layer and dividing the disk structure, when data needs to be written, writing is performed by taking a logic block as granularity. The write command received by the storage device may include a corresponding logical object, such as an object _ id, which is referred to as a first logical object herein for convenience of distinction in this embodiment. In addition, an offset address offset in the first logical object is also included. According to the offset address and the size block _ size of the logical block in the first logical object, the first logical block _ idx pointed to by the write instruction can be obtained by calculating offset/block _ size.

Because data is written in each storage area of the data storage area in a sequential writing mode, after the first logic object is determined, the storage area needing to be written is determined, and then the first storage unit is determined to be used for writing the data to be written based on the writing condition of the determined storage area.

After the first storage unit writes the data to be written, the variable metadata of the first storage unit is written into the variable metadata area. In this embodiment, the variable metadata of the storage unit may be denoted as chunk _ table, and the write location of the variable metadata of the storage unit in the variable metadata area corresponds to the location of the storage unit in the data storage area, that is, the location is an address, and the location indicates the address of the storage unit in the data storage area, so that the amount of metadata is further reduced compared to a method of storing the address as a part of the metadata.

In this embodiment, the variable metadata of the storage unit with a certain position i may be indexed by the chunk _ table. Since the storage units correspond to the logical blocks one to one, the chunk _ table may substantially represent the index association information of the first logical object, the first logical block, and the first storage unit. For example, the indexing method may be hash (object _ id, idx) ═ meta (chunk [ i ]), which indicates that object _ id and block _ idx are keys, and chunk _ idx corresponds to variable metadata and establishes index association information as value.

In this embodiment, by using index association information for constructing the logical object, the logical block, and the storage unit, data at any position can be indexed by using the index association information, and efficient metadata representation is achieved, so that compared with the existing manner of using a path or a URL string as an identifier, the storage space occupied by metadata is greatly reduced. In addition, the variable metadata and the immutable metadata are stored separately, so that the confusion of the metadata is avoided, and compared with the method of recording addresses in the metadata in the prior art, the method of using the positions, namely the addresses, for the variable metadata further reduces the amount of the metadata.

When data is written, in order to improve performance, a large data content is usually split into a plurality of small blocks and sent to a storage device for writing. The client can split the data content according to the size of the logic block, divide the data content into a plurality of sub-requests and send the sub-requests to the storage device for writing. If one storage area has a logic object which is being written and is not closed, if a new logic object is to be written, only one storage area can be replaced for writing, and the replaced storage area has no logic object which is being written, so the writing behavior of the logic object is exclusive. Therefore, the continuity of the written data on the magnetic disk is guaranteed. For example, in FIG. 6, a scenario is shown in which two objects are written concurrently, with the two write lines indicating that object1 and object2 are both writing, object1 is writing on the first zone of the data storage area, and object2 is writing on the second zone of the data storage area.

In order to ensure the continuity of data on the disk, in this embodiment, a relationship table writing _ zone _ map is introduced, which is in the form of a map [ object _ id ] ═ zone _ id and indicates that a correspondence relationship between a logical object and a storage area is stored.

When data is written, whether a new object data is written or the same object data (data required to maintain data continuity) is determined by querying the relationship table. Therefore, referring to fig. 7, in the present embodiment, when determining the first storage unit according to the first logical object, the following steps may be performed:

step S131, searching whether the relation table has a storage area corresponding to the first logical object, if not, executing the following step S132, and if so, executing the following step S133.

Step S132 is executed to determine a storage area in an idle state from the plurality of storage areas, use the storage unit indicated by the write pointer of the storage area in the idle state as a first storage unit, and write the correspondence between the first logical object and the storage area in the idle state into the relationship table.

Step S133, determining a storage area corresponding to the first logical object according to the correspondence, and using a storage unit indicated by the write pointer of the corresponding storage area as a first storage unit.

The write command received by the storage device includes the first storage unit, and by querying the correspondence between the storage object and the storage area in the relationship table, it can be determined whether the write of the first storage unit is performed before, that is, whether the write of the current first storage unit has a requirement for data continuity can be indicated.

If the relation table has a storage area corresponding to the first logical object, indicating that the first logical object is written before, the object _ id of the first logical object can be used as a key to query the relation table to obtain the previously written storage area zone _ id. And then allocates the memory location chunk in the memory area as the first memory location to perform data writing.

Since the storage area is written in data sequence, after the storage area is determined, the storage unit indicated by the write pointer of the storage area can be used as the first storage unit.

In addition, if the relation table does not have a storage area corresponding to the first logical object, it indicates that a new object write is currently performed, for example, an object that has been closed after the previous write is performed or a block calculated based on the object _ id and the offset in the write request is the 0 th block of the object. In this case, a storage area in an idle state (i.e., a storage area occupied by no logical object) may be determined randomly or according to a preset policy from the plurality of storage areas, and the storage unit chunk is allocated in the storage area for data writing. The storage unit indicated by the write pointer of the storage area may be used as the first storage unit to perform data writing.

On this basis, in order to ensure the continuity of data writing of the object, the correspondence between the first logical object and the determined storage area may be written into the relationship table, that is, the writing _ zone _ map [ object _ id ] ═ zone _ id may be updated.

In addition, after completing data writing in any of the above cases, the written storage unit, and the index association information of the corresponding logical object and logical block may be written into chunk _ table, where object _ id and block _ idx are keys, chunk _ meta [ i ] is value, and i is the number of chunk in the data area. The corresponding field crc/written length in chunk _ table [ i ] is updated, and then the content of chunk _ meta [ i ] is written to the disk, where the writing position is the ith position in the invariant metadata area, that is, the address of lba ═ sizeof (chunk _ meta) × i.

In addition to enabling the writing of data, the client may also perform the reading of data from the storage device. Referring to fig. 8, in the present embodiment, the data reading can be implemented in the following manner:

step S210, receiving a reading instruction, and determining a second logic object and a second logic block based on the reading instruction;

step S220, determining a second storage unit corresponding to the second logic object and the second logic block according to the index association information in the variable metadata area;

in step S230, the required data indicated by the read instruction is read from the second storage unit.

The object _ id may be included in the read command received by the storage device, and is referred to as the second logical object herein for convenience of distinction. Further, offset addresses offset and len are included. And after the storage device receives the storage instruction, calculating according to the offset address and the length of the block and a calculation mode of offset/block _ size to obtain a corresponding block of the data to be read in the second logic object, namely block _ idx, and marking as the second logic block.

And querying the index association information chunk _ table in the variable metadata area by taking the object _ id and the block _ idx as keys to obtain corresponding metadata chunk _ meta [ i ]. The metadata chunk _ meta [ i ] may indicate a second storage location corresponding to the second logical object and the second logical block. Thereby the indicated desired data can be read from the second memory cell.

In detail, in this embodiment, the sequence number of the second storage unit corresponding to the second logical object and the second logical block may be found according to the index information in the variable metadata area, and the address of the second storage unit is determined according to the start address of the data storage area, the length of each storage unit, and the sequence number, so as to locate the second storage unit.

For example, the internal offset _ inner of block is calculated from offset% block _ size, the chunk sequence number i actually required to be read is obtained from chunk _ meta [ i ], and the address lba to be read is calculated as offset _ inner + data _ start _ addr + (chunk _ size i) from the start address data _ start _ addr of the data storage area, the length chunk _ size of the storage cell, and the sequence number i. And reading corresponding data with the lba position length of len, and returning the data to the client.

In addition, in this embodiment, the object may also be closed and deleted, and the index association information includes the variable metadata field of the logical block, and in detail, referring to fig. 9, the closing and deleting of the object may be implemented in the following manner.

Step S310, receiving a control instruction, wherein the control instruction is a closing instruction or a deleting instruction;

step S320, determining a third logical object based on the control instruction, and marking a variable metadata field of a first logical block included in the third logical object as a state indicated by the control instruction;

step S330, deleting the corresponding relationship between the third logical object and the storage area in the relationship table.

In this embodiment, when the received control instruction is a close instruction, the close instruction carries an object _ id, and for avoiding distinction, the close instruction is denoted as a third logical object here. The first logical block _ idx it contains is determined based on the third logical object. The chunk _ table may be queried with object _ id and block _ idx as keys, to obtain the corresponding variable metadata chunk _ meta [ i ], mark the variable metadata field of the logical chunk in the variable metadata as a closed state, and then update the variable metadata chunk _ meta [ i ] at the ith position in the variable metadata region.

When the received control instruction is a delete instruction, the first logical block _ idx contained in the control instruction is determined based on the third logical object. The chunk _ table may be queried with object _ id and block _ idx as keys, to obtain the corresponding variable metadata chunk _ meta [ i ], mark the variable metadata field of the logical chunk in the variable metadata as a deleted state, and then update the variable metadata chunk _ meta [ i ] at the ith position in the variable metadata region. When an object is deleted, it is assumed that the object is in the closed state.

In addition, after the object is closed or deleted, the corresponding relationship between the third logical object and the storage area in the relationship table may be deleted.

As can be seen from the above, the immutable metadata area is divided into a plurality of metadata storage areas, for example, 10 256MB zones. And the immutable metadata of the logical object and the logical block is additionally written to the immutable metadata area in a cache assembled to a page size at a time. That is, a single IO write is one page size, but the amount of valid data (i.e., immutable metadata) is less than one page, e.g., 20 bytes for a single immutable metadata size, which is typically 4kB for a page. If the data is not compressed, the address written each time is: 0,4096,8192, … …. This substantially results in written data being sparse and unnecessarily occupying storage space.

In the existing coping method, some methods implement data compression and merging by means of a disk controller, and during each writing, an address is designated as the tail of the previous valid data, for example, the three writing positions are respectively: 0,20,40 … …. In this way, it is difficult to ensure the stability of the writing performance, because only a few disk controllers have this function.

Referring to fig. 10, in the present embodiment, the following data compression method is adopted to solve the above problem:

step S410, when the metadata storage area is full, compress and merge the immutable metadata that is valid for the logical object and the logical block contained in each cache in the metadata storage area.

Step S420, writing the compressed and merged immutable metadata into the remaining metadata storage area in an idle state, and releasing the data in the metadata storage area.

In this embodiment, when the immutable metadata is written into each metadata storage area, writing is performed first according to the above sequential writing manner, that is, each writing address is: 0,4096,8192, … …. Since each write is page-aligned, when the immutable metadata is written, pages (4kB) are written in order, but one immutable metadata is extremely small and only tens of bytes, and therefore one page is written at a time, wherein the effective part is only tens of bytes. Since additional writing from the end of the page that was written last time is necessary in the next writing, the data is actually very sparse after the writing many times, depending on the characteristics of the sequential writing area.

A metadata storage area is 256MB in size and an immutable metadata size is calculated at 20 bytes, so that when a metadata storage area is full, the actual valid content is about 256MB/4KB 20-1.3 MB. Therefore, after a metadata storage area is full, the valid immutable metadata in the metadata storage area needs to be compressed and merged, and then written into a new metadata storage area in an idle state. After the data transfer, the data in the original metadata storage area can be released, and the metadata storage area can be used as a free area for data writing.

By the method, the sequential writing of data can be guaranteed, and the stability of writing performance is guaranteed.

If the metadata storage area is divided in different states, it may be divided into meta _ front _ writing _ zone and meta _ back _ merge _ zone, as shown in fig. 11. The meta _ front _ writing _ zone indicates that the foreground writes to the writing zone sequentially, and a write is reallocated after the writing zone is full, and the record is written in the sparse zone every time an object is created in the foreground. The meta _ back _ merge _ zone is a background merge zone, and after the foreground sequential write zone is fully written, the valid data in the foreground sequential write zone is compressed and merged and then written into an idle background merge zone, which is a compact zone.

As can be seen from the above, as the immutable metadata area is written continuously, the number of meta _ back _ merge _ zones increases, and actually, a 20TB disk requires about 2 metadata storage areas when the data is full. In practical implementation, the requirement can be completely satisfied theoretically by reserving a certain number of zones as metadata storage areas, for example, 10 zones. Thus, 2 merge zones +1 writing zones +1 to be merged zones, 4 zones are sufficient to contain one full capacity write of a 20TB disk. Therefore, when the disk is full, the data is deleted, new immutable metadata is created, and new object metadata is essentially additionally written in the writing zone order of the immutable metadata area. So 10 zones are sufficient for one disk to write the resulting metadata twice to full capacity. In addition, 2 zones may be reserved to perform GC (Garbage collection).

The GC process is divided into garbage collection of metadata and garbage collection of a data storage area. Metadata garbage collection mainly relocates compact and full metadata storage areas, and in detail, referring to fig. 12, can be achieved by:

step S510, for the metadata storage area written with compressed and merged immutable metadata, detecting whether the metadata storage area is full and the ratio of the contained valid immutable metadata to the total capacity of the metadata storage area is lower than a preset threshold, if yes, executing the following step S520.

Step S520, the valid immutable metadata in the metadata storage area is migrated to the remaining metadata storage area with the remaining storage space, and the original metadata storage area is released.

In addition, the garbage collection mechanism of the data storage area is similar to the garbage collection mechanism for the metadata, and the details are not described herein in this embodiment.

In the implementation process, a situation that a failure may occur to cause a disk restart may occur, and in this embodiment, data recovery may be performed in the following manner, referring to fig. 13.

Step S610, after the disk is restarted, reading the variable metadata in the variable metadata area, and recovering to obtain index associated information based on the read variable metadata;

step S620, obtaining a write pointer of the immutable metadata area, reading the immutable metadata to a memory based on the length of the write pointer when the write pointer is not 0, and recovering to obtain all immutable metadata according to the header information of each immutable metadata.

In this embodiment, after the disk is restarted in the event of a failure in the running process, the system state can be recovered by reading the variable metadata and the immutable metadata of the disk.

The variable metadata area is the first randomly written zone, and the data of the variable metadata area can be directly read in the whole volume and establish a chunk _ table.

The write pointer of the immutable metadata area can be obtained firstly, the operation is not carried out when the write pointer is 0, and the head can be read firstly when the write pointer is not 0, so that whether the metadata storage area is compact or sparse can be determined. If the zone is sparse and in the write-full state, the zone is indicated to wait for compression merging, and if the zone is not in the write-full state, the zone is a writing zone which is being written in sequence as a foreground. If the compact type is adopted, all the immutable metadata are read into the memory at one time according to the length of the write pointer of the zone. And recovers all the immutable metadata of the object according to the header information of each immutable metadata.

In this embodiment, the recovery of the GC progress can be prevented from being repeatedly moved by a checkpoint mode.

In this embodiment, the effective length of each object can be determined by acquiring the write pointers and chunk _ table information of all the storage areas of the data storage area. All the objects of the abnormal interrupt need to be closed, and the writing cannot be directly continued. When data needs to be written, a new object can be applied for implementation.

The scheme provided by the embodiment of the application is mainly introduced from the perspective of a method. To implement the above functions, it includes hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the exemplary method steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the storage device may be divided into the functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

Fig. 14 is a schematic structural diagram of a storage device according to an embodiment of the present application. The memory device may be used to perform the functions performed by the memory device in any of the embodiments described above. The storage device may include a memory 200 and a processor 100, the memory 200 may include an SMR disk, and the memory 200 may have machine-executable instructions disposed therein, the machine-executable instructions including a data management apparatus 300.

Wherein, the memory 200 is electrically connected with the processor 100 directly or indirectly to realize the data transmission or interaction. For example, they may be electrically connected to each other via one or more communication buses or signal lines. The data management apparatus 300 includes at least one software functional module that can be stored in the memory 200 in the form of software or firmware (firmware). The processor 100 is configured to execute executable computer programs stored in the memory 200, such as software functional modules and computer programs included in the data management apparatus 300, so as to implement the data management method provided by the embodiment of the present application.

Alternatively, the Processor 100 may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), a System on Chip (SoC), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

It will be appreciated that the configuration shown in fig. 14 is merely illustrative, and that the storage device may also include more or fewer components than shown in fig. 14, or have a different configuration than shown in fig. 14, for example, and may also include a communication unit for information interaction with other devices.

Referring to fig. 15, a data management apparatus 300 according to an embodiment of the present disclosure includes a creating module 310, a receiving module 320, and a writing module 330.

A creating module 310, configured to create a logical intermediate layer, where the logical intermediate layer includes a plurality of logical objects respectively used for identifying a section of storage space in the data storage area, each logical object is divided into a plurality of logical blocks, and the immutable metadata of each logical object and each logical block is written into the immutable metadata area, where the storage units and the logical blocks are in one-to-one correspondence.

In this embodiment, the creating module 310 may be configured to execute step S110 shown in fig. 2, and reference may be made to the foregoing description of step S110 for relevant contents of the creating module 310.

The receiving module 320 is configured to receive a write instruction, and determine data to be written, a first logical object, and a first logical block based on the write instruction.

In this embodiment, the receiving module 320 may be configured to perform step S120 shown in fig. 2, and reference may be made to the foregoing description of step S120 for relevant contents of the receiving module 320.

A writing module 330, configured to determine a first storage unit according to the first logical object, write the data to be written into the first storage unit, and write variable metadata of the first storage unit into a variable metadata area, where a writing position of the variable metadata in the variable metadata area corresponds to a position of the first storage unit in a data storage area, and the variable metadata includes index association information of the first logical object, the first logical block, and the first storage unit.

In this embodiment, the writing module 330 may be configured to execute the steps S130 and S140 shown in fig. 2, and for the relevant content of the writing module 330, reference may be made to the description of the steps S130 and S140.

In a possible implementation, the data management apparatus 300 further includes a reading module, and the reading module may be configured to:

In a possible implementation, the reading module may be specifically configured to:

In a possible implementation manner, the variable metadata area further stores a relationship table containing a correspondence between a logical object and a storage area, and the writing module 330 may be configured to:

In a possible implementation, the index association information includes a variable metadata field of the logical block, and the data management apparatus 300 may further include a marking module, where the marking module may be configured to:

In a possible implementation, the immutable metadata region is divided into a plurality of metadata storage areas, the immutable metadata of the logical object and the logical block is additionally written into the immutable metadata region in a cache assembled to a page size each time, and the data management apparatus 300 further includes a compression module, which may be configured to:

In one possible implementation, the data management apparatus 300 further includes a recycling module, which may be configured to:

In one possible implementation, the data management apparatus 300 further includes a recovery module, which may be configured to:

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

In summary, according to the data management method, the data management apparatus, and the storage device provided in the embodiments of the present application, index association information for constructing a logical object, a logical block, and a storage unit is adopted, data at any position can be indexed by using the index association information, and efficient metadata representation greatly reduces a storage space occupied by metadata compared with an existing manner of using a path or a URL string as an identifier. In addition, the variable metadata and the immutable metadata are stored separately, so that the confusion of the metadata is avoided, and compared with the method of recording addresses in the metadata in the prior art, the method of using the positions, namely the addresses, for the variable metadata further reduces the amount of the metadata.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A data management method is applied to a storage device, a disk contained in the storage device is divided into a variable metadata area, an immutable metadata area and a data storage area, the data storage area is divided into a plurality of storage areas, and each storage area is divided into a plurality of storage units, and the method comprises the following steps:

2. The data management method of claim 1, wherein the method further comprises:

3. The data management method according to claim 2, wherein the step of determining the second storage unit corresponding to the second logical object and the second logical block based on the index association information in the variable metadata area comprises:

4. The data management method according to claim 1, wherein a relationship table containing a correspondence between logical objects and storage areas is further stored in the variable metadata area;

5. The data management method according to claim 4, wherein the index association information contains a variable metadata field of the logical block;

the method further comprises the following steps:

6. The data management method according to claim 1, wherein the immutable metadata area is divided into a plurality of metadata storage areas, and immutable metadata of the logical object and the logical block is additionally written to the immutable metadata area in a cache assembled in a page size at a time;

the method further comprises the following steps:

7. The data management method of claim 6, wherein the method further comprises:

8. The data management method of claim 1, wherein the method further comprises:

9. A data management apparatus, which is applied to a storage device, where a disk included in the storage device is divided into a variable metadata area, an immutable metadata area, and a data storage area, the data storage area is divided into a plurality of storage areas, and each of the storage areas is divided into a plurality of storage units, the apparatus comprising:

10. A storage device comprising one or more storage media and one or more processors in communication with the storage media, the one or more storage media storing processor-executable machine-executable instructions that, when executed by the processor, perform the method steps of any of claims 1-8.