CN113568582B

CN113568582B - Data management method, device and storage equipment

Info

Publication number: CN113568582B
Application number: CN202110871528.6A
Authority: CN
Inventors: 吴小雄
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2023-05-26
Anticipated expiration: 2041-07-30
Also published as: CN113568582A

Abstract

The application provides a data management method, a device and a storage device, wherein index association information of a logic object, a logic block and a storage unit is constructed, data at any position can be indexed by using the index association information, and metadata is efficiently represented. In addition, the variable metadata and the non-variable metadata are stored separately, so that disorder of the metadata is avoided, and compared with the prior art, the method of recording addresses in the metadata is adopted, the method of using the positions, namely the addresses, for the variable metadata, and the metadata quantity is further reduced.

Description

Data management method, device and storage equipment

Technical Field

The present invention relates to the field of storage technologies, and in particular, to a data management method, apparatus, and storage device.

Background

Shingled (SMR) disks employ a track-overlapping technique that greatly increases disk capacity. The characteristics of a conventional SMR disk are: a random writing area and a sequential writing area, wherein the random writing area has small capacity, and is typically one zone,256MB, and the rest zones can only be sequentially added for writing.

Wherein the random writing area is typically used to overwrite the swap space. When the SMR disk is used as a normal HDD (hard disk drive) disk, if direct overwrite is not handled, the disk controller itself reads the data of the adjacent tracks to be overwritten and saves them to the swap space, rewrites the overwrite data requested by the host, and rewrites the data previously saved to the swap space. This results in very poor writing performance of the SMR.

Improvements that exist today include, for example, adding SSD disks in addition for storing metadata, or organizing disk metadata using inodes and bitmaps, or saving metadata and data in a mix in the data store. In the existing improvement mode, the problems of system cost increase, metadata confusion, large metadata quantity and the like exist.

Disclosure of Invention

The invention aims at providing a data management method, a data management device and a storage device, which can realize efficient metadata representation and reduce the storage space occupied by metadata.

Embodiments of the invention may be implemented as follows:

in a first aspect, the present invention provides a data management method applied to a storage device, where a disk included in the storage device is divided into a variable metadata area, a non-variable metadata area, and a data storage area, where the data storage area is divided into a plurality of storage areas, and each storage area is divided into a plurality of storage units, the method includes:

Creating a logic middle layer, wherein the logic middle layer comprises a plurality of logic objects which are respectively used for identifying a section of storage space in the data storage area, each logic object is divided into a plurality of logic blocks, each logic object and the non-variable metadata of each logic block are written into the non-variable metadata area, and the storage units are in one-to-one correspondence with the logic blocks;

receiving a writing instruction, and determining data to be written, a first logic object and a first logic block based on the writing instruction;

determining a first storage unit according to the first logic object, and writing the data to be written into the first storage unit;

and writing the variable metadata of the first storage unit into a variable metadata area, wherein the writing position of the variable metadata in the variable metadata area corresponds to the position of the first storage unit in a data storage area, and the variable metadata comprises a first logic object, a first logic block and index association information of the first storage unit.

In an alternative embodiment, the method further comprises:

receiving a read instruction, and determining a second logic object and a second logic block based on the read instruction;

Determining a second storage unit corresponding to the second logical object and the second logical block according to the index association information in the variable metadata area;

and reading the required data indicated by the reading instruction from the second storage unit.

In an alternative embodiment, the step of determining a second storage unit corresponding to the second logical object and the second logical block according to the index association information in the variable metadata area includes:

searching the sequence numbers of the second storage units corresponding to the second logic objects and the second logic blocks according to the index association information in the variable metadata area;

and determining the address of the second storage unit according to the starting address of the data storage area, the length of each storage unit and the sequence number.

In an alternative embodiment, the variable metadata area further stores a relationship table containing correspondence between logical objects and storage areas;

the step of determining a first storage unit according to the first logic object includes:

searching whether the relation table has a storage area corresponding to the first logic object or not;

if the storage unit is not in the idle state, determining a storage area in the idle state from the plurality of storage areas, taking a storage unit indicated by a write pointer of the storage area in the idle state as a first storage unit, and writing a corresponding relation between a first logic object and the storage area in the idle state into the relation table;

If so, determining a storage area corresponding to the first logic object according to the corresponding relation, and taking a storage unit indicated by a write pointer of the corresponding storage area as a first storage unit.

In an alternative embodiment, the index association information comprises a variable metadata field of a logical block;

the method further comprises the steps of:

receiving a control instruction, wherein the control instruction is a closing instruction or a deleting instruction;

determining a third logic object based on the control instruction, and marking a variable metadata field of a first logic block contained in the third logic object as a state indicated by the control instruction;

and deleting the corresponding relation between the third logic object and the storage area in the relation table.

In an alternative embodiment, the immutable metadata area is divided into a plurality of metadata storage areas, and the immutable metadata of the logical objects and the logical blocks are additionally written into the immutable metadata area with a buffer memory assembled into a page size at a time;

the method further comprises the steps of:

when the metadata storage area is fully written, compressing and merging the logic objects and the logic blocks contained in each cache in the metadata storage area, wherein the logic objects and the logic blocks are effective and invariable metadata;

Writing the compressed and combined invariable metadata into the rest metadata storage areas in idle states, and releasing the data in the metadata storage areas.

In an alternative embodiment, the method further comprises:

for a metadata storage area in which the compressed and combined non-variable metadata is written, detecting whether the ratio of the effective non-variable metadata which is fully written in the metadata storage area to the total capacity of the metadata storage area is lower than a preset threshold value;

if yes, the effective invariable metadata in the metadata storage area is moved to the rest metadata storage area with the rest storage space, and the original metadata storage area is released.

In an alternative embodiment, the method further comprises:

after restarting the disk, reading the variable metadata in the variable metadata area, and recovering to obtain index association information based on the read variable metadata;

and acquiring a write pointer of the non-variable metadata area, reading the non-variable metadata to a memory based on the length of the write pointer when the write pointer is not 0, and recovering all the non-variable metadata according to the header information of each non-variable metadata.

In a second aspect, the present invention provides a data management apparatus applied to a storage device, where a disk included in the storage device is divided into a variable metadata area, a non-variable metadata area, and a data storage area, the data storage area is divided into a plurality of storage areas, and each of the storage areas is divided into a plurality of storage units, the apparatus comprising:

the creation module is used for creating a logic middle layer, the logic middle layer comprises a plurality of logic objects which are respectively used for identifying a section of storage space in the data storage area, each logic object is divided into a plurality of logic blocks, each logic object and the non-variable metadata of each logic block are written into the non-variable metadata area, and the storage units and the logic blocks are in one-to-one correspondence;

the receiving module is used for receiving a writing instruction, and determining data to be written, a first logic object and a first logic block based on the writing instruction;

the writing module is used for determining a first storage unit according to the first logic object, writing the data to be written into the first storage unit, writing variable metadata of the first storage unit into a variable metadata area, wherein the writing position of the variable metadata in the variable metadata area corresponds to the position of the first storage unit in a data storage area, and the variable metadata comprises index association information of the first logic object, a first logic block and the first storage unit.

In a third aspect, the present invention provides a storage device comprising one or more storage media and one or more processors in communication with the storage media, the one or more storage media storing processor-executable machine-executable instructions that, when the storage device is operating, are executed by the processor to perform the method steps of any of the preceding embodiments.

The beneficial effects of the embodiment of the invention include, for example:

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an application scenario schematic diagram of a data management method provided in an embodiment of the present application;

FIG. 2 is a flowchart of a data management method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a logic middle layer provided in an embodiment of the present application;

FIG. 4 is a block diagram of a conventional magnetic disk;

FIG. 5 is a block diagram of a magnetic disk according to an embodiment of the present application;

FIG. 6 is another block diagram of a magnetic disk according to an embodiment of the present application;

FIG. 7 is a flow chart of sub-steps included in step S130 of FIG. 2;

FIG. 8 is a flowchart of a reading method according to an embodiment of the present disclosure;

FIG. 9 is a flowchart of a control response method provided in an embodiment of the present application;

FIG. 10 is a flowchart of a data compression method according to an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of data compression merging according to an embodiment of the present disclosure;

FIG. 12 is a flowchart of a data migration method according to an embodiment of the present disclosure;

FIG. 13 is a flowchart of a data recovery method according to an embodiment of the present disclosure;

FIG. 14 is a block diagram of a storage device according to an embodiment of the present disclosure;

fig. 15 is a functional block diagram of a data management device according to an embodiment of the present application.

Icon: a 100-processor; 200-memory; 300-data management means; 310-creating a module; 320-a receiving module; 330-write module.

Detailed Description

In the prior art, the improvement of the disk writing performance mainly comprises, for example, additionally configuring an SSD disk for storing metadata of an SMR disk, and sequentially writing data on the SMR disk. The metadata of the SMR disk may be organized at will, as a result of the additional disk addition. In addition, there are schemes that use random write areas to store the disk metadata of the SMR disk, and data storage areas are sequentially additionally written, and the disk metadata is still organized by using a class inode and a bitmap. There are also schemes for mixing metadata and data into the data store, or for proposing key-value types specific to the leveldb.

In the above-mentioned existing improvement, the manner of additionally configuring the SSD disk increases the system cost due to the need of introducing additional SSD media. The scheme of storing the disc metadata of the SMR disc by adopting the random writing area is limited by discs with larger preset capacity and smaller data storage area capacity of the random writing area. Due to the adoption of metadata management methods similar to inodes and bitmaps, file continuity is poor, and expansibility applied to a distributed scene is poor. Also, in practical implementations, the random writing area is insufficient to hold disk metadata, or a larger granularity of capacity is managed using a single metadata, which tends to result in a larger waste of space.

In addition, the scheme of mixing metadata into data blocks for storage is easy to cause metadata inconsistency, and when a fault is restarted, metadata is slowly loaded, and a data storage area is required to be read. The current key-value type storage scheme has the problem of limited use situations.

Based on the research findings, the application provides a data management scheme, wherein index association information of a logic object, a logic block and a storage unit is constructed, data at any position can be indexed by using the index association information, and metadata is efficiently represented. In addition, the variable metadata and the non-variable metadata are stored separately, so that disorder of the metadata is avoided, and compared with the prior art, the method of recording addresses in the metadata is adopted, the method of using the positions, namely the addresses, for the variable metadata, and the metadata quantity is further reduced.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the present invention, it should be noted that, if the terms "first," "second," and the like are used merely to distinguish the descriptions, they are not to be construed as indicating or implying relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

Referring to fig. 1, an application scenario diagram of a data management method provided in an embodiment of the present application is provided, where the application scenario includes a storage device and a plurality of clients, where each client may communicate with the storage device to implement transmission and interaction of data and information.

Each client may communicate with and write data to the storage device, and each client may also read data from the storage device. The storage device may be a storage device that contains shingled SMR disks.

With reference to fig. 2, an embodiment of the present application further provides a data management method applicable to the above storage device. Wherein the method steps defined by the flow related to the data management method can be implemented by the storage device. In this embodiment, a disk included in the storage device may be divided into a variable metadata area, a non-variable metadata area, and a data storage area, where the data storage area is divided into a plurality of storage areas, and each storage area is divided into a plurality of storage units. The specific flow shown in fig. 2 will be described in detail.

Step S110, a logic middle layer is created, wherein the logic middle layer comprises a plurality of logic objects which are respectively used for identifying a section of storage space in the data storage area, each logic object is divided into a plurality of logic blocks, each logic object and the non-variable metadata of each logic block are written into the non-variable metadata area, and the storage units and the logic blocks are in one-to-one correspondence.

Step S120, receiving a write instruction, and determining data to be written, a first logic object and a first logic block based on the write instruction.

And step S130, determining a first storage unit according to the first logic object, and writing the data to be written into the first storage unit.

And step S140, writing the variable metadata of the first storage unit into a variable metadata area, wherein the writing position of the variable metadata in the variable metadata area corresponds to the position of the first storage unit in a data storage area, and the variable metadata comprises a first logic object, a first logic block and index association information of the first storage unit.

In this embodiment, a logical middle layer is first introduced, where the entity of the logical middle layer is a logical object, each logical object may be identified by a 64-bit shaping number, and the shaping number of each logical object identifies a global unique number, which may be used to uniquely identify a section of storage space in the data storage area. Each logic object can be divided into a plurality of logic blocks according to a fixed length, wherein the size of each logic block can be specifically set according to actual conditions. Thus, a logical object may be composed of n (n is 1 or more) logical blocks.

Each logical object has corresponding id information, which is marked as object_id, and each logical block in the logical object corresponds to an index block_idx, wherein the block_idx represents the sequence number of the logical block in the logical object. The data of any one logical block can be uniquely determined by (object_id, block_idx).

As shown in fig. 3, the position of offset= 1572864 is calculated first to be the block_idx= 1572864/1048576 =1, and then the position of offset= 1572864 can be represented by (object_id=0x12345678776543210, block_idx=1, and offset=524288). This piece of data of offset= 1572864 in the drawing can be indexed by (object_id=0x123456787543210, block_idx=1).

In this embodiment, when creating the logical middle layer, the disk management process in the disk device may receive a request for creating an object, where the request carries EC information, stripe size, and other information. Wherein the object_id is generated by the disk management process and returned to the upper layer, which subsequently reads and writes data through the object_id. The process of creating an object is to generate a globally unique 64-bit integer object_id, package the object_id and parameters (including EC information, stripe size, etc.) in the request into a fixed structure, append the immutable metadata area written to the disk, and return the object_id upwards.

Since bare disks cannot directly manage data, there is still a need to organize and manage disks. The structure of an existing typical SMR disk is generally divided into two types of ZONEs, a random read-write ZONE (C ZONE) and a sequential write ZONE (SP/SR ZONE), as shown in fig. 4. Each ZONE is typically 256M and the entire disk is divided into a plurality of equally large ZONEs (ZONEs).

In the present embodiment, the structure of the SMR disc is divided into a variable metadata area (variable meta data), an invariable metadata area (invariable meta data), and a data storage area (data), as shown in fig. 5. Where the variable metadata ZONE corresponds to the C ZONE of the SMR disk, it may be used to hold metadata that requires frequent modification, such as CRC, data storage written length, data status, etc. The immutable metadata area is used for sequentially writing immutable parts of metadata including immutable metadata of each logical object and each logical block, such as object_id, EC n+m, stripe size, and the like. Wherein the part of the metadata information of an object is not modified once generated.

The non-variable metadata area shown in fig. 5 is composed of two sequentially written zones, and in practice, the non-variable metadata area contains zones not limited to two, but may be generally within 10 zones, and is merely an example in fig. 5.

While the data storage area is a data storage area supporting sequential writing, the data storage area may be divided into disks of a fixed length, for example, divided into a plurality of storage areas, for example, each storage area having a size of 256MB. And each memory area is divided into a plurality of memory cells chunk, for example, each memory cell is 1MB in size. The storage units on one disk are in one-to-one correspondence with the logic blocks in the logic middle layer.

Referring to fig. 6 in combination, the first part in fig. 6 is a variable metadata area, which is a zone size, that is, a random writing area provided by a hard disk manufacturer, and has a size of 256MB. The variable metadata area may be numbered in fixed lengths, and metadata (variable metadata) of one storage unit may be recorded for each fixed length, including, for example, object_id, block_idx, write_size, etc., and may be written in an overwrite manner.

The second part in FIG. 6 is an immutable metadata area consisting of a plurality of sequentially written zones, the number of which can be preset. The immutable metadata area in a typical 20TB disk may be divided into 10 zones. This area is saved to a metadata immutable part whose content is an object, such as an object_id, an object_size, etc., and an additional writing mode is only used when writing.

The third part in fig. 6 provides a writing space for user data for the data storage area. The part is numbered according to a fixed length (a chunk), each fixed length corresponds to a preset block size, and a plurality of fixed lengths form a zone. When writing data, each zone can only be sequentially added and written.

The first part number of memory location metadata in fig. 6 corresponds one-to-one with the third part number of memory locations, as shown, chunk1 in the first part (variable metadata) corresponds to chunk of the first 1MB in the first zone of the third part, chunk157 corresponds to chunk of the first 1MB in the second zone of the third part, and so on.

The non-variable metadata of the object in the second part is associated with the variable metadata information in the first part by object, so the metadata of one object consists of the same variable metadata of the object non-variable metadata of the second part and the object_id in the first part together.

On the basis of constructing a logic middle layer and dividing the disk structure, when data needs to be written, the writing is performed by taking a logic block as granularity. The write instruction received by the storage device may include a corresponding logical object, such as an object_id, which is referred to herein as a first logical object for ease of distinction in this embodiment. Further, an offset address offset in the first logical object is also included. The first logical block block_idx pointed to by the write instruction may be calculated offset/block_size based on the offset address and the size block_size of the logical block in the first logical object.

Because the data is written in each storage area of the data storage area in a sequential writing mode, after the storage area needing to be written is determined after the first logic object is determined, the first storage unit is determined to be used for writing the data to be written based on the writing condition of the determined storage area.

After the first storage unit writes the data to be written, the variable metadata of the first storage unit is written into the variable metadata area. In this embodiment, the variable metadata of the storage unit may be denoted as a chunk_table, where the writing position of the variable metadata of the storage unit in the variable metadata area corresponds to the position of the storage unit in the data storage area, that is, the address of the storage unit in the data storage area is indicated by the position in a manner of a position, that is, an address is used as a location, and the metadata amount is further reduced compared with a method of storing the address as a part of the metadata.

In this embodiment, the variable metadata of a storage unit with a position i may be indexed by a chunk_table. Because the storage units are in one-to-one correspondence with the logic blocks, the chunk_table substance can represent the first logic object, the first logic block and the index association information of the first storage unit. For example, the index manner may be hash (object_id, idx) =meta (chunk [ i ]), which means that index association information is created for value by using object_id and block_idx as keys and chunk_idx as corresponding variable metadata.

In this embodiment, index association information of a logical object, a logical block and a storage unit is constructed, and data at any position can be indexed by using the index association information, so that metadata is efficiently represented, and compared with the existing method of using a path or URL character string as an identifier, the storage space occupied by metadata is greatly reduced. In addition, the variable metadata and the non-variable metadata are stored separately, so that disorder of the metadata is avoided, and compared with the prior art, the method of recording addresses in the metadata is adopted, the method of using the positions, namely the addresses, for the variable metadata, and the metadata quantity is further reduced.

In order to improve performance when writing data, a larger data content is typically split into multiple fractional blocks and sent to a storage device for writing. The client can split the data content according to the size of the logic block, and divide the data content into a plurality of sub-requests to be sent to the storage device for writing. If one storage area has a logic object which is being written and is not closed, if a new logic object is to be written, only one storage area can be replaced for writing, and the replaced storage area has no logic object which is being written, so that the writing behavior of the logic object is exclusive. Thus, the continuity of the written data on the disk is ensured. For example, in FIG. 6, a scenario is shown where two objects are written concurrently, with two write lines representing that both object1 and object2 are being written, object1 being written on a first zone of the data store and object2 being written on a second zone of the data store, respectively.

In order to ensure the continuity of data on the disk, in this embodiment, a relation table writing_zone_map is also introduced, where the form is map [ object_id ] =zone_id, which indicates that the correspondence between the logical object and the storage area is stored.

At the time of data writing, by referring to the above-described relation table, it can be determined whether writing of data of one new object or writing of data of the same object (data required to maintain data continuity) is currently performed. Thus, referring to fig. 7, in this embodiment, when determining the first storage unit according to the first logical object, the following may be implemented:

step S131, searching whether the relation table has a storage area corresponding to the first logical object, if not, executing the following step S132, and if so, executing the following step S133.

Step S132, a storage area in an idle state is determined from a plurality of storage areas, a storage unit indicated by a write pointer of the storage area in the idle state is taken as a first storage unit, and a corresponding relation between a first logic object and the storage area in the idle state is written into the relation table.

Step S133, determining a storage area corresponding to the first logic object according to the corresponding relation, and taking a storage unit indicated by a write pointer of the corresponding storage area as a first storage unit.

The writing instruction received by the storage device includes a first storage unit, and by querying the corresponding relation between the storage object and the storage area in the relation table, whether the writing of the first storage unit is executed before the writing is determined, that is, whether the current writing of the first storage unit has the requirement of data continuity or not can be indicated.

If the relation table has a storage area corresponding to the first logic object, which indicates that the first logic object is executing writing before, the storage area zone_id of the previous writing can be queried from the relation table by using the object_id of the first logic object as a key. And then the memory unit chunk is allocated in the memory area as a first memory unit to perform data writing.

Since the storage area is written sequentially with data, after the storage area is determined, the storage unit indicated by the write pointer of the storage area may be taken as the first storage unit.

If the relation table does not have a storage area corresponding to the first logical object, it indicates that a new object is currently being written, for example, an object that has been closed after the previous writing is performed or a block calculated based on the object_id and the offset in the write request is the 0 th block of the object. In this case, a storage area in an idle state (i.e., a storage area occupied by no logical object) may be determined from a plurality of storage areas randomly or according to a preset policy, and a storage unit chunk is allocated in the storage area for data writing. The storage unit indicated by the write pointer of the storage area may be used as the first storage unit to write data.

On this basis, in order to ensure the continuity of the data writing of the object, the corresponding relationship between the first logical object and the determined storage area may be written into the relationship table, that is, the writing_zone_map [ object_id ] =zone_id is updated.

In addition, after the data writing is completed in any of the above cases, the written storage unit, the corresponding logical object, and the index related information of the logical block may be written into the chunk_table, where the object_id and the block_idx are used as keys, chunk_meta [ i ] is a value, and i is the number of the chunk in the data area. Updating the corresponding field crc/written length in the chunk_table [ i ], and then writing the content of chunk_meta [ i ] on the disk, wherein the writing position is the i-th position in the unchanged metadata area, namely the address of lba=sizeof (chunk_meta) ×i.

In addition to enabling writing of data, the client may also read data from the storage device. Referring to fig. 8, in this embodiment, the data reading can be implemented by the following ways:

step S210, receiving a reading instruction, and determining a second logic object and a second logic block based on the reading instruction;

step S220, determining a second storage unit corresponding to the second logic object and the second logic block according to the index association information in the variable metadata area;

Step S230, reading the required data indicated by the read instruction from the second storage unit.

The read instruction received by the storage device may include an object_id, referred to herein as a second logical object for ease of distinction. Further, offset addresses offset and len are also included. And the data is read by taking the block as granularity, and after the storage device receives a storage instruction, the storage device calculates the block corresponding to the data to be read in the second logic object, namely the block_idx, according to the offset address and the length of the block and in a calculation mode, and marks the block as a second logic block.

And inquiring index association information chunk_table in the variable metadata area by taking object_id and block_idx as keys to obtain corresponding metadata chunk_meta [ i ]. The metadata chunk_meta [ i ] may indicate a second storage location corresponding to a second logical object and a second logical block. The indicated desired data may then be read from the second storage unit.

In detail, in this embodiment, the sequence number of the second storage unit corresponding to the second logical object and the second logical block may be found according to the index information in the variable metadata area, and the address of the second storage unit may be determined according to the start address of the data storage area, the length of each storage unit, and the sequence number, so as to locate the second storage unit.

For example, the internal offset of block is calculated by offset% block_size, the hash number i to be actually read is obtained by hash_meta [ i ], and the address lba=offset_inner+data_start_addr+ (hash_size i) to be read is calculated from the start address data_start_addr of the data storage area, the length of the storage unit, and the sequence number i. And reading corresponding data with the length of the lba position being len, and returning the corresponding data to the client.

In addition, in this embodiment, the object may be closed and deleted, and the index association information includes the variable metadata field of the logical block, and in detail, referring to fig. 9, the closing and deleting of the object may be implemented in the following manner.

Step S310, receiving a control instruction, wherein the control instruction is a closing instruction or a deleting instruction;

step S320, a third logic object is determined based on the control instruction, and the variable metadata field of the first logic block contained in the third logic object is marked as the state indicated by the control instruction;

and step S330, deleting the corresponding relation between the third logic object and the storage area in the relation table.

In this embodiment, when the received control instruction is a close instruction, the close instruction carries an object_id, which is denoted as a third logical object for avoiding distinction. The first logical block block_idx it contains is determined based on the third logical object. And inquiring the chunk_table by taking object_id and block_idx as keys to obtain corresponding variable metadata chunk_meta [ i ], marking the variable metadata field of the logic block in the variable metadata as a closed state, and then updating the variable metadata chunk_meta [ i ] at the ith position in the variable metadata area.

When the received control instruction is a delete instruction, the first logical block block_idx contained in the third logical object is determined based on the third logical object. And inquiring the chunk_table by taking object_id and block_idx as keys to obtain corresponding variable metadata chunk_meta [ i ], marking the variable metadata field of the logic block in the variable metadata as a deletion state, and then updating the variable metadata chunk_meta [ i ] at the ith position in the variable metadata area. When an object is deleted, it is assumed that the object is in a closed state.

In addition, after the object is closed or deleted, the correspondence between the third logical object and the storage area in the relationship table may be deleted.

As can be seen from the above, the immutable metadata area is divided into a plurality of metadata storage areas, for example, 10 zones of 256 MB. While the immutable metadata of the logical objects and logical blocks are additionally written to the immutable metadata region each time with the cache assembled into a page size. That is, a single IO is written as one page size, but the amount of valid data (i.e., immutable metadata) data is smaller than one page, e.g., a single immutable metadata size of 20Byte, a page is typically 4kB. If the data is not compressed, the address of each writing is: 0,4096,8192, … …. This essentially results in the written data being sparse and unnecessarily taking up memory space.

In the existing coping mode, some of the coping modes use a disk controller to realize data compression and combination, and when each writing is performed, an address is designated as the tail part of the previous effective data, for example, the three writing positions are respectively: 0,20,40, … …. In this way, only a few disk controllers have this function, and it is difficult to ensure stable writing performance.

Referring to fig. 10, in this embodiment, for the above-mentioned problems, the following data compression method is adopted to solve:

in step S410, when the metadata storage area is full, the logical objects and the valid non-variable metadata of the logical blocks included in each cache in the metadata storage area are compressed and combined.

And step S420, writing the compressed and combined non-variable metadata into the rest metadata storage area in an idle state, and releasing the data in the metadata storage area.

In this embodiment, when the non-variable metadata is written in each metadata storage area, writing is performed first according to the above-described sequential writing manner, that is, each writing address is: 0,4096,8192, … …. Because each writing is aligned by page, the immutable metadata is written in the order of page (4 kB), but one immutable metadata is very small, only tens of bytes, so one page is written at a time, with the valid portion being only tens of bytes. Since the sequential writing area is determined by the characteristics of the sequential writing area, it is necessary to write additionally from the end of the page written last time in the next writing, and this part of data is very sparse after writing many times.

The size of one metadata storage area is 256MB and the size of one immutable metadata is calculated as 20byte, then the actual effective content is about 256MB/4kb×20=1.3 MB when one metadata storage area is full. Therefore, after a metadata storage area is filled, valid non-variable metadata therein needs to be compressed and merged, and then written into a new metadata storage area in an idle state. After the data transfer, the data in the original metadata storage area can be released, and then the metadata storage area can be used as an idle area for writing data.

By the method, sequential writing of data can be guaranteed, and stability of writing performance is guaranteed.

If the division is performed in different states by the metadata storage area, it may be divided into meta_front_writing_zone and meta_back_merge_zone as shown in fig. 11. Wherein meta_front_writing_zone indicates that the foreground writes writing_zone sequentially, and after being written fully, a write is reassigned, which is a sparse area, and each time the foreground creates an object, a record is written. The meta_back_merge_zone is the background merge zone, and after the foreground sequential writing zone is fully written, the effective data in the foreground sequential writing zone is compressed and combined and then written into the idle background merge zone, and the background merge zone is a compact area.

As can be seen from the above, the meta_back_merge_zone is more and more with the continuous writing of the immutable metadata, and in fact, a 20TB disk requires about 2 metadata storage areas in case of full data writing. In practical implementation, a certain amount of zones can be reserved as metadata storage areas, for example, 10 zones, so that the requirements can be completely met theoretically. Thus, 2 merge zones+1 writing zones+1 to be merged zone (reserved), 4 zones are sufficient to contain one full write of a 20TB disk. Therefore, when the disk is full, the data is deleted, new immutable metadata is created again, and new object metadata is essentially additionally written in the writing zone sequence of the immutable metadata area. So 10 zones suffices for one disk to write the resulting metadata twice at full capacity. In addition, 2 zone execution GCs may be reserved (Garbage Colletion, garbage collection mechanism).

The GC flow is divided into metadata garbage collection and data storage area garbage collection. Metadata garbage collection mainly moves compact and full metadata storage areas, and in detail, referring to fig. 12, this can be achieved by:

In step S510, for the metadata storage area where the compressed and combined immutable metadata is written, it is detected whether the metadata storage area is full and the ratio of the included valid immutable metadata to the total capacity of the metadata storage area is lower than a preset threshold, and if yes, the following step S520 is executed.

Step S520, the effective non-variable metadata in the metadata storage area is moved to the rest metadata storage area with the rest storage space, and the original metadata storage area is released.

In addition, the garbage collection mechanism of the data storage area is similar to that of metadata, and this embodiment is not described here again.

In the implementation process, a situation that a failure may occur to cause a disk to restart occurs, referring to fig. 13, in this embodiment, in this case, the recovery of data may be performed in the following manner.

Step S610, after restarting the disk, reading the variable metadata in the variable metadata area, and recovering to obtain index association information based on the read variable metadata;

step S620, obtaining the write pointer of the immutable metadata area, when the write pointer is not 0, reading immutable metadata to the memory based on the length of the write pointer, and recovering all immutable metadata according to the header information of each immutable metadata.

In this embodiment, after the disk is restarted due to a failure in the operation process, the system state may be recovered by reading the variable metadata and the non-variable metadata of the disk.

The variable metadata area is the first random writing zone, and the data can be directly read in full and a chunk_table is built.

The non-variable metadata area may first acquire its write pointer, and for non-operation with a write pointer of 0, the header may first be read for a write pointer other than 0, and it is determined whether the metadata storage area is compact or sparse. If the zone is sparse and in a full state, the zone is indicated to wait for compression merging, and if the zone is not in the full state, the zone is a writing zone which is being written as a foreground sequence. If compact, all the immutable metadata are read into the memory at one time according to the write pointer length of the zone. And restore the immutable metadata of all objects based on the header information of each immutable metadata.

In this embodiment, the recovery of the GC schedule may avoid the relocation by checking the points.

In this embodiment, the effective length of each object may be determined by acquiring the write pointers and the chunk_table information of all the storage areas of the data storage area. All aborted objects need to be shut down and cannot be written directly next. When data needs to be written, a new object can be applied for implementation.

The foregoing description of the solution provided in the embodiments of the present application has been mainly presented in terms of a method. To achieve the above functions, it includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative method steps described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application may divide the functional modules of the storage device according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

Fig. 14 is a schematic structural diagram of a memory device according to an embodiment of the present application. The storage device may be adapted to perform the functions performed by the storage device in any of the embodiments above. The memory device may include a memory 200 and a processor 100, the memory 200 may include SMR disk, and machine executable instructions may be disposed within the memory 200, the machine executable instructions including the data management apparatus 300.

Wherein the memory 200 and the processor 100 are electrically connected directly or indirectly to enable transmission or interaction of data. For example, electrical connection may be made to each other via one or more communication buses or signal lines. The data management device 300 includes at least one software functional module that may be stored in the memory 200 in the form of software or firmware (firmware). The processor 100 is configured to execute executable computer programs stored in the memory 200, for example, software functional modules and computer programs included in the data management device 300, so as to implement the data management method provided in the embodiment of the present application.

Alternatively, the processor 100 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a System on Chip (SoC), etc.; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

It will be appreciated that the configuration shown in fig. 14 is merely illustrative, and that the storage device may also include more or fewer components than shown in fig. 14, or have a different configuration than shown in fig. 14, for example, may also include a communication unit for information interaction with other devices.

Referring to fig. 15, a data management apparatus 300 provided in an embodiment of the present application includes a creating module 310, a receiving module 320, and a writing module 330.

The creating module 310 is configured to create a logical middle layer, where the logical middle layer includes a plurality of logical objects that are respectively used to identify a section of storage space in the data storage area, each logical object is divided into a plurality of logical blocks, each logical object and the immutable metadata of each logical block are written into the immutable metadata area, where the storage units and the logical blocks are in one-to-one correspondence.

In this embodiment, the creation module 310 may be configured to perform step S110 shown in fig. 2, and the description of step S110 may be referred to above with respect to the relevant content of the creation module 310.

The receiving module 320 is configured to receive a write instruction, and determine data to be written, a first logic object, and a first logic block based on the write instruction.

In this embodiment, the receiving module 320 may be configured to perform step S120 shown in fig. 2, and the description of step S120 may be referred to above with respect to the relevant content of the receiving module 320.

The writing module 330 is configured to determine a first storage unit according to the first logical object, write the data to be written into the first storage unit, write variable metadata of the first storage unit into a variable metadata area, where a writing position of the variable metadata in the variable metadata area corresponds to a position of the first storage unit in a data storage area, and the variable metadata includes index association information of the first logical object, the first logical block, and the first storage unit.

In this embodiment, the writing module 330 may be used to perform steps S130 and S140 shown in fig. 2, and the description of steps S130 and S140 may be referred to above for the relevant content of the writing module 330.

In one possible implementation, the data management apparatus 300 further includes a reading module that can be used to:

In one possible implementation, the reading module may be specifically configured to:

In one possible implementation manner, the variable metadata area further stores a relationship table including a correspondence between logical objects and storage areas, and the writing module 330 may be configured to:

In one possible implementation, the index association information includes a variable metadata field of a logical block, and the data management apparatus 300 may further include a marking module, where the marking module may be configured to:

In one possible implementation, the immutable metadata area is divided into a plurality of metadata storage areas, and the immutable metadata of the logical objects and the logical blocks is additionally written to the immutable metadata area each time with a buffer assembled into a page size, and the data management apparatus 300 further comprises a compression module operable to:

In one possible implementation, the data management apparatus 300 further includes a reclamation module that may be used to:

In one possible implementation, the data management apparatus 300 further includes a recovery module that can be used to:

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

In summary, the data management method, device and storage device provided in the embodiments of the present application use the index association information for constructing the logical object, the logical block and the storage unit, so that the data at any position can be indexed by using the index association information, and the metadata is efficiently represented. In addition, the variable metadata and the non-variable metadata are stored separately, so that disorder of the metadata is avoided, and compared with the prior art, the method of recording addresses in the metadata is adopted, the method of using the positions, namely the addresses, for the variable metadata, and the metadata quantity is further reduced.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A data management method, characterized in that it is applied to a storage device, where a disk included in the storage device is divided into a variable metadata area, a non-variable metadata area, and a data storage area, where the data storage area is divided into a plurality of storage areas, and where each storage area is divided into a plurality of storage units, the method includes:

2. The method of data management according to claim 1, wherein the method further comprises:

3. The data management method according to claim 2, wherein the step of determining a second storage unit corresponding to the second logical object and the second logical block according to the index association information in the variable metadata area comprises:

4. The data management method according to claim 1, wherein a relationship table containing correspondence between logical objects and storage areas is also stored in the variable metadata area;

5. The data management method according to claim 4, wherein the index association information includes a variable metadata field of a logical block;

the method further comprises the steps of:

6. The data management method according to claim 1, wherein the immutable metadata area is divided into a plurality of metadata storage areas, and the immutable metadata of the logical objects and logical blocks is additionally written to the immutable metadata area each time with a buffer assembled into a page size;

the method further comprises the steps of:

7. The method of data management according to claim 6, wherein the method further comprises:

8. The method of data management according to claim 1, wherein the method further comprises:

9. A data management apparatus applied to a storage device, the storage device comprising a disk divided into a variable metadata area, a non-variable metadata area, and a data storage area, the data storage area divided into a plurality of storage areas, each of the storage areas divided into a plurality of storage units, the apparatus comprising:

10. A storage device comprising one or more storage media and one or more processors in communication with the storage media, the one or more storage media storing processor-executable machine-executable instructions that, when the storage device is run, are executed by the processor to perform the method steps recited in any of claims 1-8.