CN112596673A

CN112596673A - Multi-active multi-control storage system with dual RAID data protection

Info

Publication number: CN112596673A
Application number: CN202011508298.9A
Authority: CN
Inventors: 胡晓宇
Original assignee: Nanjing Daoshang Information Technology Co ltd
Current assignee: Nanjing Daoshang Information Technology Co ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-04-02
Anticipated expiration: 2040-12-18
Also published as: CN112596673B

Abstract

The invention discloses a multi-active multi-control storage system with double RAID data protection, which organically integrates the RAID technology of a controller and the RAID technology of a network, and comprises at least two storage controllers which are connected through the network, wherein the storage controllers have the local RAID protection function of respective independent operation, and each user data block is stored at least in two parts and respectively stored on different controllers, so that when any storage controller is down or damaged, the service of the storage system is not interrupted and the data is not lost. When the hard disk is damaged, the data is preferentially applied to a local RAID to realize data repair; and when the local RAID can not recover the data from the node, adopting protection between the storage controllers to recover and rebuild the data. The invention further improves the metadata management mode, greatly improves the storage utilization efficiency and the storage performance by combining the technologies of data compression, deduplication, integrity check, self repair and the like, improves the GC strategy, and can obviously reduce the write amplification caused by GC and improve the storage IO performance.

Description

Multi-active multi-control storage system with dual RAID data protection

Technical Field

The invention belongs to the technical field of disk Arrays and external storage systems, and relates to a novel dual RAID (redundant Arrays of Inexplicit disks) technology, namely a network RAID implementation method combining an RAID inside a controller and at least two storage controllers, in particular to a multi-active multi-control storage system further combining technologies such as data compression, data deduplication, data integrity verification and self-repair on the basis of dual RAID data protection.

Background

With the further development of digital transformation, mass data puts new requirements on storage. Although the traditional disk array has the advantages of mature technology, good performance, high availability and the like, the defects of the traditional disk array are more and more obvious in the face of mass data, especially with the continuous increase of the capacity of a hard disk, the reliability of the hard disk and the error rate parameters of data are not obviously improved, and the potential safety hazard of the RAID disk group is more and more serious. In addition, the scalability of the disk array is severely limited by the architecture of the shared hardware RAID controller, and it is difficult to extend the scalability from the conventional dual controller to multiple controllers.

The working principle of the traditional disk array is based on the RAID principle, namely, an array mode is used as a disk group, and the data security is improved by matching with the design of data scattered arrangement. The disk array is a large disk set composed of a plurality of inexpensive, small-capacity, high-stability and slow-speed disks, and the performance of the whole disk system is improved by utilizing the cumulative effect generated by providing data by individual disks. Meanwhile, by using the technology, the data is cut into a plurality of sections which are respectively stored on each hard disk. The disk array can also utilize the concept of Parity Check (Parity Check), when any hard disk in the array fails, data can still be read out, and when the data is reconstructed, the data is calculated and then is placed into a new hard disk again.

Currently, RAID modes are frequently used, such as RAID 0, RAID 1, RAID 10, RAID 2, RAID 3, RAID 4, RAID5, RAID6, RAID 7, and combination RAID, such as RAID 50 and RAID 60. The purpose of RAID is to protect data in the event of a failure of a mechanical hard disk (HDD), and when the HDD fails, the data within it can be recreated by parity or mirrored copies according to different types of RAID.

Each RAID has advantages and disadvantages in terms of write performance, read performance, data protection level, data reconstruction speed and available storage space on each hard disk, for example, if data availability is guaranteed to be the highest priority, mirroring or multiple mirroring (RAID 1, RAID 10, triple mirroring, etc.) is the best option, data has a complete copy on other HDDs or RAIDs, simplifying the data protection and recovery process, but the cost is really a serious challenge. In reality, few organizations adopt this pure mirroring approach, and most prefer to accept RAID5 or RAID 6.

When one hard disk in RAID5 is damaged, the system will rebuild the data on the failed disk on the check disk, but as the capacity of the hard disk reaches 10TB level, the rebuilding time is often several hours, days or even weeks, the system performance will decline during the rebuilding, if the performance is not allowed to decline by the application system user, the rebuilding operation can only be run at a low priority, so the rebuilding time will increase significantly, and longer rebuilding time means that the risk of data loss is greater, and for this reason, many companies have been directly connected with RAID 6.

RAID6 provides a second block of parity or stripe disk protection data, and even if two hard disks are corrupted or unrecoverable read errors occur, the risk of data loss is significantly reduced, but if data on the two hard disks needs to be reconstructed at the same time, the impact on system performance is very large. In addition, RAID6 also runs the risk of mechanical wear and dust damage, and most storage systems include automatic error correction, but as the capacity of hard disks increases, the time required for these operations increases exponentially.

Some storage manufacturers solve some traditional RAID problems by improving reliability and by technical innovation, for example, RAID-DP of IBM's EVENODD and NetApp enhances RAID6 performance by reducing algorithm overhead. The RAID-TM of NEC aims to reduce the risk of data loss for RAID 1, it writes data to three independent hard disks simultaneously, even if two hard disks fail or an unrecoverable read error occurs, the application can continue to access the data it needs without degrading performance, even during reconstruction, performance is not affected, although the disadvantage of RAID-TM is obvious, that is, the disk space utilization is only 1/3.

RAID-X is an innovation in the IBM XIV storage system that uses a large number of stripes to reduce the loss of RAID performance and the risk of data loss, and can be viewed as a variant of RAID 10 that uses an intelligent risk algorithm to randomly mirror data blocks across the entire array, which allows XIV to reconstruct data on a 2TB HDD in less than 30 minutes, and like other mirroring techniques, RAID-X has the disadvantage of a less efficient use of disk space and no controller specific RAID.

Finally, it is worth pointing out that hewlett-packard Networks and Pivot provide similar Network (Network) RAID variants for their x 86-based cluster iSCSI storage, Network RAID utilizes the principle of RAID, but uses storage nodes instead of disks as the lowest component level, so that it can allocate data blocks of logical volumes across the cluster, and can provide 1-4 data images according to the difference in Network RAID levels, with very high scalability. Meanwhile, it has self-healing capability, and when one node fails, it can correct the data and then copy it to another node. Although the risk of data loss is reduced by the network RAID, the delay is large through network data reconstruction, and the consumption of network bandwidth is severe; how to reduce the dependence of a storage system on a network RAID while maintaining the expansibility of the network RAID is always a technical problem in the field of storage. In addition, as with other mirroring techniques, the low utilization of disk space is a significant disadvantage.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems that in a traditional disk array, the same RAID group is shared or managed by a plurality of controllers at the same time, and the failure of the RAID group becomes a key failure source of the traditional disk array and the performance bottleneck of a storage system, the invention aims to provide a novel multi-active multi-control storage system based on dual RAID data protection, wherein the RAID group independently managed in a storage controller and cross-node mirror image protection (network RAID protection) are organically combined to realize dual (two-layer) RAID data protection, and the performance and capacity of the storage system can be synchronously expanded. And further adopting novel metadata management, dynamic composition of RAID data packets and address allocation and a self-adaptive garbage recovery strategy on the basis so as to improve the overall IO performance of the system and save the storage space.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:

a multi-active multi-control storage system with double RAID data protection comprises at least two storage controllers, wherein each storage controller has an independent RAID function, different storage controllers are connected through a network to realize cross-node mirror image copy or network RAID of user data blocks to form a double data redundancy structure, and RAID data protection in the storage controllers is combined with cross-node mirror image copy or RAID to realize two layers of RAID data protection; the logical address space of the storage system is divided into addressing spaces with fixed sizes, and data blocks of the address spaces are stored in at least two parts and are distributed to different storage controllers for storage in local RAID; when the hard disk is damaged, the storage controller preferentially utilizes the local RAID function to recover and reconstruct data, and if the local RAID group cannot recover the data, a protection mechanism among the storage controllers is adopted to recover and reconstruct the data. .

As an optional implementation manner, the storage system adopts two-level metadata management, including two-level address mapping, where the first-level address mapping allocates at least two storage controllers to each user logical address space according to the number of copies of the cross-controller and a pre-fixed address mapping method, and forms an address mapping relationship with the logical address space of the local RAID group of the storage controller; the second level address mapping is implemented by the local RAID of each storage controller to complete the translation from logical addresses to physical addresses of the local RAID groups.

The two-stage metadata management adopts a pre-fixed address mapping method, the management of the metadata is simple, and the metadata does not need to be stored in a large capacity. The metadata management of the RAID group can be realized by a traditional hardware RAID control card or soft RAID functions such as mdrain or LVM in a Linux kernel, but the method has the defects of poor performance, and the write operation of each data block can cause data updating of at least two RAID groups to cause the defect of IO amplification; in addition, the dual RAID has a great reliability improvement due to the redundancy of mirroring between nodes, but this is at the cost of lower storage use efficiency, and there is room for further improvement.

As a preferred embodiment, the storage system adopts global single-level metadata management and dynamically allocates addresses to implement data writing, when a user writes data, at least two storage controllers for storing user data blocks are determined by a pseudo-random algorithm, the user data blocks are respectively sent to a first cache region of a corresponding storage controller, and sent to a second cache region after being de-duplicated and compressed, and a plurality of compressed data blocks are combined in the second cache region and form an RAID stripe after adding address mapping information, compression algorithm and check information of the data blocks; the RAID stripe is written to a dynamically allocated address space, and the logical and actual write addresses of all user data blocks within the RAID stripe are saved in a global metadata table of the storage controller.

In a preferred embodiment, a storage pool of wide stripes is formed by a plurality of disks in the storage controller, and the logical address space of the storage pool starts from the first address space of the first disk, transitions to the first address space of the next disk according to the order of the disks, and sequentially moves to the next disk; then the data is recycled to the second address space of the first disk, and the data is sequentially recycled; the storage controller dynamically allocates a continuous storage address space for each RAID stripe in front-to-back order according to the logical address space of the storage pool, and writes to the RAID stripe.

In a preferred embodiment, the global metadata table of the storage controller includes an address mapping table for expressing actual write addresses corresponding to logical addresses of user data blocks, and a data Hash table for a data deduplication function.

Further, a garbage collection GC strategy is adopted in the storage system, and the storage space occupied by the invalid data blocks is reused. As a preferred embodiment, the GC process determines whether the data block on the RAID stripe is valid by combining the address mapping information in the RAID stripe and the address mapping information stored in the global metadata table; if the two are consistent, the data block is valid, and if the two are not consistent, the data block is invalid.

As an optional implementation manner, a FIFO garbage collection policy is adopted in the storage system, before the free space in each storage controller pool is exhausted, a GC process is started, the oldest written RAID stripes are sequentially checked, whether valid data are stored in the RAID stripes, if valid data exist, the valid data blocks are read again and sent into the second cache area to be packaged with new write data blocks to be written into new RAID stripes, and the recovered RAID stripes enter the free stripe pool to be reused for writing data.

In the above FIFO GC reclamation strategy, if there is valid data in the RAID stripe to be reclaimed, the data will be packaged together again as new data, written again, resulting in an additional Write operation, colloquially called Write Amplification (WA). Although the dynamic allocation, sequential write RAID groups described above can significantly improve system performance, excessive WA can offset some of the performance improvements.

In order to reduce WA, as a preferred embodiment, the storage system employs an adaptive garbage collection policy, wherein data blocks collected by the GC are separated from newly written data blocks to form independent RAID stripes, and digital tags corresponding to the data blocks on the RAID stripes and belonging to the recovery of the second time are attached, and the storage system determines the relative recovery frequency of the stripes according to the digital tags.

Has the advantages that: compared with the prior art, the invention has the following advantages:

1. the invention provides double RAID data protection by combining the local RAID of each storage controller with the copy of the cross controller (a network RAID); when the hard disk fails or is damaged, data recovery is preferentially carried out through the RAID local to the controller, and the work of other controllers is not influenced. When the hard disk failure exceeds the protection capability of the local RAID, data repair can be carried out through the network RAID. Compared with the traditional disk array, the invention can greatly improve the durability of data and the reliability of a storage system.

2. The invention provides a pre-fixed two-stage fixed address mapping method for managing and realizing dual RAID protection; the first level address mapping is mapping from the user logic address space to the logic address space of the RAID group in each controller; the second level of address mapping is the logical to physical address translation by each controller RAID group, which involves computing and maintaining redundant data blocks for the RAID. The two-level mapping method is simple to implement, and the second-level mapping can be implemented by using the existing hardware RAID controller or soft RAID method.

3. The invention provides a novel metadata management and data dynamic write address allocation method to realize dual RAID protection, which integrates the functions of online data compression, data deduplication, dynamic RAID group construction and sequential write and the like, and can greatly improve the performance and the utilization rate of storage resources.

4. The invention provides a new method for realizing GC, which realizes that data with similar updating frequency is placed in the same RAID stripe by setting a digital label based on the recovery times for the RAID stripe and classifying the recovered data according to the digital label, thereby effectively reducing the write amplification in the GC process and improving the performance of the system.

Drawings

Fig. 1 is a diagram of a total system architecture for dual RAID implementation according to an embodiment of the present invention, in which a user logical address space is mapped to two different controllers according to a Module policy to form global fine-grained data mirror protection, and meanwhile, data is further protected by a RAID group in each controller.

FIG. 2 is a diagram of a general system architecture for a dual RAID implementation according to another embodiment of the present invention, wherein the user logical address space is mapped to two different controllers according to a policy, and thereafter, the data is protected at each node using RAID data with dynamically allocated RAID stripe addresses.

Fig. 3 is a flowchart of deduplication and compression processes in a dynamic data write address allocation method according to an embodiment of the present invention, where in a dual RAID, data is first subjected to deduplication processing, then subjected to online data compression, and then reassembled into a data packet in a cache.

Fig. 4 is a flow chart of a group packet in a dynamic data write address allocation method according to an embodiment of the present invention, where in a dual RAID, after data is subjected to deduplication and compression, metadata and a redundancy check code required for RAID protection are added to a cache to form a complete RAID stripe with local data protection capability.

Fig. 5 is a flowchart of sequential allocation in a dynamic data write address allocation method according to an embodiment of the present invention, where in a dual RAID, idle RAID stripes are sequentially allocated, and meanwhile, GC space is sequentially recovered.

Fig. 6 is a schematic diagram of a novel GC method based on digital label classification in an embodiment of the present invention, and by setting a digital label based on recovery times for a RAID stripe and a strategy for classifying recovery data according to the digital label, data with similar update frequency can be placed in the same RAID stripe, so that write amplification in a GC process is effectively reduced, and system performance is improved.

Detailed Description

The invention is further described with reference to the following figures and specific examples.

In the existing disk array technology, a RAID group is shared by two or more controllers (active mode), or switched between controllers (active/standby mode); however, when the failure of a disk in a RAID group exceeds the protection capability of the RAID group, data loss is inevitable. Specifically, for RAID5, when two hard disks are damaged in one RAID group or one hard disk is damaged and another hard disk fails to read out the individual data block correctly (commonly referred to as Media Errors), data loss may result. As the capacity of the hard disk is increased, but the reliability index of the hard disk is not limited to be improved, the above problem needs a better data protection strategy to cope with.

Aiming at the defects of the prior RAID technology, the embodiment of the invention discloses a double-RAID data protection-based multi-active multi-control storage system, which comprises at least two storage controllers, wherein the storage controllers are connected by a high-speed switching network TCP or Infiniband. Each controller (storage controller) has a RAID function which works independently, a local logical address space of the controller with RAID data protection is formed, and the local logical address space is formed by logical address spaces of one or more RAID groups; when any hard disk on the controller is damaged, local (non-network) data recovery and reconstruction can be realized through the RAID function in the controller, and the normal work of other controllers is not influenced by the data reconstruction service; the logical address space of the storage system, i.e. the user address space, is firstly divided into addressing spaces of fixed size (corresponding to a certain data block, the granularity of which may be 4KB, 8KB, 16KB, or even 1MB or 4MB, and the size of which is selectable), the data blocks of the address space are allocated to the logical addresses of different RAID groups in two different controllers according to a certain pseudo-random algorithm for storage, so that the user data is randomly allocated with smaller granularity to form a multi-copy protection mechanism across the controllers. And then, the data blocks distributed to the same RAID group of the controller are repackaged, RAID redundant data blocks are calculated to form a local RAID group, and then the local RAID group is written into a physical disk corresponding to the RAID group. It is worth noting that a dual RAID architecture storage system can be extended to hundreds, or even thousands, of storage controllers due to the use of a pseudo-random algorithm.

The embodiment of the invention adopts a double RAID disk array technology to construct RAID groups with local exclusive functions of each controller, so as to realize local (local) data recovery of data without consuming network resources of a storage system and calculation or storage resources of other controllers; and when the local RAID group can not be recovered, the data recovery is realized through the data copying technology of the data blocks between the controllers. The data copying technology between the RAID control function which is exclusive to the local controller and the controller can determine the corresponding relation of data from a logic address to a physical address by two-stage metadata management and adopting a pseudo-random algorithm and a fixed address mapping table in advance. The first-level address mapping is mapping from a user logic address space to logic address spaces of different RAID groups in each controller, and according to the number of copies of the cross-controller, each user logic address space is distributed to two or more controllers and forms an address mapping relation with a certain logic address space of a certain RAID group in the controller, so as to determine read-write address query of user data. The second level address mapping is implemented by each controller's local RAID to accomplish the translation from logical addresses to physical addresses for each RAID group, including computing and maintaining redundant data blocks for the RAID. Furthermore, the invention provides a novel global single-stage metadata management and data dynamic write address allocation method to realize dual RAID protection, integrates functions of online data compression, data deduplication, dynamic RAID group construction and sequential write and the like, and can greatly improve performance and storage resource utilization rate.

First, a detailed description will be given of two-level metadata management dual RAID: in each controller, one or more local RAID groups are managed by a dedicated hardware RAID control card or Linux soft RAID software (or other similar software), and each controller includes a plurality of physical disks (which may be formed by a mechanical hard disk HDD or a solid state disk SSD), which is called local RAID management (underlying RAID). Metadata management of a local RAID, wherein a local readable and writable logical address space protected by the RAID is provided for an upper layer by taking each controller as a unit; and the system is responsible for mapping the logical address space of the local RAID group to the addresses of each HDD and SSD, providing easy generation and management of data blocks of the RAID group, and realizing data repair tasks when the hard disks are damaged. When the hard disk fails, the controller realizes local repair on the data through the local RAID without influencing the operation of the RAID of the upper layer.

The cross-controller RAID management, namely the upper RAID, is realized through a network RAID, namely a data redundancy structure is generated by data mirror image copy or RAID groups of data blocks constructed on different controllers, so that the data is protected; the controller is essentially a stand-alone computer system connected by a high-speed network and is therefore referred to as a network RAID. Network RAID may also become a distributed RAID, typically employing some distributed algorithm to determine how to construct a mirrored copy of data or RAID groups on multiple controllers. Common distributed storage recommends the use of three copies to realize data protection and service continuity in a production system, but because the dual RAID of the invention and the local controller already have one layer of independent RAID protection, only two copies of mirror images are generally needed to realize extremely high data durability protection in a network RAID layer. Of course, the upper RAID level in the present invention does not exclude triple copy or other erasure code protection.

Specifically, as shown in fig. 1, a general structure diagram of a method for implementing a dual RAID with two-level metadata management according to the present invention is shown, wherein four controllers each have two RAID groups and form a local logical address space. When the data of the local logic address space is written into the RAID group, the controller calculates a redundant data block (P) according to the rule of the RAID group and automatically writes the redundant data block into the RAID group, and the consistency of the redundant data block and the data of the local logic address space is kept to form local RAID protection. In the figure, the user logical address space maps the address of the user data to the local logical address spaces of two different controllers respectively through a fixed mapping relation, that is, the user data is kept in the local logical address spaces of the two different controllers, so as to form data protection across the controller layer. In this illustration, copy 1 of data block 1 maps to the first data block of controller 1 and copy 2 maps to the first data block of controller 2; copy 1 of data block 2 maps to the 2 nd data block of controller 2, and copy 2 maps to the first data block of controller 3; copy 1 of data block 3 is mapped to the 2 nd data block of controller 3, and copy 2 is mapped to the first data block of controller 4; copy 1 of data block 4 maps to the 2 nd data block of controller 4 and copy 2 maps to the second data block of controller 1.

By analogy, copy 1 of data block 5 is mapped to the 3 rd data block of controller 1, and copy 2 is mapped to the 3 rd data block of controller 2; copy 1 of data block 6 maps to the 4 th data block of controller 2, and copy 2 maps to the 3 rd data block of controller 3; copy 1 of data block 7 maps to the 4 th data block of controller 3 and copy 2 maps to the 3 rd data block of controller 4; copy 1 of data block 8 maps to the 4 th data block of controller 4 and copy 2 maps to the 4 th data block of controller 1.

In the above example, a pre-fixed mapping from the user logical address space to the local logical address space of each controller can be constructed by a simple Module operation. One advantage of this approach is that even in large storage systems, the simple Module operation can still be used to calculate the sequence number of each data block and its copy held in a controller in the storage system and the specific logical address therein, without consuming too much, non-volatile cache to hold the address mapping table.

It should be noted that the number of controllers shown in the above figures is 4, but the number is not limited to 4 controllers. The number of controllers may be 2, 3, 4, or any number above. The RAID internal to each controller is also not limited to 2 (2+1) RAID5 groups, and may be a multi-group RAID, or a RAID model may be another type.

The dual RAID described above, while improving data durability and high availability of the system, is suitable for some application scenarios where extreme reliability is sought, but still needs to pay a price in performance and storage efficiency. In particular, due to the duplication across controllers or other levels of RAID protection, there is more data redundancy in the system than in a conventional disk array, not only consuming more storage space, but also resulting in more RAID-specific Read-Modify-Write operations with performance degradation.

In order to further improve the storage efficiency and performance of the double RAID, the invention provides a novel global single-level metadata management and data dynamic write address allocation method to realize double RAID protection, and the performance of a storage system is improved while the utilization rate of storage resources is improved by integrating the functions of online data compression, data deduplication, dynamic RAID group construction and sequential write and the like. When a data block is written in each time, two storage controllers are selected to be written in simultaneously according to a certain Hash rule; at each storage controller, the data blocks are subjected to data deduplication, data compression and repackaging, data packets with RAID protection are generated, newly allocated addresses are written in a sequential mode, a metadata table is updated, and mapping between the logical addresses and actual written addresses of the data blocks is recorded. Fig. 2 is a diagram of a dual RAID system architecture according to another embodiment of the present invention, wherein a user data block corresponding to a certain lbn (local block number) is first determined according to a certain calculation method, a Module algorithm, or another Hash algorithm, and target controllers of two or three copies of the user data block are determined, and data is transmitted up through a network and stored through a write operation. If the data block is a read request, the data block is determined to exist in a certain controller, then the controller inquires the storage address of the data block corresponding to the LBN through a metadata table, and then the data is read from the address.

As shown in fig. 3, in each controller, the new data is first placed in the buffer 1. Each block is first passed through one or more Hash engines dedicated to data deduplication, such as the well-known cryptographic functions SHA256, SHA512, etc., to compute a characteristic value characterizing the block. An important feature of these cryptographic functions is that when two data blocks have the same feature value (SHA256, SHA512), the two data blocks can be considered identical, so that the data blocks that can be deduplicated can be quickly queried and determined by detecting their feature values without having to perform a direct comparison between the more burdensome data blocks and the data blocks. If the characteristic value is found to exist in the metadata table of the controller, the current data block is indicated to be a repeated data, the logical address of the data block can be directly pointed to the address of the existing data block with the same characteristic value, and the data block does not need to be written again.

When the characteristic value query indicates that the current data block is a new data block, the data block is firstly sent to a data compression engine for online data compression. Commonly used data compression algorithms are LZ4, GZIP1-9, and the like. After data compression, the data is sent to the buffer 2.

In the cache 2, a plurality of compressed data blocks are combined together, and address mapping information of the data blocks and metadata information such as compression algorithm selection and integrity check codes of the data blocks are added to form a data packet with a fixed length. The length of the data packet is determined by the RAID configuration of the controller, for example, the RAID configuration is 2+1 of RAID5 type, that is, each RAID group is composed of two data blocks and one check data block, each data block is S bytes in size, and the cache 2 will generate two data packets of S bytes in size each time. Then, a matching RAID redundancy data block is generated for the data packet by soft RAID calculation, thus generating a RAID data packet with RAID protection to be written. The RAID data packet includes a plurality of compressed data blocks, corresponding address mapping information, and RAID redundancy data. These RAID data blocks with self-protection capability are abbreviated RAID stripes. Continuing below, it is described how these RAID stripes are dynamically assigned addresses.

Firstly, inside each controller, a storage pool with wide strips is formed by a plurality of disks, and the logical address space of the storage pool is transited to the first address space of the next disk from the first (logical) address space of the first disk according to the sequence of the disks and then sequentially enters the next disk; and then the data is recycled to the second address space of the first disk, and the data is sequentially recycled, so that the concurrent writing operation of the disks can be maximally utilized by the address space, and the utilization rate of the bandwidth is improved. When the cache 2 continues to generate the RAID stripe, the controller dynamically allocates continuous storage address space for each RAID group data packet in the order from front to back according to the logical address space of the storage pool, and writes the RAID stripe. When the RAID stripe is written according to the linear address space, each sub-block with fixed length can be automatically distributed to different disks, so that the data packet has a dynamic RAID protection function.

It is noted that, in the RAID stripe, in addition to the user data, the stripe also stores metadata information related to the RAID stripe, which is used to determine whether the data block on the stripe is valid; more specifically, a stripe will hold the logical address of the data on it, as well as the actual write address assigned to the stripe; the same address mapping relationship is stored in the global metadata table of each storage controller at the same time. When a certain data needs to be read out, the controller firstly queries the table to obtain a specific address saved by the user data, and then reads the data from the address.

When a certain data block is written again, the data block is written into a new stripe together with other data blocks; its mapping in the global metadata table points to the new stripe address. When the old stripe is recycled, reading the metadata on the stripe, and checking whether the mapping address of the data block is consistent with the mapping in the universal metadata table; if so, indicating that the data block is valid data (the data block is not overwritten); if not, indicating that the block has been overwritten with a new address, the corresponding data on the old stripe is invalid. )

As shown in fig. 4, the data is passed through the deduplication and compression processes in the respective controllers and stored in the cache 2. Next, the data blocks are recombined, a series of RAID stripes with a fixed size are formed by a plurality of data blocks, corresponding metadata, RAID protection, and the like, and sent to the cache 3; the size of the RAID stripes matches the physical disks on the controller node, such that the RAID stripes may be stored spread across multiple physical disks, and ensure that upon encountering one or more disk corruptions, the RAID stripes can still reconstruct the data without causing data loss.

FIG. 5 illustrates a write mode for dynamically allocating spare RAID stripes in order according to the present invention. Firstly, in each controller, the logic address space of all hard disks forms a uniform idle RAID stripe which is addressed according to the

sequence

0, 1, 2 and …; whenever 3 types of cache generate a RAID stripe to be written (where multiple compressed data blocks are protected), the controller dynamically and sequentially allocates free address space to the current data RAID stripe. Thus, each time data is written, the data is not overwritten at the previously assigned address, but at a new address, commonly referred to as a write-out-of-place write strategy. Because data is written in according to a complete RAID stripe each time, the bandwidth of a disk can be utilized to the maximum extent, the IO performance of a system is improved, and meanwhile, local RAID data protection is provided. Since the data block is written to a new data RAID stripe each time, the corresponding data block becomes an invalid data block at the last writing. In this way, during the continuous writing process, some data blocks in a plurality of data blocks of a data RAID stripe correspond to the latest written data, which is called valid data, and other data blocks correspond to the updated data, which is called invalid data, and it is required to try to find back the storage space occupied by the invalid data through a Garbage Collection (Garbage Collection) policy and reuse the storage space. FIG. 5 illustrates a First-in-First-out garbage collection based strategy in which idle RAID stripes are gradually depleted as the sequential allocation of idle RAID stripes progresses; before exhaustion, the controller sequentially checks the oldest written stripe by the GC process, if valid data is still stored in the oldest written stripe, if so, the data is read again and sent to the buffer 2, and together with new write data, a new stripe is formed and written into a new RAID stripe, which is called Relocation. When valid data in the GC stripes are completely relocated, the stripes enter a free stripe pool and are reused for writing data.

Shown in tables 1 and 2 is a metadata management corresponding to the above operation. The metadata is usually stored in a high-speed DRAM cache, and is assisted by a suitable power-down protection mechanism, such as using NVDIMM, battery protection, or Flash protection, to ensure that the critical metadata is not lost in case of power-down or system Crash. The metadata management includes at least two table information, one is an address mapping table, which is used to express the address of the data actually stored corresponding to the logical address of the data block, i.e. the Local Block Address (LBA), and the size of the data block. In general, an LBA can represent or be converted to a sequence number of a RAID stripe and an offset (offset) located at the RAID stripe. After a data block is written, the LBA address and the data block size corresponding to the data block LBN need to be updated. Thus, when a certain LBN data block needs to be read, the table is first looked up to obtain the address actually stored by the data block, and then a read request is issued. The other metadata table is a data Hash table used for a data deduplication function, and the data Hash table records characteristic values of all valid data blocks on the storage controller. When a new data block is written in, firstly, the characteristic value of the new data block is calculated through the Hash function, then, the characteristic value table is inquired, if the same characteristic value exists in the controller, the data block can be deduplicated, only the LBN of the data block needs to point to the existing data block address, and the data block does not need to be written again, so that the purpose of data deduplication is achieved.

Table 1 address mapping table

LBN	(LBA，size)
		0	(xx…，8)
1	(yy…，4)

Table 2 data Hash table

LBA	Characteristic value	Count
			xx…	Hash value	1
yy…	Hash value	5

It is noted that a portion of the metadata is stored directly in the metadata area of each RAID stripe. The metadata is used to record the LBN of the data block stored in the RAID stripe, and information such as LBA, size, characteristic value, and CRC check corresponding to the LBN. When the metadata is used to assist in GC data recovery of the RAID stripe, in combination with the metadata information described above, it can be determined which data blocks are invalid and which data blocks are valid, and then the valid data blocks are rewritten to new addresses for the purpose of recovering the RAID stripe.

The RAID stripe may cause extra Write overhead when performing GC data recovery, commonly referred to as Write Amplification (WA). The root cause of WA is caused by inconsistency of the frequency of rewriting data blocks in the same RAID stripe, that is, inconsistency of the life cycle of each data block. If it can be possible to place data blocks with the same or similar life cycle in the same RAID stripe, effective data that needs to be migrated when the RAID stripe undergoes GC data recovery can be greatly reduced by adjusting the recovery frequency of the RAID stripe, so the WA can be reduced, thereby improving the performance of the system.

For the above principle, fig. 6 shows that the present invention proposes a new adaptive GC strategy, i.e. separately separating the data recovered by the GC and the newly written data, placing them in respective independent RAID groups, and attaching a digital label (Count field in the data Hash table can be written) to indicate that the data on the RAID group belongs to the recovery of the second time. For example, when the RAID groups are packed, four RAID groups receive data simultaneously, which correspond to the

digital tags

0, 1, 2, and 3, new data is put into the RAID group with tag 0, and data recovered from tag 0 is allocated to the RAID group with tag 1, data recovered from tag 1 is allocated to the RAID group with tag 2, and so on, and data recovered from

Claims

1. A multi-active multi-control storage system with double RAID data protection is characterized in that the storage system comprises at least two storage controllers, each storage controller has an independent RAID function, different storage controllers are connected through a network to realize cross-node mirror image copy or network RAID of user data blocks to form a double data redundancy structure, and RAID data protection in the storage controllers is combined with cross-node mirror image copy or RAID to realize two-layer RAID data protection; the logical address space of the storage system is divided into addressing spaces with fixed sizes, and data blocks of the address spaces are stored in at least two parts and are distributed to different storage controllers for storage in local RAID; when the hard disk is damaged, the storage controller preferentially utilizes the local RAID function to recover and reconstruct data, and if the local RAID group cannot recover the data, a protection mechanism among the storage controllers is adopted to recover and reconstruct the data.

2. The multi-active multi-control storage system with dual RAID data protection according to claim 1, wherein the storage system employs two levels of metadata management, including two levels of address mapping, the first level of address mapping allocates at least two storage controllers for each user logical address space according to the number of copies of the cross controller and a pre-fixed address mapping method, and forms an address mapping relationship with a logical address space of a local RAID group of the storage controllers; the second level address mapping is implemented by the local RAID of each storage controller to complete the translation from logical addresses to physical addresses of the local RAID groups.

3. The multi-active multi-control storage system with dual RAID data protection according to claim 1, wherein a global single-level metadata management and a dynamic address allocation are adopted in the storage system to implement data writing, when a user writes data, at least two storage controllers that store user data blocks are determined by a pseudo random algorithm, the user data blocks are respectively sent to a first cache area of a corresponding storage controller, and sent to a second cache area after being deduplicated and compressed, and a plurality of compressed data blocks are combined in the second cache area and an address mapping information, a compression algorithm, and a check information of the data blocks are added to form a RAID stripe; the RAID stripe is written to a dynamically allocated address space, and the logical and actual write addresses of all user data blocks within the RAID stripe are saved in a global metadata table of the storage controller.

4. The multi-active multi-control storage system with dual RAID data protection according to claim 3 wherein the storage controller is configured with a plurality of disks to form a wide stripe storage pool, and logical address spaces of the storage controller are transitioned from a first address space of a first disk to a first address space of a next disk in a disk order and sequentially moved to the next disk; then the data is recycled to the second address space of the first disk, and the data is sequentially recycled; the storage controller dynamically allocates a continuous storage address space for each RAID stripe in front-to-back order according to the logical address space of the storage pool, and writes to the RAID stripe.

5. The system of claim 3, wherein the global metadata tables of the storage controller comprise an address mapping table for expressing actual write addresses corresponding to logical addresses of user data blocks, and a data Hash table for data deduplication.

6. The multi-active multi-control storage system with dual RAID data protection according to claim 3 wherein a garbage collection GC policy is employed in the storage system to reuse storage space occupied by invalid data blocks.

7. The multi-active multi-control storage system with dual RAID data protection according to claim 6, wherein the GC process combines the address mapping information in the RAID stripe and the address mapping information stored in the global metadata table to determine whether the data block on the RAID stripe is valid; if the two are consistent, the data block is valid, and if the two are not consistent, the data block is invalid.

8. The multi-active multi-control storage system with dual RAID data protection according to claim 6, wherein a FIFO garbage collection policy is adopted in the storage system, before the free space in each storage controller storage pool is exhausted, a GC process is started, the oldest written RAID stripe is sequentially checked, whether valid data is still stored therein, if yes, the valid data blocks are read again and sent to the second cache area to be packaged with a new write data block to be written into a new RAID stripe, and the recovered RAID stripe is entered into the free stripe pool to be reused for writing data.

9. The multi-active multi-control storage system with dual RAID data protection according to claim 6, wherein an adaptive garbage collection policy is adopted in the storage system, the GC collected data blocks are separated from the newly written data blocks to form respective independent RAID stripes, and digital tags corresponding to the data blocks on the RAID stripes belonging to the recovery of the number of times are attached, and the storage system determines the relative recovery frequency of the stripes according to the digital tags.