CN114840448B

CN114840448B - Method for accelerating garbage collection of 3D flash memory by utilizing inter-channel parallel

Info

Publication number: CN114840448B
Application number: CN202210524346.6A
Authority: CN
Inventors: 沈志荣; 舒继武; 龚红彬
Original assignee: Xiamen University
Current assignee: Xiamen University
Filing date: 2022-05-13
Publication date: 2024-06-04
Anticipated expiration: 2042-05-13

Abstract

A method for accelerating 3D flash memory garbage collection by utilizing inter-channel parallel relates to the technical field of solid state disk storage. Comprising the following steps: 1) And (3) garbage collection redirection scheduling: the method comprises the steps of calculating a channel service rate, calculating a redirection flow, calculating a reading performance loss caused by garbage recovery, and initially distributing the redirection flow across channels; 2) Page allocation based on hot perception: including page access hotness partitioning, channel service rate ordering, and hotness-based cross-channel page allocation. Solves the problem of serious I/O blocking caused by long garbage recovery delay on a single channel. And redirecting scheduling, so that request blocking conditions caused by garbage collection are relieved on the whole, and the overall performance of the flash memory is improved. And (3) redirecting page heat, namely distributing the redirected pages with different heat according to the read load conditions of the channels, so as to realize load balancing of the channels and maximize the channel resource utilization rate.

Description

Method for accelerating garbage collection of 3D flash memory by utilizing inter-channel parallel

Technical Field

The invention relates to the technical field of solid state disk storage, in particular to a method for accelerating 3D flash memory garbage collection by utilizing inter-channel parallel garbage collection to reduce performance loss.

Background

The 2D planar flash memory is difficult to further expand the capacity due to the process scaling technology, and the conventional method of storing more bits in flash memory cells (for example, SLC flash memory storing one bit in a single flash memory cell and MLC flash memory storing two bits) is difficult to compensate for the negative overhead ^[1][2] caused by the increase of the bit error rate, serious access interference, and the like. Accordingly, current 3D flash technology by vertically stacking multiple flash memory layers (e.g., 24-96 layers ^[3]) within a flash memory chip is receiving extensive academia and industry attention, which technology does not require scaling process support, significantly increases capacity compared to planar flash memory, and provides a new direction ^[3] for expanding flash memory capacity technology.

However, the 3D flash memory also exacerbates the garbage collection interference problem while achieving high flash memory capacity. This problem is mainly caused by the "off-site update" mechanism inside the flash memory, that is, the data sent by the upper layer update request cannot be updated directly at the physical address of the original old data, and new data needs to be written into a new free page, and the old data is set to be "invalid". Therefore, as the number of free pages decreases and the number of invalid pages increases, the flash memory triggers garbage collection, and by selecting a suitable flash memory block, the remaining valid pages in the block are relocated to other blocks containing free pages, and then the block is erased to obtain a new free block. However, the relocation operation in the garbage collection process brings additional read-write operation, which increases write amplification to reduce the service life of the flash memory, and meanwhile, the flash memory blocks cannot be erased before the data relocation is completed, so that the time required for garbage collection is prolonged. In 3D flash memory, the method of vertical stacking increases the capacity of the flash memory, and at the same time, the capacity ^[4] of the flash memory block is inevitably enlarged, which significantly increases the relocation flow (i.e. the amount of data relocated in the garbage collection process), and thus further increases the garbage collection delay.

There have been a number of studies that have been proposed to be effective in accelerating the recovery of flash memory garbage. Based on the change in relocation flow, these methods can be divided into two categories, one of which reduces relocation flow ^[2][4-9]; secondly, the redirection process ^[10-12] is accelerated without changing the relocation traffic. In terms of reducing relocation traffic, some methods propose a method of sub-block erasure, shrinking the unit of GC to sub-block size; there are also methods to reduce the number of relocated pages by scheduling data writing locations according to the data update frequency by taking the load characteristics into account; there have also been some studies to reduce the number of write operations that occur during garbage collection by caching redirected data to be written back to flash memory when the system is idle or under-cached. However, existing studies ignore the parallelism between channels, and existing methods perform garbage collection on a single channel, so that access by upper layers to the data under that channel is blocked for a long period of time. Unfortunately, how to exploit parallelism among channels to shorten garbage collection has remained largely unexplored.

The invention is based on the multi-channel parallel capability of SSD of 3D flash memory. By redirecting data to multiple channels instead of a traditional single channel to accelerate garbage collection of the flash memory, the method of the invention is based on multi-channel parallel characteristic design of SSD, and therefore, the method is effectively compatible with the traditional methods based on flash memory characteristics, load characteristics and the like.

Reference is made to:

[1]Y.Cai,S.Ghose,E.Haratsch,Y.Luo,and O.Mutlu.2017.Error Characterization,Mitigation,and Recovery in Flash-Memory-Based Solid-State Drives.Proc.IEEE 105,9(2017),1666–1704.

[2]H.Gong,Z.Shen,and J.Shu.2021.Accelerating Sub-Block Erase in 3D NAND Flash Memory.In Proc.of IEEE ICCD.

[3]Y.Luo,S.Ghose,Y.Cai,E.Haratsch,and O.Mutlu.2018.Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation.Proceedings of the ACM on Measurement and Analysis of Computing Systems 2,3(2018),1–48.

[4]T.Chen,Y.Chang,C.Ho,and S.Chen.2016.Enabling Sub-Blocks Erase Management to Boost the Performance of 3D NAND Flash Memory.In Proc.of DAC.

[5]H.Chang,C.Ho,Y.Chang,Y.Chang,and T.Kuo.2016.How to Enable Software Isolation and Boost System Performance with Sub-block Erase over 3D Flash Memory.In Proc.of CODES+ISSS.

[6]S.Chen,Y.Chen,H.Wei,and W.Shih.2017.Boosting the Performance of 3D Charge Trap NAND Flash with Asymmetric Feature Process Size Characteristic.In Proc.of DAC.

[7]J.Cui,Y.Zhang,J.Huang,W.Wu,and J.Yang.2018.ShadowGC:Cooperative Garbage Collection with Multi-Level Buffer for Performance Improvement in NAND Flash-based SSDs.In Proc.of DATE.

[8]C.Liu,J.Kotra,M.Jung,and M.Kandemir.2018.PEN:Design and Evaluation of Partial-Erase for 3D NAND-Based High Density SSDs.In Proc.of USENIX FAST.

[9]P.Yang,N.Xue,Y.Zhang,Y.Zhou,L.Sun,W.Chen,Z.Chen,W.Xia,J.Li,and K Kwon.2019.Reducing Garbage Collection Overhead in SSD Based on Workload Prediction.In Proc.of USENIX HotStorage.

[10]S.Li,W.Tong,J.Liu,B.Wu,and Y.Feng.2019.Accelerating Garbage Collection for 3D MLC Flash Memory with SLC Blocks.In Proc.of ICCAD.

[11]F.Wu,J.Zhou,S.Wang,Y.Du,C.Yang,and C.Xie.2018.FastGC:Accelerate Garbage Collection Via an Efficient Copyback-Based Data Migration in SSDs.In Proc.of DAC.

[12]S.Yan,H.Li,M.Hao,M.Tong,S.Sundararaman,A.Chien,and H.Gunawi.2017.Tiny-Tail Flash:Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs.In Proc.of USENIX FAST.

Disclosure of Invention

The invention aims to provide a method for accelerating garbage collection of a 3D flash memory by utilizing inter-channel parallel. The redirection on the traditional single channel is designed to be redirection on multiple channels by utilizing the capability of parallel processing of requests among channels, so that the execution of garbage collection is effectively accelerated, and the blocking condition of I/O is reduced. In the practical application process, the invention reasonably schedules the redirection flow among the channels by analyzing and predicting the number of data read accesses in each channel, and relieves the request blocking condition of the SSD whole by minimizing the performance loss.

The invention comprises the following steps:

1) And (3) garbage collection redirection scheduling: scheduling redirection flow on a plurality of channels, and distributing proper redirection flow according to the blocking degree of each channel, wherein the redirection flow comprises the steps of calculating the service rate of the channel, calculating the redirection flow, calculating the reading performance loss caused by garbage recovery, initially distributing the redirection flow across channels and initially distributing the redirection flow across channels;

2) Page allocation based on hot perception: and according to the read load conditions of all channels, the redirection pages with different hotness are distributed so as to realize load balancing of all channels and maximize the channel resource utilization rate, wherein the redirection pages comprise page access hotness division, channel service rate sorting and hotness-based cross-channel page distribution.

In step 1), the specific steps of the garbage collection redirection scheduling may be:

1.1, calculating the channel service rate: recording the number of read operations from a host computer, which occur in a certain time period, of each channel through a ring counter, and taking the number as a prediction of the future read request condition of the channel, which is called service rate;

1.2 calculating redirection traffic: the redirection flow is approximately proportional to the number of valid pages in the block, and the time occupied by read-write operations in the flash memory is predictable; calculating redirection flow through counting the number of effective pages in the block, and calculating garbage recovery delay under each channel;

1.3, calculating the read performance loss caused by garbage recovery: taking a single channel as an example, the read performance loss increases along with the increase of the redirection flow, and meanwhile, the greater the number of read requests to the channels is, the greater the read performance loss is caused, and in the parallel redirection of multiple channels, the performance loss is calculated by counting the sum of the read performance losses of the multiple channels;

1.4 initial distribution of redirection traffic across channels: when the redirection traffic is known, the redirection traffic is initially and randomly distributed for each channel, and the redirection traffic can be also initially distributed according to the channel service rate from the angle of reducing the subsequent scheduling time (the higher the service rate is, the less the redirection traffic is distributed);

1.5 rescheduling split weight directional flow across channels: with the read performance penalty calculation of step 1.3, the rescheduling strategy follows to minimize the performance penalty, by iteratively scheduling the redirected traffic on each channel, each iteration reduces the overall performance penalty (the overall performance penalty for each channel, which value needs to be updated after each iteration) until the overall performance penalty no longer drops or the number of iterations exceeds a certain threshold.

In step 2), the page allocation based on heat perception comprises the following steps:

2.1 page access hotness partitioning: when the page is redirected, the heat of the page is distinguished through a heat identifier by using a heat partitioning method (H.Gong,Z.Shen,and J.Shu.2021.Accelerating Sub-Block Erase in 3D NAND Flash Memory.In Proc.of IEEE ICCD), used by the prior research institute, and when the page is read into a cache (internal RAM of SSD), all the redirected pages are ordered;

2.2 channel service rate ordering: sequencing all channels according to the channel service rate calculated in the step 1.1, wherein the higher the service rate is, the more frequent the predicted read access is;

2.3 Cross-channel page allocation based on hotness: after finishing step 1), calculating the redirection flow of all channels, distributing the appointed number of redirection pages according to the redirection flow, and preferentially distributing the data with higher heat to the channels with lower service rate according to the distribution method.

Compared with the prior art, the invention has the following outstanding advantages:

1. the method has the advantages that the characteristic of multi-channel parallelism is utilized in the garbage collection process for the first time to accelerate the garbage collection of the flash memory, and the method has universality in SSDs based on the flash memory.

2. A method for distributing redirection traffic based on service rate of each channel is provided, and from the perspective of minimizing performance loss, the redirection traffic of each channel is scheduled and distributed. Has higher compatibility with the traditional method based on flash memory characteristics, load characteristics and the like.

3. The page is allocated according to the access heat of the redirection data and the service rate of each channel, so that the number of read requests accessed on each channel can be balanced, and the multi-channel parallel characteristic of SSD is fully utilized.

Drawings

FIG. 1 is a diagram illustrating an exemplary single iteration process of the present invention for scheduling redirected traffic among multiple channels.

FIG. 2 is an exemplary diagram of read load balancing channels under dispatch redirection traffic using data heat.

FIG. 3 is a block diagram of a system design according to the present invention.

FIG. 4 is a graph of results of testing the read performance of a system in accordance with the present invention.

FIG. 5 is a graph of the results of testing the write performance of the system under the present invention.

FIG. 6 is a graph showing the results of testing the garbage collection performance of the system according to the present invention. The figure also illustrates the source of the performance improvement provided by the present invention.

FIG. 7 is a graph of the sensitivity test results of the method of the present invention for varying the garbage collection threshold.

FIG. 8 is a graph of the sensitivity test results of the method of the present invention for varying channel numbers.

Detailed Description

The invention will be further illustrated by the following examples in conjunction with the accompanying drawings.

The core of the invention is to convert garbage collection on a traditional single channel into multi-channel parallel by utilizing parallelism among channels in SSD, thereby reducing single-channel load, and simultaneously, taking read access load of each channel into consideration, designing and realizing a redirection flow distribution method based on service rate and a channel read load balancing method based on data heat characteristics.

The embodiment of the invention comprises the following steps:

step 1: garbage collection redirection scheduling

1.1 Calculating the channel service rate. The number of read operations from the host that occur over a period of time for each channel is recorded by a ring counter and used as a predictor of future read request conditions for that channel, which is referred to herein as the service rate.

1.2 Calculating the redirected traffic. The redirection traffic is approximately proportional to the number of active pages in the block and the time taken for read and write operations in flash memory is predictable. And calculating redirection flow through counting the number of effective pages in the block, and calculating garbage recovery delay under each channel.

1.3 Calculating the read performance loss caused by garbage collection. Taking a single channel as an example, the read performance penalty increases as the redirection traffic increases, while the greater the number of read requests to the channel, the greater the resulting read performance penalty, which is calculated by counting the sum of the read performance penalty for multiple channels in parallel redirection of the multiple channels.

1.4 Redirection traffic is initially allocated across channels. When the redirection traffic is known, the redirection traffic is initially and randomly distributed for each channel, and the redirection traffic can be initially distributed according to the channel service rate from the angle of reducing the subsequent scheduling time (the higher the service rate, the less the distributed redirection traffic).

1.5 Rescheduling split weight directional flow across channels. With the read performance penalty calculation of step 1.3, the rescheduling strategy follows to minimize the performance penalty, by iteratively scheduling the redirected traffic on each channel, each iteration reduces the overall performance penalty (the overall performance penalty for each channel, which value needs to be updated after each iteration) until the overall performance penalty no longer drops or the number of iterations exceeds a certain threshold.

Step 2: page allocation based on hot sensing

2.1 Page access hotness partitioning. The invention uses the heat dividing method [2] used by the prior research institute, when the pages are redirected, the heat is distinguished by a heat identifier, and when the pages are read into a cache (internal RAM of SSD), all the redirected pages are ordered.

2.2 Channel service rate ordering. According to the channel service rate calculated in step 1.1, all channels are ordered, and the higher the service rate is, the more frequent the predicted read access is.

2.3 Cross-channel page allocation based on hotness. After the step 1 is completed, calculating the redirection flow of all channels, distributing the designated number of redirection pages according to the redirection flow, and preferentially distributing the data with higher heat to the channels with lower service rate according to the distribution method.

The implementation of the invention mainly comprises the following modules:

1. And a service rate calculation module: the module is used for calculating the service rate of each channel for a period of time. And recording the read request access times in the first n time nodes by using a ring structure for each channel module, and replacing the time node with the longest retention time in the ring with the read access times in the latest node time when the next time node comes. The sum of all nodes in the ring records the value, i.e. the current service rate as a channel.

2. The performance loss calculation module: the module is used for calculating the estimated read performance loss, wherein the estimated read performance loss is the number of read requests which are estimated to be blocked by garbage collection. Assuming that SSD is formed by n channels, setting { C ₁,C₂,…,C_n }, wherein the service rate of each channel is { s ₁,s₂,…,s_n }, satisfying { s _i ∈0,1 +.i +.n }, letting n-th channel (i.e. C _n) be recycling garbage, assuming that the recycling garbage has v (v+.0) pages to be redirected, the flow of parallel redirection of each channel is { v ₁,v₂,…,v_n }, satisfying { v ₁+v₂+…+v_n＝v,v_i ∈0,1 +.i +.n }, and calculating the time spent by each channel to execute redirection as t _i＝v_i ×m (m is the time occupied by executing one write operation). To avoid loss of data due to power-down, the block needs to be erased after all page migration is complete, so channel C _n must perform the erase operation after cross-channel migration is complete, so the dead time of the read request under channel C _n isThus, the read performance loss of each channel and D can be calculated

3. And a sequencing module: the modular design provides data support primarily for balancing read loads among the channels. In the redirection process, the effective pages in the blocks are sequentially read into a cache, and meanwhile, the heat of each redirection page is sequentially judged and sequenced through a heat identifier designed by a flash memory conversion layer; in addition, all channels need to be ordered according to their service rates, which only occurs when garbage collection is triggered. The module bases on the next read load balancing operation by ordering the data hotness and the channel servers. As in fig. 2, in the redirect pages (relocated pages), all pages are ordered in the order of high-to-low heat, and between channels are ordered in the order of low-to-high service rate (e.g., s ₁＝1≤s₂＝2≤s₃ = 5)

4. Redirection traffic distribution module: the module is used to actually allocate redirection traffic for each channel. The basis for the module to allocate redirected traffic is to minimize performance loss, i.e., minimize;

the formula must satisfy the condition:

t_i＝v_i×m1≤i≤n

v₁+v₂+…+v_n＝v,v_i≥0,1≤i≤n

From this equation, it can be seen that minimizing the performance penalty only requires scheduling the allocation of each channel redirect traffic (v _i) where the channel service rate (s _i) is known and the page write delay (m) is known. However, since the number of valid pages contained in a block may be large (i.e., the redirected traffic is large), a large computational overhead may be incurred if the minimized performance penalty is computed in a brute force manner. Therefore, the invention provides a simpler redirection flow rescheduling method. Firstly, the module randomly and initially divides the weight directional flow (the sum of the weight directional flow of each channel is equal to the total weight directional flow of the garbage recovery) for each channel, and calculates the performance loss of the module; then, calculating performance loss after each channel schedules redirection traffic (for example, channel C ₁ schedules redirection traffic of one page to channel C ₂), and actually executing the scheduling when the performance loss is reduced most than before the scheduling; the process is operated repeatedly until the rescheduling process is completed when scheduling is not performed any more, and a maximum iteration frequency threshold can be set to prevent excessive cost of iterative calculation. The allocation of the redirected traffic is performed in accordance with the calculation result of the module. As shown in fig. 1, the initial allocation traffic of the channels is { v ₁＝2,v₂＝1,v₃ =2 }, in the process of one iteration, the performance loss reduction amount (i.e. u _i,j, 1.ltoreq.i, j.ltoreq.n) after the traffic scheduling between each channel is calculated, and since u _3,4 =4 is the maximum performance loss reduction amount, the traffic scheduling from the channel C ₃ to the channel C ₄ is performed, where the redirection traffic allocation between each channel is { v ₁＝2,v₂＝2,v₃ =1 }. In addition, the invention also considers that the read load balance among all channels is balanced by utilizing the data heat, and the module actually executes the balancing process. By writing the data with higher heat into the channels with lower service rate, and writing the data with lower heat into the channels with higher service rate, load balancing among the channels can be balanced, and the module utilizes the data heat sorting and channel service rate sorting calculated by the sorting module. As shown in fig. 2, in relocated pages, all pages are ordered from high to low in heat, channels are ordered from low to high in service rate, assuming the module has calculated the redirected traffic for each channel as Then, when the flow is actually distributed, C ₁、C₂、C₃ is sequentially distributed from high to low (corresponding to a ₁、A₂、A₃ in the drawing, respectively).

The system structure model implemented by the invention is shown in fig. 3, and the design of the system model is implemented at two layers: flash translation layer (controller) and flash storage medium. Wherein ParaGC of the control layer is a module synthesis designed by the invention, and the operation flow of four modules can be summarized as follows: the node will recalculate the service rate of each channel (service rate calculation module) every time a period of time passes by the system, when triggering garbage collection, the service rate of each channel is sequenced, and meanwhile, the effective pages in the blocks are read into the cache and sequenced according to the heat (sequencing module), the redirected traffic (reassignment calculation module) is initially distributed for each channel, the performance loss is calculated, then the redirected traffic among channels is rescheduled in an iterative mode, so that the performance loss is minimized, and the traffic is actually distributed for each channel by combining the heat of data and the service rate of the channels.

The performance test of the present invention is given below:

the workload used by the present invention test is obtained from microsoft production server (MSR) and advanced resource library that gathers enterprise virtual desktop infrastructure I/O records. the trace file records the access order of the I/O requests and the access related parameters, including the request issue time (timestamp), the type of access request (read or write), the logical sector address of the accessed data, and the size of the accessed data. Analytical testing of experiments was done by way of currently widely used SSD simulators (SSDsim). The experiment evaluates the performance of the invention from two planes, including: comprehensively evaluating and analyzing the overall performance of the design, wherein the evaluation tests the average read-write response delay and the average garbage recycling delay of the requests from the host; and (3) sensitivity analysis, namely measuring the performance of the invention by changing key parameters for a plurality of times, and mainly testing performance changes under different garbage collection thresholds and performance conditions when the number of channels of the key parameters is changed. Performance evaluation of the experiments was performed in the same manner as the control experiments, including comparison of the baseline method and the method using the Zipf distribution (GC-Zipf). The standard mode still adopts a traditional single-channel garbage collection method, and data redirection generated by primary garbage collection only occurs on one channel; the Zipf distribution method adopts a multi-channel parallel garbage recycling method, and is different in that the Zipf distribution mode is adopted when the directional flow is distributed by the weight of the cross-channel, and the load condition of each channel is not considered.

A. Overall performance testing

The main parameters of the experimental test refer to the configuration of a 64-layer 3D flash memory, which includes eight channels, each of which is contended by two flash memory chips, and each of which is composed of 1536 blocks, each of which contains 768 pages, and the page size is 16KB. The latency of the basic operation of the flash memory is configured as a3 millisecond write latency, a 66 microsecond read latency, and a10 millisecond erase latency.

A.1 average read response delay:

FIG. 4 shows that the inventive method is able to reduce the average read latency by 41.3% and 25.3%, respectively, compared to the baseline and GC-Zipf methods. The underlying reason for this is that the baseline approach simply performs data redirection of garbage collection on a single channel, which can result in read requests from upper layers being severely blocked; while the GC-Zipf method enables data redirection to be distributed over multiple channels, the simple allocation of redirection traffic does not take into account competition between read requests and data migration, thus resulting in serious interference. The method can distribute the redirection flow for each channel according to the service rate of each channel, thereby greatly reducing the interference. Under partial trace (such as proj_4 and src1_1), the inventive method can significantly reduce read latency because these trace will exhibit a large amount of redirected traffic at playback, indicating that the inventive method can achieve better results under workloads where write amplification problems caused by garbage collection are more severe.

A.2 average write response delay:

FIG. 5 shows that the inventive method is able to reduce the average write latency by 38.8% and 24.3%, respectively, compared to the baseline and GC-Zipf methods. The root cause is that the method of the invention relieves the congestion condition of the system by redirecting the flow scheduling, so that the read-write request of the front end can fully utilize the idle I/O resources.

A3 average refuse recovery delay

Figure 6 shows that the inventive method is able to reduce the average waste recovery delay by 73.8% and 51.1%, respectively, compared to the baseline and GC-Zipf methods. The main reason is that the cross-channel parallel garbage collection enables the traditional long delay to be distributed on a plurality of channels and completed in a mode of a plurality of shorter delays, so that the time consumed by garbage collection is greatly reduced, and the blocking condition of the garbage collection to a system is lightened.

B. Sensitivity test

The garbage collection threshold has a larger influence on the garbage collection performance, and the number of channels is related to the design method of the invention. Experiments therefore evaluate the sensitivity of the design system of the invention by varying the garbage collection threshold and the number of channels and analyze the cause thereof.

B.1 garbage collection threshold impact

Experiments the effect of the garbage collection threshold on the performance of the present invention was tested by varying the garbage collection threshold from 10% to 30%. FIG. 7 shows read-write delays and garbage collection delays for the inventive method and control experiments at different garbage collection thresholds. The results show that all delays increase with increasing garbage collection threshold for the three methods, since increasing the threshold results in more page redirection, further exacerbating the interference between front-end data access and back-end page redirection in the system. At different garbage collection thresholds, the inventive method still reduces on average 29.4% read latency, 31.6% write latency, and 63.5% garbage collection latency.

B.2 channel number Effect

Experiments the effect of channel number on the proposed design method of the present invention was tested by varying the channel number from 4 to 16. Fig. 8 evaluates this effect by taking into account read-write latency, garbage collection latency.

As can be seen from fig. 8, first, as the number of channels increases, the read-write delay of all methods decreases. This is because as the number of channels increases, the controller can perform more requests and greatly reduce the time consumed in a parallel manner, thus reducing the time for request blocking. At the same time, the invention still reduces the read latency by 20.2% and the write latency by 29.4% on average, compared to the baseline method and the GC-Zipf method.

Secondly, as the number of channels increases, the garbage collection delay of the reference method remains unchanged, and both the method and the GC-Zipf method of the invention are reduced. This is because the baseline method does the garbage collection under a single channel and is therefore substantially unaffected by the number of channels; in contrast, the method and the GC-Zipf method of the invention can utilize the existing more abundant I/O resources to accelerate the parallel garbage collection, thereby shortening the delay of garbage collection. Overall, the present invention is able to reduce the average waste recovery delay by 75.8% and 52.5% compared to the baseline method and GC-Zipf method, respectively, at different channel numbers.

Claims

1. The method for accelerating the garbage collection of the 3D flash memory by utilizing the inter-channel parallel is characterized by comprising the following steps of:

1) And (3) garbage collection redirection scheduling: scheduling redirection flow on a plurality of channels, and distributing proper redirection flow according to the blocking degree of each channel, wherein the redirection flow comprises the steps of calculating channel service rate, calculating redirection flow, calculating reading performance loss caused by garbage recovery, initially distributing redirection flow across channels and distributing redirection flow across channels;

The specific steps of the garbage collection redirection scheduling are as follows:

1.1, calculating the channel service rate: recording the number of read operations from a host computer, which occur in a certain time period, of each channel through a ring counter, wherein the number of read operations is taken as a prediction of the future read request condition of the channel and is called the service rate;

1.4 initial distribution of redirection traffic across channels: when the redirection traffic is known, the redirection traffic is initially and randomly distributed for each channel, or is initially distributed according to the channel service rate from the angle of reducing the subsequent scheduling time, and the higher the service rate is, the less the redirection traffic is distributed;

1.5 rescheduling split weight directional flow across channels: the rescheduling strategy follows minimizing performance loss by the read performance loss calculation of step 1.3, and reducing the overall performance loss by iteratively scheduling the redirected traffic on each channel each iteration until the overall performance loss is no longer reduced or the number of iterations exceeds a certain threshold;

2. The method for accelerating garbage collection of 3D flash memory by inter-channel parallel according to claim 1, wherein in step 2), the page allocation based on heat perception comprises the following steps:

2.1 page access hotness partitioning: when the page is redirected, distinguishing the heat degree by a heat degree identifier, and when the page is read into a cache, sequencing all redirected pages;