CN109284233B - Garbage recovery method of storage system and related device - Google Patents

Garbage recovery method of storage system and related device Download PDF

Info

Publication number
CN109284233B
CN109284233B CN201811087264.XA CN201811087264A CN109284233B CN 109284233 B CN109284233 B CN 109284233B CN 201811087264 A CN201811087264 A CN 201811087264A CN 109284233 B CN109284233 B CN 109284233B
Authority
CN
China
Prior art keywords
coverage
large block
block space
probability
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811087264.XA
Other languages
Chinese (zh)
Other versions
CN109284233A (en
Inventor
何孝金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201811087264.XA priority Critical patent/CN109284233B/en
Publication of CN109284233A publication Critical patent/CN109284233A/en
Application granted granted Critical
Publication of CN109284233B publication Critical patent/CN109284233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0253Garbage collection, i.e. reclamation of unreferenced memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0616Improving the reliability of storage systems in relation to life time, e.g. increasing Mean Time Between Failures [MTBF]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7205Cleaning, compaction, garbage collection, erase control

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a garbage recycling method of a storage system, which comprises the following steps: obtaining IO (input/output) characteristics and IO coverage states corresponding to the IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model; respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities; marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered; and performing garbage recycling treatment on all the large spaces to be recycled. Whether effective data in a large block space is to be changed into garbage data or not is judged through a prediction model of machine learning training, so that garbage recycling processing on the large block space is avoided, the IO performance of an IO storage system is improved, and waste of the IO performance is avoided. The application also discloses a garbage recycling system, a server and a computer readable storage medium, which have the beneficial effects.

Description

Garbage recovery method of storage system and related device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a garbage collection method for a storage system, a garbage collection system, a server, and a computer-readable storage medium.
Background
With the continuous development of information technology, data stored in the internet is increasing, and an AFA (full flash memory) array is presented in order to improve the efficiency of data storage. The AFA array is used for storage in an SSD (solid state disk) hard disk, and due to the write characteristics and erase/write times of the SSD itself, discrete data is usually rewritten after being aggregated, so as to recycle garbage in a large space, thereby efficiently utilizing the SSD hard disk.
Generally, the garbage collection method provided in the prior art is to count the total amount of garbage data in each large block space, select the large block space with the most garbage data as the large block space to be collected, and migrate valid data in the large block space to a new space to release the storage space of the large block space.
However, in the prior art, the valid data in the large block space becomes garbage data after being migrated to a new space, which not only fails to achieve the effect of the garbage recovery method, but also wastes IO (read/write operation) of the storage system that migrates the valid data, which affects the performance of the host and also affects the service life of the SSD hard disk.
Therefore, how to improve the effect of the garbage recycling technology is a key issue to be focused on by those skilled in the art.
Disclosure of Invention
The application aims to provide a garbage recovery method, a garbage recovery system, a server and a computer readable storage medium of a storage system, and whether effective data in a large block space is to be changed into garbage data or not is judged through a prediction model of machine learning training, so that garbage recovery processing on the large block space is avoided, the IO performance of the IO storage system is improved, and the waste of the IO performance is avoided.
In order to solve the above technical problem, the present application provides a garbage recycling method for a storage system, including:
obtaining IO characteristics and IO coverage states corresponding to the IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model;
respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities;
marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered;
and performing garbage recycling treatment on all the large spaces to be recycled.
Optionally, the performing prediction processing on the large block space according to the coverage probability prediction model to obtain a corresponding coverage probability includes:
performing probability prediction processing on all data blocks of the large block space according to the coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;
and adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.
Optionally, the performing prediction processing on the large block space according to the coverage probability prediction model to obtain a corresponding coverage probability includes:
selecting a data block to be predicted in the large block space according to a data block selection rule;
performing probability prediction processing on all the data blocks to be predicted in the large block space according to the coverage probability prediction model to obtain the coverage probability of a plurality of data blocks corresponding to the large block space;
and adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.
Optionally, the method includes obtaining an IO characteristic and an IO coverage state corresponding to the IO characteristic, and performing machine learning training processing according to the IO characteristic and the IO coverage state to obtain a coverage probability prediction model, including:
obtaining an IO logical address in a preset time period and an IO coverage state corresponding to the IO logical address;
and performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain the coverage probability prediction model.
Optionally, performing garbage recycling treatment on all the large spaces to be recycled, including:
and recovering the effective data of the large block space to be recovered, of which the difference of the coverage probability is smaller than the preset variable quantity, into the same new large block space so as to complete garbage recovery processing.
The present application further provides a garbage recycling system of a storage system, including:
the machine learning training module is used for acquiring IO characteristics and IO coverage states corresponding to the IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model;
the coverage probability prediction module is used for respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities;
the to-be-recovered marking module is used for marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered;
and the garbage recycling module is used for recycling all the large spaces to be recycled.
Optionally, the coverage probability prediction module includes:
the data block probability prediction unit is used for carrying out probability prediction processing on all data blocks of the large block space according to the coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;
and the coverage probability adding unit is used for adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.
Optionally, the machine learning training module includes:
the IO characteristic acquisition unit is used for acquiring an IO logical address in a preset time period and an IO coverage state corresponding to the IO logical address;
and the training unit is used for performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain the coverage probability prediction model.
The present application further provides a server, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the garbage collection method as described above when executing the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the garbage collection method as described above.
The application provides a garbage recycling method of a storage system, which comprises the following steps: obtaining IO characteristics and IO coverage states corresponding to the IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model; respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities; marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered; and performing garbage recycling treatment on all the large spaces to be recycled.
Machine learning training processing is carried out through the obtained IO characteristics and the corresponding IO coverage states in the storage system, a coverage probability prediction model capable of predicting the coverage probability of a certain block of IO data is obtained, then the probability that the large block of data is written and covered can be predicted according to the coverage probability prediction model, namely the probability that the large block of data is changed into garbage data from effective data is predicted, garbage recycling is not carried out on the large block of space which is changed into garbage data at a high probability, garbage recycling is carried out on the large block of space which is changed into garbage data at a low probability, namely garbage recycling is carried out on the large block of space which is not changed into garbage data too much, so that the IO performance of the storage system is improved, invalid reading and writing caused by invalid garbage recycling are avoided, the waste of the IO performance is avoided, and the service life of an SSD is prolonged.
The application also provides a garbage recycling system, a server and a computer readable storage medium of the storage system, which have the above beneficial effects and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a garbage collection method of a storage system according to an embodiment of the present application;
fig. 2 is a flowchart of a prediction processing method of a garbage collection method according to an embodiment of the present application;
FIG. 3 is a flow chart of another prediction processing method of the garbage recycling method according to the embodiment of the present application;
fig. 4 is a schematic structural diagram of a garbage collection system of a storage system according to an embodiment of the present application.
Detailed Description
The core of the application is to provide a garbage recovery method of a storage system, a garbage recovery system, a server and a computer readable storage medium, and whether effective data in a large block space is to be changed into garbage data is judged through a prediction model of machine learning training, so that garbage recovery processing on the large block space is avoided, the IO performance of the IO storage system is improved, and the waste of the IO performance is avoided.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The garbage collection method provided in the prior art is to count the total amount of garbage data in each large block space, select the large block space with the most garbage data as the large block space to be collected, and migrate the valid data in the large block space to a new space to release the storage space of the large block space. However, in the prior art, the valid data in the large block space becomes garbage data after being migrated to a new space, which not only fails to achieve the effect of the garbage recovery method, but also wastes IO (read/write operation) of the storage system that migrates the valid data, which affects the performance of the host and also affects the service life of the SSD hard disk.
Therefore, the embodiment of the present application provides a garbage collection method for a storage system, which performs machine learning training processing through the obtained IO characteristics and corresponding IO coverage states in the storage system to obtain a coverage probability prediction model capable of predicting coverage probability of a certain block of IO data, then, the probability that the large block data is overwritten by writing can be predicted according to the covering probability prediction model, namely the probability that the large block data is changed into the garbage data from the valid data, garbage collection is not carried out on the large block space with high probability of being changed into garbage data, and garbage collection is carried out on the large block space with low probability of being changed into garbage data, namely, the large block space which is not changed into garbage data is subjected to garbage collection so as to improve the IO performance of the storage system, and invalid read-write caused by invalid garbage recycling is avoided, IO performance waste is avoided, and the service life of the SSD is prolonged.
Referring to fig. 1, fig. 1 is a flowchart illustrating a garbage collection method of a storage system according to an embodiment of the present disclosure.
The method can comprise the following steps:
s101, obtaining IO coverage states corresponding to IO characteristics and IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model;
the method mainly comprises the steps of obtaining characteristic data of machine learning, namely IO characteristics and IO coverage states corresponding to the IO characteristics in the step, and then training the machine learning according to the characteristic data to obtain a coverage probability prediction model. The probability of being written and covered on the corresponding storage address can be predicted through the covering probability prediction model and the obtained IO characteristics.
The acquired feature data may be a logical address frequently covered in each time period, the specific time period and the logical address are the acquired IO characteristics, and the coverage state of the logical address under the IO characteristics is written and covered. The feature data may also be feature data of a hot spot data area, for example, data acquired to a certain address area is often written over, and then the data may be used as feature data for training.
The algorithm used for machine learning in this embodiment may be a bayesian algorithm, a K-nearest neighbor algorithm, or any one of the algorithms for machine learning provided in the prior art. It is to be understood that the algorithm of machine learning in this step is not exclusive and is not limited herein.
Optionally, this step may include:
the method comprises the steps of firstly, obtaining an IO logical address and an IO coverage state corresponding to the IO logical address in a preset time period;
and secondly, performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain a coverage probability prediction model.
The alternative scheme is that machine learning training processing is carried out on the obtained IO logical address, the IO coverage state and the corresponding preset time period to obtain a coverage probability prediction model. By using the coverage probability prediction model of the alternative scheme, the probability that the data of the logical address is written and covered, namely the probability of being changed into garbage data, can be judged by using the time period and the IO logical address.
S102, respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities;
on the basis of step S101, this step aims to perform prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities. That is, the probability that valid data in each of the plurality of large block spaces is overwritten by writing, that is, the probability that valid data in each of the large block spaces becomes garbage data is calculated. According to the coverage probability, partial large block spaces can be selected in a targeted mode to carry out garbage recycling treatment instead of carrying out garbage recycling treatment on all the large block spaces in a one-view same way, so that the waste of IO performance is avoided, and the use efficiency of IO is improved.
Because the coverage probability prediction models are different, prediction processing can be performed on the large block space at different angles, for example, if the coverage probability prediction models are trained by using overall effective data in the large block space, the overall prediction processing is performed on the large block space in the step, that is, the characteristic data of the large block space is directly obtained, and the coverage probability is calculated by the coverage probability prediction models; if the coverage probability prediction model is trained by using the IO characteristics of each data block of the valid data in the large block space during training, then in this step, a plurality of data blocks in the large block space are subjected to prediction processing to obtain a plurality of corresponding data block coverage probabilities, and then the coverage probabilities of the plurality of data blocks are subjected to calculation processing to obtain the coverage probability of the large block space. Therefore, the prediction processing method for the large block space in this step is not limited to the only method, and is not particularly limited herein.
Specifically, the prediction process performed on the data block may be performed on all data blocks in the large block space, or may be performed on part of the data blocks in the large block space, and in short, a plurality of data block coverage probabilities are obtained. Then, the coverage probability of the large block space is calculated according to the coverage probabilities of the plurality of data blocks, specifically, the coverage probability of the large block space is obtained by summing the coverage probabilities of the plurality of data blocks, the coverage probability of the large block space is obtained by averaging the coverage probabilities of the plurality of data blocks, the obtained average is used as the coverage probability of the large block space, or the coverage probability of the large block space is obtained by weighted averaging the coverage probabilities of the plurality of data blocks, and the weighted average is used as the coverage probability of the large block space. Therefore, the manner of performing the prediction processing on the data block is not limited to the only one, and is not particularly limited herein.
S103, marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered;
on the basis of step S102, this step is intended to mark the large block space with the coverage probability less than the preset coverage probability as the large block space to be recovered, that is, regard the large block space with the coverage probability less than the predicted coverage probability as the large block space that can be used for garbage recovery. The preset coverage probability may be set according to the coverage probabilities of all the large block spaces, for example, a median value of all the coverage probabilities or a value smaller than 30%, or a received fixed value, for example, 35%, or a value that varies with the IO performance, so that the setting manner of the preset coverage probability in this step is not unique, and is not specifically limited herein.
By marking the large block space meeting the conditions as the large block space to be recovered in the step, a plurality of large block spaces to be recovered and 1 large block space to be recovered can be obtained, which are not limited and change along with the change of the actual condition.
Assuming that there are A, B, C, D large block spaces currently, prediction processing is performed on each large block space to obtain a plurality of coverage probabilities, which are 70%, 50%, 90%, and 20% in sequence. The larger the probability is, the easier the coverage is, that is, the more easily the valid data therein is covered by the new data, so that it is necessary to perform garbage collection on the large block space with the smaller coverage probability so as to reasonably utilize the valid data therein. At this time, the preset coverage probability is 60%, and therefore, the garbage recycling treatment is performed on the B, D large block spaces.
And S104, performing garbage recycling treatment on all the large spaces to be recycled.
On the basis of step S103, this step aims to perform garbage collection processing on the large space to be collected. In this step, any garbage recycling process provided by the prior art can be adopted, that is, all valid data in all the designated large block spaces are migrated to a new space for centralized storage.
Having obtained the coverage probability prediction model, S102 to S104 may be performed individually as a processing method.
In order to improve the utilization rate and utilization efficiency of the data, S104 may further include:
and recovering the effective data of the large block space to be recovered, of which the difference of the coverage probability is smaller than the preset variable quantity, into the same new large block space so as to complete garbage recovery processing.
That is, when the valid data is migrated, the valid data with the similar coverage probability can be migrated to the same space according to the coverage probability of the valid data, so that cold data and hot data can be separated, and the data use efficiency is improved.
The difference between the coverage probabilities is smaller than the predicted variation, that is, the difference between the coverage probabilities of any two to-be-recovered large block spaces is smaller than the preset probability, for example, the difference between the two is smaller than 5%, and then the valid data of the two units can be stored in the same large block space.
It should be noted that, in this embodiment, when a write request is processed daily, machine learning may be continued according to the situation of the write request, so as to update the coverage probability prediction model, and improve the accuracy of subsequent prediction.
In summary, in this embodiment, machine learning training is performed on the obtained IO characteristics and the corresponding IO coverage states in the storage system to obtain a coverage probability prediction model capable of predicting the coverage probability of a certain block of IO data, and then the probability that a large block of data is written and covered can be predicted according to the coverage probability prediction model, that is, the probability that the large block of data is changed from valid data to junk data is predicted, garbage collection is not performed on a large block of space where a high probability is changed into junk data, and garbage collection is performed on a large block of space where a low probability is changed into junk data, that is, garbage collection is performed on a large block of space where the large block of data is not likely to be changed into junk data, so as to improve the IO performance of the storage system, avoid invalid read and write caused by invalid garbage collection, avoid waste of IO performance, and improve the operating life of the SSD disk.
In the previous embodiment, any processing method for performing prediction by using a prediction model, which is provided in the prior art, may be used for the prediction processing performed on a plurality of large block spaces according to the coverage probability prediction model. In order to improve the prediction accuracy, on the basis of the previous embodiment, the following prediction processing method is adopted.
Referring to fig. 2, fig. 2 is a flowchart illustrating a prediction processing method of a garbage collection method according to an embodiment of the present disclosure.
The method can comprise the following steps:
s201, performing probability prediction processing on all data blocks in a large block space according to a coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;
the step aims to carry out probability prediction processing on each effective data block in a large block space according to a coverage probability prediction model, and the coverage probability of all data blocks in the large block space can be correspondingly obtained by one large block space.
Specifically, in this step, the data block is predicted according to the coverage probability prediction model, the IO characteristics of the data block are obtained and matched in the coverage probability prediction model, the matching mode may be that the IO characteristics recorded in the closest model are searched according to the IO characteristics, the corresponding IO coverage state is obtained, and the coverage probability of the data block is calculated according to the proximity degree of the IO characteristics; the matching mode can also calculate the IO characteristics in the model to obtain the possibility of the IO coverage state, namely the coverage probability; the matching mode may also be that when the coverage probability prediction model is a curve model, a corresponding point is searched in the curve model according to the obtained IO characteristics to obtain a corresponding coverage probability. Therefore, the prediction processing in this step is not limited to a unique method, and is not particularly limited herein.
S202, adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.
On the basis of step S101, this step aims to sum the coverage probabilities of all data blocks of each large block space to obtain the coverage probability of the large block space.
It should be noted that the present embodiment is a method for calculating the coverage probability of one large block space, and when calculating the coverage probabilities of a plurality of large block spaces, the steps of the present embodiment need to be repeated many times to complete the prediction process.
Suppose now that there are 4 large block spaces, A, B, C, D respectively, with 4 data blocks in each large block space, a1, a2, A3, a4, B1, B2, B3, B4, C1, C2, C3, C4, D1, D2, D3, D4 respectively. All data blocks in each large block space are subjected to prediction processing through a probability prediction model, namely, data blocks from A1 to D4 are subjected to prediction processing, and coverage probabilities of all data blocks in A, B, C, D large block spaces, namely coverage probabilities of four data blocks from A1, A2, A3 and A4 in A, coverage probabilities of four data blocks from B1, B2, B3 and B4 in B, coverage probabilities of four data blocks from C1, C2, C3 and C4 in C and coverage probabilities of four data blocks from D1, D2, D3 and D4 in D are obtained in sequence. The coverage probability of A can be obtained by adding the coverage probabilities of the four data blocks A1, A2, A3 and A4, the coverage probability of B can be obtained by adding the coverage probabilities of the four data blocks B1, B2, B3 and B4, the coverage probability of C can be obtained by adding the coverage probabilities of the four data blocks C1, C2, C3 and C4, and the coverage probability of D can be obtained by adding the coverage probabilities of the four data blocks D1, D2, D3 and D4.
In the embodiment, the coverage probability of a large block space is obtained by calculating the coverage probability of all the data blocks, the unit for calculating the coverage probability is reduced, and the calculation accuracy of the coverage probability is improved.
In the previous embodiment, the data block coverage probability of all data blocks is used to calculate the coverage probability of the large block space. In order to increase the speed of calculating the coverage probability, on the basis of the previous embodiment, the present embodiment calculates the coverage probability of the large block space by using the data block coverage probability of the partial data block.
Referring to fig. 3, fig. 3 is a flowchart of another prediction processing method of the garbage collection method according to the embodiment of the present application.
The method can comprise the following steps:
s301, selecting a data block to be predicted in a large block space according to a data block selection rule;
the data block selection rule in this step is mainly to select a part of data blocks in the large block space as data blocks to be predicted. The number of data blocks is reduced, the time for calculating the coverage probability is reduced, and the speed of prediction processing is improved.
The data block selection rule may randomly select a preset number of data blocks from all the data blocks, may randomly select a preset proportional number of data blocks from all the data blocks, and may select data blocks at intervals of a preset unit as data blocks to be predicted. Therefore, the rule for selecting data blocks in this step is not unique, and is not specifically limited herein, as long as this step selects a part of data blocks from all data blocks as data blocks to be predicted, so as to reduce the number of data blocks for calculating the coverage probability.
Specifically, in this step, the data block is predicted according to the coverage probability prediction model, the IO characteristics of the data block are obtained and matched in the coverage probability prediction model, the matching mode may be that the IO characteristics recorded in the closest model are searched according to the IO characteristics, the corresponding IO coverage state is obtained, and the coverage probability of the data block is calculated according to the proximity degree of the IO characteristics; the matching mode can also calculate the IO characteristics in the model to obtain the possibility of the IO coverage state, namely the coverage probability; the matching mode may also be that when the coverage probability prediction model is a curve model, a corresponding point is searched in the curve model according to the obtained IO characteristics to obtain a corresponding coverage probability. Therefore, the prediction processing in this step is not limited to a unique method, and is not particularly limited herein.
S302, performing probability prediction processing on the data block to be predicted in the large block space according to a coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;
on the basis of step S301, this step aims to perform probability prediction processing on all data blocks to be predicted in a large block space, so as to obtain a plurality of data block coverage probabilities.
And S303, adding the coverage probabilities of all the data blocks in each large block space to obtain the corresponding coverage probability.
On the basis of step S302, this step is intended to add the obtained coverage probabilities of all data blocks to obtain the coverage probability corresponding to the large block space.
It should be noted that the present embodiment is a method for calculating the coverage probability of one large block space, and when calculating the coverage probabilities of a plurality of large block spaces, the steps of the present embodiment need to be repeated many times to complete the prediction process.
Suppose now that there are 4 large block spaces, A, B, C, D respectively, with 4 data blocks in each large block space, a1, a2, A3, a4, B1, B2, B3, B4, C1, C2, C3, C4, D1, D2, D3, D4 respectively. The first two data blocks of each large block space can be selected as data blocks to be predicted by selecting rules, namely a1, a2, B1, B2, C1, C2, D1 and D2.
The data blocks to be predicted in each large block space are subjected to prediction processing through a probability prediction model, namely, a1, a2, B1, B2, C1, C2, D1 and D2 are subjected to prediction processing, so that the coverage probabilities of all the data blocks to be predicted in A, B, C, D large block spaces, namely the coverage probabilities of two data blocks a1 and a2 in a, the coverage probabilities of two data blocks B1 and B2 in B, the coverage probabilities of two data blocks C1 and C2 in C and the coverage probabilities of two data blocks D1 and D2 in D, are obtained in turn.
The coverage probability of A can be obtained by adding the coverage probabilities of the two data blocks A1 and A2, the coverage probability of B can be obtained by adding the coverage probabilities of the two data blocks B1 and B2, the coverage probability of C can be obtained by adding the coverage probabilities of the two data blocks C1 and C2, and the coverage probability of D can be obtained by adding the coverage probabilities of the two data blocks D1 and D2.
In the embodiment, because only the coverage probability of part of the data blocks is calculated, the calculation number of the data blocks is reduced, and the processing speed of the prediction processing is correspondingly improved.
In the following, a garbage recycling system provided by an embodiment of the present application is introduced, and a garbage recycling system described below and a garbage recycling method described above may be referred to correspondingly.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a garbage collection system of a storage system according to an embodiment of the present disclosure.
The system may include:
the machine learning training module 100 is configured to acquire an IO characteristic and an IO coverage state corresponding to the IO characteristic, and perform machine learning training processing according to the IO characteristic and the IO coverage state to obtain a coverage probability prediction model;
a coverage probability prediction module 200, configured to perform prediction processing on each large block space according to the coverage probability prediction model to obtain multiple coverage probabilities;
the to-be-recovered marking module 300 is configured to mark the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered;
and the garbage recycling module 400 is used for recycling garbage from all the large spaces to be recycled.
Optionally, the coverage probability prediction module 200 may include:
the data block probability prediction unit is used for carrying out probability prediction processing on all data blocks in the large block space according to the coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;
and the coverage probability adding unit is used for adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.
Optionally, the machine learning training module 100 may include:
the IO characteristic acquisition unit is used for acquiring an IO logical address and an IO coverage state corresponding to the IO logical address in a preset time period;
and the training unit is used for performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain a coverage probability prediction model.
An embodiment of the present application further provides a server, including:
a memory for storing a computer program;
a processor for implementing the steps of the garbage collection method according to the above embodiments when executing the computer program.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the garbage collection method according to the above embodiment are implemented.
The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The garbage collection method, the garbage collection system, the server and the computer-readable storage medium of the storage system provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims (8)

1. A garbage collection method for a storage system, comprising:
obtaining an IO logical address in a preset time period and an IO coverage state corresponding to the IO logical address;
performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain a coverage probability prediction model;
respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities;
marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered;
and performing garbage recycling treatment on all the large spaces to be recycled.
2. The garbage collection method according to claim 1, wherein the step of performing prediction processing on the large block space according to the coverage probability prediction model to obtain a corresponding coverage probability comprises:
performing probability prediction processing on all data blocks of the large block space according to the coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;
and adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.
3. The garbage collection method according to claim 1, wherein the step of performing prediction processing on the large block space according to the coverage probability prediction model to obtain a corresponding coverage probability comprises:
selecting a data block to be predicted in the large block space according to a data block selection rule;
performing probability prediction processing on all the data blocks to be predicted in the large block space according to the coverage probability prediction model to obtain the coverage probability of a plurality of data blocks corresponding to the large block space;
and adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.
4. A method according to any one of claims 1 to 3, wherein the garbage collection treatment of all the large spaces to be collected comprises:
and recovering the effective data of the large block space to be recovered, of which the difference of the coverage probability is smaller than the preset variable quantity, into the same new large block space so as to complete garbage recovery processing.
5. A garbage collection system for a storage system, comprising:
the machine learning training module is used for acquiring IO characteristics and IO coverage states corresponding to the IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model;
the coverage probability prediction module is used for respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities;
the to-be-recovered marking module is used for marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered;
the garbage recycling module is used for recycling garbage from all the large spaces to be recycled;
wherein the machine learning training module comprises:
the IO characteristic acquisition unit is used for acquiring an IO logical address in a preset time period and an IO coverage state corresponding to the IO logical address;
and the training unit is used for performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain the coverage probability prediction model.
6. The garbage collection system of claim 5, wherein the coverage probability prediction module comprises:
the data block probability prediction unit is used for carrying out probability prediction processing on all data blocks of the large block space according to the coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;
and the coverage probability adding unit is used for adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.
7. A server, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the garbage collection method according to any one of claims 1 to 4 when executing said computer program.
8. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the garbage collection method according to any one of claims 1 to 4.
CN201811087264.XA 2018-09-18 2018-09-18 Garbage recovery method of storage system and related device Active CN109284233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811087264.XA CN109284233B (en) 2018-09-18 2018-09-18 Garbage recovery method of storage system and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811087264.XA CN109284233B (en) 2018-09-18 2018-09-18 Garbage recovery method of storage system and related device

Publications (2)

Publication Number Publication Date
CN109284233A CN109284233A (en) 2019-01-29
CN109284233B true CN109284233B (en) 2022-02-18

Family

ID=65181006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811087264.XA Active CN109284233B (en) 2018-09-18 2018-09-18 Garbage recovery method of storage system and related device

Country Status (1)

Country Link
CN (1) CN109284233B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913649B (en) * 2019-05-09 2022-05-06 深圳大普微电子科技有限公司 Data processing method and device for solid state disk
KR20210017401A (en) * 2019-08-08 2021-02-17 에스케이하이닉스 주식회사 Data Storage Apparatus,Operating Method Thereof and Controller Therefor
CN111158598B (en) * 2019-12-29 2022-03-22 北京浪潮数据技术有限公司 Garbage recycling method, device, equipment and medium for full-flash disk array
CN113971137A (en) * 2020-07-22 2022-01-25 华为技术有限公司 Garbage recovery method and device
CN112860593A (en) * 2021-02-09 2021-05-28 山东英信计算机技术有限公司 GC performance prediction method, system, medium and equipment of storage system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412826A (en) * 2013-07-18 2013-11-27 记忆科技(深圳)有限公司 Garbage collection method and system of solid state disk
CN103577338A (en) * 2013-11-14 2014-02-12 华为技术有限公司 Junk data recycling method and storage device
CN104216665A (en) * 2014-09-01 2014-12-17 上海新储集成电路有限公司 Storage management method of multi-layer unit solid state disk
US9141457B1 (en) * 2013-09-25 2015-09-22 Emc Corporation System and method for predicting multiple-disk failures
CN105204783A (en) * 2015-10-13 2015-12-30 华中科技大学 Solid-state disk garbage recycling method based on data life cycle
CN106874213A (en) * 2017-01-12 2017-06-20 杭州电子科技大学 A kind of solid state hard disc dsc data recognition methods for merging various machine learning algorithms
CN107102954A (en) * 2017-04-27 2017-08-29 华中科技大学 A kind of solid-state storage grading management method and system based on failure probability
CN107479825A (en) * 2017-06-30 2017-12-15 华为技术有限公司 A kind of storage system, solid state hard disc and date storage method
CN108241471A (en) * 2017-11-29 2018-07-03 深圳忆联信息***有限公司 A kind of method for promoting solid state disk performance

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8527544B1 (en) * 2011-08-11 2013-09-03 Pure Storage Inc. Garbage collection in a storage system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412826A (en) * 2013-07-18 2013-11-27 记忆科技(深圳)有限公司 Garbage collection method and system of solid state disk
US9141457B1 (en) * 2013-09-25 2015-09-22 Emc Corporation System and method for predicting multiple-disk failures
CN103577338A (en) * 2013-11-14 2014-02-12 华为技术有限公司 Junk data recycling method and storage device
CN104216665A (en) * 2014-09-01 2014-12-17 上海新储集成电路有限公司 Storage management method of multi-layer unit solid state disk
CN105204783A (en) * 2015-10-13 2015-12-30 华中科技大学 Solid-state disk garbage recycling method based on data life cycle
CN106874213A (en) * 2017-01-12 2017-06-20 杭州电子科技大学 A kind of solid state hard disc dsc data recognition methods for merging various machine learning algorithms
CN107102954A (en) * 2017-04-27 2017-08-29 华中科技大学 A kind of solid-state storage grading management method and system based on failure probability
CN107479825A (en) * 2017-06-30 2017-12-15 华为技术有限公司 A kind of storage system, solid state hard disc and date storage method
CN108241471A (en) * 2017-11-29 2018-07-03 深圳忆联信息***有限公司 A kind of method for promoting solid state disk performance

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于页面写相关的闪存转换层策略;陈金忠;《通信学报》;20130630;第34卷(第6期);第76-84页 *

Also Published As

Publication number Publication date
CN109284233A (en) 2019-01-29

Similar Documents

Publication Publication Date Title
CN109284233B (en) Garbage recovery method of storage system and related device
CN111045956B (en) Solid state disk garbage recycling method and device based on multi-core CPU
CN1645516B (en) Data recovery apparatus and method used for flash memory
CN110673789B (en) Metadata storage management method, device, equipment and storage medium of solid state disk
CN108776614B (en) Recovery method and device of storage block
CN112328169B (en) Wear leveling method and device for solid state disk and computer readable storage medium
CN110134215B (en) Data processing method and device, electronic equipment and readable storage medium
US20220327018A1 (en) Behavior-driven die management on solid-state drives
CN115756312A (en) Data access system, data access method, and storage medium
CN111813347B (en) Garbage recycling space management method and device and computer readable storage medium
CN104156321A (en) Data pre-fetching method and device
CN114968839A (en) Hard disk garbage recycling method, device and equipment and computer readable storage medium
US9785374B2 (en) Storage device management in computing systems
CN104021226A (en) Method and device for updating prefetch rule
CN111221468A (en) Storage block data deleting method and device, electronic equipment and cloud storage system
CN110647294B (en) Storage block recovery method and device, storage medium and electronic equipment
Tsai et al. Learning-assisted write latency optimization for mobile storage
CN107346288B (en) Data writing method and device
CN105095197A (en) Method and device for processing data
JP6429197B2 (en) Logical physical address conversion table control method and memory device
CN114518849B (en) Data storage method and device and electronic equipment
CN117806837B (en) Method, device, storage medium and system for managing hard disk tasks
CN116795298B (en) IO optimization method and system for NVME memory under Linux
US11385798B1 (en) Method and system for application aware, management of write operations on non-volatile storage
CN113835638B (en) Method, device and equipment for determining garbage recycling destination block in storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant