CN109284233B

CN109284233B - Garbage recovery method of storage system and related device

Info

Publication number: CN109284233B
Application number: CN201811087264.XA
Authority: CN
Inventors: 何孝金
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2018-09-18
Filing date: 2018-09-18
Publication date: 2022-02-18
Anticipated expiration: 2038-09-18
Also published as: CN109284233A

Abstract

The application discloses a garbage recycling method of a storage system, which comprises the following steps: obtaining IO (input/output) characteristics and IO coverage states corresponding to the IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model; respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities; marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered; and performing garbage recycling treatment on all the large spaces to be recycled. Whether effective data in a large block space is to be changed into garbage data or not is judged through a prediction model of machine learning training, so that garbage recycling processing on the large block space is avoided, the IO performance of an IO storage system is improved, and waste of the IO performance is avoided. The application also discloses a garbage recycling system, a server and a computer readable storage medium, which have the beneficial effects.

Description

Garbage recovery method of storage system and related device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a garbage collection method for a storage system, a garbage collection system, a server, and a computer-readable storage medium.

Background

With the continuous development of information technology, data stored in the internet is increasing, and an AFA (full flash memory) array is presented in order to improve the efficiency of data storage. The AFA array is used for storage in an SSD (solid state disk) hard disk, and due to the write characteristics and erase/write times of the SSD itself, discrete data is usually rewritten after being aggregated, so as to recycle garbage in a large space, thereby efficiently utilizing the SSD hard disk.

Generally, the garbage collection method provided in the prior art is to count the total amount of garbage data in each large block space, select the large block space with the most garbage data as the large block space to be collected, and migrate valid data in the large block space to a new space to release the storage space of the large block space.

However, in the prior art, the valid data in the large block space becomes garbage data after being migrated to a new space, which not only fails to achieve the effect of the garbage recovery method, but also wastes IO (read/write operation) of the storage system that migrates the valid data, which affects the performance of the host and also affects the service life of the SSD hard disk.

Therefore, how to improve the effect of the garbage recycling technology is a key issue to be focused on by those skilled in the art.

Disclosure of Invention

The application aims to provide a garbage recovery method, a garbage recovery system, a server and a computer readable storage medium of a storage system, and whether effective data in a large block space is to be changed into garbage data or not is judged through a prediction model of machine learning training, so that garbage recovery processing on the large block space is avoided, the IO performance of the IO storage system is improved, and the waste of the IO performance is avoided.

In order to solve the above technical problem, the present application provides a garbage recycling method for a storage system, including:

obtaining IO characteristics and IO coverage states corresponding to the IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model;

respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities;

marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered;

and performing garbage recycling treatment on all the large spaces to be recycled.

Optionally, the performing prediction processing on the large block space according to the coverage probability prediction model to obtain a corresponding coverage probability includes:

performing probability prediction processing on all data blocks of the large block space according to the coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;

and adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.

selecting a data block to be predicted in the large block space according to a data block selection rule;

performing probability prediction processing on all the data blocks to be predicted in the large block space according to the coverage probability prediction model to obtain the coverage probability of a plurality of data blocks corresponding to the large block space;

Optionally, the method includes obtaining an IO characteristic and an IO coverage state corresponding to the IO characteristic, and performing machine learning training processing according to the IO characteristic and the IO coverage state to obtain a coverage probability prediction model, including:

obtaining an IO logical address in a preset time period and an IO coverage state corresponding to the IO logical address;

and performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain the coverage probability prediction model.

Optionally, performing garbage recycling treatment on all the large spaces to be recycled, including:

and recovering the effective data of the large block space to be recovered, of which the difference of the coverage probability is smaller than the preset variable quantity, into the same new large block space so as to complete garbage recovery processing.

The present application further provides a garbage recycling system of a storage system, including:

the machine learning training module is used for acquiring IO characteristics and IO coverage states corresponding to the IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model;

the coverage probability prediction module is used for respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities;

the to-be-recovered marking module is used for marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered;

and the garbage recycling module is used for recycling all the large spaces to be recycled.

Optionally, the coverage probability prediction module includes:

the data block probability prediction unit is used for carrying out probability prediction processing on all data blocks of the large block space according to the coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;

and the coverage probability adding unit is used for adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.

Optionally, the machine learning training module includes:

the IO characteristic acquisition unit is used for acquiring an IO logical address in a preset time period and an IO coverage state corresponding to the IO logical address;

and the training unit is used for performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain the coverage probability prediction model.

The present application further provides a server, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the garbage collection method as described above when executing the computer program.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the garbage collection method as described above.

The application provides a garbage recycling method of a storage system, which comprises the following steps: obtaining IO characteristics and IO coverage states corresponding to the IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model; respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities; marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered; and performing garbage recycling treatment on all the large spaces to be recycled.

Machine learning training processing is carried out through the obtained IO characteristics and the corresponding IO coverage states in the storage system, a coverage probability prediction model capable of predicting the coverage probability of a certain block of IO data is obtained, then the probability that the large block of data is written and covered can be predicted according to the coverage probability prediction model, namely the probability that the large block of data is changed into garbage data from effective data is predicted, garbage recycling is not carried out on the large block of space which is changed into garbage data at a high probability, garbage recycling is carried out on the large block of space which is changed into garbage data at a low probability, namely garbage recycling is carried out on the large block of space which is not changed into garbage data too much, so that the IO performance of the storage system is improved, invalid reading and writing caused by invalid garbage recycling are avoided, the waste of the IO performance is avoided, and the service life of an SSD is prolonged.

The application also provides a garbage recycling system, a server and a computer readable storage medium of the storage system, which have the above beneficial effects and are not described herein again.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a garbage collection method of a storage system according to an embodiment of the present application;

fig. 2 is a flowchart of a prediction processing method of a garbage collection method according to an embodiment of the present application;

FIG. 3 is a flow chart of another prediction processing method of the garbage recycling method according to the embodiment of the present application;

fig. 4 is a schematic structural diagram of a garbage collection system of a storage system according to an embodiment of the present application.

Detailed Description

The core of the application is to provide a garbage recovery method of a storage system, a garbage recovery system, a server and a computer readable storage medium, and whether effective data in a large block space is to be changed into garbage data is judged through a prediction model of machine learning training, so that garbage recovery processing on the large block space is avoided, the IO performance of the IO storage system is improved, and the waste of the IO performance is avoided.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The garbage collection method provided in the prior art is to count the total amount of garbage data in each large block space, select the large block space with the most garbage data as the large block space to be collected, and migrate the valid data in the large block space to a new space to release the storage space of the large block space. However, in the prior art, the valid data in the large block space becomes garbage data after being migrated to a new space, which not only fails to achieve the effect of the garbage recovery method, but also wastes IO (read/write operation) of the storage system that migrates the valid data, which affects the performance of the host and also affects the service life of the SSD hard disk.

Therefore, the embodiment of the present application provides a garbage collection method for a storage system, which performs machine learning training processing through the obtained IO characteristics and corresponding IO coverage states in the storage system to obtain a coverage probability prediction model capable of predicting coverage probability of a certain block of IO data, then, the probability that the large block data is overwritten by writing can be predicted according to the covering probability prediction model, namely the probability that the large block data is changed into the garbage data from the valid data, garbage collection is not carried out on the large block space with high probability of being changed into garbage data, and garbage collection is carried out on the large block space with low probability of being changed into garbage data, namely, the large block space which is not changed into garbage data is subjected to garbage collection so as to improve the IO performance of the storage system, and invalid read-write caused by invalid garbage recycling is avoided, IO performance waste is avoided, and the service life of the SSD is prolonged.

Referring to fig. 1, fig. 1 is a flowchart illustrating a garbage collection method of a storage system according to an embodiment of the present disclosure.

The method can comprise the following steps:

s101, obtaining IO coverage states corresponding to IO characteristics and IO characteristics, and performing machine learning training processing according to the IO characteristics and the IO coverage states to obtain a coverage probability prediction model;

the method mainly comprises the steps of obtaining characteristic data of machine learning, namely IO characteristics and IO coverage states corresponding to the IO characteristics in the step, and then training the machine learning according to the characteristic data to obtain a coverage probability prediction model. The probability of being written and covered on the corresponding storage address can be predicted through the covering probability prediction model and the obtained IO characteristics.

The acquired feature data may be a logical address frequently covered in each time period, the specific time period and the logical address are the acquired IO characteristics, and the coverage state of the logical address under the IO characteristics is written and covered. The feature data may also be feature data of a hot spot data area, for example, data acquired to a certain address area is often written over, and then the data may be used as feature data for training.

The algorithm used for machine learning in this embodiment may be a bayesian algorithm, a K-nearest neighbor algorithm, or any one of the algorithms for machine learning provided in the prior art. It is to be understood that the algorithm of machine learning in this step is not exclusive and is not limited herein.

Optionally, this step may include:

the method comprises the steps of firstly, obtaining an IO logical address and an IO coverage state corresponding to the IO logical address in a preset time period;

and secondly, performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain a coverage probability prediction model.

The alternative scheme is that machine learning training processing is carried out on the obtained IO logical address, the IO coverage state and the corresponding preset time period to obtain a coverage probability prediction model. By using the coverage probability prediction model of the alternative scheme, the probability that the data of the logical address is written and covered, namely the probability of being changed into garbage data, can be judged by using the time period and the IO logical address.

S102, respectively carrying out prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities;

on the basis of step S101, this step aims to perform prediction processing on each large block space according to the coverage probability prediction model to obtain a plurality of coverage probabilities. That is, the probability that valid data in each of the plurality of large block spaces is overwritten by writing, that is, the probability that valid data in each of the large block spaces becomes garbage data is calculated. According to the coverage probability, partial large block spaces can be selected in a targeted mode to carry out garbage recycling treatment instead of carrying out garbage recycling treatment on all the large block spaces in a one-view same way, so that the waste of IO performance is avoided, and the use efficiency of IO is improved.

Because the coverage probability prediction models are different, prediction processing can be performed on the large block space at different angles, for example, if the coverage probability prediction models are trained by using overall effective data in the large block space, the overall prediction processing is performed on the large block space in the step, that is, the characteristic data of the large block space is directly obtained, and the coverage probability is calculated by the coverage probability prediction models; if the coverage probability prediction model is trained by using the IO characteristics of each data block of the valid data in the large block space during training, then in this step, a plurality of data blocks in the large block space are subjected to prediction processing to obtain a plurality of corresponding data block coverage probabilities, and then the coverage probabilities of the plurality of data blocks are subjected to calculation processing to obtain the coverage probability of the large block space. Therefore, the prediction processing method for the large block space in this step is not limited to the only method, and is not particularly limited herein.

Specifically, the prediction process performed on the data block may be performed on all data blocks in the large block space, or may be performed on part of the data blocks in the large block space, and in short, a plurality of data block coverage probabilities are obtained. Then, the coverage probability of the large block space is calculated according to the coverage probabilities of the plurality of data blocks, specifically, the coverage probability of the large block space is obtained by summing the coverage probabilities of the plurality of data blocks, the coverage probability of the large block space is obtained by averaging the coverage probabilities of the plurality of data blocks, the obtained average is used as the coverage probability of the large block space, or the coverage probability of the large block space is obtained by weighted averaging the coverage probabilities of the plurality of data blocks, and the weighted average is used as the coverage probability of the large block space. Therefore, the manner of performing the prediction processing on the data block is not limited to the only one, and is not particularly limited herein.

S103, marking the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered;

on the basis of step S102, this step is intended to mark the large block space with the coverage probability less than the preset coverage probability as the large block space to be recovered, that is, regard the large block space with the coverage probability less than the predicted coverage probability as the large block space that can be used for garbage recovery. The preset coverage probability may be set according to the coverage probabilities of all the large block spaces, for example, a median value of all the coverage probabilities or a value smaller than 30%, or a received fixed value, for example, 35%, or a value that varies with the IO performance, so that the setting manner of the preset coverage probability in this step is not unique, and is not specifically limited herein.

By marking the large block space meeting the conditions as the large block space to be recovered in the step, a plurality of large block spaces to be recovered and 1 large block space to be recovered can be obtained, which are not limited and change along with the change of the actual condition.

Assuming that there are A, B, C, D large block spaces currently, prediction processing is performed on each large block space to obtain a plurality of coverage probabilities, which are 70%, 50%, 90%, and 20% in sequence. The larger the probability is, the easier the coverage is, that is, the more easily the valid data therein is covered by the new data, so that it is necessary to perform garbage collection on the large block space with the smaller coverage probability so as to reasonably utilize the valid data therein. At this time, the preset coverage probability is 60%, and therefore, the garbage recycling treatment is performed on the B, D large block spaces.

And S104, performing garbage recycling treatment on all the large spaces to be recycled.

On the basis of step S103, this step aims to perform garbage collection processing on the large space to be collected. In this step, any garbage recycling process provided by the prior art can be adopted, that is, all valid data in all the designated large block spaces are migrated to a new space for centralized storage.

Having obtained the coverage probability prediction model, S102 to S104 may be performed individually as a processing method.

In order to improve the utilization rate and utilization efficiency of the data, S104 may further include:

That is, when the valid data is migrated, the valid data with the similar coverage probability can be migrated to the same space according to the coverage probability of the valid data, so that cold data and hot data can be separated, and the data use efficiency is improved.

The difference between the coverage probabilities is smaller than the predicted variation, that is, the difference between the coverage probabilities of any two to-be-recovered large block spaces is smaller than the preset probability, for example, the difference between the two is smaller than 5%, and then the valid data of the two units can be stored in the same large block space.

It should be noted that, in this embodiment, when a write request is processed daily, machine learning may be continued according to the situation of the write request, so as to update the coverage probability prediction model, and improve the accuracy of subsequent prediction.

In summary, in this embodiment, machine learning training is performed on the obtained IO characteristics and the corresponding IO coverage states in the storage system to obtain a coverage probability prediction model capable of predicting the coverage probability of a certain block of IO data, and then the probability that a large block of data is written and covered can be predicted according to the coverage probability prediction model, that is, the probability that the large block of data is changed from valid data to junk data is predicted, garbage collection is not performed on a large block of space where a high probability is changed into junk data, and garbage collection is performed on a large block of space where a low probability is changed into junk data, that is, garbage collection is performed on a large block of space where the large block of data is not likely to be changed into junk data, so as to improve the IO performance of the storage system, avoid invalid read and write caused by invalid garbage collection, avoid waste of IO performance, and improve the operating life of the SSD disk.

In the previous embodiment, any processing method for performing prediction by using a prediction model, which is provided in the prior art, may be used for the prediction processing performed on a plurality of large block spaces according to the coverage probability prediction model. In order to improve the prediction accuracy, on the basis of the previous embodiment, the following prediction processing method is adopted.

Referring to fig. 2, fig. 2 is a flowchart illustrating a prediction processing method of a garbage collection method according to an embodiment of the present disclosure.

The method can comprise the following steps:

s201, performing probability prediction processing on all data blocks in a large block space according to a coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;

the step aims to carry out probability prediction processing on each effective data block in a large block space according to a coverage probability prediction model, and the coverage probability of all data blocks in the large block space can be correspondingly obtained by one large block space.

Specifically, in this step, the data block is predicted according to the coverage probability prediction model, the IO characteristics of the data block are obtained and matched in the coverage probability prediction model, the matching mode may be that the IO characteristics recorded in the closest model are searched according to the IO characteristics, the corresponding IO coverage state is obtained, and the coverage probability of the data block is calculated according to the proximity degree of the IO characteristics; the matching mode can also calculate the IO characteristics in the model to obtain the possibility of the IO coverage state, namely the coverage probability; the matching mode may also be that when the coverage probability prediction model is a curve model, a corresponding point is searched in the curve model according to the obtained IO characteristics to obtain a corresponding coverage probability. Therefore, the prediction processing in this step is not limited to a unique method, and is not particularly limited herein.

S202, adding the coverage probabilities of all the data blocks in the large block space to obtain the corresponding coverage probability.

On the basis of step S101, this step aims to sum the coverage probabilities of all data blocks of each large block space to obtain the coverage probability of the large block space.

It should be noted that the present embodiment is a method for calculating the coverage probability of one large block space, and when calculating the coverage probabilities of a plurality of large block spaces, the steps of the present embodiment need to be repeated many times to complete the prediction process.

Suppose now that there are 4 large block spaces, A, B, C, D respectively, with 4 data blocks in each large block space, a1, a2, A3, a4, B1, B2, B3, B4, C1, C2, C3, C4, D1, D2, D3, D4 respectively. All data blocks in each large block space are subjected to prediction processing through a probability prediction model, namely, data blocks from A1 to D4 are subjected to prediction processing, and coverage probabilities of all data blocks in A, B, C, D large block spaces, namely coverage probabilities of four data blocks from A1, A2, A3 and A4 in A, coverage probabilities of four data blocks from B1, B2, B3 and B4 in B, coverage probabilities of four data blocks from C1, C2, C3 and C4 in C and coverage probabilities of four data blocks from D1, D2, D3 and D4 in D are obtained in sequence. The coverage probability of A can be obtained by adding the coverage probabilities of the four data blocks A1, A2, A3 and A4, the coverage probability of B can be obtained by adding the coverage probabilities of the four data blocks B1, B2, B3 and B4, the coverage probability of C can be obtained by adding the coverage probabilities of the four data blocks C1, C2, C3 and C4, and the coverage probability of D can be obtained by adding the coverage probabilities of the four data blocks D1, D2, D3 and D4.

In the embodiment, the coverage probability of a large block space is obtained by calculating the coverage probability of all the data blocks, the unit for calculating the coverage probability is reduced, and the calculation accuracy of the coverage probability is improved.

In the previous embodiment, the data block coverage probability of all data blocks is used to calculate the coverage probability of the large block space. In order to increase the speed of calculating the coverage probability, on the basis of the previous embodiment, the present embodiment calculates the coverage probability of the large block space by using the data block coverage probability of the partial data block.

Referring to fig. 3, fig. 3 is a flowchart of another prediction processing method of the garbage collection method according to the embodiment of the present application.

The method can comprise the following steps:

s301, selecting a data block to be predicted in a large block space according to a data block selection rule;

the data block selection rule in this step is mainly to select a part of data blocks in the large block space as data blocks to be predicted. The number of data blocks is reduced, the time for calculating the coverage probability is reduced, and the speed of prediction processing is improved.

The data block selection rule may randomly select a preset number of data blocks from all the data blocks, may randomly select a preset proportional number of data blocks from all the data blocks, and may select data blocks at intervals of a preset unit as data blocks to be predicted. Therefore, the rule for selecting data blocks in this step is not unique, and is not specifically limited herein, as long as this step selects a part of data blocks from all data blocks as data blocks to be predicted, so as to reduce the number of data blocks for calculating the coverage probability.

S302, performing probability prediction processing on the data block to be predicted in the large block space according to a coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;

on the basis of step S301, this step aims to perform probability prediction processing on all data blocks to be predicted in a large block space, so as to obtain a plurality of data block coverage probabilities.

And S303, adding the coverage probabilities of all the data blocks in each large block space to obtain the corresponding coverage probability.

On the basis of step S302, this step is intended to add the obtained coverage probabilities of all data blocks to obtain the coverage probability corresponding to the large block space.

Suppose now that there are 4 large block spaces, A, B, C, D respectively, with 4 data blocks in each large block space, a1, a2, A3, a4, B1, B2, B3, B4, C1, C2, C3, C4, D1, D2, D3, D4 respectively. The first two data blocks of each large block space can be selected as data blocks to be predicted by selecting rules, namely a1, a2, B1, B2, C1, C2, D1 and D2.

The data blocks to be predicted in each large block space are subjected to prediction processing through a probability prediction model, namely, a1, a2, B1, B2, C1, C2, D1 and D2 are subjected to prediction processing, so that the coverage probabilities of all the data blocks to be predicted in A, B, C, D large block spaces, namely the coverage probabilities of two data blocks a1 and a2 in a, the coverage probabilities of two data blocks B1 and B2 in B, the coverage probabilities of two data blocks C1 and C2 in C and the coverage probabilities of two data blocks D1 and D2 in D, are obtained in turn.

The coverage probability of A can be obtained by adding the coverage probabilities of the two data blocks A1 and A2, the coverage probability of B can be obtained by adding the coverage probabilities of the two data blocks B1 and B2, the coverage probability of C can be obtained by adding the coverage probabilities of the two data blocks C1 and C2, and the coverage probability of D can be obtained by adding the coverage probabilities of the two data blocks D1 and D2.

In the embodiment, because only the coverage probability of part of the data blocks is calculated, the calculation number of the data blocks is reduced, and the processing speed of the prediction processing is correspondingly improved.

In the following, a garbage recycling system provided by an embodiment of the present application is introduced, and a garbage recycling system described below and a garbage recycling method described above may be referred to correspondingly.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a garbage collection system of a storage system according to an embodiment of the present disclosure.

The system may include:

the machine learning training module 100 is configured to acquire an IO characteristic and an IO coverage state corresponding to the IO characteristic, and perform machine learning training processing according to the IO characteristic and the IO coverage state to obtain a coverage probability prediction model;

a coverage probability prediction module 200, configured to perform prediction processing on each large block space according to the coverage probability prediction model to obtain multiple coverage probabilities;

the to-be-recovered marking module 300 is configured to mark the large block space with the coverage probability smaller than the preset coverage probability as a large block space to be recovered;

and the garbage recycling module 400 is used for recycling garbage from all the large spaces to be recycled.

Optionally, the coverage probability prediction module 200 may include:

the data block probability prediction unit is used for carrying out probability prediction processing on all data blocks in the large block space according to the coverage probability prediction model to obtain a plurality of data block coverage probabilities corresponding to the large block space;

Optionally, the machine learning training module 100 may include:

the IO characteristic acquisition unit is used for acquiring an IO logical address and an IO coverage state corresponding to the IO logical address in a preset time period;

and the training unit is used for performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain a coverage probability prediction model.

An embodiment of the present application further provides a server, including:

a memory for storing a computer program;

a processor for implementing the steps of the garbage collection method according to the above embodiments when executing the computer program.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the garbage collection method according to the above embodiment are implemented.

The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The garbage collection method, the garbage collection system, the server and the computer-readable storage medium of the storage system provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims

1. A garbage collection method for a storage system, comprising:

performing machine learning processing on the IO logical address and the IO coverage state corresponding to the IO logical address according to a preset time period to obtain a coverage probability prediction model;

2. The garbage collection method according to claim 1, wherein the step of performing prediction processing on the large block space according to the coverage probability prediction model to obtain a corresponding coverage probability comprises:

3. The garbage collection method according to claim 1, wherein the step of performing prediction processing on the large block space according to the coverage probability prediction model to obtain a corresponding coverage probability comprises:

4. A method according to any one of claims 1 to 3, wherein the garbage collection treatment of all the large spaces to be collected comprises:

5. A garbage collection system for a storage system, comprising:

the garbage recycling module is used for recycling garbage from all the large spaces to be recycled;

wherein the machine learning training module comprises:

6. The garbage collection system of claim 5, wherein the coverage probability prediction module comprises:

7. A server, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the garbage collection method according to any one of claims 1 to 4 when executing said computer program.

8. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the garbage collection method according to any one of claims 1 to 4.