CN117873388A

CN117873388A - Data writing method, device, computer equipment and storage medium

Info

Publication number: CN117873388A
Application number: CN202311842087.2A
Authority: CN
Inventors: 李家骥; 唐轩; 陈建春; 季旻; 郭照斌
Original assignee: Tianjin Zhongke Shuguang Storage Technology Co ltd
Current assignee: Tianjin Zhongke Shuguang Storage Technology Co ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-04-12

Abstract

The present application relates to a data writing method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: acquiring a data stream to be written corresponding to a storage system, wherein the storage system comprises a plurality of logic block groups; identifying a lifecycle of each data object contained in the data stream to be written; dividing each data object with the life cycle meeting the preset similar condition into the same group to obtain a plurality of data groups, and writing the plurality of data objects contained in each data group into the same logic block group. By adopting the method, the life cycle of each data object to be written in the data stream is identified and predicted based on the target life cycle identification model, each data object can be grouped based on the life cycle of each data object, the effects that the data in the same logic block group meets similar conditions and fails in synchronization are realized, the service performance of the solid state disk is improved, the moving amount of the data in the data moving process can be reduced, and the service life of the solid state disk is prolonged.

Description

Data writing method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of storage technologies, and in particular, to a data writing method, apparatus, computer device, storage medium, and computer program product.

Background

With the continuous development of storage technology, a full flash memory array technology is presented, where a full flash memory array is a storage system including an SSD (Solid State Disk or Solid State Drive) and a system controller, and since the SSD needs to erase old data before writing new data, the full flash memory system adopts a redirect write mode to allocate logical addresses from the SSD in units of logical block groups to write data. When the remaining write storage space in the SSD is insufficient, space needs to be freed by system level garbage collection.

In the related art, in a data writing scenario, the data is generally written into a logical block group in sequence directly according to the arrangement sequence of each data object included in a data stream, and when data migration is performed again, all effective data in the logical block group to be recovered needs to be migrated to a new logical block group, so that a large amount of effective data can be moved in an SSD, and the write amplification frequency of a disk is increased, resulting in poor performance of a storage system.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data writing method, apparatus, computer device, computer readable storage medium, and computer program product that can improve the performance of a storage system.

In a first aspect, the present application provides a data writing method. The method comprises the following steps:

obtaining a data stream to be written corresponding to a storage system, wherein the storage system comprises a plurality of logic block groups;

identifying the life cycle of each data object contained in the data stream to be written;

dividing each data object with the life cycle meeting the preset similar condition into the same group to obtain a plurality of data groups, and writing the plurality of data objects contained in each data group into the same logic block group.

In this embodiment, the life cycle of each data object to be written in the data stream is identified and predicted by the target life cycle identification model, so that the effect of writing the data objects with different life cycles into different logic block groups and data synchronization failure in the same logic block group can be achieved, and the service performance of the solid state disk is further improved.

In one embodiment, the identifying the life cycle of each data object included in the data stream to be written includes:

And identifying the life cycle of the data object through the target life cycle identification model and the address identification information of the data object.

In this embodiment, the query on the life cycle of each data object is implemented through the pre-generated life cycle identification model, so that the query efficiency and accuracy of the life cycle of the data object are improved.

In one embodiment, the identifying the life cycle of the data object by the target life cycle identification model and the address identification information of the data object includes:

inquiring in a target life cycle identification model based on the address identification information of the data object;

if the target life cycle corresponding to the address identification information of the data object is queried, determining the target life cycle as the life cycle of the data object;

and if the target life cycle corresponding to the address identification information of the data object is not queried, determining that the default life cycle is the life cycle of the data object.

In this embodiment, the life cycle of each data is recorded through the history data, so as to provide a stable data base for the subsequent identification of the life cycle of the newly written data object.

In one embodiment, the method further comprises:

acquiring the life cycle of each effective data object contained in a target logic block group under the condition that the target logic block group meets the preset data recovery condition;

if the current time meets a preset time delay condition, dividing each effective data into a plurality of target data packets based on the life cycle of each effective data object;

and based on each target data packet, performing data migration on each valid data object, and performing release processing on the target logic block group.

In this embodiment, the data objects may be grouped based on the life cycle of each data object and a preset similar condition, so that the failure time of the data objects contained in the same logical block group is relatively close, the moving amount of data in the data migration process may be reduced, the effective data is prevented from being moved in the solid state disk, the write amplification is reduced, the migration efficiency of the data is improved, and the service life of the solid state disk is increased.

In one embodiment, the dividing each of the valid data into a plurality of target data packets based on a life cycle of each of the valid data objects includes:

And determining a life cycle grade of each effective data object based on the life cycle of each effective data object, and dividing each effective data into a plurality of target data packets based on the life cycle grade of each effective data object.

In this embodiment, each effective data object is divided into a plurality of data packets based on the life cycle level of each effective data object, so that the efficiency of data packet division can be improved, and the data recovery efficiency in the data recovery process can be improved.

In one embodiment, the method further comprises:

acquiring the total data capacity of the target logical block group and the invalid data capacity of the invalid data object in the target logical block group;

and if the ratio of the invalid data capacity to the total data capacity is greater than a preset recovery threshold, determining that the target logic block group meets a preset data recovery condition.

In this embodiment, by determining whether to perform data reclamation by the invalid data capacity of the logical block group, the total data capacity of the logical block group, and the preset reclamation threshold, the triggering condition of data reclamation can be quantified, triggering of garbage reclamation under a higher garbage proportion is ensured, and garbage reclamation efficiency of the storage system can be improved.

In one embodiment, the method further comprises:

acquiring the first time when the target logic block group meets the preset data recovery condition;

and under the condition that the time difference value between the first time and the current time is larger than or equal to the preset delay time length, determining that the current time meets the preset time delay condition.

In this embodiment, by configuring the preset delay time, the proportion of invalid data in the logic block group during data recovery is further improved, the data volume of valid data to be moved in the data moving process is reduced, the write amplification is reduced, and the service performance of the solid state disk is improved.

In one embodiment, the method further comprises:

under the condition that the life cycle of each data object contained in the target logic block group is inconsistent, acquiring the longest life cycle in the target logic block group, and determining the longest life cycle as a preset delay time;

or determining that the default duration is a preset delay duration under the condition that the life cycle of each data object contained in the target logic block group is consistent.

In this embodiment, the preset delay time is determined according to the consistent life cycle of each data object, so that the proportion of invalid data in the logic block group during data recovery can be further improved, movement of valid data in the data migration process is avoided, and the effect of synchronous failure of the data in the same logic block group is achieved.

In one embodiment, the method further comprises:

acquiring a data update time of the data object in response to a data modification operation of the data object;

determining the default life cycle as the life cycle of the data object under the condition that the data object meets the first writing condition of the data;

calculating a life cycle of the data object based on a data update time of the data object under a condition that the data object is determined not to satisfy the data first-time writing condition; and determining a lifecycle level of the data object based on a preset level determination policy.

In this embodiment, by dynamically adjusting the lifecycle levels corresponding to the lifecycles of the data objects, the performance overhead of the storage system may be further reduced, and more computing resources may be reserved for the data writing process.

In a second aspect, the present application further provides a data writing apparatus. The device comprises:

the first acquisition module is used for acquiring a data stream to be written corresponding to a storage system, wherein the storage system comprises a plurality of logic block groups;

a first determining module, configured to identify a life cycle of each data object included in the data stream to be written;

The first writing module is used for dividing each data object with the life cycle meeting the preset similar condition into the same group to obtain a plurality of data groups, and writing the plurality of data objects contained in each data group into the same logic block group.

In one embodiment, the first determining module is specifically configured to: and identifying the life cycle of the data object through the target life cycle identification model and the address identification information of the data object.

In one embodiment, the first determining module is further specifically configured to: inquiring in a target life cycle identification model based on the address identification information of the data object;

In one embodiment, the data writing apparatus further includes:

the second acquisition module is used for acquiring the life cycle of each effective data object contained in the target logic block group under the condition that the target logic block group meets the preset data recovery condition;

The first dividing module is used for dividing each effective data into a plurality of target data packets based on the life cycle of each effective data object if the current time meets the preset time delay condition;

and the release module is used for carrying out data migration on each effective data object based on each target data packet and carrying out release processing on the target logic block group.

In one embodiment, the first dividing module is specifically configured to determine a life cycle level of each valid data object based on a life cycle of each valid data object, and divide each valid data into a plurality of target data packets based on the life cycle level of each valid data object.

In one embodiment, the data writing apparatus further includes:

a third obtaining module, configured to obtain a total data capacity of the target logical block group and an invalid data capacity of an invalid data object in the target logical block group;

and the second determining module is used for determining that the target logic block group meets the preset data recovery condition if the ratio of the invalid data capacity to the total data capacity is larger than a preset recovery threshold.

In one embodiment, the data writing apparatus further includes:

a fourth obtaining module, configured to obtain a first time when the target logic block group meets the preset data recovery condition;

and the third determining module is used for determining that the current time meets a preset time delay condition under the condition that the time difference between the first time and the current time is larger than or equal to a preset delay time length.

In one embodiment, the data writing apparatus further includes:

a fifth obtaining module, configured to obtain a longest life cycle in the target logical block group and determine that the longest life cycle is a preset delay duration when the life cycle of each data object included in the target logical block group is inconsistent;

and the fourth determining module is used for determining the default duration as the preset delay duration or determining the default duration as the preset delay duration under the condition that the life cycles of the data objects contained in the target logic block group are consistent.

In one embodiment, the data writing apparatus further includes:

a sixth obtaining module, configured to obtain a data update time of the data object in response to a data modification operation of the data object;

A fifth determining module, configured to determine the default life cycle as a life cycle of the data object if it is determined that the data object meets a data first-time writing condition;

a sixth determining module, configured to calculate a life cycle of the data object based on a data update time of the data object if it is determined that the data object does not satisfy the first writing condition of the data; and determining a lifecycle level of the data object based on a preset level determination policy.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

The above data writing method, apparatus, computer device, storage medium and computer program product, the method comprising: obtaining a data stream to be written corresponding to a storage system, wherein the storage system comprises a plurality of logic block groups; identifying the life cycle of each data object contained in the data stream to be written; dividing each data object with the life cycle meeting the preset similar condition into the same group to obtain a plurality of data groups, and writing the plurality of data objects contained in each data group into the same logic block group. By adopting the method, the life cycle of each data object to be written in the data stream is identified and predicted based on the target life cycle identification model, each data object can be grouped based on the life cycle of each data object and the preset similar condition, the effect of data synchronization failure in the same logic block group is realized, the service performance of the solid state disk is further improved, the data moving amount in the data moving process can be reduced, and the service life of the solid state disk is also prolonged.

Drawings

FIG. 1 is a schematic diagram of a memory system in a data writing method according to an embodiment;

FIG. 2 is a flow chart of a data writing method according to an embodiment;

FIG. 3 is a flow diagram of a process for determining a lifecycle step, in one embodiment;

FIG. 4 is a flow chart of a release step in one embodiment;

FIG. 5 is a flowchart illustrating a step of determining whether a predetermined data recovery condition is satisfied in one embodiment;

FIG. 6 is a flowchart illustrating a step of determining whether a predetermined time delay condition is satisfied in one embodiment;

FIG. 7 is a flowchart illustrating a step of determining a predetermined delay period in one embodiment;

FIG. 8 is a flow diagram of the steps for determining a lifecycle level in one embodiment;

FIG. 9 is a diagram of data writing in a memory system according to one embodiment;

FIG. 10 is a diagram of data relocation and space release for data failure in one embodiment;

FIG. 11 is a schematic diagram of a garbage collection relocation data writing process in one embodiment;

FIG. 12 is a system architecture diagram of a storage system in one embodiment;

FIG. 13 is a flow chart of a data writing step in another embodiment;

FIG. 14 is a flow chart of data lifecycle learning in one embodiment;

FIG. 15 is a schematic diagram of a process flow of system garbage collection in one embodiment;

FIG. 16 is a block diagram showing a structure of a data writing device in one embodiment;

Fig. 17 is an internal structural view of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The data writing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. The storage system may include a plurality of logic block groups, each logic block group may include a plurality of logic blocks, and data included in each logic block of each logic block group is written into a plurality of SSD disks respectively; for example, logical block group 1 and logical block group 2 may be included, and logical block group may include logical block 1, logical block 2, logical block 3, and logical block 4; logical block group 2 may include logical block 5, logical block 6, logical block 7, and logical block 8; the data of the logic block 1 in the logic block group 1 and the data of the logic block 5 in the logic block group 2 can be written into the SSD disk 1, and correspondingly, the data of the logic block 2 and the data of the logic block 6 can be written into the SSD disk 2; the data of the logic block 3 and the data of the logic block 7 can be written into the SSD disk 3; the check data of the logical block 4 and the check data of the logical block 8 may be written in the SSD disk 4.

In one embodiment, as shown in fig. 2, a data writing method is provided, and this embodiment is applied to a terminal for illustration, where the terminal may communicate with a storage system through a network. The terminal may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers. In this embodiment, the data writing method includes the following steps:

step 202, obtaining a data stream to be written corresponding to a storage system.

The storage system comprises a plurality of logic block groups, wherein the data stream to be written comprises a plurality of data objects, and the data stream to be written comprises a plurality of data objects to be written.

Specifically, after receiving a data writing request, the terminal can analyze the data writing request to obtain a writing data stream carried by the data writing request, and acquire a plurality of data objects to be written contained in the data stream to be written; the data stream to be written is a data stream for writing data into the storage system, and specifically may be writing a plurality of data objects to be written into the storage system.

Step 204, identify a lifecycle of each data object contained in the data stream to be written.

Specifically, for each data object included in the data stream to be written, the terminal may identify (predict) the lifecycle of the data object based on the target lifecycle identification model, and determine the predicted lifecycle of the data object based on the output result of the target lifecycle.

And 206, dividing each data object with the life cycle meeting the preset similar condition into the same group to obtain a plurality of data groups, and writing the plurality of data objects contained in each data group into the same logic block group.

The preset similar condition may be that the life cycles of the data objects are the same, the preset similar condition may also be that the life cycles of the data objects are close or the time difference is smaller than a preset threshold value, and so on.

Specifically, the terminal may divide each data object based on a specific numerical value of a number period of each data object and a specific content of a preset similarity condition, divide one data object or a plurality of data objects that satisfy the preset similarity condition into the same data packet, that is, determine a life cycle level of the data objects that satisfy the preset similarity condition as the same life cycle level, and divide the data objects with the same life cycle level into the same data packet. Based on this, the terminal can get multiple data packets. For each of a plurality of data packets, the terminal may write the data object contained in that data packet to the same logical block group.

In one example, after determining the lifecycle of the respective data objects, the terminal may determine the content of the preset similarity condition based on the number of lifecycles and the number of lifecycle levels preconfigured by the storage system. If the terminal determines that the number of life cycles of the current data objects is less than or equal to the number of life cycle grades preset by the storage system, the terminal can determine that the content of the current preset similar condition can be that the life cycles of the data objects are the same; when the terminal determines that the number of life cycles of the current data object is greater than the number of life cycle levels preconfigured by the storage system, the terminal may determine that the content of the current preset similarity condition may be that the life cycles of the respective data objects are close, or that the time difference is smaller than a preset threshold value, and so on. Wherein each lifecycle level is a gear of one lifecycle.

In a specific example, the life cycle of each data object currently determined by the terminal may be 1 day, 3 days, 7 days, and 15 days, respectively, and the terminal may determine that the life cycle is 1 day and may be a first life cycle level, and the life cycle is 3 days and may be a second life cycle level; a lifecycle of 7 days may be a third lifecycle level; a lifecycle of 15 days may be a fourth lifecycle class. In another specific example, the life cycle of each data object currently identified by the terminal may be 1 day, 3 days, 7 days, 15 days, 30 days, 60 days, the number of life cycle levels preconfigured by the storage system that the terminal may determine may be 4, and since the number of life cycles of the data object is greater than the number of life cycle levels preconfigured by the storage system, the terminal may perform merging processing on the life cycles, and since 1 day, 3 days, 7 days satisfy a preset similar condition, the terminal may determine that the life cycle is 1 day, 3 days, 7 days as the same life cycle level, that is, the life cycle is 1, 3, 7 days may be a first life cycle level, and the life cycle is 15 days may be a second life cycle level; a lifecycle of 30 days may be a third lifecycle level; a lifecycle of 60 days may be a fourth lifecycle level.

In one embodiment, the specific process of step "identify the lifecycle of each data object contained in the data stream to be written" includes:

the lifecycle of the data object is identified by the target lifecycle identification model and address identification information of the data object.

The target life cycle identification model may be obtained by training write-delete records of each data object included in a plurality of data streams written into the storage system in a target time period, where the write-delete records of the data object include write time and deleted time of the data object, and the target life cycle identification model may be, for example, a business write-delete model, and the target life cycle identification model may include a plurality of address identification information and life cycle data pairs.

Specifically, for each data object in a plurality of data objects included in a data stream to be written, address identification information of the data object is first obtained, and based on the address identification of the data object, the address identification information consistent with the address identification information of the data object is determined in a target life cycle identification model query, and the life cycle corresponding to the address identification information is determined as the life cycle of the data object.

In one embodiment, as shown in fig. 3, the specific process of step of identifying the lifecycle of the data object by the target lifecycle identification model and the address identification information of the data object includes:

step 302, query is performed on the target lifecycle identification model based on address identification information of the data object.

The address identification information of the data object may be a key of the data object, for example, may be data of different logical addresses, and the corresponding address identification information may be a logical location (volid+lba) of the data object. The target lifecycle identification model may include a plurality of mappings of address identification information to lifecycles.

Specifically, the process of constructing the target lifecycle identification model (business write deletion model) may include: for each piece of data that has been written, the terminal may record the writing time of the piece of data in metadata corresponding to the piece of data, may record the writing time of the piece of data in a trace log of the storage system, and after the terminal detects that the piece of data is deleted or overwritten, the terminal may detect the deleting time when the piece of data is deleted, or the time when the piece of data is overwritten. The lifecycle of data is the data update period of the data, which may be the time interval from writing to being deleted or overwritten. The metadata of the storage system has a write time of the piece of data recorded therein, and when the piece of data is deleted or overwritten, the terminal may calculate a time difference based on the current time and the recorded write time, and determine the calculated time difference as a life cycle of the piece of data. Based on the above, the terminal may add the address identification information of the piece of data and the life cycle corresponding to the piece of data to a preset target life cycle identification model. The target lifecycle recognition model may characterize key intervals with more write-delete operations and key intervals with fewer write-delete operations.

Step 304, if the target life cycle corresponding to the address identification information of the data object is queried, determining the life cycle of the data object.

Specifically, the terminal may query in the target lifecycle identification model based on address identification information of a data object (to-be-written data object), and if the terminal queries in the target lifecycle identification model target address identification information matching the address identification information of the to-be-written data object, the terminal may determine a target lifecycle corresponding to the target address identification information as the lifecycle of the to-be-written data object.

In step 306, if the target life cycle corresponding to the address identification information of the data object is not queried, determining a default life cycle as the life cycle of the data object.

Specifically, the terminal queries in the target life cycle identification model based on the address identification information of the data object to be written, and if the terminal determines that the address identification information matched with the address identification information of the data object to be written does not exist in the target life cycle identification model, the terminal can determine that the target life cycle corresponding to the address identification information of the data object is not queried currently. Based on this, the terminal may acquire a default lifecycle of the storage system and determine the default lifecycle as the lifecycle of the data object to be written.

In one embodiment, as shown in fig. 4, the data writing method further includes:

step 402, acquiring life cycles of each valid data object contained in the target logical block group if the target logical block group meets a preset data recovery condition.

The life cycle of each data object contained in the target logic block group meets the preset similar condition, each logic block group contains a plurality of logic blocks, each logic block contains a plurality of data objects, the life cycle of each data object is determined based on the writing time of the data object and the deleted time of the data object, and can also be determined based on the writing time and the covered writing time of the data object, and the life cycle represents the duration of the data object for maintaining effective data; the preset data reclamation condition may be that the proportion of invalid data objects contained in the logical block group is greater than or is a preset proportion threshold value, or the like. The valid data objects contained in the target logical block group may be data objects whose lifecycle has not reached a preset duration.

Specifically, the terminal may determine whether each logical block group satisfies the preset data recovery condition, and in case the terminal determines that there is a target logical block group satisfying the preset data recovery condition, the terminal may determine whether there is a valid data object in the target logical block group, and in case that there is a valid data object, the terminal may acquire a life cycle of each valid data object.

In step 404, if the current time satisfies the preset time delay condition, each valid data is divided into a plurality of target data packets based on the life cycle of each valid data object.

The content of the preset time delay condition may be that the time and the current time of the target logic block group meeting the preset data recovery condition are greater than or equal to the preset delay duration; the life cycle of each valid data object in the same target data packet satisfies a preset similarity condition.

Specifically, when determining that the current time has met the preset time delay condition based on the current time and the time that the target logical block group meets the preset data recovery condition, the terminal may acquire each valid data object included in the target logical block group under the current condition, and determine the life cycle of each valid data object. The terminal may divide the valid data objects whose life periods satisfy the preset similarity condition into the same data packet based on the life periods of the respective valid data objects, to obtain a plurality of target data packets, where the life periods of the respective valid data objects included in the same target data packet satisfy the preset similarity condition.

Alternatively, the preset similarity condition may be that the life cycles of the data objects are the same, the preset similarity condition may be that the life cycles of the data objects are close or the time difference is smaller than a preset threshold, and so on.

Step 406, based on each target data packet, performing data migration on each valid data object, and performing release processing on the target logical block group.

Specifically, for each target data packet in the multiple target data packets, the terminal may write each valid data object containing the life cycle level in the target data packet into the same logical block group, perform data downloading processing on each logical block group into which the valid data object has been written, and release the storage space of the target logical block group after the downloading is completed.

In an example, the specific process of the terminal performing the data downloading process may be that the terminal writes the data contained in each logical block group that has been written with the valid data object into the solid state disk, so as to complete the downloading process.

In one embodiment, the specific process of step "divide each valid data into a plurality of target data packets based on the lifecycle of each valid data object" includes:

the lifecycle class of each valid data object is determined based on the lifecycle of each valid data object, and each valid data is divided into a plurality of target data packets based on the lifecycle class of each valid data object.

The content of the preset time delay condition may be that the time and the current time of the target logic block group meeting the preset data recovery condition are greater than or equal to the preset delay duration; the life cycle level of each valid data object may be a gear obtained by dividing based on the value of the life cycle of the data object; the life cycle of each effective data object in the same target data packet meets the preset similar condition, and the life cycle grades of each effective data object are the same.

Specifically, when determining that the current time has met the preset time delay condition based on the current time and the time that the target logical block group meets the preset data recovery condition, the terminal may acquire each valid data object included in the target logical block group under the current condition, and determine the life cycle of each valid data object. The terminal can determine the lifecycle levels respectively corresponding to the effective data objects based on the lifecycle of the effective data objects and the number of the lifecycle levels which are configured in advance, so that the terminal can determine that the effective data objects with the same lifecycle level are data objects meeting preset similar conditions; that is, the terminal divides each valid data object included in the target logical block group, and may divide a plurality of valid data objects having the same life cycle level into the same data packet, to obtain a plurality of target data packets.

In one embodiment, as shown in fig. 5, the data writing method further includes:

step 502, obtaining the total data capacity of the target logical block group and the invalid data capacity of the invalid data object in the target logical block group.

Wherein the total data capacity of the target logical block group may be determined based on the data capacities of the plurality of logical blocks included in the target logical block group.

Specifically, the terminal may obtain the data capacity of the target logical block group based on the configuration information, and the terminal may calculate the total data capacity of the target logical block group by the data capacities of the respective logical blocks included in the target logical block group. Accordingly, the terminal pair determines whether each data object is a valid data object or an invalid data object based on the life cycle of each data object in the target logical block group, and counts invalid data capacities corresponding to one or more invalid data objects contained in the target logical block group.

In step 504, if the ratio of the invalid data capacity to the total data capacity is greater than or equal to the preset reclamation threshold, it is determined that the target logical block group meets the preset reclamation condition.

The specific value of the preset reclamation threshold may be determined based on an actual application scenario, for example, may be eighty percent, and the specific value of the preset reclamation threshold is not specifically limited in this disclosure.

Specifically, after determining the invalid data capacity included in the target logical block group, the terminal may calculate the garbage amount ratio based on the invalid data capacity and the total data capacity of the target logical block group. The terminal may determine a ratio of an invalid data capacity of a target logical block group to a total data capacity of the target logical block group as a garbage amount ratio of the target logical block group. Under the condition that the terminal detects that the garbage amount proportion of the target logic block group is greater than or equal to a preset recycling threshold value, the terminal can determine that the current target logic block group meets the preset data recycling condition, namely, the terminal can recycle the data of the target logic block group.

In one embodiment, as shown in fig. 6, the data writing method further includes:

step 602, obtaining a first time when the target logical block group meets a preset data recovery condition.

Specifically, the terminal may determine whether the target logical block group in the current situation meets the preset data recovery condition based on the ratio of the invalid data capacity and the total data capacity of the target logical block group, and when the terminal determines that the current target logical block group meets the preset data recovery condition, the terminal may acquire the time when the target logical block group meets the preset data recovery condition, that is, the first time.

In step 604, in the case that the time difference between the first time and the current time is greater than or equal to the preset delay time length, it is determined that the current time satisfies the preset time delay condition.

The preset delay time may be a delay recovery threshold, which is a waiting time after determining that the current target logic block group meets a preset data recovery condition.

Specifically, after determining that the target logic block group meets the preset data recovery condition, the terminal may synchronously record the first time when the target logic block group meets the preset data recovery condition, so that the terminal may acquire the current time, calculate a time difference between the current time and the first time, and determine a corresponding relationship between the time difference and a preset delay duration. Under the condition that the terminal determines that the time difference is greater than or equal to the preset delay time, the terminal can determine that the target logic block group has passed through the preset delay time after the preset data recovery condition is met, namely the terminal can determine that the current time length has met the preset time delay condition. In one example, in a case where the terminal determines that the time difference between the current time and the first time is less than the preset delay period, the terminal may determine that the current time does not satisfy the preset time delay condition, and the terminal re-performs the steps of acquiring the current time and calculating the time difference between the current time and the first time until the calculated time difference is greater than or equal to the preset delay period.

In one embodiment, as shown in fig. 7, the data writing method further includes:

step 702, in the case that the life cycle of each data object included in the target logical block group is inconsistent, acquiring the longest life cycle in the target logical block group, and determining the longest life cycle as the preset delay duration.

Wherein, the non-uniform life cycle of each data object contained in the target logic block group can be the data objects with non-uniform life cycle.

Specifically, the terminal may obtain the life cycle of each data object included in the target logical block group, determine whether there are data objects with different life cycles in the target logical block group, and if the terminal determines that the life cycles of the data objects in the current target logical block group are different, the terminal may extract the life cycle of the data object with the longest life cycle, and configure the longest life cycle as a preset delay duration corresponding to the target logical block group.

In step 704, in the case that the life cycles of the data objects included in the target logical block group are consistent, the default duration is determined to be the preset delay duration.

Specifically, under the condition that the life cycle of each data object included in the target logic block group is determined to be uniform and consistent, the terminal can acquire a default duration preset by the storage system, and determine the default duration as a preset delay duration corresponding to the target logic block group.

In one embodiment, as shown in fig. 8, the data writing method further includes:

step 802, in response to a data modification operation of a data object, a data update time of the data object is obtained.

The data modification operation may be an operation of the data object, and may include a plurality of operations such as an add operation, a data update operation, a data delete operation, a data update operation, and a data query operation, for example.

Specifically, in response to a data modification operation for a data object, the terminal may record a modification time corresponding to the data modification operation and record the modification time as a data update time, and in one example, the terminal may add the data update time of the data object to metadata of the data.

In step 804, in the case that the data object is determined to satisfy the data first-write condition, the default lifecycle is determined as the lifecycle of the data object.

The content of the condition of first writing of the data may be that the data object is first written into the storage system, the default life cycle may be preconfigured by the storage system, the life cycle level corresponding to the default life cycle may be, for example, a first life cycle level, etc., and the life cycle level corresponding to the default life cycle may be determined by the storage system based on an actual application scenario.

Specifically, after the terminal obtains the data update time of the data object, it may determine whether the data object is first writing, and in the case that the terminal determines that the data object is first writing in the storage system, the terminal may determine that the data object meets the first writing condition of the data, then the terminal may obtain a default life cycle and a default life cycle level of the storage system, and determine the default life cycle and the default life cycle level as the life cycle and the life cycle level of the data object corresponding to the data modification operation.

In step 806, in the case that the data object is determined not to satisfy the first writing condition of the data, the life cycle of the data object is calculated based on the data update time of the data object, and the life cycle level of the data object is determined based on the preset level determination policy.

Specifically, in the case that the terminal determines that the data object is not written into the storage system for the first time, the terminal may determine that the data object does not satisfy the condition of first writing of data, based on which the terminal may acquire the data update time of the data object and the last writing time of the logical address corresponding to the data object, calculate a time difference between the data update time (current time) and the last writing time of the logical address corresponding to the data object, and determine the calculated time difference as the life cycle of the data object.

In one example, the preset level determination policy may include: in the case where the number of lifecycles of the respective data objects is less than the number of lifecycle levels preconfigured by the storage system, the terminal may determine a respective corresponding lifecycle level for each lifecycle, for example: the life cycle of each data object currently determined by the terminal can be 1 day, 3 days, 7 days and 15 days respectively, the terminal can determine that the life cycle is 1 day and can be a first life cycle grade, and the life cycle is 3 days and can be a second life cycle grade; a lifecycle of 7 days may be a third lifecycle level; a lifecycle of 15 days may be a fourth lifecycle class.

Under the condition that the number of the life cycles of the current data objects determined by the terminal is greater than or equal to the number of the life cycle grades preset by the storage system, the terminal can perform merging processing based on the life cycles of the data objects, and the terminal can divide the data objects with the life cycles close to each other or with the time difference of the life cycles smaller than a preset threshold value into the same life cycle grade. For example, when the terminal currently recognizes that the life cycle of each data object may be 1 day, 3 days, 7 days, 15 days, 30 days, and 60 days, the terminal may perform the merging process on the life cycles, and since the life cycles are close to each other in 1 day, 3 days, and 7 days, or if the time difference of the life cycles is less than the preset threshold, the terminal may determine that the life cycles are 1 day, 3 days, and 7 days as the same life cycle level, that is, the life cycles are 1, 3, and 7 days may be the first life cycle level, and the life cycle is 15 days may be the second life cycle level; a lifecycle of 30 days may be a third lifecycle level; a lifecycle of 60 days may be a fourth lifecycle level.

The following describes in detail, in connection with a specific embodiment, a specific implementation procedure of the above data writing method:

with the continuous development of the storage technology field, a full flash memory array is presented, and the full flash memory array is a storage system comprising an SSD solid state disk and a system controller; the SSD needs to erase old data before writing new data, so the full flash memory system often adopts a redirection writing mode, and allocates logical addresses from the SSD solid state disk to write data in units of logical block groups. When the storage system is not sufficiently space, space needs to be freed by system-level garbage collection. In the system-level garbage collection operation, all effective data in the old logical block group needs to be moved to the new logical block group, which can lead to effective data to be moved in the SSD solid state disk, increase the write amplification times of the disk and influence the service life and performance of the disk.

The logical block group is to divide and manage the SSD disk space according to the logical blocks with fixed size, and the logical blocks with RAID attribute are grouped by the logical blocks on different SSD disks; garbage collection refers to that when new data is written into SSD, flash memory blocks occupied by original data need to be erased first. Because of the characteristics of the flash memory chip, the written data cannot be directly covered, but the original data is marked as invalid, and then the erasing operation is performed. This process is called garbage collection; the data life cycle of the data object refers to the interval between the data update write time and the previous write time of the same logical address or object, that is, the life cycle of the data object represents the write heat of the data, the data life cycle with high write heat is short, and the data life cycle with low write heat is long.

As shown in fig. 9, there may be a schematic diagram of data writing in a storage system: the write data stream may include a plurality of data objects, e.g., data object 1, … …, data object 12; the write data stream may be data to be written into the full flash memory system, and the terminal may identify and predict a life cycle of each data object included in the write data stream through the life cycle prediction module. Dividing gears according to different life cycles, writing different gear data into different logic block groups, so that the data of different life cycles belong to different logic block groups, and the life cycles of the data in the same logic block group are the same or similar.

For example, the terminal may determine that the life cycles of the data object 2, the data object 4, the data object 6, the data object 8, the data object 9 and the data object 10 are all 7 days, and the life cycles of the data object 1, the data object 3, the data object 5, the data object 7, the data object 11 and the data object 12 are all 30 days, so that the terminal may divide the life cycles into the same data packet, for example, the data object 2, the data object 4, the data object 6, the data object 8, the data object 9 and the data object 10 may be written into the logic block 1, the logic block 2 and the logic block 3 contained in the logic block group 1, and the logic block 4 in the logic block group 1 may write the check data of the data; similarly, the data object 1, the data object 3, the data object 5, the data object 7, the data object 11, and the data object 12 may be written into the logical blocks 5, 6, and 7 included in the logical block group 2, and the logical block 8 in the logical block group 2 may be written with the check data of the above data. And the data objects contained in the logic block group 1 and the logic block group 2 are subjected to disc dropping, the logic block 1 and the logic block 5 are written into the SSD disc 1, the logic block 2 and the logic block 6 are written into the SSD disc 2, the logic block 3 and the logic block 7 are written into the SSD disc 3, and the logic block 4 and the logic block 8 are written into the SSD disc 4.

Thus, as shown in fig. 10, the data relocation and space release of the data failure is shown, after the data is updated and written, the data in the same logic block group will fail at the same time or in a similar time period, when the space occupied by the logic block group needs to be recovered, no or only a small amount of valid data is in the logic block group, and the system garbage recovery can release the space of the logic block group without relocation of data or only with relocation of a small amount of data.

FIG. 11 is a schematic diagram of a garbage collection and relocation data writing process: when the terminal determines that the data need to be moved, the storage system writes the data into different logic block groups according to the life cycle grading when carrying out garbage recovery, and the life cycle of the data in the logic block groups is kept the same or similar. Specifically, the data failure condition may include: the data object 4, the data object 8, the data object 9 and the data object 10 in the logic block group 2, the data object 1, the data object 5, the data object 7 and the data object 11 in the logic block group 1, the valid data object may include the data object 2 and the data object 6, and the data object 3 and the data object 12, so that the terminal performs life cycle identification, re-identifies the life cycle of the valid data object and divides the life cycle gear into life cycle gears, the life cycle gear may include the life cycle 7 days, the life cycle 15 days, the life cycle 30 days and the life cycle 60 days, the life cycle of the determined data object 2 and the life cycle 6 is 7 days, the life cycle of the data object 3 and the life cycle 12 is 30 days, so that the data object in the same life cycle can be written into the same logic block group, for example, the logic block group pool may include the logic block group 3, the logic block 4, the logic block 5 and the logic block 6, the terminal may write the data object 2 and the data object 6 into the logic block group 3, the data object 3 and the data object 12 into the logic block group 5 and one corresponding to the life cycle.

Fig. 12 is a schematic diagram of a system architecture of a storage system according to an embodiment of the present application, which mainly includes a data receiving module, a life cycle prediction module, a life cycle learning module, a space allocation module, a storage module, and a space recycling module. The data receiving module is responsible for accessing business read-write data, the life cycle prediction module is responsible for identifying the life cycle of the data and classifying the data according to the supported life cycle gear, the life cycle learning module is responsible for learning the life cycle of the data and pushing the life cycle gear to be configured to the life cycle prediction module, the space allocation module is responsible for allocating and releasing the logic block group, the storage module is responsible for the disc-down persistence of the logic block group data, and the space recovery module is responsible for releasing the logic block group space.

As shown in fig. 13, the data writing process specifically includes: the user at the host computer writes data, the life cycle of the data is identified, whether the life cycle of the data can be identified is judged, and under the condition that the life cycle of the data cannot be identified, the data is written into a default gear, and data downloading is performed; where identifiable, the data is written to the corresponding lifecycle stage and a data download is performed. Specifically, the storage system can perform life cycle identification on the written data, and if the life cycle of the data cannot be judged, the data is written according to default grading. When the system is just started to be used, because the data writing quantity is small, the life cycle learning module does not have enough data writing records to learn and can not accurately identify the life cycle of the written data, so the writing can be performed according to default grading. After the system operates for a period of time, the user business writing and deleting model is basically fixed, and the learning module can learn the data updating period accurately through writing and deleting records, further calculate the data life cycle classification and push the data life cycle classification to the prediction module for recognition.

As shown in fig. 14, a data life cycle learning flow chart specifically includes: the logic address data is written and deleted, the data updating time is recorded, the system judges whether the data is written for the first time, and the data is classified into a default gear under the condition that the data is confirmed to be written for the first time; and under the condition that the initial writing is not determined, the system searches the last writing time of the logic address, calculates the life cycle of the logic address data, updates the life cycle gear of the data, merges the approximate life cycle into the same gear when the gears are excessive, and periodically pushes the life cycle gear to be configured to the prediction module. Specifically, under the condition that the data is modified, the learning module records the data updating time and judges whether the data is written for the first time, if so, the data is classified into a default class, if not, the last time of modifying the data is inquired, the data life cycle is calculated, then the data life cycle gear is updated, and if the recorded life cycle gears are too many, the approximate life cycles are combined to reduce the system maintenance cost.

As shown in fig. 15, which is a schematic diagram of a processing flow of system garbage collection, the storage system determines that the current logic block group reaches the garbage collection condition to perform data collection processing, specifically includes data update and writing, and old data fails, and under the condition that the logic block group meets the collection condition, the terminal queries the data life cycle of the logic block group and judges whether the current time reaches the delay time, and under the condition that the delay time is not reached, the terminal can keep to delay for a certain time to perform processing, and returns to execute the step of judging whether the logic block group meets the collection condition; in the case where it is determined that the delay time is reached. The terminal can search the effective data in the logic block group, grade the effective data according to the life cycle, write different logic block groups with different gear data, the new logic block group data is downloaded, and the old logic block group space is released. Specifically, when garbage is recovered, the data life cycle of the logic block group is firstly inquired to judge whether the current time reaches a set delay recovery threshold value, if the current time does not reach the threshold value, the logic block group is recovered after waiting for a certain time for reprocessing, and if the current time reaches the threshold value, the logic block group is recovered. And when the effective data is recovered, the effective data is still classified according to the life cycle, the data with different life cycles is migrated and written into different new logic block groups, and the space of the old logic block group is released after the migration is completed. The aim of delaying reprocessing for a certain time is to make the data in the logic block group fail as much as possible, so as to avoid the immediate processing when the recycling condition (such as the garbage amount of the logic block group reaches a certain water level) is reached, and the data which is about to fail originally is also moved to cause write amplification.

According to the data writing method, data with different life cycles can be written into different logic block groups through predicting the life cycles of the data, the effect that the data in the logic block groups are invalid at the same time is achieved, the data quantity of system garbage recovery and relocation is reduced, and system write amplification is reduced. The garbage collection efficiency of the full flash memory system can be improved, the occupation of the bandwidth of the CPU and SSD of the controller is reduced, the system resources are more used for serving the foreground business, and the foreground business performance is improved. Further, write amplification can be reduced, and the service life of the SSD disk can be prolonged.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a data writing device for realizing the above related data writing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the data writing device provided below may refer to the limitation of the data writing method hereinabove, and will not be repeated here.

In one embodiment, as shown in FIG. 16, there is provided a data writing apparatus 1600 comprising:

a first obtaining module 1602, configured to obtain a data stream to be written corresponding to a storage system, where the storage system includes a plurality of logic block groups;

a first determining module 1604, configured to identify a life cycle of each data object included in the data stream to be written;

the first writing module 1606 is configured to divide each data object whose life cycle satisfies a preset similar condition into the same group, obtain a plurality of data groups, and write a plurality of data objects included in each data group into the same logical block group.

In one embodiment, the first determining module is specifically configured to: the lifecycle of the data object is identified by the target lifecycle identification model and address identification information of the data object.

In one embodiment, the first determining module is further specifically configured to: inquiring in the target life cycle identification model based on the address identification information of the data object;

if the target life cycle corresponding to the address identification information of the data object is queried, determining the life cycle of the data object in the target life cycle;

if the target life cycle corresponding to the address identification information of the data object is not queried, determining the life cycle of the data object in the default life cycle.

In one embodiment, the data writing apparatus further includes:

and the second determining module is used for determining that the target logic block group meets the preset data recovery condition if the ratio of the invalid data capacity to the total data capacity is greater than the preset recovery threshold.

In one embodiment, the data writing apparatus further includes:

the fourth acquisition module is used for acquiring the first time when the target logic block group meets the preset data recovery condition;

and the third determining module is used for determining that the current time meets the preset time delay condition under the condition that the time difference between the first time and the current time is larger than or equal to the preset delay time length.

In one embodiment, the data writing apparatus further includes:

a fifth obtaining module, configured to obtain a longest life cycle in the target logical block group and determine that the longest life cycle is a preset delay duration when the life cycles of the data objects included in the target logical block group are inconsistent;

and the fourth determining module is used for determining the default time length to be the preset delay time length or determining the default time length to be the preset delay time length under the condition that the life cycles of the data objects contained in the target logic block group are consistent.

In one embodiment, the data writing apparatus further includes:

a sixth acquisition module for acquiring a data update time of the data object in response to a data modification operation of the data object;

a fifth determining module, configured to determine a default life cycle as a life cycle of the data object if it is determined that the data object satisfies the first writing condition of the data;

a sixth determining module, configured to calculate a life cycle of the data object based on a data update time of the data object if it is determined that the data object does not satisfy the data first writing condition; and determining a lifecycle level of the data object based on the preset level determination policy.

The respective modules in the above-described data writing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 17. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing lifecycle data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data writing method.

It will be appreciated by those skilled in the art that the structure shown in fig. 17 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the related data are required to meet the related regulations.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of writing data, the method comprising:

2. The method of claim 1, wherein said identifying a lifecycle of each data object contained in said data stream to be written comprises:

3. The method of claim 2, wherein the identifying the lifecycle of the data object by the target lifecycle identification model and the address identification information of the data object comprises:

4. The method according to claim 1, wherein the method further comprises:

5. The method of claim 4, wherein dividing each of the valid data into a plurality of target data packets based on a lifecycle of each of the valid data objects comprises:

6. The method according to claim 4, wherein the method further comprises:

7. The method according to claim 4, wherein the method further comprises:

8. The method of claim 7, wherein the method further comprises:

9. A method according to claim 3, characterized in that the method further comprises:

10. A data writing apparatus, the apparatus comprising:

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 9.