WO2024051957A1

WO2024051957A1 - Method and apparatus for writing data to magnetic tape

Info

Publication number: WO2024051957A1
Application number: PCT/EP2022/075187
Authority: WO
Inventors: Aviv Kuvent; Assaf Natanzon; Yair Toaff; Idan Zach; Michael Sternberg
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2022-09-09
Filing date: 2022-09-09
Publication date: 2024-03-14

Abstract

A number of tape wraps K to be used for writing deduplicated input data to a magnetic tape is determined. Priorities are assigned to data segments in each logical cluster based on priority rules and an order of writing of data segments from the logical cluster to the magnetic tape is determined by ordering the data segments according to the assigned priorities. For each logical cluster a writing plan is determined comprising a tape wrap and an offset for writing each data segment of the logical cluster to the magnetic tape so that to plan writing of each next data segment into a tape wrap different from a tape wrap of a previous data segment in a nearest available location to a location of the previous data segment in accordance with distances between the tape wraps and offsets reached or planned in each of the determined K tape wraps.

Description

METHOD AND APPARATUS FOR WRITING DATA TO MAGNETIC TAPE

TECHNICAL FIELD

The present disclosure relates generally to the field of data storage and more specifically, to a method and an apparatus for writing data to a magnetic tape.

BACKGROUND

Generally, various storage devices are used for storing digital data, such as hard disks, pen drives, memory cards, and the like. Moreover, the need for storing digital data is increasing such that tape technologies are used with deduplication for secondary storage due to cheap price and long retention. However, the seek time of a conventional magnetic tape is relatively long, due to which the magnetic tape is used without any optimization beside the built-in compression.

Conventionally, certain attempts have been made in order to solve the problem of seek time when using deduplication on the magnetic tape, such as by writing the data on the magnetic tape in its original (i.e., without deduplication) form. The other existing solution to solve the problem of seek time is by maintaining a closed unit, such as a similarity locality (SILO) that includes all the deduped data and further reads the entire unit of data during the required. In certain scenarios, the problem of seek time can also be resolved by storing the deduped data on the magnetic tape to reduce the restoration time, such as by writing a common data in two files for sequentially reading the common data from the magnetic tape. However, such attempts further require lots of preprocessing of the common data and also lower the deduplication ratio drastically. Thus, there exists a technical problem of how to allow deduplication with the reduced seek time to read files (or objects) from the magnetic tape.

Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with conventional memory systems.

SUMMARY

The present disclosure provides a method and an apparatus for writing data on a magnetic tape. The present disclosure provides a solution to the existing problem of how to allow deduplication while optimizing future reads by reducing seek time that is required to read objects (or files) from the magnetic tape. An aim of the present disclosure is to provide a solution that overcomes at least partially the problem encountered in the prior art and provides an improved method of writing data to the magnetic tape and an improved apparatus for writing data to the magnetic tape, such as an optimized writing of deduped data to the magnetic tape.

One or more objectives of the present disclosure are achieved by the solutions provided in the enclosed independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims.

In one aspect, the present disclosure provides a method of writing data to magnetic tape, comprising deduplicating input data, splitting the deduplicated input data into logical clusters, and each logical cluster comprises a set of files and/or data objects of the deduplicated input data that contain a number of common data segments in the deduplicated input data. The method comprises determining the number of tape wraps K to be used for writing the deduplicated input data to a magnetic tape and a head of tape offset already reached in the magnetic tape. For each logical cluster, the method comprises assigning priorities to data segments in the logical cluster based on priority rules that define higher priorities for more common data segment and lower priorities for less common data segment among the set of files and/or data objects comprised in the logical cluster. Further, the method comprises determining an order of writing of data segments from the logical cluster to the magnetic tape by ordering the data segments in accordance with the assigned priorities, and determining a writing plan that comprises a tape wrap and an offset in the tape wrap for writing each data segment of the logical cluster to the magnetic tape so that to plan writing of each next data segment, in the order of writing, into a tape wrap different from a tape wrap of a previous data segment, within the determined K tape wraps, in a nearest available location to a location of the previous data segment in accordance with distances between the tape wraps and offsets reached in each of the determined K tape wraps in all the writing plans, and writing the deduplicated input data to the magnetic tape using the writing plans of all the logical clusters.

The method is used to write the data to the magnetic tape efficiently with reduced seek time. The method is used for the deduplication and splitting of the input data and further assigns priority to the data segments according to the common data stored in the data segments and order the data segments as per the priority. In addition, the buffering of the input data increases the life span of the magnetic tape. Furthermore, the method is used for providing an optimal order to read the data more efficiently with reduced complexity, such as by moving the head of the tape drive across tape wraps than by moving the head of the tape across the length of the tape wrap. Thus, the method is configured to write the magnetic tape with reduced seek time to read the files (or the data objects) from the magnetic tape with reduced complexity.

In an implementation form, the method further comprises buffering input data until it reaches a pre-determined size before the deduplicating of the input data.

In such an implementation, the buffering of the input data increases the life span of the magnetic tape by reducing the running of the magnetic tape in back-and-forth motion.

In a further implementation form, the determining of the number of tape wraps K to be used for writing the deduplicated input data to the magnetic tape is based on a tape wrap size, a size of the deduplicated input data and the head of the tape offset already reached in the magnetic tape.

The number of tape wraps K are used for writing the deduplicated input data to the magnetic tapes sequentially without any holes.

In a further implementation form, the splitting of the deduplicated input data into logical clusters comprise for each file and/or data object of the deduplicated input data, creating a list of data segments that are contained in the file and/or data object, and assigning two files and/or data objects to the same logical cluster if an overlap of their lists of data segments is above a pre-determined threshold.

The assignment of two files and/or the two data objects to the same logical cluster is used to refer to the data segments with common data to provide an efficient and improved reading of the data on the magnetic tape.

In a further implementation form, the method comprises determining a first writing location to start the writing of the deduplicated input data to the magnetic tape based on the head of the tape offset.

The first writing location is used for providing the location to start the writing of the deduplicated input data to the magnetic tape that is based on the head of the tape offset. In a further implementation form, the nearest neighbour to the last written data segment is determined by a minimum distance between offsets reached in all the determined K tape wraps and an offset in a tape wrap of the last written data segment.

The nearest neighbour provides an optimal order to read the data more efficiently with reduced complexity, such as by moving the head of the magnetic tape drive across tape wraps than by moving the head of the magnetic tape across the length of the tape wrap.

In a further implementation form, the priority rules further define a higher priority for one or more data segments contained in one or more files and/or data objects comprised in the logical cluster.

In such implementation, the method is used to order the common data with higher priority to provide an efficient and improved readability of the data.

In yet another aspect, the present disclosure provides an apparatus for writing data to magnetic tape, comprising a magnetic-tape data storage comprising a magnetic tape with a plurality of tape wraps and a data processing module. The data processing module is configured for deduplicating input data, splitting the deduplicated input data into logical clusters, and each logical cluster comprises a set of files and/or data objects of the deduplicated input data that contain a number of common data segments in the deduplicated input data. The data processing module is configured for determining the number of tape wraps K to be used for writing the deduplicated input data to the magnetic tape and a head of tape offset already reached in the magnetic tape. For each logical cluster, assigning priorities to data segments in the logical cluster based on priority rules that define higher priorities for a more common data segments and lower priorities for less common data segment among the set of files and/or data objects comprised in the logical cluster, determining an order of writing of data segments from the logical cluster to the magnetic tape by ordering the data segments in accordance with the assigned priorities, and determining a writing plan that comprises a tape wrap and an offset in the tape wrap for writing each data segment of the logical cluster to the magnetic tape so that to plan writing of each next data segment, in the order of writing, into a tape wrap different from a tape wrap of a previous data segment, within the determined K tape wraps, in a nearest available location to a location of the previous data segment in accordance with distances between the tape wraps and offsets reached in each of the determined K tape wraps in all the writing plans, and controlling the magnetic-tape data storage to write the deduplicated input data to the magnetic tape using the writing plans of all the logical clusters.

The apparatus achieves all the advantages and technical effects of the method of the present disclosure.

It is to be appreciated that all the aforementioned implementation forms can be combined. It is be noted that all devices, elements, circuitry, units, and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application, as well as the functionalities described to be performed by the various entities, are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity that performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.

Additional aspects, advantages, features, and objects of the present disclosure would be made apparent from the drawings and the detailed description of the illustrative implementations construed in conjunction with the appended claims that follow.

BRIEF DESCRIPTION OF THE DRAWINGS

The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers. Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:

FIG. 1 is a flowchart of a method of writing a data to a magnetic tape, in accordance with an embodiment of the present disclosure;

FIG. 2 is a block diagram that depicts an apparatus, in accordance with an embodiment of the present disclosure;

FIG. 3 is a diagram that depicts reordering of data segments, in accordance with an embodiment of the present disclosure;

FIG. 4 is a diagram that depicts reordering of data segments without prioritizing specific file read optimization, in accordance with an embodiment of the present disclosure; and

FIG. 5 is a diagram that depicts reordering of data segments by prioritizing file read optimization, in accordance with another embodiment of the present disclosure.

In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the nonunderlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.

DETAILED DESCRIPTION OF EMBODIMENTS

The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practising the present disclosure are also possible.

FIG. 1 is a flowchart of a method of writing a data to a magnetic tape, in accordance with an embodiment of the present disclosure. With reference to FIG. 1, there is shown a flowchart of a method 100 that includes steps 102 to 114.

There is provided the method 100 of writing data to a magnetic tape. In an implementation, the magnetic tape is composed of tracks on which the data is written and from which the data is read. Moreover, the magnetic tape is divided into bands and a tape drive head that covers the width of a single band. In addition, each band is composed of an even number of tape wraps. For example, linear tape open ninth-generation (LTO 9) tapes are constructed according to the specifications of the LTO 9. In an example, there are thirty-two (32) tracks per wrap and fifty- two (52) tape wraps per band, include twenty-six (26) start-to-end tape wraps and twenty-six (26) end-to-start tape wraps and four (4) bands. Thus, the method 100 of writing data to the magnetic tape provides an efficient and reliable use of recommended access order and timebased access order system techniques for optimized reading order of objects (or files), such as by reordering the data before writing.

At step 102, the method 100 comprises deduplicating an input data. In an example, a data processing module is used for deduplicating an input data, which may also be referred to as a deduplicated input data. At step 104, the method 100 comprises splitting the deduplicated input data into logical clusters. In an implementation, the logical clusters correspond to a set of data segments belong to a set of files or data objects to re-order in order to optimize future reads of the files and the data objects from the magnetic tape with improved performance of the reads. Moreover, each logical cluster includes a set of files and/or data objects of the deduplicated input data that contain a number of common data segments in the deduplicated input data. In an implementation, the logical clusters correspond to a set of data segments in the input data (i.e., the deduplicated input data), such as a set of files (or objects) that are reordered together to provide an optimal reading of each individual file (or object) in the logical cluster. In an implementation, each logical cluster includes the set of files and the data objects of the deduplicated input data that includes the number of common data segments in the deduplicated input data. In another implementation, each logical cluster includes the set of files of the deduplicated input data that contain the number of common data segments in the deduplicated input data. In yet another implementation, each logical cluster includes the data objects of the deduplicated input data that contain the number of common data segments in the deduplicated input data.

In accordance with an embodiment, the splitting of the deduplicated input data into logical clusters includes creating a list of data segments that are contained in the file and/or data object for each file and/or data object of the deduplicated input data. Further, the splitting of the deduplicated input data into the logical clusters includes assigning two files and/or data objects to the same logical cluster if an overlap of their lists of data segments is above a pre-determined threshold. In an implementation, the splitting of the deduplicated input data into logical clusters includes creating the list of data segments that are contained in the file and the data object for each file and for each data object of the deduplicated input data. In another implementation, the splitting of the deduplicated input data into logical clusters includes creating the list of data segments that are contained in the file for each file of the deduplicated input data. In yet another implementation, the splitting of the deduplicated input data into logical clusters includes creating the list of data segments that are contained in the data object for each data object of the deduplicated input data. Firstly, the input data, such as the deduplicated input data is processed, and thereafter, the list is created that includes the logical clusters used by each file (or each data object). After that, the overlapping entries of files (or data objects) in the lists are determined, and if the overlapping is above the pre-determined threshold (e.g., 40% of data segments overall), then the same logical cluster is assigned to both the files (or both data objects). In an implementation, the splitting of the deduplicated input data into the logical clusters includes assigning two files and two data objects to the same logical cluster. In another implementation, the splitting of the deduplicated input data into the logical clusters includes assigning two files to the same logical cluster. In yet another implementation, the splitting of the deduplicated input data into the logical clusters includes assigning two data objects to the same logical cluster. Thus, the assignment of the two files and/or two data objects to the same logical cluster is used to refer to the data segments with common data to provide an efficient and improved reading of the data on the magnetic tape with reduced complexity. In accordance with an embodiment, the method 100 further comprises buffering the input data until it reaches a pre-determined size before the deduplicating of the input data. In an implementation, the buffering of the input data occurs in a storage media, such as a hard disk drive (HDD) and solid-state drive (SSD), which increases the efficiency and the durability of the magnetic tape. In addition, the buffering of the input data increases the life span of the magnetic tape by reducing the running of the magnetic tape in a back-and-forth motion (i.e., shoe shinning).

At step 106, the method 100 further comprises determining a number of tape wraps K to be used for writing the deduplicated input data to the magnetic tape and a head of tape offset already reached in the magnetic tape. In an implementation, the head of the tape offset (or tape drive) includes multiple read and write elements to allow reading and writing of the number of tape wraps K, which is also referred to as the multiple adjacent tracks. In accordance with an embodiment, determining the number of tape wraps K are used for writing the deduplicated input data to the magnetic tape based on a tape wrap size, the size of the deduplicated input data, and the head of tape offset already reached in the magnetic tape. For example, the tape wrap size depends on the magnetic tape format and the last offset (or the head of the tape offset) written on the magnetic tape provides the size of the input data (e.g., X) and the number of tape wraps (e.g., a first tape wrap, a second tape wrap, and the like) that is written in the input data. Thus, the number of tape wraps K are used for writing the deduplicated input data to the magnetic tape sequentially without any holes. In accordance with an embodiment, the method 100 comprises determining a first writing location to start the writing of the deduplicated input data to the magnetic tape based on the head of the tape offset. For example, the number of offsets required to write four tape wraps are ten (10), one hundred (100), one thousand (1000), and ten thousand (10000) for each tape wrap, respectively, and the head of the magnetic tape is in another tape wrap of another offset, such as in the offset two hundred (200). Thus, the writing order for the current given logical cluster (or the data segment), including the size of one hundred (100), can be two tape wraps, including the offset one hundred (100) that is closest to the head location of wrap X offset two hundred (200), one tape wrap including the offset 10, two tape wraps including offset two hundred (200), and the like until the area of the offset reaches one thousand (1000). Thereafter, new logical clusters (or data segments) are added to a third tape wrap, including the offset one thousand (1000). Thus, the first writing location is used for providing the location to start the writing of the deduplicated input data to the magnetic tape that is based on the head of the tape offset.

At step 108, the method 100 comprises assigning, for each logical cluster, priorities to data segments in the logical cluster based on priority rules that define higher priorities for more common data segment and lower priorities for less common data segment among the set of files and/or data objects comprised in the logical cluster. In an implementation, the priorities are assigned to the data segments in the logical cluster based on the priority rules, such as the data segments that include common data are assigned high priority among the set of files and the data objects included in the logical cluster. In another implementation, the priorities are assigned to the data segments in the logical cluster based on the priority rules, such as the data segments that include common data are assigned high priority among the set of files or the data objects included in the logical cluster. In accordance with an embodiment, the priority rules further define a higher priority for one or more data segments contained in one or more files and/or data objects comprised in the logical cluster. In an implementation, the priority rules define the higher priority for the one or more data segments contained in the one or more files and the data objects that are included in the logical cluster. In another implementation, the priority rules define the higher priority for the one or more data segments contained in the one or more files or the data objects that are included in the logical cluster. The higher priority for one or more data segments in the logical cluster is used to determine the order of writing of the data segments. Thus, the method 100 includes ordering the common data with higher priority to provide an efficient and improved readability of the data.

At step 110, the method 100 comprises determining an order of writing of data segments from the logical cluster to the magnetic tape by ordering the data segments in accordance with the assigned priorities. In an implementation, the data is recorded on the magnetic tape in a linear serpentine manner, such that the first tape wrap is written from start to end. Thereafter, the second tape wrap is written from end to start and then the third tape wrap from start to end and the like. In addition, the direction of recording and of reading is the direction of the relevant tape wrap. Thus, the even number of tape wraps (e.g., 0, 2, 4, ...) are from the start of the magnetic tape to the end of the magnetic tape, while the odd number of tape wraps (e.g., 1, 3, 5, ...) are from the end of the magnetic tape to start of the magnetic tape. Finally, a band is composed of an equal number of start-to-end tape wraps and end-to-start tape wraps. In an example, the input data is written to K tape wraps, such as from a tape wrap n to a tape wrap n+k-1. Thereafter, each tape wrap is ordered according to the assigned priorities. Furthermore, the method 100 includes determining the tape wrap on which to write the next data segment by nearest neighbour with some thresholds to ensure we utilize all K tape wraps. Thus, the data segments are ordered to ensure an optimal read order from the magnetic tape.

At step 112, the method 100 further comprises determining a writing plan that includes a tape wrap and an offset in the tape wrap. The tape wrap and the offset in the tape wrap are used for writing each data segment of the logical cluster to the magnetic tape. The method 100 is used to plan the writing of each next data segment in the order of writing into the tape wrap different from the tape wrap of a previous data segment, within the determined K tape wraps, in the nearest available location to a location of the previous data segment. The available location to the location of the previous data segment is in accordance with distances between the tape wraps and offsets reached in each of the determined K tape wraps in all the writing plans. After dividing the input data into logical clusters. For each data segment, the number of files (or data objects) in the logical cluster is determined from the logical cluster that includes the most common data to the logical cluster that includes the least common data. For example, the entire input data is going to be written to K tape wraps, from tape wrap n to tape wrap n+k-1. Thus, for each tape wrap during the re-ordering, the offset in the tape wrap is stored. Thereafter, the method 100 includes determining the tape wrap on which to write the next data segment by the nearest neighbour, with some thresholds to ensure we utilize all K tape wraps. In an implementation, A [] is an array of data segments, ordered from most common (e.g., A[0]) to least common (e.g., A[N-1]), where N is the number of data segments in the current logical cluster. Moreover, the tape wrap (e.g., tape wrap[]) is an array storing for each tape wrap, the offset reached so far in it, across all logical clusters of current input data. So that tape wrap (e.g., tape wrapfi]) is the offset last written to in tape wrap i. Thereafter, the K tape wraps are the number of tape wraps that are used to store the data segments from tape wrap 0, offset 0. Finally, an array Out [] is the output array used for mapping the data segment (i) that includes the tape wrap and offset in the tape wrap as shown in the below-mentioned pseudo-code:

Reorder_cluster( A [ ] ) : curr_wrap = 0

Out[0] = <curr_wrap, Wrap[curr_wrap]>

Wrap[curr_wrap] += A[0].size

For (i = 1, i < N, ++i) curr wrap = min(distance(Wrap[], curr wrap)) Out[i] = <curr_wrap, Wrap[curr_wrap]> Wrap[curr_wrap] += A[i].size

Return Out[]

In accordance with an embodiment, the nearest neighbour to the last written data segment is determined by a minimum distance between offsets reached in all the determined K tape wraps and the offset in the tape wrap of the last written data segment. In an implementation, the minimum distance corresponds to a function that takes the offsets reached in all the used tape wraps and the current tape wrap and returns the tape wrap where the offset reached is the closest to the last offset written in the current tape wrap. In another implementation, the actual head of the magnetic tape is located in a single offset on a single tape wrap and is considered as the nearest neighbour, which is also referred to as a starting point. The nearest neighbour provides an optimal order to read the data more efficiently with reduced complexity, such as by moving the head of the magnetic tape drive across tape wraps than by moving the head of the magnetic tape across the length of the tape wrap.

At step 112, the method 100 comprises writing the deduplicated input data to the magnetic tape using the writing plans of all the logical clusters. Firstly, the method 100 is used for assigning priorities to the data segments in the logical clusters based on the priority rules. Thereafter, the method 100 includes determining the order of writing of the data segments from the logical cluster to the magnetic tape by ordering the data segments in accordance with the assigned priorities. After that, the method 100 includes determining a writing plan that includes the tape wrap and the offset in the tape wrap for writing each data segment of the logical cluster to the magnetic tape so that to plan the writing of each next data segment. Finally, the method 100 includes writing the deduplicated input data to the magnetic tape using the writing plans of all the logical clusters.

The method 100 is used to write the data to the magnetic tape efficiently with reduced seek time. The method 100 is used for the deduplication and splitting of the input data and further assigns priority to the data segments according to the common data stored in the data segments and order the data segments as per the priority. In addition, the buffering of the input data increases the life span of the magnetic tape. Furthermore, the method 100 is used for providing an optimal order to read the data more efficiently with reduced complexity, such as by moving the head of the tape drive across tape wraps than by moving the head of the tape across the length of the tape wrap. Thus, the method 100 is configured to write the magnetic tape with reduced seek time to read the files (or the data objects) from the magnetic tape with reduced complexity.

The steps 102 to 112 are only illustrative, and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.

FIG. 2 is a block diagram that depicts an apparatus, in accordance with an embodiment of the present disclosure. With reference to FIG. 2 there is shown a block diagram 200 of an apparatus 202 that includes a magnetic-tape data storage 204, a magnetic tape 206, a plurality of tape wraps 208, a communication interface 214, a memory 212, and a data processing module 210.

The apparatus 202 is configured to write data to the magnetic tape 206 that is included in the magnetic-tape data storage 204. Moreover, the magnetic-tape data storage 204 includes the magnetic tape 206, and the plurality of tape wraps 208. The magnetic tape 206 is composed of tracks on which data is written and from which data is read. A track runs the length of the magnetic tape 206, from tape start to tape end. In addition, the head of a tape drive used for reading and writing that contains multiple read and write elements that allows reading and writing on multiple adjacent tracks are called as the plurality of tape wraps 208, such as a first tape wrap 208A, a second tape wrap 108B, and the like.

The communication interface 214 is used by the apparatus 202 to communicate with another device(s). Examples of implementation of the communication interface 214 may include but are not limited to a network interface, a computer port, a network socket, a network interface controller (NIC), and any other network interface device.

The data processing module 210 is configured to write data to the magnetic tape 206. Examples of implementation of the data processing module 210 may include but are not limited to a central data processing device, a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application-specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a state machine, and other processors or control circuitry.

The memory 212 is configured to store the instructions to write to the magnetic tape 206. Examples of implementation of the memory 212 may include, but are not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Dynamic Random- Access Memory (DRAM), Random Access Memory (RAM), Read-Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, a Secure Digital (SD) card, Solid-State Drive (SSD), and/or CPU cache memory.

The apparatus 202 is configured to write data to the magnetic tape 206. In an implementation, the magnetic tape 206 is composed of tracks on which the data is written and from which the data is read. Moreover, the magnetic tape 206 is divided into bands and a tape drive head that covers the width of a single band. In addition, each band is composed of an even number of tape wraps. For example, linear tape open ninth-generation (LTO 9) tapes are constructed according to the specifications of the LTO 9. Thus, the apparatus 202 is configured to write data to the magnetic tape 206 efficiently and reliably through a recommended access order and time-based access order system techniques for optimized reading order of objects (or files), such as by reordering the data before writing. The magnetic-tape data storage 204 includes the magnetic tape 206 with the plurality of tape wraps 208, such as the first tape wrap 208A, the second tape wrap 208B and the Nth tape wrap 208N. In an implementation, the magnetic-tape data storage 204 includes the magnetic tape 206 that includes the first tape wrap 108A. Similarly, the magnetic-tape data storage 204 includes the second tape wrap 108B and the like. Furthermore, the apparatus 202 includes the data processing module 210, which is configured to deduplicate input data and split the deduplicated input data into logical clusters. Moreover, each logical cluster includes a set of files and/or data objects of the deduplicated input data that contain a number of common data segments in the deduplicated input data. In an implementation, the logical clusters correspond to a set of data segments in the input data (i.e., the deduplicated input data), such as a set of files (or objects) that are reordered together to provide an optimal reading of each individual file (or object) in the logical cluster. In an implementation, each logical cluster includes the set of files and the data objects of the deduplicated input data that includes the number of common data segments in the deduplicated input data. In another implementation, each logical cluster includes the set of files of the deduplicated input data that contain a number of common data segments in the deduplicated input data. In yet another implementation, each logical cluster includes the data objects of the deduplicated input data that contain the number of common data segments in the deduplicated input data.

In accordance with an embodiment, the data processing module 210 is configured to split the deduplicated input data into logical clusters. For each file and/or data object of the deduplicated input data, a list of data segments is created that is contained in the file and/or data object and assigns two files and/or data objects to the same logical cluster if an overlap of their lists of data segments is above a pre-determined threshold. In an implementation, the data processing module 210 is configured to split the deduplicated input data into logical clusters and creates the list of data segments that are contained in the file and the data object for each file and for each data object of the deduplicated input data. In another implementation, the data processing module 210 is configured to split the deduplicated input data into logical clusters and creates the list of data segments that are contained in the file for each file of the deduplicated input data. In yet another implementation, the splitting of the deduplicated input data into logical clusters includes creating the list of data segments that are contained in the data object for each data object of the deduplicated input data. Firstly, the input data, such as the deduplicated input data is processed, and thereafter, the list is created that includes the logical clusters used by each file (or each data object). After that, the overlapping entries of files (or the data objects) in the lists are determined, and if the overlapping is above the pre-determined threshold (e.g., 40% of data segment overall), then the same logical cluster is assigned to both the files (or both data objects). In an implementation, the apparatus 202 is configured to assign two files, and two data objects to the same logical cluster. In another implementation, the apparatus 202 is configured to assign two files to the same logical cluster. In yet another implementation, the apparatus 202 is configured to assign two data objects to the same logical cluster. Thus, the assignment of the two files and or two data objects to the same logical cluster is used to refer to the data segments with common data to provide an efficient and improved reading of the data on the magnetic tape 206 with reduced complexity. In accordance with an embodiment, the data processing module 210 is further configured to buffer the input data until it reaches a predetermined size before the deduplicating of the input data. In an implementation, the buffering of the input data occurs in a storage media, such as a hard disk drive (HDD) and solid-state drive (SSD), which increases the efficiency and the durability of the magnetic tape 206. In addition, the buffering of the input data increases the life span of the magnetic tape 206 by reducing the running time of the magnetic tape 206 in back-and-forth motion (i.e., shoe shinning).

The data processing module 210 is configured to determine a number of tape wraps K used to write the deduplicated input data to the magnetic tape 206 and a head of the tape offset already reached in the magnetic tape 206. In an implementation, the head of the tape offset (or tape drive) includes multiple read and write elements to allow reading and writing of the number of tape wraps K, which is also referred to as the multiple adjacent tracks. In accordance with an embodiment, the data processing module 210 is configured to determine the number of tape wraps K used to write the deduplicated input data to the magnetic tape 206 based on a tape wrap size, a size of the deduplicated input data and the head of the tape offset already reached in the magnetic tape 206. For example, the tape wrap size depends on the magnetic tape format and the last offset (or the head of the tape offset) written on the magnetic tape provides the size of the input data (e.g., X) and the number of tape wraps (e.g., a first wrap, a second wrap, and the like) that is written in the input data. Thus, the number of tape wraps K are used for writing the deduplicated input data to the magnetic tapes sequentially without any holes. In accordance with an embodiment, the data processing module 210 is further configured for determining a first writing location to start the writing of the deduplicated input data to the magnetic tape based on the head of the tape offset. For example, the number of offsets required to write four tape wraps are ten (10), one hundred (100), one thousand (1000), and ten thousand (10000) for each tape wrap respectively, and the head of the magnetic tape is in another tape wrap of another offset, such as in the offset two hundred (200). Thus, the writing order for the current given logical cluster (or the data segment), including the size of one hundred (100), can be two tape wraps, including the offset one hundred (100) that is closest to the head location of tape wrap X offset two hundred (200), one tape wrap including the offset 10, two tape wraps including offset two hundred (200), and the like until the area of the offset reaches one thousand (1000). Thereafter, new logical clusters (or data segments) are added to a third tape wrap, including the offset one thousand (1000). Thus, the first writing location is used for providing the location to start the writing of the deduplicated input data to the magnetic tape that is based on the head of the tape offset.

For each logical cluster, the data processing module 210 is configured to assign priorities to data segments in the logical cluster based on priority rules that define higher priorities for more common data segment and lower priorities for less common data segment among the set of files and/or data objects comprised in the logical cluster. In an implementation, the priorities are assigned to the data segments in the logical cluster based on the priority rules, such as the data segments that include common data are assigned high priority among the set of files and the data objects included in the logical cluster. In another implementation, the priorities are assigned to the data segments in the logical cluster based on the priority rules, such as the data segments that include common data are assigned high priority among the set of files or the data objects included in the logical cluster. In accordance with an embodiment, the priority rules are configured to be set by a user to define a higher priority for one or more data segments contained in one or more files and/or data objects comprised in the logical cluster. In an implementation, the priority rules define the higher priority for the one or more data segments contained in the one or more files and the data objects that are included in the logical cluster. In another implementation, the priority rules define the higher priority for the one or more data segments contained in the one or more files or the data objects that are included in the logical cluster. The higher priority for one or more data segments in the logical cluster is used to determine the order of writing of the data segments. Thus, the apparatus 202 is configured to order the common data with higher priority to provide an efficient and improved readability of the data.

The data processing module 210 is configured to determine an order of writing of data segments from the logical cluster to the magnetic tape by ordering the data segments in accordance with the assigned priorities. In an implementation, the data is recorded on the magnetic tape 206 in a linear serpentine manner, such that the first tape wrap 208A is written from start to end. Thereafter, the second tape wrap 208B is written from end to start and then a subsequent tape wrap from start to end and the like. In addition, the direction of recording and of reading is the direction of the relevant tape wrap. Thus, the even number of tape wraps (e.g., 0, 2, 4, ...) are from the start of the magnetic tape 206 to the end of the magnetic tape 206, while the odd number of tape wraps (e.g., 1, 3, 5, ...) are from the end of the magnetic tape 206 to start of the magnetic tape 206. Finally, a band is composed of an equal number of start-to-end tape wraps and end-to-start tape wraps. In an example, the input data is written to K tape wraps, such as from a tape wrap n to a tape wrap n+k-1. Thereafter, each tape wrap is ordered according to the assigned priorities. Furthermore, the apparatus 202 is configured to determine the tape wrap to write the next data segment by nearest neighbour with some thresholds to ensure we utilize all K tape wraps. Thus, the data segments are ordered to ensure an optimal read order from the magnetic tape 206.

The data processing module 210 is configured to determine a writing plan that includes a tape wrap and an offset in the tape wrap. The tape wrap is used to write each data segment of the logical cluster to the magnetic tape 206. The data processing module 210 is configured to plan writing of each next data segment in the order of writing, into a tape wrap different from a tape wrap of a previous data segment, within the determined K tape wraps, in the nearest available location to a location of the previous data segment. The available location to the location of the previous data segment is in accordance with distances between the tape wraps and offsets reached in each of the determined K tape wraps in all the writing plans. After the division of the input data into logical clusters. For each data segment, the number of files (or data objects) in the logical cluster is determined from the logical cluster that includes the most common data to the logical cluster that includes the least common data. For example, the entire input data is going to be written to the determined K tape wraps, from tape wrap n to tape wrap n+k-1. Thus, for each tape wrap during the re-ordering, the offset in the tape wrap is stored. Thereafter, the data processing module 210 is configured to determine the tape wrap on which to write the next data segment by the nearest neighbour, with some thresholds to ensure we utilize all K tape wraps. In an implementation, A [] is an array of data segments, ordered from most common (e.g., A[0]) to least common (e.g., A[N-1]), where N is the number of data segments in the current logical cluster. Moreover, the tape wrap (e.g., tape wrap[]) is an array storing for each tape wrap, the offset reached so far in it, across all logical clusters of current input data. So that tape wrap (e.g., tape wrapfi]) is the offset last written to in tape wrap i. Thereafter, the K tape wraps are the number of tape wraps that are used to store the data segments from tape wrap 0, offset 0. Finally, an array Out [] is the output array used for mapping the data segment (i) that includes the tape wrap and offset in the tape wrap, as shown in the below-mentioned pseudocode:

Reorder_cluster( A [ ] ) : curr_wrap = 0

Out[0] = <curr_wrap, Wrap[curr_wrap]>

Wrap[curr_wrap] += A[0].size

Return Out[]

In accordance with an embodiment, the data processing module 210 is configured for determining the nearest neighbour to the last written data segment by a minimum distance between offsets reached in all the determined K tape wraps and an offset in a tape wrap of the last written data segment. In an implementation, the minimum distance corresponds to a function that takes the offsets reached in all the used tape wraps and the current tape wrap and returns the tape wrap where the offset reached is the closest to the last offset written in the current tape wrap. In another implementation, the actual head of the magnetic tape 206 is located in a single offset on a single tape wrap and is considered as the nearest neighbour, which is also referred to as a starting point. The nearest neighbour provides an optimal order to read the data more efficiently with reduced complexity, such as by moving the head of the tape drive across the tape wraps than by moving the head of the magnetic tape 206 across the length of the tape wrap.

The data processing module 210 is configured to control the magnetic-tape data storage 204 to write the deduplicated input data to the magnetic tape 206 using the writing plans of all the logical clusters. Firstly, the data processing module 210 is configured to assign priorities to the data segments in the logical clusters based on the priority rules. Thereafter, the data processing module 210 is configured to determine the order of writing of the data segments from the logical cluster to the magnetic tape 206 by ordering the data segments in accordance with the assigned priorities. After that, the data processing module 210 is configured to determine a writing plan that includes the tape wrap and the offset in the tape wrap for writing each data segment of the logical cluster to the magnetic tape 206 so that to plan the writing of each next data segment. Finally, the data processing module 210 is configured to write the deduplicated input data to the magnetic tape 206 using the writing plans of all the logical clusters.

The apparatus 202 is configured to write the data to the magnetic tape 206 of the magnetic-tape data storage 204 more efficiently with reduced seek time. The apparatus 202 includes the data processing module 210, which is configured for the deduplication and splitting of the input data and further assigns priority to the data segments according to the common data stored in the data segments and order the data segments as per the priority. In addition, the buffering of the input data increases the life span of the magnetic tape 206. Furthermore, the apparatus 202 provides an optimal order to read the data more efficiently with reduced complexity, such as by moving the head of the tape drive across tape wraps than by moving the head of the tape across the length of the tape wrap. Thus, the apparatus 202 is configured to write data to the magnetic tape 206 with the reduced seek time to read the files (or the data objects) from the magnetic tape 206 with reduced complexity.

FIG. 3 is a diagram that depicts reordering of data segments, in accordance with an embodiment. FIG. 3 is described in conjunction with elements from FIG. 2. With reference to FIG. 3, there is shown a diagram 300 that includes a first file 302, a second file 304, and a third file 306. Each file uses data segments, such as a first data segment 308 (i.e., datal), a second data segment 310 (i.e., data2), a third data segment 316 (i.e., data3), a fourth data segment 318 (i.e., data4), fifth data segment 320 (i.e., data5), a sixth data segment 314 (i.e., data6), and a seventh data segment 322 (i.e., data7). The size of each file, such as the first file 302, the second file 304, and the third file 306 is 30 GB. The first file 302 and the second file 304 shares the first data segment 308 that is of 5GB. In addition, the first file 302 and the third file 306 shares the third data segment 316 that is of 20GB. Furthermore, the second file 304 and the third file 306 share the sixth data segment 314 that is of 5GB. In an implementation, the first file 302, the second file 304, and the third file 306 are the parts of the same logical cluster that are required to be re-ordered. Thus, the first data segment 308 is used by the first file 302, the second file 304, and the third file 306. Similarly, the second data segment 310 is used by the first file 302 and the third file 306. Furthermore, the sixth data segment 314 is used by the second data segment 310 is used by the second file 304 and the third file 306. Finally, the third data segment 316, the fourth data segment 318 that is of 5GB, the fifth data segment 320 that is of 15GB, and the seventh data segment 322 that is of 15GB, are used by the first file 302.

FIG. 4 is a diagram that depicts reordering of data segments without prioritizing specific file read optimization, in accordance with an embodiment. FIG. 4 is described in conjunction with elements from FIGs. 2 and 3. With reference to FIG. 4, there is shown a diagram 400 that includes a magnetic tape (e.g., the magnetic tape 206 of FIG. 2). Moreover, the magnetic tape includes a first tape wrap 402 (i.e., a tape wrap 0), a second tape wrap 404 (i.e., a wrap 1), and a third tape wrap 406 (i.e., a tape wrap2). The data processing module 210 (of FIG. 2) is configured to reorder the data segments to read the file (i.e., the third file 306). The first data segment 308 and the fourth data segment 318 are arranged on the first tape wrap 402 (i.e., the tape wrapO). However, the offset of the first data segment 308 is zero (0) and the offset of the fourth data segment 318 is 5GB. Similarly, the second data segment 310 with offset zero (0), the third data segment 316 with offset of 5 GB, and the seventh data segment 322 with an offset of 25 GB is arranged on the second tape wrap 404 (i.e., tape wrapl). Furthermore, the sixth data segment 314 with offset zero (0) and the fifth data segment 320 with an offset of 5GB is arranged on the third tape wrap 406 (i.e., tape wrap2). Thus, the data segments that are reordered without prioritizing increases the seek time to read the files, such as the third file 306. Hence, the reordering of the data segment with prioritizing is required to increase the efficiency with reduced seek time.

FIG. 5 is a diagram that depicts re-ordering of data segments by prioritizing file read optimization, in accordance with an embodiment. FIG. 5 is described in conjunction with elements from FIGs. 2, 3, and 4. With reference to FIG. 5, there is shown a diagram 500 that includes the reordering of the data segments according to the priority. The data processing module 210 (of FIG. 2) is configured to reorder the data segments to read the file (i.e., the third file 306) by prioritizing the data segments according to the data segments that includes the common data. The first data segment 308 with offset zero (0), the second data segment 310 with an offset of 5 GB, and the third data segment with an offset of 10GB are arranged on the first tape wrap 402 (i.e., the tape wrapO). Similarly, the sixth data segment 314 with offset zero (0), the fourth data segment 318 with an offset of 5GB, and the fifth data segment 320 with an offset of 10GB is arranged on the second tape wrap 404 (i.e., tape wrapl). Furthermore, the seventh data segment 322 with offset zero (0) is arranged on the third tape wrap 406 (i.e., tape wrap2). Thus, the reordering of the data segment with prioritizing increases the efficiency with reduced seek time and reduced complexity.

Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including", "comprising", "incorporating", "have", "is" used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or to exclude the incorporation of features from other embodiments. The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". It is appreciated that certain features of the present disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the present disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as suitable in any other described embodiment of the disclosure.

Claims

1. A method (100) of writing data to magnetic tape (206), comprising: deduplicating input data, splitting the deduplicated input data into logical clusters, wherein each logical cluster comprises a set of files and/or data objects of the deduplicated input data that contain a number of common data segments in the deduplicated input data, determining a number of tape wraps K to be used for writing the deduplicated input data to a magnetic tape (206) and a head of tape offset already reached in the magnetic tape (206), for each logical cluster: assigning priorities to data segments in the logical cluster based on priority rules that define higher priorities for more common data segment and lower priorities for less common data segment among the set of files and/or data objects comprised in the logical cluster, determining an order of writing of data segments from the logical cluster to the magnetic tape (206) by ordering the data segments in accordance with the assigned priorities, and determining a writing plan that comprises a tape wrap and an offset in the tape wrap for writing each data segment of the logical cluster to the magnetic tape (206) so that to plan writing of each next data segment, in the order of writing, into a tape wrap different from a tape wrap of a previous data segment, within the determined K tape wraps, in a nearest available location to a location of the previous data segment in accordance with distances between the tape wraps and offsets reached in each of the determined K tape wraps in all the writing plans, and writing the deduplicated input data to the magnetic tape (206) using the writing plans of all the logical clusters.

2. The method (100) of claim 1 further comprising buffering input data until it reaches a predetermined size before the deduplicating of the input data.

3. The method (100) of claim 1 or 2, wherein the determining of the number of tape wraps K to be used for writing the deduplicated input data to the magnetic tape (206) is based on a tape wrap size, a size of the deduplicated input data and the head of tape offset already reached in the magnetic tape (206).

4. The method (100) of any one of claims 1 to 3, wherein the splitting of the deduplicated input data into logical clusters comprises: for each file and/or data object of the deduplicated input data, creating a list of data segments that are contained in the file and/or data object, and assigning two files and/or data objects to the same logical cluster if an overlap of their lists of data segments is above a pre-determined threshold.

5. The method (100) of any one of claims 1 to 4 further comprising determining a first writing location to start the writing of the deduplicated input data to the magnetic tape (206) based on the head of tape offset.

6. The method (100) of any one of claims 1 to 5, wherein the nearest neighbor to the last written data segment is determined by a minimum distance between offsets reached in all the determined K tape wraps and an offset in a tape wrap of the last written data segment.

7. The method (100) of any one of claims 1 to 6, wherein the priority rules further define a higher priority for one or more data segments contained in one or more of files and/or data objects comprised in the logical cluster.

8. An apparatus (202) for writing data to magnetic tape (206), comprising: a magnetic-tape data storage (204) comprising a magnetic tape (206) with a plurality of tape wraps (208), and a data processing module (210) configured for: deduplicating input data, splitting the deduplicated input data into logical clusters, wherein each logical cluster comprises a set of files and/or data objects of the deduplicated input data that contain a number of common data segments in the deduplicated input data, determining a number of tape wraps K to be used for writing the deduplicated input data to the magnetic tape (206) and a head of tape offset already reached in the magnetic tape (206), for each logical cluster: assigning priorities to data segments in the logical cluster based on priority rules that define higher priorities for a more common data segments and lower priorities for less common data segment among the set of files and/or data objects comprised in the logical cluster, determining an order of writing of data segments from the logical cluster to the magnetic tape (206) by ordering the data segments in accordance with the assigned priorities, and determining a writing plan that comprises a tape wrap and an offset in the tape wrap for writing each data segment of the logical cluster to the magnetic tape (206) so that to plan writing of each next data segment, in the order of writing, into a tape wrap different from a tape wrap of a previous data segment, within the determined K tape wraps, in a nearest available location to a location of the previous data segment in accordance with distances between the tape wraps and offsets reached in each of the determined K tape wraps in all the writing plans, and controlling the magnetic-tape data storage (204) to write the deduplicated input data to the magnetic tape (206) using the writing plans of all the logical clusters.

9. The apparatus (202) of claim 8, wherein the data processing module (210) is further configured for buffering input data until it reaches a pre-determined size before the deduplicating of the input data.

10. The apparatus (202) of claim 8 or 9, wherein the data processing module (210) is configured for determining the number of tape wraps K to be used for writing the deduplicated input data to the magnetic tape (206) based on a tape wrap size, a size of the deduplicated input data and the head of tape offset already reached in the magnetic tape (206).

11. The apparatus (202) of any one of claims 8 to 10, wherein the data processing module (210) is configured for splitting the deduplicated input data into logical clusters by means of: for each file and/or data object of the deduplicated input data, creating a list of data segments that are contained in the file and/or data object, and assigning two files and/or data objects to the same logical cluster if an overlap of their lists of data segments is above a pre-determined threshold.

12. The apparatus (202) of any one of claims 8 to 11, wherein the data processing module (210) is further configured for determining a first writing location to start the writing of the deduplicated input data to the magnetic tape (206) based on the head of tape offset.

13. The apparatus (202) of any one of claims 8 to 12, wherein the data processing module (210) is configured for determining the nearest neighbour to the last written data segment by a minimum distance between offsets reached in all the determined K tape wraps and an offset in a tape wrap of the last written data segment.

14. The apparatus (202) of any one of claims 8 to 13, wherein the priority rules are configured to be set by a user to define a higher priority for one or more data segments contained in one or more of files and/or data objects comprised in the logical cluster.