CN108132759B

CN108132759B - Method and device for managing data in file system

Info

Publication number: CN108132759B
Application number: CN201810035872.XA
Authority: CN
Inventors: 林烽; 陈文生
Original assignee: Wangsu Science and Technology Co Ltd
Current assignee: Wangsu Science and Technology Co Ltd
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2021-04-16
Anticipated expiration: 2038-01-15
Also published as: CN108132759A

Abstract

The invention discloses a method and a device for managing data in a file system, and belongs to the technical field of data storage. The method comprises the following steps: formatting an original file system, establishing a new file system based on a pre-allocated high IOPS storage medium partition and a pre-allocated low IOPS storage medium, and setting an I-node form in the high IOPS storage medium partition; and storing the metadata to be stored into the I-node form, and storing the file data to be stored into the low IOPS storage medium. The invention can improve the service performance of the file system.

Description

Method and device for managing data in file system

Technical Field

The present invention relates to the field of data storage technologies, and in particular, to a method and an apparatus for managing data in a file system.

Background

A CDN (Content Delivery Network) server generally stores a large number of files therein, and a file system for managing the files is established. Data in a file system can be divided into file data and metadata, the file data refers to specific content data of a file, the metadata refers to system data used for describing file attributes, such as access rights, file owners, distribution information of storage areas, description information of the file system (such as available space of the file system), and the like.

The CDN server has a large capacity for data storage, generally selects a mechanical hard disk with a large storage capacity and low cost to store data in a file system, may specifically establish a file system for managing files based on the mechanical hard disk, and when storing files, the CDN server may store metadata of the files and file data in the mechanical hard disk, and perform management operation on the files through the file system. When a file access request of the outside for a certain file is received, the CDN server may perform an I/O (Input/Output, read/write) operation on the mechanical hard disk through the file system, obtain metadata of the file from the mechanical hard disk, then locate a storage location of the file data through the metadata, and further may feed back the file data stored in the mechanical hard disk to the outside.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

a CDN server often needs to concurrently process a large number of file access requests, demand for an IOPS (Input/Output Operations Per Second) of a storage medium is increasing, but an IOPS capability of a mechanical hard disk is poor, and fast feedback of a file access request cannot be achieved, so that service performance of a file system is poor.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present invention provide a method and an apparatus for managing data in a file system. The technical scheme is as follows:

in a first aspect, a method for managing data in a file system is provided, the method comprising:

formatting an original file system, establishing a new file system based on a pre-allocated high IOPS storage medium partition and a pre-allocated low IOPS storage medium, and setting an I-node form in the high IOPS storage medium partition;

and storing the metadata to be stored into the I-node form, and storing the file data to be stored into the low IOPS storage medium.

Optionally, the method further includes:

if the high IOPS storage medium partition has a residual storage space, storing directory files and indirect block data to be stored into the residual storage space;

and when the residual storage space is insufficient, continuously storing the directory file to be stored and indirect block data into the low IOPS storage medium.

Optionally, the method further includes:

estimating the average file size of all files to be stored, and estimating the number of I-nodes based on the capacity of the low IOPS storage medium and the average file size;

and determining the storage capacity of the high IOPS storage medium partition according to the number of the I-nodes, the unit capacity of the I-nodes and the unit storage capacity of the low IOPS storage medium.

Optionally, the storing the file data to be stored in the low IOPS storage medium includes:

dividing the low IOPS storage medium into a plurality of continuous storage areas with uniform size according to the maximum parallel number of process reading and writing of the file system;

and for a file to be stored, selecting a target storage area from the plurality of storage areas by adopting a preset random algorithm, and writing file data of the file to be stored from a first available storage unit of the target storage area.

Optionally, the selecting a target storage area in the plurality of storage areas by using a preset random algorithm, and writing the file data of the file to be stored from a first available storage unit of the target storage area includes:

if the file size of the file to be stored is larger than a preset value, selecting a target storage area in the plurality of storage areas by adopting a preset random algorithm, and writing file data of the file to be stored from a first available storage unit of the target storage area;

and if the file size of the file to be stored is not larger than the preset value, writing file data of the file to be stored from the idle storage unit pointed by the position cursor, and updating the idle storage unit pointed by the position cursor, wherein the idle storage unit pointed by the position cursor is always the first idle storage unit in the plurality of storage areas.

Optionally, the high IOPS storage medium is a solid state disk, and the low IOPS storage medium is a mechanical hard disk.

In a second aspect, an apparatus for managing data in a file system is provided, the apparatus comprising:

the system comprises an establishing module, a file system creating module and a file system creating module, wherein the establishing module is used for formatting an original file system, establishing a new file system based on a pre-allocated high IOPS storage medium partition and a pre-allocated low IOPS storage medium partition, and setting an I-node form in the high IOPS storage medium partition;

and the storage module is used for storing the metadata to be stored into the I-node form and storing the file data to be stored into the low IOPS storage medium.

Optionally, the storage module is further configured to:

Optionally, the apparatus further comprises:

the estimation module is used for estimating the average file size of all files to be stored and estimating the number of the I-nodes based on the capacity of the low IOPS storage medium and the average file size;

and the determining module is used for determining the storage capacity of the high IOPS storage medium partition according to the number of the I-nodes, the unit capacity of the I-nodes and the unit storage capacity of the low IOPS storage medium.

Optionally, the storage module is specifically configured to:

In a third aspect, there is provided a file storage device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, the at least one instruction, the at least one program, set of codes, or set of instructions being loaded and executed by the processor to implement a method of managing data in a file system as claimed in any one of claims 1 to 6.

In a fourth aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of managing data in a file system according to any one of claims 1 to 6.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, an original file system is formatted, a new file system is established based on a pre-allocated high IOPS storage medium partition and a low IOPS storage medium, an I-node form is set in the high IOPS storage medium partition, metadata to be stored is stored in the I-node form, and file data to be stored is stored in the low IOPS storage medium. Therefore, the metadata is stored by using the high-IOPS storage media with strong quick feedback capacity, such as the solid state disk, and the like, so that the IOPS pressure brought to the low-IOPS storage media, such as the mechanical hard disk, and the like by the metadata access processing in the file system is shared, and the service performance of the file system can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a data storage diagram of a file system according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for managing data in a file system according to an embodiment of the present invention;

FIG. 3 is a block diagram of an apparatus for managing data in a file system according to an embodiment of the present invention;

FIG. 4 is a block diagram of an apparatus for managing data in a file system according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a file storage device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The embodiment of the invention provides a method for managing data in a file system, wherein an execution main body of the method can be a file storage device, wherein the file storage device can be any device which stores a large number of files and has a file management function, and can be a terminal or a server. At least two storage media with different IOPS exist in the file storage device, and the file storage device can construct a local file system based on a single storage medium or a plurality of storage media. The file storage device may be provided with a processor, a memory, and a transceiver, the processor may be configured to process a process of managing data in the file system, the memory may be configured to store data required and generated in the following process, and the transceiver may be configured to perform data interaction between the file storage device and the outside. In this embodiment, a case where the file storage device is a CDN server, the high IOPS storage medium is a solid state disk, and the low IOPS storage medium is a mechanical hard disk is taken as an example for description, and other cases are similar and will not be described one by one. It can be understood that the method described in the embodiment of the present invention may also be applied to servers or file storage devices in other non-CDN fields.

Fig. 1 is a schematic diagram of data storage of a solid state disk and a mechanical hard disk in this embodiment, where a technician on a CDN server side may partition a solid state disk partition used for establishing a file system in the solid state disk in advance, where the solid state disk partition may include an I-node form and a reserved storage space, the I-node form is a form composed of a large number of I-nodes (information nodes) with the same size, each file in the file system corresponds to one I-node in the form, and some attribute information of each file, such as the size of the file, the owner of the file, and creation time, is recorded in the I-node; the mechanical hard disk can contain a large number of storage units for storing file data, and all the storage units have the same size and are respectively corresponding to unique numbers.

The process flow shown in fig. 2 will be described in detail below with reference to the specific embodiments, and the contents may be as follows:

step 201, formatting an original file system, establishing a new file system based on a pre-allocated high IOPS storage medium partition and a pre-allocated low IOPS storage medium, and setting an I-node form in the high IOPS storage medium partition.

In implementation, a technician on the CDN server side may control the CDN server to separately store data inside the file system. Specifically, when the CDN server detects a preset trigger condition of a process of managing data in a file system described below, the CDN server may format an existing original file system established on the mechanical hard disk, and may establish a new file system based on a pre-allocated solid state disk partition and the mechanical hard disk. And then, the CDN server can set an I-node form in the solid state disk partition. It can be understood that the preset trigger condition may be set arbitrarily by a technician according to an operating state of the CDN server, and may be a start instruction received by a user, or may be a time point reached by a preset time point, or a current load detected to be not greater than a preset load value.

Step 202, storing the metadata to be stored into an I-node form, and storing the file data to be stored into a low IOPS storage medium.

In implementation, after the CDN server establishes a new file system based on the solid state disk partition and the mechanical hard disk, the CDN server may record start and stop addresses of the solid state disk partition and the mechanical hard disk. And then, the CDN server can acquire metadata and file data of the file to be stored, store the metadata to be stored into the I-node form, and then store the file data to be stored into the mechanical hard disk.

Optionally, the CDN server may also preferentially store the directory file and the indirect block data in the solid state disk, and the corresponding processing may be as follows: if the high IOPS storage medium partition has the residual storage space, storing the directory file and indirect block data to be stored into the residual storage space; and when the residual storage space is insufficient, continuously storing the directory file and the indirect block data to be stored into the low IOPS storage medium.

In order to manage a file directory, the file directory is usually stored as a file, and the file is called a directory file.

If the size of the file data exceeds the capacity of one storage unit on the mechanical hard disk, the file data needs to be stored in a plurality of storage units. The CDN server may record the distribution of the file data in a disk sequence number list included in the I-node corresponding to the file. When the number of storage units occupied by the file data exceeds the capacity of the disk sequence number list, the number of a part of the storage units needs to be recorded in a storage unit outside the disk sequence number list, and a pointer of the storage unit is recorded in the disk sequence number list, the CDN server can find the part of the storage units through the pointer, and the storage unit pointed by the pointer can be regarded as indirect block data.

In implementation, if there is available storage space (i.e., remaining storage space) in the solid state disk partition in addition to the I-node table, the CDN server may store the directory file and indirect block data to be stored in the remaining storage space. Specifically, after the CDN server constructs a new file system, the CDN server stores data to be stored in units of files, and may store metadata in the I-node form first, and then store the relevant directory file and indirect block data in the solid state disk partition. When the remaining storage space in the solid state disk partition is insufficient and the directory file and the indirect block data cannot be stored continuously, the CDN server may store the subsequent directory file and indirect block data continuously into the mechanical hard disk.

Optionally, the CDN server may preset the storage capacity of the solid state disk partition according to the storage status of the file, and the corresponding processing may be as follows: estimating the average file size of all files to be stored, and estimating the number of I-nodes based on the capacity of the low IOPS storage medium and the average file size; and determining the storage capacity of the high IOPS storage medium partition according to the number of the I-nodes, the unit capacity of the I-nodes and the unit storage capacity of the low IOPS storage medium.

In implementation, the CDN server may estimate an average file size of all files to be stored in advance, and then estimate the number of files that can be stored in the mechanical hard disk based on the capacity of the mechanical hard disk and the average file size, that is, estimate the maximum number of I-nodes of the file system. For example, if the average file size is estimated to be 1MB and the capacity of the mechanical hard disk is 1TB, it can be expected that at least 100 ten thousand I-nodes are required. Then, the CDN server may first calculate the storage capacity occupied by the I-node form according to the number of I-nodes and the unit capacity of the I-nodes, where, for example, the unit capacity of one I-node is 128B, the number of I-nodes is 100 ten thousand, and the storage capacity of the I-node form is 128 MB. Further, a part of storage space may be reserved in the solid state disk partition to store the directory file and the indirect block data, and specifically, the size of the reserved storage space may be calculated according to the number of the directory file or the indirect block data and the unit storage capacity of the mechanical hard disk. Assuming 100 ten thousand I-nodes, there are 1/4 directory files or indirect block data, and the unit storage capacity of the mechanical hard disk is 4KB, 1000000 × 1/4 × 4KB of storage space needs to be reserved. And adding the storage capacity of the I-node form and the reserved storage space to determine the storage capacity of the solid state disk partition.

Optionally, the storage area of the mechanical hard disk may be divided, and then a storage area is randomly selected from a plurality of storage areas when storing the file data, and accordingly, step 202 may be as follows: dividing the low IOPS storage medium into a plurality of continuous storage areas with uniform size according to the maximum parallel number of process reading and writing of the file system; and for a file to be stored, selecting a target storage area from the plurality of storage areas by adopting a preset random algorithm, and writing file data of the file to be stored from a first available storage unit of the target storage area.

In implementation, the CDN server may count a maximum parallel number of process reads and writes of the file system, that is, a maximum number of file management operations existing in the same time, and then may divide the mechanical hard disk into a plurality of storage regions of uniform size and continuity according to the maximum parallel number, so that the number of the storage regions is greater than the maximum parallel number, and each storage region may include a plurality of storage units. Then, when storing the file data of a file to be stored, the CDN server may select a storage area (e.g., a target storage area) from the plurality of storage areas by using a preset random algorithm, and then may write the file data of the file to be stored, starting from a first available storage unit of the target storage area. Therefore, the situation that a plurality of files to be stored are written simultaneously, cross storage of the plurality of files occurs, and a large amount of fragmented files are generated can be effectively avoided.

Optionally, the file may be divided into a large file and a small file, and the large file and the small file are stored according to different storage manners, and the corresponding processing may be as follows: if the file size of the file to be stored is larger than a preset value, selecting a target storage area in a plurality of storage areas by adopting a preset random algorithm, and writing file data of the file to be stored in a first available storage unit of the target storage area; and if the file size of the file to be stored is smaller than or equal to the preset numerical value, writing the file data of the file to be stored from the idle storage unit pointed by the position cursor, and updating the idle storage unit pointed by the position cursor.

And the idle storage unit pointed by the position cursor is always the first idle storage unit in the plurality of storage areas.

In implementation, the CDN server may set a value (i.e., a preset value) as a division standard of the large file and the small file according to a size distribution condition of the file to be stored, and if the size of the file is greater than the preset value, the file is the large file, and if the size of the file is less than or equal to the preset value, the file is the small file. Furthermore, when storing file data of a file to be stored, if the file to be stored is a large file, the CDN server may select a target storage area in the plurality of storage areas by using a preset random algorithm, and start writing the file data of the file to be stored from a first available storage unit of the target storage area, and if the file to be stored is a small file, the CDN server may start writing the file data of the file to be stored from a first free storage unit in the plurality of storage areas, so that IOPS pressure of the mechanical hard disk when the small file is concurrently written in the file system may be reduced. Here, the CDN server may use the position cursor to point to the first free storage unit in the storage areas, so that when storing file data of a small file, the free storage unit may be searched by the position cursor, and after the storage is completed, the free storage unit pointed by the position cursor may be updated.

In the embodiment of the invention, an original file system is formatted, a new file system is established based on a pre-allocated high IOPS storage medium partition and a low IOPS storage medium, metadata to be stored is stored in an I-node form, and file data to be stored is stored in the low IOPS storage medium. Therefore, the metadata is stored by using the high-IOPS storage media with strong quick feedback capacity, such as the solid state disk, and the like, and the IOPS pressure brought to the low-IOPS storage media, such as the mechanical hard disk, and the like by the access processing of the metadata in the file system is shared, so that the service performance of the file system can be improved.

On the other hand, the ratio of metadata to file data in a general file system is below 1:100, the high-IOPS storage medium such as a solid-state disk is only used for storing the metadata, the corresponding storage capacity requirement is low, and a solid-state disk partition with a proper size can be estimated and allocated according to an application scene, so that the storage resources of the high-IOPS storage medium such as the solid-state disk can be saved, and the cost is effectively reduced.

Based on the same technical concept, there is provided an apparatus for managing data in a file system, as shown in fig. 3, the apparatus comprising:

an establishing module 301, configured to format an original file system, establish a new file system based on a pre-allocated high IOPS storage medium partition and a pre-allocated low IOPS storage medium, and set an I-node form in the high IOPS storage medium partition;

a storage module 302, configured to store the metadata to be stored in the I-node form, and store the file data to be stored in the low IOPS storage medium.

Optionally, the storage module 302 is further configured to:

Optionally, as shown in fig. 4, the apparatus further includes:

the estimation module 303 is configured to estimate an average file size of all files to be stored, and estimate the number of I-nodes based on the capacity of the low IOPS storage medium and the average file size;

a determining module 304, configured to determine the storage capacity of the partition of the high IOPS storage medium according to the number of I-nodes, the unit capacity of I-nodes, and the unit storage capacity of the low IOPS storage medium.

Optionally, the storage module 302 is specifically configured to:

It should be noted that: in the foregoing embodiment, when the device for managing data in a file system manages data in the file system, only the division of the functional modules is illustrated, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the apparatus for managing data in a file system and the method embodiment for managing data in a file system provided in the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not described herein again.

Fig. 5 is a schematic structural diagram of a file storage device according to an embodiment of the present invention. The file storage device 500 may vary widely in configuration or performance and may include one or more central processors 522 (e.g., one or more processors) and memory 532, one or more storage media 530 (e.g., one or more mass storage devices) storing application programs 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a file storage device. Still further, the central processor 522 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the file storage device 500.

The file storage device 500 may also include one or more power supplies 529, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, one or more keyboards 556, and/or one or more operating systems 541, such as Windows Server, Mac OSXTM, UnixTM, LinuxTM, FreeBSDTM, etc.

File storage device 500 may include memory, and one or more programs, where the one or more programs are stored in the memory and configured for execution by one or more processors to include instructions for managing data in the file system described above.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for managing data in a file system, the method comprising:

storing metadata to be stored into the I-node form, and storing file data to be stored into the low IOPS storage medium;

the method further comprises the following steps:

and presetting the storage capacity of the high IOPS storage medium partition according to the storage condition of the file and the unit storage capacity of the low IOPS storage medium, wherein the storage condition comprises the storage capacity of the I-node form, the number of the directory files and the number of indirect block data.

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising:

4. The method of claim 1, wherein storing the file data to be stored in the low IOPS storage medium comprises:

5. The method according to claim 4, wherein the selecting a target storage area among the plurality of storage areas by using a preset random algorithm, and writing the file data of the file to be stored from a first available storage unit of the target storage area comprises:

6. The method of any of claims 1-5, wherein the high IOPS storage media is a solid state disk and the low IOPS storage media is a mechanical hard disk.

7. An apparatus for managing data in a file system, the apparatus comprising:

the system comprises an establishing module, a file system creating module and a file system creating module, wherein the establishing module is used for formatting an original file system, establishing a new file system based on a pre-allocated high IOPS storage medium partition and a pre-allocated low IOPS storage medium partition, and setting an I-node form in the high IOPS storage medium partition; the storage capacity of the high IOPS storage medium partition is preset according to the storage condition of files and the unit storage capacity of the low IOPS storage medium, wherein the storage condition comprises the storage capacity of the I-node form, the number of directory files and indirect block data;

8. The apparatus of claim 7, wherein the storage module is further configured to:

9. The apparatus of claim 7, further comprising:

10. The apparatus of claim 7, wherein the storage module is specifically configured to:

11. The apparatus of claim 10, wherein the storage module is specifically configured to:

12. The apparatus of any of claims 7-11, wherein the high IOPS storage media is a solid state disk and the low IOPS storage media is a mechanical hard disk.

13. A file storage device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, the at least one instruction, the at least one program, set of codes, or set of instructions being loaded and executed by the processor to implement a method of managing data in a file system according to any one of claims 1 to 6.

14. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of managing data in a file system according to any one of claims 1 to 6.