CN111427841A

CN111427841A - Data management method and device, computer equipment and storage medium

Info

Publication number: CN111427841A
Application number: CN202010120131.9A
Authority: CN
Inventors: 刘昌鑫; 李立帅
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-02-26
Filing date: 2020-02-26
Publication date: 2020-07-17
Also published as: WO2021169113A1

Abstract

The application discloses a data management method, a data management device, computer equipment and a storage medium, wherein the method comprises the steps of storing a directory tree of a file system in a metadata server; partitioning file data of each file contained in the file system, and respectively storing the partitioned file data in a plurality of nodes of a cluster server; partitioning attribute information corresponding to the file system, and respectively storing the partitioned attribute information in a plurality of nodes of a cluster server; acquiring a data read-write instruction sent by a user; judging whether the directory tree contains specified directory information or not; if yes, extracting a specified file identifier corresponding to specified directory information from the metadata server; determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier; and finishing the read-write operation with each appointed node according to the read-write operation information. The method and the device reduce the operating pressure of the metadata server and improve the data reading efficiency of the high-concurrency scene.

Description

Data management method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data management method, apparatus, computer device, and storage medium.

Background

A file system typically includes a large amount of file data distributed in directories or sub-directories of the file system. A directory contains a large number of subdirectories or file data, and each subdirectory contains a large number of file data. The metadata of the file system includes a directory tree and attribute information. The directory tree is used for recording the mapping relation between the file data logic and the physical position, and the attribute information is used for recording the data of the attribute information such as the file size, the modification time, the read-write permission and the like. In the related art, directory entry metadata, file data, and attribute information are generally managed centrally. In a high concurrency scene, due to mutual exclusion of data access, the concurrency of services is low, the time consumption of data reading is long, and the system efficiency is low.

Disclosure of Invention

The present application mainly aims to provide a data management method, an apparatus, a computer device and a storage medium, and aims to solve the problems of centralized management of file data, long time consumption for data reading and low system efficiency in the prior art.

The application provides a data management method, which comprises the following steps:

storing a directory tree of a file system in a metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;

acquiring a data reading and writing instruction sent by a user, wherein the data reading and writing instruction carries specified directory information and reading and writing operation information of a file to be read and written;

judging whether the directory tree contains the specified directory information or not, wherein the directory tree is stored in a key-value mode, the directory information of each file is taken as a key, and a file identifier corresponding to each file is taken as a value;

if yes, extracting a specified file identifier corresponding to the specified directory information from the metadata server;

determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier;

and completing the read-write operation with each appointed node according to the read-write operation information.

Further, the step of blocking file data of each file included in the file system and storing each block in a plurality of nodes of the cluster server includes:

dividing the file data of each file into a plurality of first blocks of data according to a first preset size;

determining each first node for storing each first block of data from all nodes of the cluster server according to a preset hash algorithm and respectively according to the number of the first blocks of data corresponding to each file, and establishing a first mapping relation between each first block of data and each first node;

and respectively storing the first block data corresponding to each file to each first node according to the first mapping relation.

Further, the read-write operation information includes a data read start value and a data read length value, and the step of completing the read-write operation with each of the designated nodes according to the read-write operation information includes:

extracting the data reading starting point value and the data reading length value from the read-write operation information;

determining a first target node corresponding to the data reading starting point value from each designated node;

in the first target node, reading the read data corresponding to the data read length value from the first block of data corresponding to the first target node with the data read start point value as a start point.

Further, the read-write operation information includes a data operation start point value and data write-in information, and the step of completing the read-write operation with each of the designated nodes according to the read-write operation information includes:

extracting the data operation starting point value and the data writing information from the read-write operation information;

determining a second target node corresponding to the data operation starting point value from a plurality of designated nodes;

in the second target node, with the data operation starting point value as a starting point, writing the data writing information into the first block data corresponding to the second target node to obtain update block data corresponding to the second target node;

and redistributing each node of the updated block data stored in the cluster server according to a preset hash algorithm.

Further, the step of determining, according to a preset hash algorithm, each first node for storing each first block of data from all nodes of the cluster server according to the number of the first block of data corresponding to each file, and establishing a first mapping relationship between each first block of data and each first node includes:

numbering all nodes in the cluster server in sequence according to Arabic numbers starting from 0 to respectively obtain the number of each node;

numbering all first block data corresponding to one file in sequence according to a preset numbering rule to obtain all file block numbers corresponding to the file;

according to the preset hash algorithm, respectively carrying out hash calculation on a file identifier corresponding to one file and a file block number corresponding to each first block of data corresponding to the file identifier, and respectively obtaining a hash value corresponding to each first block of data;

performing modular operation on the maximum node number according to the hash value corresponding to each first block of data to obtain a modular operation result;

and mapping each first block of data to each first node according to a preset mapping rule, wherein the preset mapping rule is that the modulo operation result of the first block of data is equal to the node number of the first node.

Further, the metadata server is a distributed metadata server cluster, and the step of storing the directory tree of the file system in the metadata server includes:

storing the directory tree of the file system in each metadata server in the metadata server cluster, wherein one metadata server is used as a main metadata server, and the other metadata servers are used as auxiliary metadata servers, wherein the main metadata server is used for providing metadata service for the outside;

judging whether the main metadata server fails or not;

and if the main metadata server fails, selecting one from the metadata servers as a new main metadata server, and adopting the new main metadata server to provide metadata service for the outside.

Further, after the step of determining whether the directory tree includes the specified directory information, the method includes:

if not, distributing a corresponding new file identifier for the specified directory information in the directory tree;

and distributing nodes for storing the file data corresponding to the new file identification for the new file identification through a preset hash algorithm.

The present application further provides a data management apparatus, including:

the data storage unit is used for storing the directory tree of the file system in the metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;

the instruction acquisition unit is used for acquiring a data read-write instruction sent by a user, wherein the data read-write instruction carries specified directory information and read-write operation information of a file to be read and written;

the judging unit is used for judging whether the specified directory information is contained in the directory tree or not, wherein the directory tree is stored in a key-value mode, the directory information of each file is taken as a key, and the file identifier corresponding to each file is taken as a value;

the identification extracting unit is used for extracting the specified file identification corresponding to the specified directory information from the metadata server if the specified directory information is contained;

the node determining unit is used for determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier;

and the read-write operation unit is used for finishing the read-write operation with each appointed node according to the read-write operation information.

The present application further proposes a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of any of the above-mentioned methods when executing the computer program.

The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.

The beneficial effect of this application:

according to the data management method, the data management device, the computer equipment and the storage medium, the directory tree of the file system is stored in the metadata server; partitioning file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of the cluster server; partitioning attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server; when data reading and writing operations are carried out, only the appointed file identification corresponding to the file to be read and written is searched from the metadata server, the reading and writing operations can be carried out on the appointed node corresponding to the file to be read and written according to the appointed file identification, and the reading and writing operations are not required to be carried out on the metadata server, so that on one hand, the operating pressure of the metadata server is greatly reduced, and on the other hand, the data reading efficiency of a high concurrency scene is greatly improved.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a data management method according to an embodiment of the present application;

FIG. 2 is a block diagram schematically illustrating a structure of a data management apparatus according to an embodiment of the present application;

fig. 3 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, an embodiment of the present application provides a data management method, including:

s1, storing the directory tree of the file system in a metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;

s2, acquiring a data reading and writing instruction sent by a user, wherein the data reading and writing instruction carries specified directory information and reading and writing operation information of a file to be read and written;

s3, judging whether the directory tree contains the specified directory information or not, wherein the directory tree is stored in a key-value mode, the directory information of each file is taken as a key, and the file identifier corresponding to each file is taken as a value;

s4, if yes, extracting the appointed file identification corresponding to the appointed directory information from the metadata server;

s5, determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier;

and S6, completing the read-write operation with each appointed node according to the read-write operation information.

In this embodiment, in step S1, the file data is data included in the file content in the file system, and the attribute information is data in which attribute information such as a file size, modification time, and read/write permission is recorded. The directory tree is stored in a dedicated metadata server, the file data is stored in a plurality of nodes of the cluster server in a distributed manner, and the attribute information is also stored in a plurality of nodes of the cluster server in a distributed manner.

The metadata server is a metadata server specially storing the directory tree. The metadata server can be a single server; or a plurality of metadata servers can be adopted for the distributed metadata server cluster, and a full peer-to-peer mode is adopted for the plurality of metadata servers, namely each metadata server is completely peer-to-peer, each metadata server can independently provide metadata service to the outside, and the data of each metadata server is kept synchronous. When the metadata server adopts a distributed metadata server cluster, one metadata server is adopted as a main metadata server to provide metadata service for the outside, and the data of the other metadata servers are kept to be updated synchronously but not provide the metadata service for the outside; when a main metadata server fails, selecting one from the other metadata servers as a new main metadata server; therefore, the reliability of the external service of the metadata server is ensured.

The cluster server comprises a plurality of nodes and is used for storing file data and attribute information in a distributed mode. In a cluster server, each node stores a portion of data, and a plurality of nodes collectively constitute complete data. The distributed cluster server respectively disperses the file data and the attribute information to a plurality of nodes. When a user needs to read and write data of a certain file, the data can be read and written at the corresponding node without reading and writing from the metadata server, and the pressure of the metadata server is effectively reduced.

In the above steps S2 to S4, in the metadata server, the directory tree is stored in a key-value form, the key-value storage can bring good scalability to the distributed storage management of the file system, and if the file data to be processed is increased, the mapping relationship of new file data is added to the key-value directory tree. Table 1 is a specific example of a directory tree stored in a key-value form in a file system having sub-folders dir1, dir2 and dir3 under folder FS1, wherein the sub-folder dir3 includes files 1 and 2.

TABLE 1 File System directory Tree

Key with a key body	Value of
		FS1	0001
0001/dir1	0002
		0001/dir2	0003
0001/dir3	0004
		0001/0004/file1	0005
0001/0004/file2	0006

When a user wants to open a file2 file and perform read-write operation on the file2 file, the specified directory information of the file to be read and written, namely < FS1/dir3/file2>, is carried in the data read-write instruction. For the specific example of table 1, the metadata server first finds that the value of FS1 is 0001, then finds a directory named "dir 3" under the root directory with the prefix "0001", that is, the value corresponding to the key of "0001/dir 3" is "0004", and further finds a file named "file 2" under the root directory with the prefix "0001/0004", that is, finds that the value corresponding to the key of "0001/0004/file 2" is "0006". When the metadata server finds the specified file identifier corresponding to the specified directory information, that is, it is determined that the specified directory information is included in the directory tree, the specified file identifier is extracted, and the specific example of table 1 is the value "0006". The logical location of the file data on the cluster server is determined according to the value "0006".

In the above steps S5-S6, each designated node of the file to be read and written distributed in the cluster server can be determined according to the designated file identifier of the file to be read and written. According to the appointed file identification, the specific node information of the file data distributed and stored in the cluster server corresponding to the appointed file identification can be determined. And then completing corresponding read-write operation in the specified node according to the read-write operation information in the data read-write instruction.

In the data management method of the embodiment, a directory tree of a file system is stored in a metadata server; partitioning file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of the cluster server; partitioning attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server; when data read-write operation is carried out, firstly, a data read-write instruction sent by a user is obtained, wherein the data read-write instruction carries specified directory information and read-write operation information of a file to be read and written; when the directory tree contains the specified directory information, extracting the specified file identification corresponding to the specified directory information from the metadata server; determining each designated node distributed in the cluster server by file data corresponding to the file to be read and written according to the designated file identifier; finally, completing the read-write operation between the designated nodes according to the read-write operation information; therefore, only the designated file identification corresponding to the file to be read and written is searched from the metadata server, the read-write operation can be carried out on the corresponding designated node according to the designated file identification, and the read-write operation is not required to be carried out on the metadata server, so that on one hand, the operating pressure of the metadata server is greatly reduced, and on the other hand, the data reading efficiency of a high-concurrency scene is greatly improved.

In an embodiment, the step S1 of partitioning the file data of each file included in the file system and storing each block in each of the plurality of nodes of the cluster server includes:

s101, dividing the file data of each file into a plurality of first blocks of data according to a first preset size;

s102, determining each first node for storing each first block of data from all nodes of the cluster server according to a preset hash algorithm and establishing a first mapping relation between each first block of data and each first node respectively according to the number of the first block of data corresponding to each file;

s103, respectively storing the first block data corresponding to each file to each first node according to the first mapping relation.

In this embodiment, the plurality of first block data are distributed to the plurality of nodes of the cluster server by a preset hash algorithm. The first node is selected from a cluster server, and refers to a single node storing a single first block of data. The first block data of each file data is distributed through a preset hash algorithm, each first block data is positioned to a corresponding first node, and a mapping relation between each first block data and each first node is established, so that the load of each node in the cluster server is kept uniform to the maximum extent, and the problems that the load of some nodes is too large and the load of some nodes is too small can be solved.

In an embodiment, the step S6 of completing the read-write operation with each designated node according to the read-write operation information includes:

s601, extracting the data reading start value and the data reading length value from the read-write operation information;

s602, determining a first target node corresponding to the data reading starting point value from each designated node;

s603, in the first target node, reading the read data corresponding to the data read length value from the first block of data corresponding to the first target node by taking the data read start point value as a start point.

In this embodiment, in step S601, when the user needs to read a certain data segment from the file data to be read and written, the corresponding data reading start value and data reading length value may be written in the read-write operation information. In a specific example, for example, a user wants to read a data segment with a length of 2M from a data point with an offset of 60M from a first character from a file to be read and written, the data reading start point value in the corresponding read and write operation information is < off ═ 60M >, and the data reading length value is < len ═ 2M >.

In step S602, a first target node where the data reading start point is located is determined according to the data reading start point value. For example, for the foregoing specific example, in step S5, each designated node of the file distribution to be read and written is determined according to the designated file identifier, and in this step S602, the node is used to determine which node of each designated node the data read start point value of < off ═ 60M > is specifically located, and the node is the above-mentioned first target node. Specifically, matching is performed in a first mapping relationship between each of the first block data and the first node according to the data reading start point value, for example, the file data to be read and written is sequentially divided into a plurality of first block data according to a size of 25M, if it is determined that the data reading start point value is located in the divided 3 rd first block data according to "off ═ 60M >, matching is performed from the first mapping relationship, and the first node corresponding to the first block data is obtained, that is, the first target node can be determined.

In step S603, in the first target node, read data corresponding to the data read length value is read from the data read start point value. The reading operation of the data is completed in the nodes of the cluster server, and the operation in the metadata server is not needed, so that the pressure of the metadata server is greatly reduced, and the high-concurrency scene service is facilitated.

In an embodiment, the step S6 of completing the read/write operation with each designated node according to the read/write operation information includes:

s611, extracting the data operation starting point value and the data writing information from the read-write operation information;

s612, determining a second target node corresponding to the data operation starting point value from the designated nodes;

s613, in the second target node, writing the data write information into the first block data corresponding to the second target node using the data operation start point value as a start point, to obtain update block data corresponding to the second target node;

and S614, redistributing each node of the updated block data stored in the cluster server according to a preset hash algorithm.

In this embodiment, in the step S611, when the user needs to write a certain data segment into the file data to be read and written, the corresponding data operation start point value and data write information may be written into the read and write operation information. In a specific example, for example, a user wants to write a data segment with a content of < xxxx > from a data point with an offset of 60M from a first character in a file to be read and written, a data read start point value in corresponding read and write operation information is < off ═ 60M >, and data write information is < xxxx >.

In step S612, the second target node where the data operation start point is located is determined according to the data operation start point value. For example, for the foregoing specific example, in step S5, each designated node of the file distribution to be read and written is determined according to the designated file identifier, and in this step S612, the node is used to determine which node of each designated node the data operation start point value of < off ═ 60M > is specifically located, and the node is the above-mentioned second target node. Specifically, according to the data operation starting point value, matching is performed in the first mapping relationship between each of the first block data and the first node, for example, the file data to be read and written is sequentially divided into a plurality of first block data according to the size of 25M, then according to < off ═ 60M >, it can be determined that the data operation starting point value is located in the divided 3 rd first block data, then matching is performed from the first mapping relationship, and the first node corresponding to the first block data is obtained, that is, the second target node can be determined.

In steps S613 to S614, the second target node writes corresponding data write information from the data operation start point value to obtain updated block data. And if the size of the updated block data exceeds the first preset size, the updated block data is stored in a plurality of nodes of the cluster server in a blocking mode through a preset hash algorithm again, so that the load of each node of the whole cluster server is kept balanced to the maximum extent. After step S614, the first mapping relationship is further updated.

The data writing operation of the embodiment is completed in the nodes of the cluster server, and the operation in the metadata server is not needed, so that the pressure of the metadata server is greatly reduced, and the high-concurrency scene service is facilitated.

In an embodiment, the step S102 of determining, according to the number of the first block data corresponding to each file, each first node for storing each first block data from all nodes of the cluster server according to a preset hash algorithm, and establishing a first mapping relationship between each first block data and each first node includes:

s1021, numbering all nodes in the cluster server in sequence according to Arabic numbers from 0 to obtain the number of each node;

s1022, numbering all the first block data corresponding to one file in sequence according to a preset numbering rule to obtain all file block numbers corresponding to one file;

s1023, respectively carrying out hash calculation on a file identifier corresponding to one file and a file block number corresponding to each first block of data corresponding to the file identifier according to the preset hash algorithm to respectively obtain a hash value corresponding to each first block of data;

s1024, performing modular operation on the maximum node number according to the hash value corresponding to each first block of data to obtain a modular operation result;

and S1025, respectively mapping each first block of data to each first node according to a preset mapping rule, wherein the preset mapping rule is that the modulo operation result of the first block of data is equal to the node number of the first node.

In this embodiment, each node in the cluster server is numbered, for example, from 0 to N-1, a hash value corresponding to each first block of data is obtained by performing hash calculation on a value corresponding to each file data in the directory tree key-value table and a file block number of each first block of data, the hash value is modulo N to obtain a remainder i, and a node numbered i in the cluster server is a first node corresponding to the first block of data. The hash algorithm is the prior art, and the algorithm is not described herein in detail.

In one embodiment, the metadata server is a distributed metadata server cluster, and the step S1 of storing the directory tree of the file system in the metadata server includes:

s121, storing the directory tree of the file system in each metadata server in the metadata server cluster, taking one metadata server as a main metadata server, and taking the other metadata servers as slave metadata servers, wherein the main metadata server is used for providing metadata service for the outside;

s122, judging whether the main metadata server fails or not;

and S123, if the main metadata server fails, selecting one from the metadata servers as a new main metadata server, and providing metadata service to the outside by adopting the new main metadata server.

In an embodiment, the step S1 of storing the attribute information corresponding to the file system in a distributed manner in the cluster server includes:

s111, dividing the attribute information into a plurality of second block data according to a second preset size;

s112, according to the number of the second block data, determining each second node for storing each second block data from all nodes of the cluster server through a preset hash algorithm, and establishing a second mapping relation between each second block data and each second node;

and S113, respectively storing the second block data to the second nodes according to the second mapping relation.

In this embodiment, the plurality of second block data are distributed to the plurality of nodes of the cluster server by a preset hash algorithm. The second node is selected from a cluster server, and refers to a single node storing a single second block of data. Second block data of the attribute information are distributed through a preset hash algorithm, each second block data is positioned to a corresponding second node, and a mapping relation between each second block data and each second node is established, so that the load of each node in the cluster server is kept uniform to the maximum extent, and the problems that the load of some nodes is too large and the load of some nodes is too small can be avoided. Specifically, the specific allocation calculation process of the preset hash algorithm includes numbering each node in the cluster server, for example, from 0 to N-1, performing hash calculation on the file block number of each second block data to obtain a hash value corresponding to each second block data, modulo N the hash value to obtain a remainder i, and determining the node numbered i in the cluster server as the second node corresponding to the second block data. The specific process of hash calculation is the prior art, and is not described herein in detail.

In an embodiment, after the step S6 of completing the read/write operation with each of the designated nodes according to the read/write operation information includes:

and S7, updating the attribute information corresponding to the file system based on the read-write operation.

In this embodiment, the attribute information is used to record data of attribute information such as file size, modification time, and read-write permission. When the file of the file system is read and written, the attribute information of the file system is also modified. And modifying the attribute information based on specific read-write operation. Specifically, according to the second mapping relationship, a specific node stored in the attribute information is determined, and the attribute information is modified at the corresponding node.

In an embodiment, after the step S3 of determining whether the directory tree includes the specified directory information, the method includes:

s401, if not, distributing a corresponding new file identifier for the specified directory information in the directory tree;

s501, distributing nodes for storing file data corresponding to the new file identification for the new file identification through a preset hash algorithm.

In this embodiment, in step S401, when the directory tree does not include the specified directory information, a new file identifier is allocated to the specified directory information, and the directory tree is updated. For example, if the specified directory information is < FS1/dir3/file3>, and the corresponding directory information cannot be found in the directory tree in table 1, a new file identifier, for example, "0007", is allocated to the directory information, and the directory tree is updated, and a new key is "0001/0004/file 3", and the corresponding value is "0007".

In the step S501, a node of the file data corresponding to the newly created file identifier is allocated by using a preset hash algorithm, so as to create a file corresponding to the specified directory information in the node. Through the above steps S401 to S501, the user can add new file data in the file system. Only the directory tree needs to be updated in the metadata server, and the operations of newly creating other files and writing data can be completed at the nodes of the cluster server, so that the pressure of the metadata server is greatly reduced, and the method is favorable for the implementation of high-concurrency scene services.

Referring to fig. 2, an embodiment of the present application provides a data management apparatus, including:

a data storage unit 10 for storing a directory tree of the file system in the metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;

the instruction obtaining unit 20 is configured to obtain a data read-write instruction sent by a user, where the data read-write instruction carries specified directory information and read-write operation information of a file to be read and written;

a determining unit 30, configured to determine whether the directory tree includes the specified directory information, where the directory tree is stored in a key-value form, directory information of each file is used as a key, and a file identifier corresponding to each file is used as a value;

an identifier extracting unit 40, configured to extract, if the specified directory information is included, a specified file identifier corresponding to the specified directory information from the metadata server;

a node determining unit 50, configured to determine, according to the specified file identifier, each specified node where file data corresponding to the file to be read and written is distributed in the cluster server;

and a read-write operation unit 60, configured to complete read-write operation with each of the designated nodes according to the read-write operation information.

In this embodiment, the implementation processes of the functions and actions of the data storage unit 10, the instruction obtaining unit 20, the determining unit 30, the identifier extracting unit 40, the node determining unit 50, and the read-write operating unit 60 in the data management apparatus are specifically described in the implementation processes of steps S1-S6 in the data management method, and are not described herein again.

The data management device of the embodiment stores the directory tree of the file system in the metadata server; partitioning file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of the cluster server; partitioning attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server; when data read-write operation is carried out, firstly, a data read-write instruction sent by a user is obtained, wherein the data read-write instruction carries specified directory information and read-write operation information of a file to be read and written; when the directory tree contains the specified directory information, extracting the specified file identification corresponding to the specified directory information from the metadata server; determining each designated node distributed in the cluster server by file data corresponding to the file to be read and written according to the designated file identifier; finally, completing the read-write operation between the designated nodes according to the read-write operation information; therefore, only the designated file identification corresponding to the file to be read and written is searched from the metadata server, the read-write operation can be carried out on the corresponding designated node according to the designated file identification, and the read-write operation is not required to be carried out on the metadata server, so that on one hand, the operating pressure of the metadata server is greatly reduced, and on the other hand, the data reading efficiency of a high-concurrency scene is greatly improved.

In one embodiment, the data storage unit 10 includes:

the first dividing subunit is used for dividing the file data of each file into a plurality of first blocks of data according to a first preset size;

the first allocating subunit is configured to determine, according to the number of first block data corresponding to each file, each first node for storing each first block data according to a preset hash algorithm from all nodes of the cluster server, and establish a first mapping relationship between each first block data and each first node;

and the first storage subunit is configured to store the first block data corresponding to each file to each first node according to the first mapping relationship.

In this embodiment, the implementation processes of the functions and functions of the first partitioning subunit, and the first storage subunit in the data management apparatus are specifically described in the implementation processes of steps S101 to S103 in the data management method, and are not described herein again.

In one embodiment, the read/write operation unit 60 includes:

the first reading subunit is used for extracting the data reading starting point value and the data reading length value from the read-write operation information;

the first determining subunit is used for determining a first target node corresponding to the data reading starting point value from each designated node;

a first operation subunit, configured to, in the first target node, read, using the data read start point value as a start point, read, from a first block of data corresponding to the first target node, read data corresponding to the data read length value.

In this embodiment, the implementation processes of the functions and functions of the first reading subunit, the first determining subunit, and the first operating subunit in the data management apparatus are specifically described in the implementation processes of steps S601 to S603 in the data management method, and are not described herein again.

In one embodiment, the read/write operation unit 60 includes:

the second reading subunit is used for extracting the data operation starting point value and the data writing information from the reading and writing operation information;

the second determining subunit is used for determining a second target node corresponding to the data operation starting point value from the plurality of designated nodes;

a second operation subunit, configured to write, in the second target node, the data write information into the first block data corresponding to the second target node using the data operation start point value as a start point, so as to obtain update block data corresponding to the second target node;

and the redistribution subunit is used for redistributing each node of the updated block data stored in the cluster server according to a preset hash algorithm.

In this embodiment, the implementation processes of the functions and functions of the second reading subunit, the second determining subunit, the second operating subunit, and the redistribution subunit in the data management apparatus are specifically described in the implementation processes of steps S611 to S614 in the data management method, and are not described herein again.

In one embodiment, the first allocation subunit includes:

the first numbering module is used for numbering all the nodes in the cluster server in sequence according to Arabic numbers starting from 0 to respectively obtain the number of each node;

the second numbering module is used for numbering all the first block data corresponding to one file in sequence according to a preset numbering rule to obtain all file block numbers corresponding to the file;

the hash calculation module is used for respectively carrying out hash calculation on a file identifier corresponding to one file and a file block number corresponding to each first block of data corresponding to the file identifier according to the preset hash algorithm to respectively obtain a hash value corresponding to each first block of data;

the module taking operation module is used for respectively carrying out module taking operation on the maximum node number according to the hash value corresponding to each piece of the first block data to obtain a module taking operation result;

and the mapping module is used for mapping each first block of data to each first node according to a preset mapping rule, wherein the preset mapping rule is that the modulo operation result of the first block of data is equal to the node number of the first node.

In this embodiment, the implementation processes of the functions and actions of the first numbering module, the second numbering module, the hash calculation module, the modulo operation module, and the mapping module in the first allocating subunit are specifically described in the implementation processes of steps S1021 to S1025 in the data management method, and are not described herein again.

In an embodiment, the metadata server is a distributed metadata server cluster, and the data storage unit 10 includes:

the directory tree storage subunit is configured to store a directory tree of the file system in each metadata server in the metadata server cluster, and use one metadata server as a master metadata server and the other metadata servers as slave metadata servers, where the master metadata server is configured to provide a metadata service to the outside;

a judging subunit, configured to judge whether the primary metadata server fails;

and the reselection subunit is used for selecting one of the metadata servers as a new main metadata server if the main metadata server fails, and providing metadata service for the outside by adopting the new main metadata server.

In this embodiment, the implementation processes of the functions and functions of the directory tree storage subunit, the judgment subunit, and the reselection subunit in the data storage unit 10 are specifically described in the implementation processes of steps S121 to S123 in the data management method, and are not described herein again.

The data storage unit 10 includes:

the second dividing subunit is used for dividing the attribute information into a plurality of second block data according to a second preset size;

a second sub-distribution unit, configured to determine, according to the number of the second block data, each second node for storing each second block data from all nodes of the cluster server through a preset hash algorithm, and establish a second mapping relationship between each second block data and each second node;

and the second storage subunit is configured to store, according to the second mapping relationship, each piece of the second block data to each second node, respectively.

In this embodiment, the implementation processes of the functions and functions of the second partitioning subunit, and the second storage subunit in the data management apparatus are specifically described in the implementation processes of steps S111 to S113 in the data management method, and are not described herein again.

In one embodiment, the data management apparatus further includes:

and the attribute updating unit is used for updating the attribute information corresponding to the file system based on the read-write operation.

In this embodiment, the detailed implementation process of the function and the action of the attribute updating unit in the data management apparatus is described in the implementation process corresponding to step S7 in the data management method, and is not described herein again.

In one embodiment, the data management apparatus further includes:

the identification distribution unit is used for distributing a corresponding new file identification for the specified directory information in the directory tree if the specified directory information is not contained;

and the node distribution unit is used for distributing nodes for storing the file data corresponding to the new file identification for the new file identification through a preset hash algorithm.

In this embodiment, the implementation process of the functions and actions of the identifier allocating unit and the node allocating unit in the data management apparatus is specifically described in the implementation process corresponding to steps S401 to S501 in the data management method, and is not described herein again.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as directory trees and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data management method.

The processor executes the data management method, and includes:

In an embodiment, the step of the processor partitioning file data of each file included in the file system and storing each block in a plurality of nodes of the cluster server includes:

In an embodiment, the read-write operation information includes a data read start value and a data read length value, and the step of the processor completing the read-write operation with each of the designated nodes according to the read-write operation information includes:

In an embodiment, the read-write operation information includes a data operation start point value and data write-in information, and the step of the processor completing the read-write operation with each of the designated nodes according to the read-write operation information includes:

In an embodiment, the processor determines, according to a preset hash algorithm, each first node for storing each first block of data from all nodes of the cluster server according to the number of the first block of data corresponding to each file, and the step of establishing a first mapping relationship between each first block of data and each first node includes:

In an embodiment, the metadata server is a distributed metadata server cluster, and the step of the processor storing the directory tree of the file system in the metadata server includes:

judging whether the main metadata server fails or not;

In an embodiment, after the step of determining whether the directory tree includes the specified directory information, the processor includes:

Those skilled in the art will appreciate that the architecture shown in fig. 3 is only a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects may be applied.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a data management method, and specifically:

judging whether the main metadata server fails or not;

In summary, in the data management method, apparatus, computer device and storage medium of the present application, a directory tree of a file system is stored in a metadata server; partitioning file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of the cluster server; partitioning attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server; when data reading and writing operations are carried out, only the appointed file identification corresponding to the file to be read and written is searched from the metadata server, the reading and writing operations can be carried out on the appointed node corresponding to the file to be read and written according to the appointed file identification, and the reading and writing operations are not required to be carried out on the metadata server, so that on one hand, the operating pressure of the metadata server is greatly reduced, and on the other hand, the data reading efficiency of a high concurrency scene is greatly improved.

It will be understood by those of ordinary skill in the art that all or a portion of the processes of the methods of the embodiments described above may be implemented by hardware that is instructed to be associated with a computer program that may be stored on a non-volatile computer-readable storage medium that, when executed, may include the processes of the embodiments of the methods described above, wherein any reference to memory, storage, database or other medium provided herein and used in the embodiments may include non-volatile and/or volatile memory.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method for managing data, comprising:

2. The data management method according to claim 1, wherein the step of blocking the file data of each file included in the file system and storing each block in each of a plurality of nodes of a cluster server comprises:

3. The data management method according to claim 2, wherein the read-write operation information includes a data read start value and a data read length value, and the step of completing the read-write operation with each of the designated nodes according to the read-write operation information includes:

4. The data management method according to claim 2, wherein the read/write operation information includes a data operation start point value and data write information, and the step of completing the read/write operation with each of the designated nodes according to the read/write operation information includes:

5. The data management method according to claim 2, wherein the step of determining, according to a preset hash algorithm, each first node for storing each first block of data from all nodes of the cluster server according to the number of the first block of data corresponding to each file, and establishing the first mapping relationship between each first block of data and each first node comprises:

6. The data management method of claim 1, wherein the metadata server is a distributed metadata server cluster, and the step of storing the directory tree of the file system in the metadata server comprises:

judging whether the main metadata server fails or not;

7. The data management method according to claim 1, wherein said step of determining whether or not said directory tree includes said specified directory information comprises:

8. A data management apparatus, comprising:

9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.