WO2023179787A1 - Metadata management method and apparatus for distributed file system - Google Patents

Metadata management method and apparatus for distributed file system Download PDF

Info

Publication number
WO2023179787A1
WO2023179787A1 PCT/CN2023/083879 CN2023083879W WO2023179787A1 WO 2023179787 A1 WO2023179787 A1 WO 2023179787A1 CN 2023083879 W CN2023083879 W CN 2023083879W WO 2023179787 A1 WO2023179787 A1 WO 2023179787A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
directory
index node
node number
level
Prior art date
Application number
PCT/CN2023/083879
Other languages
French (fr)
Chinese (zh)
Inventor
苏昆辉
殳鑫鑫
杨彦斌
郑锴
王道远
孙大鹏
曹杰
孙立晟
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2023179787A1 publication Critical patent/WO2023179787A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Definitions

  • This specification relates to the field of storage technology, and in particular, to a metadata management method and device for a distributed file system.
  • this specification provides a metadata management method and device for a distributed file system.
  • a metadata management method for a distributed file system applied to a distributed file system.
  • the distributed file system stores a mapping relationship between keywords of directories at all levels and index node numbers of directory metadata, wherein, The current-level directory keywords are generated based on the index node number of the upper-level directory metadata and the current-level directory name.
  • the method includes:
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • a keyword for the current level directory is generated based on the directory name and preset characters.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • the directory name and the index node number of the upper-level directory metadata are spliced based on a preset sequence to generate this Keywords for level directories.
  • the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
  • the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database
  • the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
  • Optional also includes:
  • mapping relationship is updated based on the index node numbers found in the cloud database.
  • Optional also includes:
  • Optional also includes:
  • the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
  • a data access method for distributed file systems, applied to distributed file systems including:
  • the aforementioned metadata management method is used to perform metadata query based on the path.
  • a metadata management device for a distributed file system applied to a distributed file system.
  • the distributed file system stores a mapping relationship between keywords of directories at all levels and index node numbers of directory metadata, wherein, The current-level directory keyword is generated based on the index node number of the upper-level directory metadata and the current-level directory name.
  • the device includes:
  • the name acquisition unit extracts the directory names of directories at all levels from the file path
  • the keyword generation unit generates keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata for each extracted directory name in the order of the directory from upper level to lower level;
  • a number search unit that searches the index node number of the current level directory metadata in the mapping relationship based on the keyword
  • a metadata acquisition unit that acquires the file from the cloud database based on the index node number of the file path. Metadata for the path.
  • a data access device for a distributed file system applied to the distributed file system, including:
  • the metadata query unit responds to the data access request sent by the client and queries the corresponding metadata according to the path of the data to be accessed;
  • a data access unit returns the metadata to the client so that the client can access data based on the metadata
  • the aforementioned metadata management method is used to perform metadata query based on the path.
  • a metadata management device for a distributed file system including:
  • Memory used to store machine-executable instructions
  • the distributed file system stores a mapping relationship between the keywords of directories at each level and the index node numbers of directory metadata.
  • the keywords of the directory at this level are based on the index node numbers of the upper-level directory metadata and the name of the directory at this level.
  • a computer-readable storage medium stores a computer program, and the computer program is used to cause a processor to execute the above metadata management method.
  • the mapping relationship between directory keywords at all levels and the index node numbers of directory metadata is locally stored in the distributed file system.
  • the method of jointly storing metadata in a distributed file system and a cloud database solves the performance bottleneck of single-machine metadata services, improves the scalability of the system, and can provide file storage of more than one billion levels.
  • Figure 1 is a schematic diagram of the architecture of a distributed file system in related technologies.
  • Figure 2 is a schematic flowchart of a metadata management method for a distributed file system according to an exemplary embodiment of this specification.
  • FIG. 3 is a schematic flowchart of another metadata management method of a distributed file system according to an exemplary embodiment of this specification.
  • Figure 4 is a schematic architectural diagram of a distributed file system according to an exemplary embodiment of this specification.
  • Figure 5 is a schematic flowchart of a data access method in a distributed file system according to an exemplary embodiment of this specification.
  • FIG. 6 is a hardware structure diagram of an electronic device in which a metadata management device of a distributed file system is located according to an exemplary embodiment of this specification.
  • FIG. 7 is a block diagram of a metadata management device of a distributed file system according to an exemplary embodiment of this specification.
  • FIG. 8 is a block diagram of a data access device of a distributed file system according to an exemplary embodiment of this specification.
  • first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other.
  • first information may also be called second information, and similarly, the second information may also be called first information.
  • word “if” as used herein may be interpreted as "when” or “when” or “in response to determining.”
  • Figure 1 is a schematic architectural diagram of a distributed file system according to an exemplary embodiment of this specification.
  • a distributed file system may include name nodes and data nodes.
  • the namenode is responsible for managing the namespace of the distributed file system, maintaining the file system tree, and the metadata of each file and folder in the file system.
  • Datanode is used to store data in the form of blocks.
  • a distributed file system client When a distributed file system client performs data access, it can send an access request to the name node. Taking the data read request as an example, the name node will search for the corresponding metadata and return the metadata to the client. The client can then obtain the data block where the data is based on the metadata, and then read it from the data node based on the data block. corresponding data. Taking the data write request as an example, the name node will also search for the corresponding metadata. If the metadata is found, the metadata can be returned to the client.
  • the client can then obtain the data block where the data is based on the metadata, and then based on the data block to the data node and write the corresponding data; if the metadata is not found, you can create a new index node number and write Metadata such as the location of the data block is returned to the client, and the client can then write data in the corresponding data block.
  • This manual provides a metadata management solution for a distributed file system.
  • the distributed file system can jointly implement metadata storage in conjunction with the cloud database, thus solving the limitation of disk capacity on metadata storage.
  • Metadata is the data of data, which can be used to describe data attributes and support functions such as indicating storage location, historical data, resource search, file recording, etc.
  • Metadata includes necessary descriptive information for reading and writing, such as real path, size, creation time, permissions, etc.
  • the file path can usually point to a specific file, and the file can be accessed through the file path.
  • a path usually includes multiple levels of directories, and each level of directory can have a corresponding directory name.
  • the file path includes four levels of directories.
  • the names of these four levels of directories are the folder names, which are user, hive, warehouse and file.
  • the distributed file system can store the mapping relationship between the keywords of directories at all levels and the index node numbers of directory metadata, without the need to store the full amount of metadata.
  • the mapping relationship may be stored in the form of key-value, for example, in the Namenode.
  • the index node number is the Inode (index node, index node) number, that is, the Inode ID.
  • An inode is a data structure that allows metadata to be looked up based on the inode number.
  • the keyword may be generated based on the index node number of the upper-level directory metadata and the current-level directory name.
  • the name of the current-level directory and the index node number of the metadata of the upper-level directory are spliced to obtain the keywords of the current-level directory.
  • the keyword 100hive can be generated.
  • the current-level directory name and the index node number of the upper-level directory metadata are calculated to obtain the keywords of the current-level directory.
  • the keywords for the first-level directory there is no upper-level directory.
  • it can be based on a The first-level directory name and preset characters generate keywords for the first-level directory.
  • the keyword 0user can be generated based on the preset character 0 and the directory name user.
  • the cloud database can store the full amount of metadata, and can also store the mapping relationship between the keywords of directories at all levels and the index node numbers of the directory metadata, so that the distributed file system can update its stored mapping relationship.
  • the distributed file system and the cloud database jointly realize the storage of metadata. There is no need to store all metadata in the distributed file system.
  • This distributed metadata storage method can effectively solve the storage problem of metadata due to the disk capacity of the distributed file system. Restricted, suitable for application scenarios with massive files such as data lakes.
  • Figure 2 is a schematic flowchart of a metadata management method for a distributed file system according to an exemplary embodiment of this specification.
  • the metadata management method of the distributed file system can be applied to the distributed file system, for example, applied to the name node in the distributed file system, including the following steps:
  • Step 202 Extract the directory names of directories at each level from the file path.
  • the user-side client can send read and write requests to the distributed file system.
  • the distributed file system usually needs to search for the metadata of the file, the metadata of the file path, and sometimes the metadata of the file. Metadata of the directory where the file is located, metadata of the directory above the file, etc. Based on these metadata, information such as file type, file size, creation time, modification time, user, executable permissions, etc. can be obtained.
  • the distributed file system when performing metadata search, can first extract the directory names of directories at all levels from the file path.
  • the directory names user, hive, warehouse, and file at all levels can be extracted.
  • Step 204 According to the order of directories from upper level to lower level, for each extracted directory name, generate a keyword for the current level directory based on the directory name and the index node number of the upper level directory metadata.
  • Step 206 Search the index node number of the current level directory metadata in the mapping relationship based on the keyword.
  • the distributed file system can find the index node numbers of the directories at each level on the file path based on the mapping relationship between the locally stored keywords of the directories at each level and the index node numbers of the directory metadata.
  • the distributed file system Before performing a lookup, the distributed file system can generate the keys needed to look up the inode number.
  • the keywords in this manual are generated based on the index node number of the upper-level directory metadata, when querying the index node number, the keywords for each level of directory can be generated in order from the upper level to the lower level of the directory to perform directory indexing at all levels. Query node number.
  • the keywords of the first-level directory can be generated first, and then the index node number of the first-level directory metadata can be found in the above mapping relationship stored locally in the distributed file system based on the keywords of the first-level directory. Then, the index node number of the secondary directory metadata can be found in the above mapping relationship based on the directory name of the secondary directory and the index node number of the primary directory metadata. Then, based on the directory name of the third-level directory and the metadata of the second-level directory The index node number finds the index node number of the third-level directory metadata in the above mapping relationship. By analogy, the index node numbers of directory metadata at all levels on the file path can be found.
  • the distributed file system can store the following table in the form of key-value The mapping relationship shown in 2.
  • Table 2 is only an illustrative description. In actual applications, there is no need to store the left directory column. Moreover, in addition to storing the index node number, the value field can also store some metadata of the directory, such as directory name, directory size, etc.
  • the keyword 0user of the first-level directory when querying the index node number, can be generated based on the first-level directory name user and the preset character 0, and the mapping relationship shown in Table 2 can be queried based on the keyword 0user. Then the index node number 100 of the first-level directory metadata is found.
  • the secondary directory keyword 100hive can be generated based on the secondary directory name hive and the index node number 100 of the primary directory metadata, and the mapping relationship shown in Table 2 can be queried based on the keyword 100hive to find the secondary directory metadata.
  • the index node number is 101.
  • the third-level directory keyword 101warehouse can be generated based on the third-level directory name warehouse and the index node number 101 of the second-level directory metadata, and the mapping relationship shown in Table 2 can be queried based on the keyword 101warehouse to find the third-level directory metadata.
  • the index node number is 102.
  • the fourth-level directory keyword 102file can be generated based on the fourth-level directory name file and the index node number 102 of the third-level directory metadata, and the mapping relationship shown in Table 2 can be queried based on the keyword 102file to find the fourth-level directory metadata.
  • the index node number is 103.
  • step 202 can be executed before step 204, that is, before generating keywords, the directory names of directories at each level are extracted from the file path.
  • Step 202 can also be executed in conjunction with the loop process of steps 204-206, that is, in step 202, first extract the first-level directory name from the file path, and then execute steps 204-206 to generate the first-level directory keywords and search for the first-level directory name.
  • the index node number of the directory metadata then you can return to step 202 to extract the secondary directory name from the file path, and then execute steps 204-206 to generate the keywords of the secondary directory and search for the index node of the secondary directory metadata. number, and so on, and execute steps 202-206 in a loop. This manual does not impose special restrictions on this.
  • Step 208 Obtain the metadata of the file path from the cloud database based on the index node number of the file path.
  • the distributed file system can obtain the full amount of metadata pointed to by the index node numbers from the cloud database.
  • metadata can be obtained in the cloud database based on access requirements.
  • the metadata of the file path /user/hive/warehouse/file can be obtained based on the index number 103. If you need to obtain the file's upper-level directory, the metadata of the third-level directory /user/hive/warehouse can be obtained based on the index node number 102, or on this basis, the metadata of the second-level directory /user/hive can be obtained based on the index node number 101, etc.
  • the mapping relationship between directory keywords at all levels and the index node numbers of directory metadata is locally stored in the distributed file system.
  • the method of jointly storing metadata in a distributed file system and a cloud database solves the performance bottleneck of single-machine metadata services, improves the scalability of the system, and can provide file storage of more than one billion levels.
  • batch processing can be used to merge the multiple index node numbers that need to be queried, and then obtain these index node numbers from the cloud database at one time
  • Batch processing is used to obtain metadata from the cloud database.
  • the index node number of the directory metadata cannot be found in the mapping relationship based on the generated keywords in step 206, it means that the distributed file system has not yet matched the directory keywords at all levels stored in the cloud database with the directory.
  • the mapping relationship between the index node numbers of the metadata is stored locally; or, the original directory name is modified, and the keyword generated by the distributed file system using the new directory name cannot find the corresponding index node number.
  • the distributed file system When the distributed file system cannot find the index node number of the directory metadata in the local mapping relationship based on the generated keywords, it can search the index node number from the cloud database based on the generated keywords to update the local mapping relationship, and can update the local mapping relationship based on the generated keywords. Obtain the directory metadata from the index node number found in the cloud database.
  • the distributed file system when the distributed file system obtains the index node number of each level directory metadata on the new file path /user/hive001/warehouse/file, it first generates a first-level directory based on the first-level directory name user and the preset character 0.
  • the keyword 0user of the directory is used, and based on the keyword 0user, the locally stored mapping relationship shown in Table 2 is queried, and the index node number 100 of the first-level directory metadata is found.
  • the secondary directory keyword 100hive001 is generated based on the secondary directory name hive001 and the index node number 100 of the primary directory metadata. Based on this keyword 100hive001, the corresponding index node number cannot be queried in the locally stored table 2.
  • the distributed file system can then query the index node number in the cloud database. That is, query the index node number 101 corresponding to the keyword 100hive001 in the mapping relationship shown in Table 3 stored in the cloud database.
  • the distributed file system can also update the local storage mapping relationship based on the correspondence between the keyword 100hive001 and the index node number 101 queried in the cloud database, that is, update the local storage mapping relationship shown in Table 2 to that shown in Table 3 mapping relationship.
  • the directory name is modified, only the keywords of the corresponding directory need to be modified.
  • the distributed file system can also query the index node number of each lower-level directory metadata of the secondary directory in the cloud database, that is, further query the third-level directory and fourth-level directory in the cloud database
  • the index node number of the metadata is updated, and the local storage mapping relationship is updated based on the query results to avoid problems such as local query failure or inaccurate query caused by the name of the lower-level directory being also modified.
  • the distributed file system can also periodically obtain the latest mapping relationship from the cloud database and update the latest mapping relationship locally. This specification does not impose special restrictions on this.
  • Using the metadata management solution for distributed file systems provided in this manual can also ensure that the distributed file system obtains accurate metadata when metadata changes, and avoids obtaining incorrect metadata due to failure to update local storage mapping relationships in a timely manner. Data issues.
  • FIG. 3 is a schematic flowchart of another metadata management method of a distributed file system according to an exemplary embodiment of this specification.
  • the metadata management method of the distributed file system can be applied to the distributed file system and includes the following steps:
  • Step 302 Extract the directory names of directories at each level from the file path.
  • Step 304 According to the order of directories from upper level to lower level, for each extracted directory name, generate a keyword for the current level directory based on the directory name and the index node number of the upper level directory metadata.
  • Step 306 Search the index node index of the current level directory metadata in the local mapping relationship based on the keyword. Number.
  • steps 302-306 may refer to the implementation of steps 202-206 in the embodiment shown in FIG. 2, and will not be described in detail here.
  • Step 308 Search the cloud database for the index node number of the directory metadata corresponding to the directory keywords at each level of the file path.
  • the distributed file system after the distributed file system finds the index node numbers of directory metadata at all levels on the file path based on the mapping relationship of local storage, it can also query the cloud database at all levels based on the generated keywords of directories at all levels.
  • the directory's inode number After the distributed file system finds the index node numbers of directory metadata at all levels on the file path based on the mapping relationship of local storage, it can also query the cloud database at all levels based on the generated keywords of directories at all levels. The directory's inode number.
  • batch processing is used to merge multiple index node numbers that need to be queried, and then query the cloud database.
  • the distributed file system finds the index node numbers 100-103 of the directory metadata at all levels in the local storage mapping relationship, it can use the keywords of the directories at all levels to 0user, 100hive, 101warehouse and 102file query the index node numbers of directory metadata at all levels in the cloud database. That is, the index node number is queried based on the mapping relationship between the directory keywords at all levels stored in the cloud database and the index node number.
  • Step 310 Determine whether the index node number of the directory metadata at each level found based on the mapping relationship is the same as the index node number found in the cloud database.
  • the distributed file system determines whether the index node number found in the local mapping relationship is the same as the index node number found in the cloud database. .
  • step 312 can be executed.
  • step 314 can be executed.
  • Step 312 When the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database, the index node number based on the file path is retrieved from the cloud database. Get the metadata for the file path.
  • the index node number found in the local mapping relationship is the same as the index node number found in the cloud database, it can be explained that the locally stored mapping relationship is the latest mapping relationship, and the mapping relationship stored in the cloud database is the same.
  • the metadata has not changed and can be obtained from the cloud database based on the index node number.
  • Step 314 When the index node number of the directory metadata at all levels found based on the mapping relationship is different from the index node number found in the cloud database, update the index node number based on the index node number found in the cloud database. Mapping relationship, and obtain metadata based on the index node number queried in the cloud database.
  • the index node number found in the local mapping relationship is different from the index node number found in the cloud database, that is, the metadata index node number corresponding to the same directory keyword is different, it means that the cloud database
  • the directory name in the database may have been updated, and the local mapping relationship has not been updated in time.
  • the index node number found in the local mapping relationship based on the new directory name is not the updated directory you want to find.
  • the index node number of the metadata may be the index node number of the historical directory metadata in the original cloud database.
  • the index node number corresponding to the keyword 100hive001 can be found in the local storage mapping relationship, for example, the index node number found in the local storage mapping relationship is 200, It is different from the index node number 101 of 100hive001 stored in the cloud database, which means that the local mapping relationship has not been updated in time.
  • 200 may be the index node number of the historical directory /user/hive001 in the cloud database, which may no longer exist or has been modified.
  • the distributed file system can update the locally stored mapping relationship based on the index node number queried in the cloud database, and on the other hand, it can obtain metadata based on the index node number queried on the cloud database.
  • the distributed file system finds the index node number of directory metadata at all levels in the local storage mapping relationship is 100-103, while the index node number found in the cloud database is 100. , 101, 105 and 106, that is, the index node numbers of the third-level directory metadata and the fourth-level directory metadata are different from those stored locally.
  • the distributed file system can store the third-level directory metadata in the local mapping relationship based on the query results of the cloud database.
  • the index node number 102 of the directory metadata is modified to 105
  • the index node number 103 of the fourth-level directory metadata stored in the local mapping relationship is modified to 106.
  • the distributed file system can also obtain third-level directory metadata and fourth-level directory metadata based on index node numbers 105 and 106.
  • the distributed file system determines the index node number queried in the cloud database and the index node queried based on the local mapping relationship before obtaining metadata from the cloud database. Whether the numbers are the same, and obtain metadata if the index node numbers are the same.
  • accurate metadata can still be obtained, which can effectively avoid problems such as metadata acquisition errors caused by the local mapping relationship of the distributed file system not being updated in time in high concurrency scenarios.
  • this specification also provides a data access method of the distributed file system, which can be applied to the name node in the distributed file system. Please refer to Figure 4 and Figure 5. Includes the following steps:
  • Step 502 In response to the data access request sent by the client, query the corresponding data according to the path of the data to be accessed. metadata.
  • the data access request may be a data read request or a data write request.
  • the name node queries the corresponding metadata based on the data path to be read.
  • the metadata query can be implemented based on the metadata query solution described in the embodiment of FIG. 2 or FIG. 3 mentioned above in this specification. For example, the name node first queries the index node number of the path in the mapping relationship between locally stored directory keywords at all levels and metadata index node numbers, and then obtains the corresponding metadata from the cloud database.
  • Step 504 Return the metadata to the client so that the client can access data based on the metadata.
  • the metadata can be returned to the client.
  • the client can obtain the data block where the data is based on the metadata, and then based on the data block to the data node to read the corresponding data.
  • the name node can also implement metadata query based on the metadata query scheme recorded in the embodiment of Figure 2 or Figure 3 mentioned above in this specification.
  • metadata query scheme recorded in the embodiment of Figure 2 or Figure 3 mentioned above in this specification.
  • this specification also provides embodiments of the metadata management apparatus of the distributed file system.
  • the embodiments of the metadata management apparatus of the distributed file system in this specification can be applied in electronic devices.
  • the device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware.
  • Taking software implementation as an example as a logical device, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory and running them through the processor of the electronic device where it is located.
  • From the hardware level as shown in Figure 6, it is a hardware structure diagram of the electronic equipment where the metadata management device of the distributed file system in this specification is located.
  • the electronic device where the device in the embodiment is located may also include other hardware based on the actual functions of the electronic device, which will not be described again.
  • FIG. 7 is a block diagram of a metadata management device of a distributed file system according to an exemplary embodiment of this specification.
  • the metadata management apparatus 700 of the distributed file system can be applied to the electronic device shown in Figure 3, and the electronic device can be the name node of the distributed file system.
  • the distributed file system stores a mapping relationship between the keywords of directories at each level and the index node numbers of the directory metadata.
  • the directory keywords at this level are generated based on the index node numbers of the upper-level directory metadata and the directory name at this level.
  • the device 700 includes:
  • the name acquisition unit 701 extracts the directory names of directories at each level from the file path;
  • the keyword generation unit 702 generates keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata for each extracted directory name in the order of the directory from upper level to lower level;
  • the number search unit 703 searches for the index of the metadata of the current level directory in the mapping relationship based on the keyword. Lead node number;
  • the metadata obtaining unit 704 obtains the metadata of the file path from the cloud database based on the index node number of the file path.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • a keyword for the current level directory is generated based on the directory name and preset characters.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • the directory name and the index node number of the upper-level directory metadata are spliced based on a preset order to generate a keyword for the current-level directory.
  • the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
  • the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database
  • the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
  • Optional also includes:
  • mapping relationship is updated based on the index node numbers found in the cloud database.
  • Optional also includes:
  • Optional also includes:
  • the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
  • this specification also provides embodiments of the data access apparatus of the distributed file system.
  • the embodiment of the data access device of the distributed file system in this specification can be applied in electronic equipment.
  • the device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware.
  • the non-volatile storage is stored in the processor of the electronic device where it is located.
  • the corresponding computer program instructions in the device are read into the memory and run.
  • the hardware structure of the electronic device where the data access device of the distributed file system in this specification is located can be similar to the electronic device shown in Figure 6, and this specification does not place special restrictions on this.
  • FIG. 8 is a block diagram of a data access device of a distributed file system according to an exemplary embodiment of this specification.
  • the metadata management device 800 of the distributed file system can be applied in the name node of the distributed file system, including:
  • the metadata query unit 80 in response to the data access request sent by the client, queries the corresponding metadata according to the path of the data to be accessed.
  • the metadata query can be implemented using the metadata management method provided in this specification.
  • the data access unit 802 returns the metadata to the client so that the client can perform data access based on the metadata.
  • the device embodiment since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • a typical implementation device is a computer, which may be in the form of a personal computer, a laptop, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, or a game controller. desktop, tablet, wearable device, or a combination of any of these devices.
  • this specification also provides a metadata management device of a distributed file system, which includes: a processor and a memory for storing machine-executable instructions.
  • the processor and the memory are usually connected to each other through an internal bus.
  • the device may also include an external interface to be able to communicate with other devices or components.
  • the distributed file system stores a mapping relationship between the keywords of directories at each level and the index node numbers of directory metadata, where the directory keywords at this level are based on the index nodes of the upper-level directory metadata. Number and directory name of this level are generated.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • a keyword for the current level directory is generated based on the directory name and preset characters.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • the directory name and the index node number of the upper-level directory metadata are spliced based on a preset order to generate a keyword for the current-level directory.
  • the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
  • the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database
  • the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
  • Optional also includes:
  • mapping relationship is updated based on the index node numbers found in the cloud database.
  • Optional also includes:
  • Optional also includes:
  • the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
  • this specification also provides a data access device of the distributed file system, which includes: a processor and a memory for storing machine-executable instructions.
  • the processor and the memory are usually connected to each other through an internal bus.
  • the device may also include an external interface to be able to communicate with other devices or components.
  • the metadata query can be implemented using the metadata management method provided in this specification.
  • the distributed file system stores a mapping relationship between the keywords of the directories at each level and the index node numbers of the directory metadata, where the current level Directory keywords are generated based on the index node number of the upper-level directory metadata and the name of the current-level directory.
  • This specification also provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium. When the program is executed by the processor, the following steps are implemented:
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • a keyword for the current level directory is generated based on the directory name and preset characters.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • the directory name and the index node number of the upper-level directory metadata are spliced based on a preset order to generate a keyword for the current-level directory.
  • the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
  • the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database
  • the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
  • Optional also includes:
  • mapping relationship is updated based on the index node numbers found in the cloud database.
  • Optional also includes:
  • Optional also includes:
  • the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
  • this specification also provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the program is executed by the processor, the following is implemented: step:
  • the metadata query can be implemented using the metadata management method provided in this specification.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the description are a metadata management method and apparatus for a distributed file system. The metadata management method and apparatus are applied to a distributed file system, and the distributed file system stores a mapping relationship between keywords and index node numbers of directory metadata of directories at all levels, wherein a keyword of a directory at the current level is generated on the basis of an index node number of metadata of a directory at an upper level and the name of the directory at the current level. The method comprises: extracting, from a file path, directory names of directories at all levels; according to the order of the directories from upper to lower levels, with regard to each directory name that is extracted and on the basis of the directory name and an index node number of directory metadata at an upper level, generating a keyword of a directory at the current level; on the basis of the keyword, searching the mapping relationship for an index node number of directory metadata at the current level; and on the basis of an index node number of the file path, acquiring metadata of the file path from a cloud database.

Description

分布式文件***的元数据管理方法和装置Metadata management method and device for distributed file system
本申请要求于2022年03月25日提交中国专利局、申请号为202210307777.7、申请名称为“分布式文件***的元数据管理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on March 25, 2022, with the application number 202210307777.7 and the application title "Metadata management method and device for distributed file system", the entire content of which is incorporated by reference. in this application.
技术领域Technical field
本说明书涉及存储技术领域,尤其涉及一种分布式文件***的元数据管理方法和装置。This specification relates to the field of storage technology, and in particular, to a metadata management method and device for a distributed file system.
背景技术Background technique
在传统的分布式文件***中,元数据往往存储在中心节点,受中心节点磁盘容量等限制,这种元数据管理方式已无法适用于海量文件的应用场景。In traditional distributed file systems, metadata is often stored in central nodes. Due to limitations such as the disk capacity of the central node, this metadata management method is no longer suitable for application scenarios with massive files.
发明内容Contents of the invention
有鉴于此,本说明书提供一种分布式文件***的元数据管理方法和装置。In view of this, this specification provides a metadata management method and device for a distributed file system.
具体地,本说明书是通过如下技术方案实现的:Specifically, this specification is implemented through the following technical solutions:
一种分布式文件***的元数据管理方法,应用于分布式文件***,所述分布式文件***中存储有各级目录的关键字与目录元数据的索引节点编号之间的映射关系,其中,本级目录关键字基于上级目录元数据的索引节点编号与本级目录名称生成,所述方法包括:A metadata management method for a distributed file system, applied to a distributed file system. The distributed file system stores a mapping relationship between keywords of directories at all levels and index node numbers of directory metadata, wherein, The current-level directory keywords are generated based on the index node number of the upper-level directory metadata and the current-level directory name. The method includes:
从文件路径中提取出各级目录的目录名称;Extract the directory names of directories at all levels from the file path;
按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字;According to the order of directories from upper level to lower level, for each extracted directory name, generate keywords for the current level directory based on the directory name and the index node number of the upper level directory metadata;
基于所述关键字在所述映射关系中查找本级目录元数据的索引节点编号;Search the index node number of the current level directory metadata in the mapping relationship based on the keyword;
基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Obtain the metadata of the file path from the cloud database based on the index node number of the file path.
可选的,所述基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字,包括:Optionally, generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
当本级目录是一级目录时,基于所述目录名称和预设字符生成本级目录的关键字。When the current level directory is a first-level directory, a keyword for the current level directory is generated based on the directory name and preset characters.
可选的,所述基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字,包括:Optionally, generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
基于预设的顺序拼接所述目录名称和上级目录元数据的索引节点编号,以生成本 级目录的关键字。The directory name and the index node number of the upper-level directory metadata are spliced based on a preset sequence to generate this Keywords for level directories.
可选的,所述基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据,包括:Optionally, the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
在云数据库中查找所述文件路径的各级目录关键字对应的目录元数据的索引节点编号;Search the cloud database for the index node number of the directory metadata corresponding to the directory keywords at each level of the file path;
判断基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号是否相同;Determine whether the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database;
在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号相同的情况下,基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。In the case where the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database, the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
可选的,还包括:Optional, also includes:
在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号不同的情况下,基于云数据库中查找到的索引节点编号更新所述映射关系。When the index node numbers of directory metadata at all levels found based on the mapping relationship are different from the index node numbers found in the cloud database, the mapping relationship is updated based on the index node numbers found in the cloud database.
可选的,还包括:Optional, also includes:
基于云数据库中查找到的文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Obtain the metadata of the file path from the cloud database based on the index node number of the file path found in the cloud database.
可选的,还包括:Optional, also includes:
若基于所述关键字无法在所述映射关系中查找到本级目录元数据的索引节点编号,则基于所述关键字从云数据库中查找本级目录元数据的索引节点编号,并基于查找到的索引节点编号更新所述映射关系。If the index node number of the directory metadata at this level cannot be found in the mapping relationship based on the keyword, the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
一种分布式文件***的数据访问方法,应用于分布式文件***,包括:A data access method for distributed file systems, applied to distributed file systems, including:
响应于客户端发送的数据访问请求,根据待访问数据的路径查询对应的元数据;In response to the data access request sent by the client, query the corresponding metadata according to the path of the data to be accessed;
将所述元数据返回给客户端,以供客户端基于所述元数据进行数据访问;Return the metadata to the client for the client to perform data access based on the metadata;
其中,采用前述元数据管理方法基于所述路径进行元数据的查询。Wherein, the aforementioned metadata management method is used to perform metadata query based on the path.
一种分布式文件***的元数据管理装置,应用于分布式文件***,所述分布式文件***中存储有各级目录的关键字与目录元数据的索引节点编号之间的映射关系,其中,本级目录关键字基于上级目录元数据的索引节点编号与本级目录名称生成,所述装置包括:A metadata management device for a distributed file system, applied to a distributed file system. The distributed file system stores a mapping relationship between keywords of directories at all levels and index node numbers of directory metadata, wherein, The current-level directory keyword is generated based on the index node number of the upper-level directory metadata and the current-level directory name. The device includes:
名称获取单元,从文件路径中提取出各级目录的目录名称;The name acquisition unit extracts the directory names of directories at all levels from the file path;
关键字生成单元,按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字;The keyword generation unit generates keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata for each extracted directory name in the order of the directory from upper level to lower level;
编号查找单元,基于所述关键字在所述映射关系中查找本级目录元数据的索引节点编号;A number search unit that searches the index node number of the current level directory metadata in the mapping relationship based on the keyword;
元数据获取单元,基于所述文件路径的索引节点编号从云数据库中获取所述文件 路径的元数据。A metadata acquisition unit that acquires the file from the cloud database based on the index node number of the file path. Metadata for the path.
一种分布式文件***的数据访问装置,应用于分布式文件***,包括:A data access device for a distributed file system, applied to the distributed file system, including:
元数据查询单元,响应于客户端发送的数据访问请求,根据待访问数据的路径查询对应的元数据;The metadata query unit responds to the data access request sent by the client and queries the corresponding metadata according to the path of the data to be accessed;
数据访问单元,将所述元数据返回给客户端,以供客户端基于所述元数据进行数据访问;A data access unit returns the metadata to the client so that the client can access data based on the metadata;
其中,采用前述元数据管理方法基于所述路径进行元数据的查询。Wherein, the aforementioned metadata management method is used to perform metadata query based on the path.
一种分布式文件***的元数据管理装置,包括:A metadata management device for a distributed file system, including:
处理器;processor;
用于存储机器可执行指令的存储器;Memory used to store machine-executable instructions;
其中,所述分布式文件***中存储有各级目录的关键字与目录元数据的索引节点编号之间的映射关系,本级目录关键字基于上级目录元数据的索引节点编号与本级目录名称生成,通过读取并执行所述存储器存储的与分布式文件***的元数据管理逻辑对应的机器可执行指令,所述处理器被促使:Among them, the distributed file system stores a mapping relationship between the keywords of directories at each level and the index node numbers of directory metadata. The keywords of the directory at this level are based on the index node numbers of the upper-level directory metadata and the name of the directory at this level. Generate, by reading and executing machine-executable instructions stored in the memory corresponding to metadata management logic of the distributed file system, the processor being caused to:
从文件路径中提取出各级目录的目录名称;Extract the directory names of directories at all levels from the file path;
按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字;According to the order of directories from upper level to lower level, for each extracted directory name, generate keywords for the current level directory based on the directory name and the index node number of the upper level directory metadata;
基于所述关键字在所述映射关系中查找本级目录元数据的索引节点编号;Search the index node number of the current level directory metadata in the mapping relationship based on the keyword;
基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Obtain the metadata of the file path from the cloud database based on the index node number of the file path.
一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于使处理器执行上述元数据管理方法。A computer-readable storage medium stores a computer program, and the computer program is used to cause a processor to execute the above metadata management method.
采用本说明书提供的上述分布式文件***元数据管理方案,在分布式文件***本地存储各级目录关键字与目录元数据的索引节点编号之间的映射关系,在进行元数据获取时,可先在本地查找到各级目录元数据的索引节点编号,然后基于索引节点编号在云数据库中查找元数据。采用分布式文件***本地与云数据库联合存储元数据的方式,解决了单机元数据服务的性能瓶颈,提高了***的可扩展性,可提供十亿级以上规模的文件存储。Using the above distributed file system metadata management solution provided in this manual, the mapping relationship between directory keywords at all levels and the index node numbers of directory metadata is locally stored in the distributed file system. When obtaining metadata, you can first Find the index node number of directory metadata at all levels locally, and then search the metadata in the cloud database based on the index node number. The method of jointly storing metadata in a distributed file system and a cloud database solves the performance bottleneck of single-machine metadata services, improves the scalability of the system, and can provide file storage of more than one billion levels.
附图说明Description of the drawings
图1是相关技术中分布式文件***的架构示意图。Figure 1 is a schematic diagram of the architecture of a distributed file system in related technologies.
图2是本说明书一示例性实施例示出的一种分布式文件***的元数据管理方法的流程示意图。Figure 2 is a schematic flowchart of a metadata management method for a distributed file system according to an exemplary embodiment of this specification.
图3是本说明书一示例性实施例示出的另一种分布式文件***的元数据管理方法的流程示意图。FIG. 3 is a schematic flowchart of another metadata management method of a distributed file system according to an exemplary embodiment of this specification.
图4是本说明书一示例性实施例示出的一种分布式文件***的架构示意图。 Figure 4 is a schematic architectural diagram of a distributed file system according to an exemplary embodiment of this specification.
图5是本说明书一示例性实施例示出的一种分布式文件***的数据访问方法的流程示意图。Figure 5 is a schematic flowchart of a data access method in a distributed file system according to an exemplary embodiment of this specification.
图6是本说明书一示例性实施例示出的一种分布式文件***的元数据管理装置所在电子设备的一种硬件结构图。FIG. 6 is a hardware structure diagram of an electronic device in which a metadata management device of a distributed file system is located according to an exemplary embodiment of this specification.
图7是本说明书一示例性实施例示出的一种分布式文件***的元数据管理装置的框图。FIG. 7 is a block diagram of a metadata management device of a distributed file system according to an exemplary embodiment of this specification.
图8是本说明书一示例性实施例示出的一种分布式文件***的数据访问装置的框图。FIG. 8 is a block diagram of a data access device of a distributed file system according to an exemplary embodiment of this specification.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本说明书的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of this specification, as detailed in the appended claims.
在本说明书使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书。在本说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the specification. As used in this specification and the appended claims, the singular forms "a," "the" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本说明书可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other. For example, without departing from the scope of this specification, the first information may also be called second information, and similarly, the second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to determining."
图1是本说明书一示例性实施例示出的一种分布式文件***的架构示意图。Figure 1 is a schematic architectural diagram of a distributed file system according to an exemplary embodiment of this specification.
请参考图1,分布式文件***可包括名字节点和数据节点。Referring to Figure 1, a distributed file system may include name nodes and data nodes.
其中,名字节点(Namenode)负责管理分布式文件***的命名空间(Namespace),维护有文件***树,以及文件***中各文件和文件夹的元数据。Among them, the namenode is responsible for managing the namespace of the distributed file system, maintaining the file system tree, and the metadata of each file and folder in the file system.
数据节点(Datanode)用于存储数据,将数据以块(block)的形式存储。Datanode is used to store data in the form of blocks.
分布式文件***客户端在进行数据访问时,可发送访问请求至名字节点。以数据读请求为例,名字节点会查找对应的元数据,并将元数据返回给客户端,客户端进而可根据元数据获取到数据所在的数据块,然后基于数据块到数据节点中读取相应数据。以数据写请求为例,名字节点也会查找对应的元数据,若查找到元数据,可将元数据返回给客户端,客户端进而可根据元数据获取到数据所在的数据块,然后基于数据块到数据节点中写入相应数据;若未查找到元数据,可创建新的索引节点编号,并编写 数据块位置等元数据,然后将这些元数据返回给客户端,客户端进而可以在对应数据块中进行数据写入。When a distributed file system client performs data access, it can send an access request to the name node. Taking the data read request as an example, the name node will search for the corresponding metadata and return the metadata to the client. The client can then obtain the data block where the data is based on the metadata, and then read it from the data node based on the data block. corresponding data. Taking the data write request as an example, the name node will also search for the corresponding metadata. If the metadata is found, the metadata can be returned to the client. The client can then obtain the data block where the data is based on the metadata, and then based on the data block to the data node and write the corresponding data; if the metadata is not found, you can create a new index node number and write Metadata such as the location of the data block is returned to the client, and the client can then write data in the corresponding data block.
在传统的分布式文件***中,元数据往往存储在名字节点,受名字节点磁盘容量等限制,这种元数据管理方式已无法适用于海量文件的应用场景。In traditional distributed file systems, metadata is often stored in name nodes. Due to limitations such as name node disk capacity, this metadata management method is no longer suitable for application scenarios with massive files.
本说明书提供一种分布式文件***的元数据管理方案,分布式文件***可联合云数据库共同实现元数据的存储,从而解决磁盘容量对元数据存储的限制。This manual provides a metadata management solution for a distributed file system. The distributed file system can jointly implement metadata storage in conjunction with the cloud database, thus solving the limitation of disk capacity on metadata storage.
其中,元数据是数据的数据,可用于描述数据属性,用来支持如指示存储位置、历史数据、资源查找、文件记录等功能。Among them, metadata is the data of data, which can be used to describe data attributes and support functions such as indicating storage location, historical data, resource search, file recording, etc.
在分布式文件***中,路径、目录、文件、链接等均可具有元数据,元数据中包括进行读写时必要的描述性信息,例如,真实路径、大小、创建时间、权限等。In a distributed file system, paths, directories, files, links, etc. can all have metadata. Metadata includes necessary descriptive information for reading and writing, such as real path, size, creation time, permissions, etc.
文件路径通常可指向某具体的文件,通过文件路径可实现对文件的访问。一条路径通常会包括多级目录,每级目录均可对应有目录名称。
The file path can usually point to a specific file, and the file can be accessed through the file path. A path usually includes multiple levels of directories, and each level of directory can have a corresponding directory name.
表1Table 1
举例来说,假设某文件路径是/user/hive/warehouse/file,请参考表1的示例,该文件路径包括有4级目录,这4级目录的名称即为文件夹名称,分别为user、hive、warehouse和file。For example, assuming a file path is /user/hive/warehouse/file, please refer to the example in Table 1. The file path includes four levels of directories. The names of these four levels of directories are the folder names, which are user, hive, warehouse and file.
在本说明书中,分布式文件***中可存储各级目录的关键字与目录元数据的索引节点编号之间的映射关系,无需存储全量元数据。In this specification, the distributed file system can store the mapping relationship between the keywords of directories at all levels and the index node numbers of directory metadata, without the need to store the full amount of metadata.
所述映射关系可以key-value的形式存储,例如存储在名字节点Namenode中。The mapping relationship may be stored in the form of key-value, for example, in the Namenode.
所述索引节点编号为Inode(index node,索引节点)编号,即Inode ID。索引节点是一种数据结构,基于索引节点编号可查找到元数据。The index node number is the Inode (index node, index node) number, that is, the Inode ID. An inode is a data structure that allows metadata to be looked up based on the inode number.
所述关键字可基于上级目录元数据的索引节点编号与本级目录名称生成。The keyword may be generated based on the index node number of the upper-level directory metadata and the current-level directory name.
例如,基于预设的顺序拼接本级目录名称和上级目录元数据的索引节点编号,得到本级目录的关键字。For example, based on a preset sequence, the name of the current-level directory and the index node number of the metadata of the upper-level directory are spliced to obtain the keywords of the current-level directory.
举例来说,假设本级目录名称为hive,上级目录元数据的索引节点编号为100,则可生成关键字100hive。For example, assuming that the name of the current-level directory is hive and the index node number of the upper-level directory metadata is 100, the keyword 100hive can be generated.
再例如,基于预设的算法对本级目录名称和上级目录元数据的索引节点编号进行运算,得到本级目录的关键字。For another example, based on a preset algorithm, the current-level directory name and the index node number of the upper-level directory metadata are calculated to obtain the keywords of the current-level directory.
当然,还可采用其他方式生成各级目录的关键字,本说明书对此不作特殊限制。Of course, other methods can be used to generate keywords for directories at all levels, and this manual does not place special restrictions on this.
对于一级目录而言,其不存在上级目录,在生成一级目录的关键字时,可基于一 级目录名称和预设字符生成一级目录的关键字。For the first-level directory, there is no upper-level directory. When generating the keywords for the first-level directory, it can be based on a The first-level directory name and preset characters generate keywords for the first-level directory.
以文件路径/user/hive/warehouse/file为例,其一级目录为/user,可基于预设字符0与目录名称user生成关键字0user。Taking the file path /user/hive/warehouse/file as an example, its first-level directory is /user. The keyword 0user can be generated based on the preset character 0 and the directory name user.
在本说明书中,云数据库中可存储全量元数据,还可存储各级目录的关键字与目录元数据的索引节点编号之间的映射关系,以供分布式文件***更新其存储的映射关系。In this specification, the cloud database can store the full amount of metadata, and can also store the mapping relationship between the keywords of directories at all levels and the index node numbers of the directory metadata, so that the distributed file system can update its stored mapping relationship.
本说明书分布式文件***联合云数据库共同实现元数据的存储,无需在分布式文件***中存储全量元数据,这种分布式元数据存储方式可有效解决分布式文件***磁盘容量对元数据的存储限制,适用数据湖等海量文件的应用场景。In this manual, the distributed file system and the cloud database jointly realize the storage of metadata. There is no need to store all metadata in the distributed file system. This distributed metadata storage method can effectively solve the storage problem of metadata due to the disk capacity of the distributed file system. Restricted, suitable for application scenarios with massive files such as data lakes.
图2是本说明书一示例性实施例示出的一种分布式文件***的元数据管理方法的流程示意图。Figure 2 is a schematic flowchart of a metadata management method for a distributed file system according to an exemplary embodiment of this specification.
请参考图2,所述分布式文件***的元数据管理方法可应用于分布式文件***,例如应用于分布式文件***中的名字节点,包括有以下步骤:Please refer to Figure 2. The metadata management method of the distributed file system can be applied to the distributed file system, for example, applied to the name node in the distributed file system, including the following steps:
步骤202,从文件路径中提取出各级目录的目录名称。Step 202: Extract the directory names of directories at each level from the file path.
在本说明书中,在进行文件的读写时,用户侧客户端可发送读写请求至分布式文件***,分布式文件***通常需查找文件的元数据、文件路径的元数据,有时还需要查找文件所在目录的元数据、文件上级目录的元数据等,进而可基于这些元数据获取到文件类型,文件大小,创建时间,修改时间,所属用户,可执行权限等信息。In this manual, when reading and writing files, the user-side client can send read and write requests to the distributed file system. The distributed file system usually needs to search for the metadata of the file, the metadata of the file path, and sometimes the metadata of the file. Metadata of the directory where the file is located, metadata of the directory above the file, etc. Based on these metadata, information such as file type, file size, creation time, modification time, user, executable permissions, etc. can be obtained.
在本说明书中,在进行元数据查找时,分布式文件***可先从文件路径中提取出各级目录的目录名称。In this specification, when performing metadata search, the distributed file system can first extract the directory names of directories at all levels from the file path.
以前述文件路径/user/hive/warehouse/file为例,可提取出各级目录名称user、hive、warehouse和file。Taking the aforementioned file path /user/hive/warehouse/file as an example, the directory names user, hive, warehouse, and file at all levels can be extracted.
步骤204,按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字。Step 204: According to the order of directories from upper level to lower level, for each extracted directory name, generate a keyword for the current level directory based on the directory name and the index node number of the upper level directory metadata.
步骤206,基于所述关键字在映射关系中查找本级目录元数据的索引节点编号。Step 206: Search the index node number of the current level directory metadata in the mapping relationship based on the keyword.
在本说明书中,分布式文件***可基于本地存储的各级目录的关键字与目录元数据的索引节点编号之间的映射关系查找到文件路径上各级目录的索引节点编号。In this specification, the distributed file system can find the index node numbers of the directories at each level on the file path based on the mapping relationship between the locally stored keywords of the directories at each level and the index node numbers of the directory metadata.
在进行查找前,分布式文件***可先生成查找索引节点编号所需的关键字。Before performing a lookup, the distributed file system can generate the keys needed to look up the inode number.
由于本说明书中关键字基于上级目录元数据的索引节点编号生成,在进行索引节点编号查询时,可按照目录从上级至下级的顺序,依次生成各级目录的关键字,以进行各级目录索引节点编号的查询。Since the keywords in this manual are generated based on the index node number of the upper-level directory metadata, when querying the index node number, the keywords for each level of directory can be generated in order from the upper level to the lower level of the directory to perform directory indexing at all levels. Query node number.
例如,可先生成一级目录的关键字,然后基于一级目录的关键字在分布式文件***本地存储的上述映射关系中查找到一级目录元数据的索引节点编号。然后,可基于二级目录的目录名称和一级目录元数据的索引节点编号在上述映射关系中查找到二级目录元数据的索引节点编号。接着,可基于三级目录的目录名称和二级目录元数据的 索引节点编号在上述映射关系中查找到三级目录元数据的索引节点编号。依次类推,可查找到文件路径上各级目录元数据的索引节点编号。For example, the keywords of the first-level directory can be generated first, and then the index node number of the first-level directory metadata can be found in the above mapping relationship stored locally in the distributed file system based on the keywords of the first-level directory. Then, the index node number of the secondary directory metadata can be found in the above mapping relationship based on the directory name of the secondary directory and the index node number of the primary directory metadata. Then, based on the directory name of the third-level directory and the metadata of the second-level directory The index node number finds the index node number of the third-level directory metadata in the above mapping relationship. By analogy, the index node numbers of directory metadata at all levels on the file path can be found.
以前述文件路径/user/hive/warehouse/file为例,假设其一级目录元数据至四级目录元数据的索引节点编号为100-103,分布式文件***可以key-value的形式存储如下表2所示的映射关系。
Taking the aforementioned file path /user/hive/warehouse/file as an example, assuming that the index node numbers from the first-level directory metadata to the fourth-level directory metadata are 100-103, the distributed file system can store the following table in the form of key-value The mapping relationship shown in 2.
表2Table 2
值得注意的是,表2仅仅为示例性说明,在实际应用中,无需存储左侧目录列。并且,value字段中除了存储索引节点编号外,还可存储目录的部分元数据,例如:目录名称、目录大小等。It is worth noting that Table 2 is only an illustrative description. In actual applications, there is no need to store the left directory column. Moreover, in addition to storing the index node number, the value field can also store some metadata of the directory, such as directory name, directory size, etc.
在本实施例中,在进行索引节点编号查询时,可先基于一级目录名称user和预设字符0生成一级目录的关键字0user,并基于关键字0user查询表2所示的映射关系,进而查找到一级目录元数据的索引节点编号100。In this embodiment, when querying the index node number, the keyword 0user of the first-level directory can be generated based on the first-level directory name user and the preset character 0, and the mapping relationship shown in Table 2 can be queried based on the keyword 0user. Then the index node number 100 of the first-level directory metadata is found.
然后,可基于二级目录名称hive和一级目录元数据的索引节点编号100生成二级目录关键字100hive,并基于关键字100hive查询表2所示的映射关系,进而查找到二级目录元数据的索引节点编号101。Then, the secondary directory keyword 100hive can be generated based on the secondary directory name hive and the index node number 100 of the primary directory metadata, and the mapping relationship shown in Table 2 can be queried based on the keyword 100hive to find the secondary directory metadata. The index node number is 101.
接着,可基于三级目录名称warehouse和二级目录元数据的索引节点编号101生成三级目录关键字101warehouse,并基于关键字101warehouse查询表2所示的映射关系,进而查找到三级目录元数据的索引节点编号102。Then, the third-level directory keyword 101warehouse can be generated based on the third-level directory name warehouse and the index node number 101 of the second-level directory metadata, and the mapping relationship shown in Table 2 can be queried based on the keyword 101warehouse to find the third-level directory metadata. The index node number is 102.
最后,可基于四级目录名称file和三级目录元数据的索引节点编号102生成四级目录关键字102file,并基于关键字102file查询表2所示的映射关系,进而查找到四级目录元数据的索引节点编号103。Finally, the fourth-level directory keyword 102file can be generated based on the fourth-level directory name file and the index node number 102 of the third-level directory metadata, and the mapping relationship shown in Table 2 can be queried based on the keyword 102file to find the fourth-level directory metadata. The index node number is 103.
需要说明的是,本实施例中步骤202可在步骤204之前执行,即在进行关键字的生成之前,从文件路径中提取出各级目录的目录名称。步骤202也可配合步骤204-206的循环过程来执行,即在步骤202中先从文件路径中提取出一级目录名称,然后执行步骤204-206,生成一级目录关键字,并查找一级目录元数据的索引节点编号;接着可返回执行步骤202从文件路径中提取出二级目录名称,然后执行步骤204-206,生成二级目录的关键字,并查找二级目录元数据的索引节点编号,依次类推,循环执行步骤202-206,本说明书对此不作特殊限制。It should be noted that in this embodiment, step 202 can be executed before step 204, that is, before generating keywords, the directory names of directories at each level are extracted from the file path. Step 202 can also be executed in conjunction with the loop process of steps 204-206, that is, in step 202, first extract the first-level directory name from the file path, and then execute steps 204-206 to generate the first-level directory keywords and search for the first-level directory name. The index node number of the directory metadata; then you can return to step 202 to extract the secondary directory name from the file path, and then execute steps 204-206 to generate the keywords of the secondary directory and search for the index node of the secondary directory metadata. number, and so on, and execute steps 202-206 in a loop. This manual does not impose special restrictions on this.
步骤208,基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。 Step 208: Obtain the metadata of the file path from the cloud database based on the index node number of the file path.
基于前述步骤,在查找到文件路径上各级目录元数据的索引节点编号后,分布式文件***可从云数据库中获取到索引节点编号指向的全量元数据。Based on the previous steps, after finding the index node numbers of directory metadata at all levels on the file path, the distributed file system can obtain the full amount of metadata pointed to by the index node numbers from the cloud database.
在本实施例中,可基于访问需求获取在云数据库中获取元数据。In this embodiment, metadata can be obtained in the cloud database based on access requirements.
以前述文件路径/user/hive/warehouse/file为例,若无需获取文件上级目录的元数据,可基于索引编号103获取文件路径/user/hive/warehouse/file的元数据,若需获取文件上级目录的元数据,可基于索引节点编号102获取三级目录/user/hive/warehouse的元数据,或者在此基础上,基于索引节点编号101获取二级目录/user/hive的元数据等。Taking the aforementioned file path /user/hive/warehouse/file as an example, if you do not need to obtain the metadata of the file's upper-level directory, you can obtain the metadata of the file path /user/hive/warehouse/file based on the index number 103. If you need to obtain the file's upper-level directory For the metadata of the directory, the metadata of the third-level directory /user/hive/warehouse can be obtained based on the index node number 102, or on this basis, the metadata of the second-level directory /user/hive can be obtained based on the index node number 101, etc.
采用本说明书提供的上述分布式文件***元数据管理方案,在分布式文件***本地存储各级目录关键字与目录元数据的索引节点编号之间的映射关系,在进行元数据获取时,可先在本地查找到各级目录元数据的索引节点编号,然后基于索引节点编号在云数据库中查找元数据。采用分布式文件***本地与云数据库联合存储元数据的方式,解决了单机元数据服务的性能瓶颈,提高了***的可扩展性,可提供十亿级以上规模的文件存储。Using the above distributed file system metadata management solution provided in this manual, the mapping relationship between directory keywords at all levels and the index node numbers of directory metadata is locally stored in the distributed file system. When obtaining metadata, you can first Find the index node number of directory metadata at all levels locally, and then search the metadata in the cloud database based on the index node number. The method of jointly storing metadata in a distributed file system and a cloud database solves the performance bottleneck of single-machine metadata services, improves the scalability of the system, and can provide file storage of more than one billion levels.
并且,在基于本地存储的映射关系查找到各级目录的索引节点编号后,可采用batch批处理的方式,合并需要查询的多个索引节点编号,然后一次性从云数据库中获取这些索引节点编号指向的元数据。采用批处理的方式从云数据库中获取元数据,相较于传统技术中将元数据存储至云数据库时,需要多次从云数据库中递归查询各级目录元数据,可大大节省查询过程的开销,提高元数据的获取效率,进而提高后续文件访问效率。Moreover, after finding the index node numbers of directories at all levels based on the local storage mapping relationship, batch processing can be used to merge the multiple index node numbers that need to be queried, and then obtain these index node numbers from the cloud database at one time The metadata pointed to. Batch processing is used to obtain metadata from the cloud database. Compared with traditional technology that stores metadata in the cloud database, it is necessary to recursively query directory metadata at all levels from the cloud database multiple times, which can greatly save the cost of the query process. , improve the efficiency of metadata acquisition, thereby improving the efficiency of subsequent file access.
在本说明书中,若前述步骤206中基于生成的关键字无法在映射关系中查找到目录元数据的索引节点编号,可说明分布式文件***尚未将云数据库中存储的各级目录关键字与目录元数据的索引节点编号之间的映射关系存储到本地;或,原目录名称被修改,分布式文件***采用新目录名称生成的关键字无法查找到对应的索引节点编号。In this description, if the index node number of the directory metadata cannot be found in the mapping relationship based on the generated keywords in step 206, it means that the distributed file system has not yet matched the directory keywords at all levels stored in the cloud database with the directory. The mapping relationship between the index node numbers of the metadata is stored locally; or, the original directory name is modified, and the keyword generated by the distributed file system using the new directory name cannot find the corresponding index node number.
分布式文件***基于生成的关键字无法在本地映射关系中查找到目录元数据的索引节点编号时,可基于生成的关键字从云数据库中查找索引节点编号,以更新本地映射关系,并可基于云数据库中查找到的索引节点编号获取目录元数据。When the distributed file system cannot find the index node number of the directory metadata in the local mapping relationship based on the generated keywords, it can search the index node number from the cloud database based on the generated keywords to update the local mapping relationship, and can update the local mapping relationship based on the generated keywords. Obtain the directory metadata from the index node number found in the cloud database.
仍以前述文件路径/user/hive/warehouse/file为例,假设二级目录名称hive被修改为hive001,分布式文件***本地存储的映射关系未更新,仍是表2。Still taking the aforementioned file path /user/hive/warehouse/file as an example, assume that the secondary directory name hive is modified to hive001, and the mapping relationship of the local storage of the distributed file system has not been updated, which is still Table 2.
云数据库中存储的是最新的映射关系,在将hive修改为hive001的例子中,采用本说明书提供的关键字和索引节点编号之间映射关系的存储方式,仅需修改云数据库映射关系中二级目录的关键字(key值)即可,即将100hive修改为100hive001即可,相较于将各级目录作为关键字的映射关系存储方式,无需递归修改各级目录的关键字,大大减少了重命名导致的关键字修改开销。What is stored in the cloud database is the latest mapping relationship. In the example of modifying hive to hive001, using the storage method of the mapping relationship between keywords and index node numbers provided in this manual, only the second level of the mapping relationship in the cloud database needs to be modified. The keyword (key value) of the directory is enough, that is, changing 100hive to 100hive001. Compared with the mapping relationship storage method where directories at all levels are used as keywords, there is no need to recursively modify the keywords of directories at all levels, which greatly reduces the need for renaming. Keyword modification overhead caused.
在本实施例中,云数据库中存储的最新映射关系如下表3所示。

In this embodiment, the latest mapping relationship stored in the cloud database is shown in Table 3 below.

表3table 3
在本实施例中,分布式文件***在获取新文件路径/user/hive001/warehouse/file上各级目录元数据的索引节点编号时,先基于一级目录名称user和预设字符0生成一级目录的关键字0user,并基于关键字0user查询本地存储的表2所示的映射关系,进而查找到一级目录元数据的索引节点编号100。In this embodiment, when the distributed file system obtains the index node number of each level directory metadata on the new file path /user/hive001/warehouse/file, it first generates a first-level directory based on the first-level directory name user and the preset character 0. The keyword 0user of the directory is used, and based on the keyword 0user, the locally stored mapping relationship shown in Table 2 is queried, and the index node number 100 of the first-level directory metadata is found.
然后,基于二级目录名称hive001和一级目录元数据的索引节点编号100生成二级目录关键字100hive001,基于该关键字100hive001无法在本地存储的表2中查询到对应的索引节点编号。分布式文件***进而可于云数据库中进行索引节点编号的查询。即在云数据库中存储的表3所示的映射关系中查询该关键字100hive001对应的索引节点编号101。Then, the secondary directory keyword 100hive001 is generated based on the secondary directory name hive001 and the index node number 100 of the primary directory metadata. Based on this keyword 100hive001, the corresponding index node number cannot be queried in the locally stored table 2. The distributed file system can then query the index node number in the cloud database. That is, query the index node number 101 corresponding to the keyword 100hive001 in the mapping relationship shown in Table 3 stored in the cloud database.
分布式文件***还可基于云数据库中查询到的关键字100hive001与索引节点编号101之间的对应关系更新本地存储的映射关系,即将本地存储的表2所示的映射关系更新为表3所示的映射关系。对于分布式文件***而言,在目录名称被修改的情况下,也仅需修改对应目录的关键字即可。The distributed file system can also update the local storage mapping relationship based on the correspondence between the keyword 100hive001 and the index node number 101 queried in the cloud database, that is, update the local storage mapping relationship shown in Table 2 to that shown in Table 3 mapping relationship. For a distributed file system, when the directory name is modified, only the keywords of the corresponding directory need to be modified.
需要说明的是,为确保查询结果准确,分布式文件***还可在云数据库中查询二级目录的各下级目录元数据的索引节点编号,即在云数据库中进一步查询三级目录和四级目录元数据的索引节点编号,并基于查询结果更新本地存储的映射关系,以避免下级目录名称也被修改所导致的本地查询不到或查询不准确等问题。It should be noted that in order to ensure accurate query results, the distributed file system can also query the index node number of each lower-level directory metadata of the secondary directory in the cloud database, that is, further query the third-level directory and fourth-level directory in the cloud database The index node number of the metadata is updated, and the local storage mapping relationship is updated based on the query results to avoid problems such as local query failure or inaccurate query caused by the name of the lower-level directory being also modified.
可选的,在其他例子中,分布式文件***也可定期从云数据库中获取最新的映射关系,并将最新的映射关系更新到本地,本说明书对此不作特殊限制。Optionally, in other examples, the distributed file system can also periodically obtain the latest mapping relationship from the cloud database and update the latest mapping relationship locally. This specification does not impose special restrictions on this.
采用本说明书提供的分布式文件***的元数据管理方案,在元数据发生变化时,还可确保分布式文件***获取到准确的元数据,避免未及时更新本地存储的映射关系导致获取到错误元数据的问题。Using the metadata management solution for distributed file systems provided in this manual can also ensure that the distributed file system obtains accurate metadata when metadata changes, and avoids obtaining incorrect metadata due to failure to update local storage mapping relationships in a timely manner. Data issues.
图3是本说明书一示例性实施例示出的另一种分布式文件***的元数据管理方法的流程示意图。FIG. 3 is a schematic flowchart of another metadata management method of a distributed file system according to an exemplary embodiment of this specification.
请参考图3,所述分布式文件***的元数据管理方法可应用于分布式文件***,包括有以下步骤:Please refer to Figure 3. The metadata management method of the distributed file system can be applied to the distributed file system and includes the following steps:
步骤302,从文件路径中提取出各级目录的目录名称。Step 302: Extract the directory names of directories at each level from the file path.
步骤304,按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字。Step 304: According to the order of directories from upper level to lower level, for each extracted directory name, generate a keyword for the current level directory based on the directory name and the index node number of the upper level directory metadata.
步骤306,基于所述关键字在本地映射关系中查找本级目录元数据的索引节点编 号。Step 306: Search the index node index of the current level directory metadata in the local mapping relationship based on the keyword. Number.
在本实施例中,步骤302-306的实现方式可参考前述图2所示实施例中步骤202-206的实现方式,本说明书在此不再一一赘述。In this embodiment, the implementation of steps 302-306 may refer to the implementation of steps 202-206 in the embodiment shown in FIG. 2, and will not be described in detail here.
步骤308,在云数据库中查找所述文件路径的各级目录关键字对应的目录元数据的索引节点编号。Step 308: Search the cloud database for the index node number of the directory metadata corresponding to the directory keywords at each level of the file path.
在本实施例中,分布式文件***基于本地存储的映射关系查找到文件路径上各级目录元数据的索引节点编号后,可基于生成的各级目录的关键字在云数据库中也查询各级目录的索引节点编号。In this embodiment, after the distributed file system finds the index node numbers of directory metadata at all levels on the file path based on the mapping relationship of local storage, it can also query the cloud database at all levels based on the generated keywords of directories at all levels. The directory's inode number.
例如,采用批处理的方式,合并需要查询的多个索引节点编号,然后进行云数据库的查询。For example, batch processing is used to merge multiple index node numbers that need to be queried, and then query the cloud database.
仍以文件路径/user/hive/warehouse/file为例,分布式文件***在本地存储的映射关系中查找到各级目录元数据的索引节点编号100-103后,可基于各级目录的关键字0user、100hive、101warehouse和102file在云数据库中查询各级目录元数据的索引节点编号。即基于云数据库中存储的各级目录关键字与索引节点编号之间的映射关系进行索引节点编号的查询。Still taking the file path /user/hive/warehouse/file as an example, after the distributed file system finds the index node numbers 100-103 of the directory metadata at all levels in the local storage mapping relationship, it can use the keywords of the directories at all levels to 0user, 100hive, 101warehouse and 102file query the index node numbers of directory metadata at all levels in the cloud database. That is, the index node number is queried based on the mapping relationship between the directory keywords at all levels stored in the cloud database and the index node number.
步骤310,判断基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号是否相同。Step 310: Determine whether the index node number of the directory metadata at each level found based on the mapping relationship is the same as the index node number found in the cloud database.
基于前述步骤308,分布式文件***在云数据库中查找到各级目录元数据的索引节点编号后,判断在本地映射关系中查找到的索引节点编号与云数据库中查找到的索引节点编号是否相同。Based on the aforementioned step 308, after the distributed file system finds the index node number of directory metadata at all levels in the cloud database, it determines whether the index node number found in the local mapping relationship is the same as the index node number found in the cloud database. .
若相同,可执行步骤312。If they are the same, step 312 can be executed.
若不相同,可执行步骤314。If they are not the same, step 314 can be executed.
步骤312,在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号相同的情况下,基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Step 312: When the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database, the index node number based on the file path is retrieved from the cloud database. Get the metadata for the file path.
基于前述步骤310的判断结果,若在本地映射关系中查找到的索引节点编号与云数据库中查找到的索引节点编号相同,可说明本地存储的映射关系即为最新映射关系,云数据库中存储的元数据未发生变化,可基于索引节点编号从云数据库中获取元数据。Based on the judgment result of the aforementioned step 310, if the index node number found in the local mapping relationship is the same as the index node number found in the cloud database, it can be explained that the locally stored mapping relationship is the latest mapping relationship, and the mapping relationship stored in the cloud database is the same. The metadata has not changed and can be obtained from the cloud database based on the index node number.
步骤314,在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号不同的情况下,基于云数据库中查找到的索引节点编号更新所述映射关系,并基于云数据库中查询到的索引节点编号获取元数据。Step 314: When the index node number of the directory metadata at all levels found based on the mapping relationship is different from the index node number found in the cloud database, update the index node number based on the index node number found in the cloud database. Mapping relationship, and obtain metadata based on the index node number queried in the cloud database.
基于前述步骤310的判断结果,若在本地映射关系中查找到的索引节点编号与云数据库中查找到的索引节点编号不相同,即相同的目录关键字对应的元数据索引节点编号不同,说明云数据库中的目录名称可能有更新,本地映射关系未及时更新,基于新的目录名称在本地映射关系中查找到的索引节点编号不是想要查找的更新后的目录 元数据的索引节点编号,可能为原云数据库中历史目录元数据的索引节点编号。Based on the judgment result of the aforementioned step 310, if the index node number found in the local mapping relationship is different from the index node number found in the cloud database, that is, the metadata index node number corresponding to the same directory keyword is different, it means that the cloud database The directory name in the database may have been updated, and the local mapping relationship has not been updated in time. The index node number found in the local mapping relationship based on the new directory name is not the updated directory you want to find. The index node number of the metadata may be the index node number of the historical directory metadata in the original cloud database.
仍以前述目录名称hive被修改为hive001为例,若在本地存储的映射关系中可以查找到关键字100hive001对应的索引节点编号,例如在本地存储的映射关系中查找到的索引节点编号为200,与云数据库中存储的100hive001的索引节点编号101不同,可说明本地映射关系未及时更新,200可能为云数据库中历史目录/user/hive001的索引节点编号,目前可能已不存在或被修改。Still taking the above example of the directory name hive being modified to hive001, if the index node number corresponding to the keyword 100hive001 can be found in the local storage mapping relationship, for example, the index node number found in the local storage mapping relationship is 200, It is different from the index node number 101 of 100hive001 stored in the cloud database, which means that the local mapping relationship has not been updated in time. 200 may be the index node number of the historical directory /user/hive001 in the cloud database, which may no longer exist or has been modified.
在这种情况下,分布式文件***一方面可基于云数据库中查询到的索引节点编号更新本地存储的映射关系,另一方面可基于云数据库中查询到的索引节点编号进行元数据获取。
In this case, on the one hand, the distributed file system can update the locally stored mapping relationship based on the index node number queried in the cloud database, and on the other hand, it can obtain metadata based on the index node number queried on the cloud database.
表4Table 4
举例来说,请参考表4的示例,分布式文件***在本地存储的映射关系中查找到各级目录元数据的索引节点编号为100-103,而云数据库中查找到的索引节点编号为100、101、105和106,即三级目录元数据和四级目录元数据的索引节点编号与本地存储的不同,分布式文件***可基于云数据库的查询结果,将本地映射关系中存储的三级目录元数据的索引节点编号102修改为105,将本地映射关系中存储的四级目录元数据的索引节点编号103修改为106。For example, please refer to the example in Table 4. The distributed file system finds the index node number of directory metadata at all levels in the local storage mapping relationship is 100-103, while the index node number found in the cloud database is 100. , 101, 105 and 106, that is, the index node numbers of the third-level directory metadata and the fourth-level directory metadata are different from those stored locally. The distributed file system can store the third-level directory metadata in the local mapping relationship based on the query results of the cloud database. The index node number 102 of the directory metadata is modified to 105, and the index node number 103 of the fourth-level directory metadata stored in the local mapping relationship is modified to 106.
当然,在value字段还存储其他元数据的情况下,若其他元数据发生变化,也需要同步进行更新。Of course, when the value field also stores other metadata, if other metadata changes, it also needs to be updated synchronously.
分布式文件***还可基于索引节点编号105和106获取三级目录元数据和四级目录元数据。The distributed file system can also obtain third-level directory metadata and fourth-level directory metadata based on index node numbers 105 and 106.
采用本说明书提供的分布式文件***的元数据管理方案,分布式文件***在从云数据库中获取元数据前,判断在云数据库中查询到的索引节点编号与基于本地映射关系查询到的索引节点编号是否相同,并在索引节点编号相同的情况下进行元数据获取。在云数据库中的元数据发生变化时,仍然可获取到准确的元数据,可有效避免高并发场景下分布式文件***本地映射关系未及时更新所导致的元数据获取错误等问题。Using the metadata management solution of the distributed file system provided in this manual, the distributed file system determines the index node number queried in the cloud database and the index node queried based on the local mapping relationship before obtaining metadata from the cloud database. Whether the numbers are the same, and obtain metadata if the index node numbers are the same. When the metadata in the cloud database changes, accurate metadata can still be obtained, which can effectively avoid problems such as metadata acquisition errors caused by the local mapping relationship of the distributed file system not being updated in time in high concurrency scenarios.
在前述分布式文件***的元数据管理方法的基础上,本说明书还提供一种分布式文件***的数据访问方法,可应用于分布式文件***中的名字节点,请参考图4和图5,包括有以下步骤:Based on the aforementioned metadata management method of the distributed file system, this specification also provides a data access method of the distributed file system, which can be applied to the name node in the distributed file system. Please refer to Figure 4 and Figure 5. Includes the following steps:
步骤502,响应于客户端发送的数据访问请求,根据待访问数据的路径查询对应 的元数据。Step 502: In response to the data access request sent by the client, query the corresponding data according to the path of the data to be accessed. metadata.
在本实施例中,所述数据访问请求可以为数据读请求或者数据写请求。以数据读请求为例,名字节点根据待读取数据路径查询对应的元数据。其中,该元数据的查询可基于本说明书前述图2或图3实施例中记载的元数据查询方案实现。例如,名字节点先在本地存储的各级目录关键字与元数据索引节点编号之间的映射关系中查询路径的索引节点编号,然后再从云数据库中获取对应的元数据。In this embodiment, the data access request may be a data read request or a data write request. Taking a data read request as an example, the name node queries the corresponding metadata based on the data path to be read. The metadata query can be implemented based on the metadata query solution described in the embodiment of FIG. 2 or FIG. 3 mentioned above in this specification. For example, the name node first queries the index node number of the path in the mapping relationship between locally stored directory keywords at all levels and metadata index node numbers, and then obtains the corresponding metadata from the cloud database.
步骤504,将所述元数据返回给客户端,以供客户端基于所述元数据进行数据访问。Step 504: Return the metadata to the client so that the client can access data based on the metadata.
基于前述步骤502,在从云数据库获取到元数据后,可将元数据返回给客户端,仍以数据读请求为例,客户端进而可根据元数据获取到数据所在的数据块,然后基于数据块到数据节点中读取相应数据。Based on the aforementioned step 502, after obtaining metadata from the cloud database, the metadata can be returned to the client. Taking the data read request as an example, the client can obtain the data block where the data is based on the metadata, and then based on the data block to the data node to read the corresponding data.
在本实施例中,针对数据写请求,名字节点也可基于本说明书前述图2或图3实施例中记载的元数据查询方案实现元数据的查询,其他数据写入过程可参考相关技术,本说明书在此不再一一赘述。In this embodiment, for data write requests, the name node can also implement metadata query based on the metadata query scheme recorded in the embodiment of Figure 2 or Figure 3 mentioned above in this specification. For other data writing processes, please refer to related technologies. This document The instructions will not go into details here.
与前述分布式文件***的元数据管理方法的实施例相对应,本说明书还提供了分布式文件***的元数据管理装置的实施例。Corresponding to the foregoing embodiments of the metadata management method of the distributed file system, this specification also provides embodiments of the metadata management apparatus of the distributed file system.
本说明书分布式文件***的元数据管理装置的实施例可以应用在电子设备中。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图6所示,为本说明书分布式文件***的元数据管理装置所在电子设备的一种硬件结构图,除了图6所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。The embodiments of the metadata management apparatus of the distributed file system in this specification can be applied in electronic devices. The device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware. Taking software implementation as an example, as a logical device, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory and running them through the processor of the electronic device where it is located. From the hardware level, as shown in Figure 6, it is a hardware structure diagram of the electronic equipment where the metadata management device of the distributed file system in this specification is located. In addition to the processor, memory, network interface, and non- In addition to the volatile memory, the electronic device where the device in the embodiment is located may also include other hardware based on the actual functions of the electronic device, which will not be described again.
图7是本说明书一示例性实施例示出的一种分布式文件***的元数据管理装置的框图。FIG. 7 is a block diagram of a metadata management device of a distributed file system according to an exemplary embodiment of this specification.
请参考图7,所述分布式文件***的元数据管理装置700可以应用在前述图3所示的电子设备上,该电子设备可以为分布式文件***的名字节点。所述分布式文件***中存储有各级目录的关键字与目录元数据的索引节点编号之间的映射关系,本级目录关键字基于上级目录元数据的索引节点编号与本级目录名称生成。所述装置700包括有:Please refer to Figure 7. The metadata management apparatus 700 of the distributed file system can be applied to the electronic device shown in Figure 3, and the electronic device can be the name node of the distributed file system. The distributed file system stores a mapping relationship between the keywords of directories at each level and the index node numbers of the directory metadata. The directory keywords at this level are generated based on the index node numbers of the upper-level directory metadata and the directory name at this level. The device 700 includes:
名称获取单元701,从文件路径中提取出各级目录的目录名称;The name acquisition unit 701 extracts the directory names of directories at each level from the file path;
关键字生成单元702,按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字;The keyword generation unit 702 generates keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata for each extracted directory name in the order of the directory from upper level to lower level;
编号查找单元703,基于所述关键字在所述映射关系中查找本级目录元数据的索 引节点编号;The number search unit 703 searches for the index of the metadata of the current level directory in the mapping relationship based on the keyword. Lead node number;
元数据获取单元704,基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。The metadata obtaining unit 704 obtains the metadata of the file path from the cloud database based on the index node number of the file path.
可选的,所述基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字,包括:Optionally, generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
当本级目录是一级目录时,基于所述目录名称和预设字符生成本级目录的关键字。When the current level directory is a first-level directory, a keyword for the current level directory is generated based on the directory name and preset characters.
可选的,所述基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字,包括:Optionally, generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
基于预设的顺序拼接所述目录名称和上级目录元数据的索引节点编号,以生成本级目录的关键字。The directory name and the index node number of the upper-level directory metadata are spliced based on a preset order to generate a keyword for the current-level directory.
可选的,所述基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据,包括:Optionally, the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
在云数据库中查找所述文件路径的各级目录关键字对应的目录元数据的索引节点编号;Search the cloud database for the index node number of the directory metadata corresponding to the directory keywords at each level of the file path;
判断基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号是否相同;Determine whether the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database;
在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号相同的情况下,基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。In the case where the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database, the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
可选的,还包括:Optional, also includes:
在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号不同的情况下,基于云数据库中查找到的索引节点编号更新所述映射关系。When the index node numbers of directory metadata at all levels found based on the mapping relationship are different from the index node numbers found in the cloud database, the mapping relationship is updated based on the index node numbers found in the cloud database.
可选的,还包括:Optional, also includes:
基于云数据库中查找到的文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Obtain the metadata of the file path from the cloud database based on the index node number of the file path found in the cloud database.
可选的,还包括:Optional, also includes:
若基于所述关键字无法在所述映射关系中查找到本级目录元数据的索引节点编号,则基于所述关键字从云数据库中查找本级目录元数据的索引节点编号,并基于查找到的索引节点编号更新所述映射关系。If the index node number of the directory metadata at this level cannot be found in the mapping relationship based on the keyword, the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
与前述分布式文件***的数据访问方法的实施例相对应,本说明书还提供了分布式文件***的数据访问装置的实施例。Corresponding to the foregoing embodiments of the data access method of the distributed file system, this specification also provides embodiments of the data access apparatus of the distributed file system.
本说明书分布式文件***的数据访问装置的实施例可以应用在电子设备中。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储 器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,本说明书分布式文件***的数据访问装置所在电子设备的硬件结构可与图6所示的电子设备类似,本说明书对此不作特殊限制。The embodiment of the data access device of the distributed file system in this specification can be applied in electronic equipment. The device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware. Taking software implementation as an example, as a logical device, the non-volatile storage is stored in the processor of the electronic device where it is located. The corresponding computer program instructions in the device are read into the memory and run. From a hardware perspective, the hardware structure of the electronic device where the data access device of the distributed file system in this specification is located can be similar to the electronic device shown in Figure 6, and this specification does not place special restrictions on this.
图8是本说明书一示例性实施例示出的一种分布式文件***的数据访问装置的框图。FIG. 8 is a block diagram of a data access device of a distributed file system according to an exemplary embodiment of this specification.
请参考图8,所述分布式文件***的元数据管理装置800可以应用在分布式文件***的名字节点中,包括有:Please refer to Figure 8. The metadata management device 800 of the distributed file system can be applied in the name node of the distributed file system, including:
元数据查询单元801,响应于客户端发送的数据访问请求,根据待访问数据的路径查询对应的元数据。The metadata query unit 801, in response to the data access request sent by the client, queries the corresponding metadata according to the path of the data to be accessed.
其中,所述元数据的查询可采用本说明书提供的元数据管理方法实现。The metadata query can be implemented using the metadata management method provided in this specification.
数据访问单元802,将所述元数据返回给客户端,以供客户端基于所述元数据进行数据访问。The data access unit 802 returns the metadata to the client so that the client can perform data access based on the metadata.
上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。For details on the implementation process of the functions and effects of each unit in the above device, please refer to the implementation process of the corresponding steps in the above method, and will not be described again here.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。As for the device embodiment, since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details. The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
上述实施例阐明的***、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, a laptop, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, or a game controller. desktop, tablet, wearable device, or a combination of any of these devices.
与前述分布式文件***的元数据管理方法的实施例相对应,本说明书还提供一种分布式文件***的元数据管理装置,该装置包括:处理器以及用于存储机器可执行指令的存储器。其中,处理器和存储器通常借由内部总线相互连接。在其他可能的实现方式中,所述设备还可能包括外部接口,以能够与其他设备或者部件进行通信。Corresponding to the foregoing embodiments of the metadata management method of a distributed file system, this specification also provides a metadata management device of a distributed file system, which includes: a processor and a memory for storing machine-executable instructions. Among them, the processor and the memory are usually connected to each other through an internal bus. In other possible implementations, the device may also include an external interface to be able to communicate with other devices or components.
在本实施例中,所述分布式文件***中存储有各级目录的关键字与目录元数据的索引节点编号之间的映射关系,其中,本级目录关键字基于上级目录元数据的索引节点编号与本级目录名称生成。通过读取并执行所述存储器存储的与分布式文件***的元数据管理逻辑对应的机器可执行指令,所述处理器被促使:In this embodiment, the distributed file system stores a mapping relationship between the keywords of directories at each level and the index node numbers of directory metadata, where the directory keywords at this level are based on the index nodes of the upper-level directory metadata. Number and directory name of this level are generated. By reading and executing machine-executable instructions stored in the memory corresponding to the metadata management logic of the distributed file system, the processor is caused to:
从文件路径中提取出各级目录的目录名称;Extract the directory names of directories at all levels from the file path;
按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和 上级目录元数据的索引节点编号生成本级目录的关键字;According to the order of directories from upper level to lower level, for each extracted directory name, based on the directory name and The index node number of the upper-level directory metadata generates the keyword of the current-level directory;
基于所述关键字在所述映射关系中查找本级目录元数据的索引节点编号;Search the index node number of the current level directory metadata in the mapping relationship based on the keyword;
基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Obtain the metadata of the file path from the cloud database based on the index node number of the file path.
可选的,所述基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字,包括:Optionally, generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
当本级目录是一级目录时,基于所述目录名称和预设字符生成本级目录的关键字。When the current level directory is a first-level directory, a keyword for the current level directory is generated based on the directory name and preset characters.
可选的,所述基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字,包括:Optionally, generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
基于预设的顺序拼接所述目录名称和上级目录元数据的索引节点编号,以生成本级目录的关键字。The directory name and the index node number of the upper-level directory metadata are spliced based on a preset order to generate a keyword for the current-level directory.
可选的,所述基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据,包括:Optionally, the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
在云数据库中查找所述文件路径的各级目录关键字对应的目录元数据的索引节点编号;Search the cloud database for the index node number of the directory metadata corresponding to the directory keywords at each level of the file path;
判断基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号是否相同;Determine whether the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database;
在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号相同的情况下,基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。In the case where the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database, the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
可选的,还包括:Optional, also includes:
在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号不同的情况下,基于云数据库中查找到的索引节点编号更新所述映射关系。When the index node numbers of directory metadata at all levels found based on the mapping relationship are different from the index node numbers found in the cloud database, the mapping relationship is updated based on the index node numbers found in the cloud database.
可选的,还包括:Optional, also includes:
基于云数据库中查找到的文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Obtain the metadata of the file path from the cloud database based on the index node number of the file path found in the cloud database.
可选的,还包括:Optional, also includes:
若基于所述关键字无法在所述映射关系中查找到本级目录元数据的索引节点编号,则基于所述关键字从云数据库中查找本级目录元数据的索引节点编号,并基于查找到的索引节点编号更新所述映射关系。If the index node number of the directory metadata at this level cannot be found in the mapping relationship based on the keyword, the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
与前述分布式文件***的数据访问方法的实施例相对应,本说明书还提供一种分布式文件***的数据访问装置,该装置包括:处理器以及用于存储机器可执行指令的存储器。其中,处理器和存储器通常借由内部总线相互连接。在其他可能的实现方式中,所述设备还可能包括外部接口,以能够与其他设备或者部件进行通信。Corresponding to the foregoing embodiments of the data access method of the distributed file system, this specification also provides a data access device of the distributed file system, which includes: a processor and a memory for storing machine-executable instructions. Among them, the processor and the memory are usually connected to each other through an internal bus. In other possible implementations, the device may also include an external interface to be able to communicate with other devices or components.
在本实施例中,通过读取并执行所述存储器存储的与分布式文件***的数据访问 逻辑对应的机器可执行指令,所述处理器被促使:In this embodiment, by reading and executing the data stored in the memory and the distributed file system, The logically corresponding machine-executable instructions that the processor is caused to:
响应于客户端发送的数据访问请求,根据待访问数据的路径查询对应的元数据;In response to the data access request sent by the client, query the corresponding metadata according to the path of the data to be accessed;
将所述元数据返回给客户端,以供客户端基于所述元数据进行数据访问;Return the metadata to the client for the client to perform data access based on the metadata;
其中,所述元数据的查询可采用本说明书提供的元数据管理方法实现。The metadata query can be implemented using the metadata management method provided in this specification.
与前述分布式文件***的元数据管理方法的实施例相对应,所述分布式文件***中存储有各级目录的关键字与目录元数据的索引节点编号之间的映射关系,其中,本级目录关键字基于上级目录元数据的索引节点编号与本级目录名称生成。本说明书还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现以下步骤:Corresponding to the aforementioned embodiment of the metadata management method of the distributed file system, the distributed file system stores a mapping relationship between the keywords of the directories at each level and the index node numbers of the directory metadata, where the current level Directory keywords are generated based on the index node number of the upper-level directory metadata and the name of the current-level directory. This specification also provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the program is executed by the processor, the following steps are implemented:
从文件路径中提取出各级目录的目录名称;Extract the directory names of directories at all levels from the file path;
按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字;According to the order of directories from upper level to lower level, for each extracted directory name, generate keywords for the current level directory based on the directory name and the index node number of the upper level directory metadata;
基于所述关键字在所述映射关系中查找本级目录元数据的索引节点编号;Search the index node number of the current level directory metadata in the mapping relationship based on the keyword;
基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Obtain the metadata of the file path from the cloud database based on the index node number of the file path.
可选的,所述基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字,包括:Optionally, generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
当本级目录是一级目录时,基于所述目录名称和预设字符生成本级目录的关键字。When the current level directory is a first-level directory, a keyword for the current level directory is generated based on the directory name and preset characters.
可选的,所述基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字,包括:Optionally, generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
基于预设的顺序拼接所述目录名称和上级目录元数据的索引节点编号,以生成本级目录的关键字。The directory name and the index node number of the upper-level directory metadata are spliced based on a preset order to generate a keyword for the current-level directory.
可选的,所述基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据,包括:Optionally, the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
在云数据库中查找所述文件路径的各级目录关键字对应的目录元数据的索引节点编号;Search the cloud database for the index node number of the directory metadata corresponding to the directory keywords at each level of the file path;
判断基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号是否相同;Determine whether the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database;
在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号相同的情况下,基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。In the case where the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database, the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
可选的,还包括:Optional, also includes:
在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号不同的情况下,基于云数据库中查找到的索引节点编号更新所述映射关系。When the index node numbers of directory metadata at all levels found based on the mapping relationship are different from the index node numbers found in the cloud database, the mapping relationship is updated based on the index node numbers found in the cloud database.
可选的,还包括: Optional, also includes:
基于云数据库中查找到的文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Obtain the metadata of the file path from the cloud database based on the index node number of the file path found in the cloud database.
可选的,还包括:Optional, also includes:
若基于所述关键字无法在所述映射关系中查找到本级目录元数据的索引节点编号,则基于所述关键字从云数据库中查找本级目录元数据的索引节点编号,并基于查找到的索引节点编号更新所述映射关系。If the index node number of the directory metadata at this level cannot be found in the mapping relationship based on the keyword, the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
与前述分布式文件***的数据访问方法的实施例相对应,本说明书还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现以下步骤:Corresponding to the foregoing embodiments of the data access method of the distributed file system, this specification also provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the program is executed by the processor, the following is implemented: step:
响应于客户端发送的数据访问请求,根据待访问数据的路径查询对应的元数据;In response to the data access request sent by the client, query the corresponding metadata according to the path of the data to be accessed;
将所述元数据返回给客户端,以供客户端基于所述元数据进行数据访问;Return the metadata to the client for the client to perform data access based on the metadata;
其中,所述元数据的查询可采用本说明书提供的元数据管理方法实现。The metadata query can be implemented using the metadata management method provided in this specification.
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.
以上所述仅为本说明书的较佳实施例而已,并不用以限制本说明书,凡在本说明书的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书保护的范围之内。 The above are only preferred embodiments of this specification and are not intended to limit this specification. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this specification shall be included in this specification. within the scope of protection.

Claims (12)

  1. 一种分布式文件***的元数据管理方法,应用于分布式文件***,所述分布式文件***中存储有各级目录的关键字与目录元数据的索引节点编号之间的映射关系,其中,本级目录关键字基于上级目录元数据的索引节点编号与本级目录名称生成,所述方法包括:A metadata management method for a distributed file system, applied to a distributed file system. The distributed file system stores a mapping relationship between keywords of directories at all levels and index node numbers of directory metadata, wherein, The current-level directory keywords are generated based on the index node number of the upper-level directory metadata and the current-level directory name. The method includes:
    从文件路径中提取出各级目录的目录名称;Extract the directory names of directories at all levels from the file path;
    按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字;According to the order of directories from upper level to lower level, for each extracted directory name, generate keywords for the current level directory based on the directory name and the index node number of the upper level directory metadata;
    基于所述关键字在所述映射关系中查找本级目录元数据的索引节点编号;Search the index node number of the current level directory metadata in the mapping relationship based on the keyword;
    基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Obtain the metadata of the file path from the cloud database based on the index node number of the file path.
  2. 根据权利要求1所述的方法,所述基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字,包括:The method according to claim 1, generating the keywords of the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
    当本级目录是一级目录时,基于所述目录名称和预设字符生成本级目录的关键字。When the current level directory is a first-level directory, a keyword for the current level directory is generated based on the directory name and preset characters.
  3. 根据权利要求1所述的方法,所述基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字,包括:The method according to claim 1, generating the keywords of the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
    基于预设的顺序拼接所述目录名称和上级目录元数据的索引节点编号,以生成本级目录的关键字。The directory name and the index node number of the upper-level directory metadata are spliced based on a preset order to generate a keyword for the current-level directory.
  4. 根据权利要求1所述的方法,所述基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据,包括:The method according to claim 1, obtaining the metadata of the file path from the cloud database based on the index node number of the file path, including:
    在云数据库中查找所述文件路径的各级目录关键字对应的目录元数据的索引节点编号;Search the cloud database for the index node number of the directory metadata corresponding to the directory keywords at each level of the file path;
    判断基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号是否相同;Determine whether the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database;
    在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号相同的情况下,基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。In the case where the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database, the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
  5. 根据权利要求4所述的方法,还包括:The method of claim 4, further comprising:
    在基于所述映射关系查找到的各级目录元数据的索引节点编号与在云数据库中查找到的索引节点编号不同的情况下,基于云数据库中查找到的索引节点编号更新所述映射关系。When the index node numbers of directory metadata at all levels found based on the mapping relationship are different from the index node numbers found in the cloud database, the mapping relationship is updated based on the index node numbers found in the cloud database.
  6. 根据权利要求5所述的方法,还包括:The method of claim 5, further comprising:
    基于云数据库中查找到的文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。Obtain the metadata of the file path from the cloud database based on the index node number of the file path found in the cloud database.
  7. 根据权利要求1所述的方法,还包括:The method of claim 1, further comprising:
    若基于所述关键字无法在所述映射关系中查找到本级目录元数据的索引节点编号, 则基于所述关键字从云数据库中查找本级目录元数据的索引节点编号,并基于查找到的索引节点编号更新所述映射关系。If the index node number of the metadata of this level directory cannot be found in the mapping relationship based on the keyword, Then, based on the keyword, the index node number of the current level directory metadata is searched from the cloud database, and the mapping relationship is updated based on the found index node number.
  8. 一种分布式文件***的数据访问方法,应用于分布式文件***,包括:A data access method for distributed file systems, applied to distributed file systems, including:
    响应于客户端发送的数据访问请求,根据待访问数据的路径查询对应的元数据;In response to the data access request sent by the client, query the corresponding metadata according to the path of the data to be accessed;
    将所述元数据返回给客户端,以供客户端基于所述元数据进行数据访问;Return the metadata to the client for the client to perform data access based on the metadata;
    其中,采用如权利要求1-7中任一项所述的方法基于所述路径进行元数据的查询。Wherein, the method as described in any one of claims 1-7 is used to query metadata based on the path.
  9. 一种分布式文件***的元数据管理装置,应用于分布式文件***,所述分布式文件***中存储有各级目录的关键字与目录元数据的索引节点编号之间的映射关系,其中,本级目录关键字基于上级目录元数据的索引节点编号与本级目录名称生成,所述装置包括:A metadata management device for a distributed file system, applied to a distributed file system. The distributed file system stores a mapping relationship between keywords of directories at all levels and index node numbers of directory metadata, wherein, The current-level directory keyword is generated based on the index node number of the upper-level directory metadata and the current-level directory name. The device includes:
    名称获取单元,从文件路径中提取出各级目录的目录名称;The name acquisition unit extracts the directory names of directories at all levels from the file path;
    关键字生成单元,按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字;The keyword generation unit generates keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata for each extracted directory name in the order of the directory from upper level to lower level;
    编号查找单元,基于所述关键字在所述映射关系中查找本级目录元数据的索引节点编号;A number search unit that searches the index node number of the current level directory metadata in the mapping relationship based on the keyword;
    元数据获取单元,基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。A metadata acquisition unit is configured to acquire the metadata of the file path from the cloud database based on the index node number of the file path.
  10. 一种分布式文件***的数据访问装置,应用于分布式文件***,包括:A data access device for a distributed file system, applied to the distributed file system, including:
    元数据查询单元,响应于客户端发送的数据访问请求,根据待访问数据的路径查询对应的元数据;The metadata query unit responds to the data access request sent by the client and queries the corresponding metadata according to the path of the data to be accessed;
    数据访问单元,将所述元数据返回给客户端,以供客户端基于所述元数据进行数据访问;A data access unit returns the metadata to the client so that the client can access data based on the metadata;
    其中,采用如权利要求1-7中任一项所述的方法基于所述路径进行元数据的查询。Wherein, the method as described in any one of claims 1-7 is used to query metadata based on the path.
  11. 一种分布式文件***的元数据管理装置,包括:A metadata management device for a distributed file system, including:
    处理器;processor;
    用于存储机器可执行指令的存储器;Memory used to store machine-executable instructions;
    其中,所述分布式文件***中存储有各级目录的关键字与目录元数据的索引节点编号之间的映射关系,本级目录关键字基于上级目录元数据的索引节点编号与本级目录名称生成,通过读取并执行所述存储器存储的与分布式文件***的元数据管理逻辑对应的机器可执行指令,所述处理器被促使:Wherein, the distributed file system stores a mapping relationship between the keywords of directories at each level and the index node numbers of the directory metadata. The keywords of the directory at this level are based on the index node numbers of the upper-level directory metadata and the name of the directory at this level. Generate, by reading and executing machine-executable instructions stored in the memory corresponding to metadata management logic of the distributed file system, the processor being caused to:
    从文件路径中提取出各级目录的目录名称;Extract the directory names of directories at all levels from the file path;
    按照目录从上级至下级的顺序,针对提取出的各目录名称,基于所述目录名称和上级目录元数据的索引节点编号生成本级目录的关键字;According to the order of directories from upper level to lower level, for each extracted directory name, generate keywords for the current level directory based on the directory name and the index node number of the upper level directory metadata;
    基于所述关键字在所述映射关系中查找本级目录元数据的索引节点编号;Search the index node number of the current level directory metadata in the mapping relationship based on the keyword;
    基于所述文件路径的索引节点编号从云数据库中获取所述文件路径的元数据。 Obtain the metadata of the file path from the cloud database based on the index node number of the file path.
  12. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于使处理器执行如权利要求1-7任一所述的元数据管理方法。 A computer-readable storage medium, the storage medium stores a computer program, the computer program is used to cause a processor to execute the metadata management method according to any one of claims 1-7.
PCT/CN2023/083879 2022-03-25 2023-03-24 Metadata management method and apparatus for distributed file system WO2023179787A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210307777.7A CN114840487A (en) 2022-03-25 2022-03-25 Metadata management method and device for distributed file system
CN202210307777.7 2022-03-25

Publications (1)

Publication Number Publication Date
WO2023179787A1 true WO2023179787A1 (en) 2023-09-28

Family

ID=82564017

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/083879 WO2023179787A1 (en) 2022-03-25 2023-03-24 Metadata management method and apparatus for distributed file system

Country Status (2)

Country Link
CN (1) CN114840487A (en)
WO (1) WO2023179787A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840487A (en) * 2022-03-25 2022-08-02 阿里巴巴(中国)有限公司 Metadata management method and device for distributed file system
CN117873967B (en) * 2024-03-08 2024-05-17 腾讯科技(深圳)有限公司 Data management method, device, equipment and storage medium of distributed file system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160147479A1 (en) * 2014-11-26 2016-05-26 International Business Machines Corporation Metadata storing technique
CN108009254A (en) * 2017-12-05 2018-05-08 北京百度网讯科技有限公司 More indexing means and device, cloud system and computer-readable recording medium
CN111694791A (en) * 2020-04-01 2020-09-22 新华三大数据技术有限公司 Data access method and device in distributed basic framework
CN112988062A (en) * 2021-01-28 2021-06-18 腾讯科技(深圳)有限公司 Metadata reading limiting method and device, electronic equipment and medium
CN113010476A (en) * 2021-03-15 2021-06-22 腾讯科技(深圳)有限公司 Metadata searching method, device and equipment and computer readable storage medium
CN114116613A (en) * 2021-11-26 2022-03-01 北京百度网讯科技有限公司 Metadata query method, equipment and storage medium based on distributed file system
CN114840487A (en) * 2022-03-25 2022-08-02 阿里巴巴(中国)有限公司 Metadata management method and device for distributed file system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685148B (en) * 2012-05-31 2014-10-15 清华大学 Method for realizing secure network backup system under cloud storage environment
JP5843965B2 (en) * 2012-07-13 2016-01-13 株式会社日立ソリューションズ Search device, search device control method, and recording medium
CN103634616B (en) * 2012-08-27 2018-04-17 中兴通讯股份有限公司 A kind of stream media ordering method and device based on cloud storage

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160147479A1 (en) * 2014-11-26 2016-05-26 International Business Machines Corporation Metadata storing technique
CN108009254A (en) * 2017-12-05 2018-05-08 北京百度网讯科技有限公司 More indexing means and device, cloud system and computer-readable recording medium
CN111694791A (en) * 2020-04-01 2020-09-22 新华三大数据技术有限公司 Data access method and device in distributed basic framework
CN112988062A (en) * 2021-01-28 2021-06-18 腾讯科技(深圳)有限公司 Metadata reading limiting method and device, electronic equipment and medium
CN113010476A (en) * 2021-03-15 2021-06-22 腾讯科技(深圳)有限公司 Metadata searching method, device and equipment and computer readable storage medium
CN114116613A (en) * 2021-11-26 2022-03-01 北京百度网讯科技有限公司 Metadata query method, equipment and storage medium based on distributed file system
CN114840487A (en) * 2022-03-25 2022-08-02 阿里巴巴(中国)有限公司 Metadata management method and device for distributed file system

Also Published As

Publication number Publication date
CN114840487A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
JP7113040B2 (en) Versioned hierarchical data structure for distributed data stores
US10754562B2 (en) Key value based block device
US10754878B2 (en) Distributed consistent database implementation within an object store
US11182356B2 (en) Indexing for evolving large-scale datasets in multi-master hybrid transactional and analytical processing systems
US9411840B2 (en) Scalable data structures
WO2023179787A1 (en) Metadata management method and apparatus for distributed file system
US10275489B1 (en) Binary encoding-based optimizations at datastore accelerators
WO2018064962A1 (en) Data storage method, electronic device and computer non-volatile storage medium
US20170075909A1 (en) In-line policy management with multi-level object handle
US7469257B2 (en) Generating and monitoring a multimedia database
US9659023B2 (en) Maintaining and using a cache of child-to-parent mappings in a content-addressable storage system
US11151081B1 (en) Data tiering service with cold tier indexing
WO2018097846A1 (en) Edge store designs for graph databases
US7844596B2 (en) System and method for aiding file searching and file serving by indexing historical filenames and locations
US10146833B1 (en) Write-back techniques at datastore accelerators
US10558636B2 (en) Index page with latch-free access
Hua et al. SANE: Semantic-aware namespacein ultra-large-scale file systems
CN115918110A (en) Spatial search using key-value store
CN107273443B (en) Mixed indexing method based on metadata of big data model
Cheng et al. A Multi-dimensional Index Structure Based on Improved VA-file and CAN in the Cloud
EP3995972A1 (en) Metadata processing method and apparatus, and computer-readable storage medium
US20210286793A1 (en) Indexing stored data objects using probabilistic filters
JP4825504B2 (en) Data registration / retrieval system and data registration / retrieval method
Zhou et al. HDKV: supporting efficient high‐dimensional similarity search in key‐value stores
CN117540056B (en) Method, device, computer equipment and storage medium for data query

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23774030

Country of ref document: EP

Kind code of ref document: A1