CN112395354B

CN112395354B - Distributed relational database based on HDFS metadata server and construction method

Info

Publication number: CN112395354B
Application number: CN202011224970.1A
Authority: CN
Inventors: 李发明
Original assignee: Shenzhen China Blog Imformation Technology Co ltd
Current assignee: Shenzhen China Blog Imformation Technology Co ltd
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2022-08-02
Anticipated expiration: 2040-11-05
Also published as: CN112395354A

Abstract

The invention provides a distributed relational database based on an HDFS (Hadoop distributed File System) metadata server and a construction method thereof, belonging to the field of distributed databases. The construction method partitions data according to the coverage area of the HDFS system, and stores the data in the HDFS according to the partitions; the region HDFS generates metadata according to the stored resource data and sends the metadata to the corresponding child nodes; the child nodes corresponding to the regional HDFS store the metadata, unique identifiers are generated according to the metadata, and the metadata and the unique identifiers are sent to an HDFS metadata server; and the metadata server stores the unique identifier, sends the metadata and the unique identifier to the root node, sets a user interface, and performs layer-by-layer feedback of a user request in the database through the user interface. The invention establishes the relationship between the father node and the child node through the metadata server, and realizes the distributed relational data storage of the cross-region multi-branch data source.

Description

Distributed relational database based on HDFS metadata server and construction method

Technical Field

The invention belongs to the field of distributed databases, and particularly relates to a distributed relational database based on an HDFS (Hadoop distributed File System) metadata server and a construction method thereof.

Background

In an internet platform architecture, a data storage layer is the basis of the whole architecture, and not only needs to effectively organize mass data, but also needs to provide a high-efficiency interface for an upper data base system, so that the requirement of mass structured data analysis is met. For example, for a mature cloud storage system, the database system has a stable architecture, good extensibility, compatibility, and friendly query and index functions. The Hadoop Distributed File System (HDFS) is a distributed File System that operates on general-purpose hardware and has high fault tolerance, and is suitable for being deployed on a cheap machine and has high compatibility. When applying HDFS to a specific information platform base layer, each platform architecture has its own unique structure, and HDFS cannot be directly applied to a platform architecture having a unique structure.

Disclosure of Invention

In view of the above-mentioned defects or shortcomings in the prior art, the present invention aims to provide a distributed relational database based on an HDFS metadata server and a construction method thereof, which establish a relationship between a parent node and a child node through the metadata server, and implement a cross-region, multi-branch data source distributed relational data storage.

In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:

in a first aspect, an embodiment of the present invention provides a distributed relational database based on an HDFS metadata server, where the distributed relational database includes: the system comprises a plurality of regional HDFS, child nodes with the same number as the regional HDFS, an HDFS metadata server, a root node and a user interface; wherein,

the region HDFS is in communication connection with the child nodes, is used for storing resource data of the region, generating metadata of the resource data, sending the metadata to the connected child nodes, receiving metadata requests of the child nodes and feeding back the resource data corresponding to the metadata to the child nodes;

the number of the child nodes is the same as that of the area HDFS, all the child nodes are connected with an HDFS metadata server and used for storing metadata sent by the area HDFS, generating a unique identifier for each metadata and sending the metadata and the unique identifier to the HDFS metadata server; the system is also used for receiving the unique identifier request of the HDFS metadata server to match with corresponding metadata and sending the metadata request to the regional HDFS;

the HDFS metadata server is in communication connection with the root node, and is used for sending the metadata and the unique identifier to the root node, reserving the unique identifier, receiving a user request and sending the user request to the root node; matching a corresponding unique identifier according to the metadata request fed back by the root node, identifying a corresponding child node, and sending the unique identifier request to the corresponding child node;

the root node is used for storing all metadata, receiving a user request from the metadata server, generating a metadata request with a unique identifier according to the user request and feeding back the metadata request to the HDFS metadata server.

In the above scheme, the unique identifier includes a sub-node identification field, a metadata identification field, a storage timing field, and a check field; the child node identification field is used for identifying a child node, the metadata identification field is used for matching a unique identifier according to a metadata request, the storage time sequence field is used for updating data, and the check field is used for checking matching and identification.

In the above scheme, the HDFS metadata server has an extended interface, and the extended interface includes a JDBC/ODBC interface and a User Shell that directly interacts with the server through a command line.

In the above scheme, each of the plurality of areas HDFS corresponds to an actual geographical area or a department or a central office in a system.

In the above scheme, the area HDFS is for a branch across a region, each branch corresponds to its own area HDFS, data and computational resources inside each area HDFS are different, and the corresponding unique identifiers of child nodes have the same structure, but the specific identifiers are different.

In the above scheme, when the user in the area stores data, the area HDFS directly stores the data in the area HDFS through the storage interface.

In the scheme, the region HDFS adopts a cloud storage mode.

In a second aspect, an embodiment of the present invention further provides a method for constructing a distributed relational database based on an HDFS metadata server, where the method for constructing the distributed relational database includes the following steps:

step S1, partitioning the data according to the coverage of the HDFS system, and storing the data in the region HDFS according to the partitions;

step S2, the regional HDFS generates metadata according to the stored resource data and sends the metadata to the corresponding child nodes;

step S3, the child node corresponding to the area HDFS stores the metadata, generates a unique identifier according to the metadata, and sends the metadata and the unique identifier to an HDFS metadata server;

step S4, the metadata server storing the unique identifier and sending the metadata and the unique identifier to a root node; simultaneously setting a user interface in the metadata server;

at step S5, the root node stores all metadata and corresponding unique identifiers.

The technical scheme of the embodiment of the invention has the following beneficial effects:

the distributed relational database based on the HDFS metadata server can bear large data with scales above PB level, and the system capacity is expanded by simply expanding the number of the regional HDFS and the sub-nodes, so that the distributed relational database has excellent expansibility; the regional HDFS runs on a low-cost commercial cluster, the compatibility solves the problems of capacity expansion and cost caused by data growth, and meanwhile, the reliability, safety and high availability of data are guaranteed; in addition, the HDFS can effectively support the upper distributed RDBMS system, provide a high-speed data operation and access interface, and provide the capability of performing aggregation, merging, extraction, and analysis operations on data. The construction method establishes the relationship between the father node and the child node through the metadata server, and achieves the cross-region and multi-branch data source distributed relational data storage.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic structural diagram of a distributed relational database based on an HDFS metadata server according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for constructing a distributed relational database based on an HDFS metadata server according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

The embodiment of the invention provides a distributed relational database based on an HDFS (Hadoop distributed File System) metadata server and a construction method thereof.

The HDFS is a Hadoop distributed file system and comprises two data storage modes of centralization and decentralization. In the decentralized HDFS, metadata distribution is calculated through an algorithm, and a metadata calculation and management module is required to be arranged; the centralized HDFS is provided with a metadata server which is used for carrying out global supervision on all storage units in the distributed file system to realize global scheduling, but when the metadata server faces mass data, the metadata amount can also increase in a series manner, and the storage pressure, the access pressure and the network pressure interacting with a bottom layer of the metadata can increase along with the increase of the system scale, so that the centralized HDFS cannot be applied to the HDFS under the condition of large data.

For example, when a nationwide or globally based teaching and scientific research information interaction platform needs to be constructed, a distributed relational database is used for storing, managing, accessing, retrieving, uploading, downloading and scheduling all data, the feedback speed of the system can be seriously reduced by mass data, when the number of interaction users increases, the metadata amount also increases in a series manner, and the HDFS system cannot be applied.

Fig. 1 is a schematic diagram illustrating a structure of a distributed relational database based on an HDFS metadata server according to an embodiment of the present invention. As shown in fig. 1, the distributed relational database includes: the system comprises a plurality of region HDFS, child nodes with the same number as the region HDFS, an HDFS metadata server, a root node and a user interface.

The region HDFS is in communication connection with the child nodes, and is used for storing resource data of the region where the region HDFS is located, generating metadata of the resource data and sending the metadata to the connected child nodes, receiving metadata requests of the child nodes, and feeding back the resource data corresponding to the metadata to the child nodes. When the users in the area store data, the data are directly stored in the area HDFS through a storage interface. Meanwhile, the regional HDFS can adopt a cloud storage mode, so that the system performance can be effectively expanded.

The number of the child nodes is the same as that of the area HDFS, all the child nodes are connected with an HDFS metadata server and used for storing metadata sent by the area HDFS, generating a unique identifier for each metadata and sending the metadata and the unique identifier to the HDFS metadata server; and the system is also used for receiving the unique identifier request of the HDFS metadata server to match corresponding metadata and sending the metadata request to the area HDFS.

The HDFS metadata server is in communication connection with the root node, and is used for sending the metadata and the unique identifier to the root node, reserving the unique identifier, receiving a user request and sending the user request to the root node; and matching the corresponding unique identifier according to the metadata request fed back by the root node, identifying the corresponding child node, and sending the unique identifier request to the corresponding child node.

As described above, since the unique identifiers associated with all metadata posted by all child nodes are stored in the HDFS metadata server, reasonable storage resource allocation and management are required for all unique identifiers. The unique identifier is composed of several parts including a sub-node identification field, a metadata identification field, a storage timing field and a check field. The child node identification field is used for identifying a child node, the metadata identification field is used for matching a unique identifier according to a metadata request, the storage time sequence field is used for updating data, and the check field is used for checking matching and identification. Through the field combination, the generated storage capacity is far smaller than the identifier of the metadata, so that the storage space of a metadata server is saved, when the metadata server faces mass data, the metadata server can perform effective transverse expansion on the HDFS, and the storage capacity of the system is improved on the premise that the storage efficiency, the calling efficiency and the server performance are not influenced.

In the HDFS metadata server in this embodiment, metadata is stored in the root node, and data matching and identification of the regional HDFS are performed by the child nodes, so that performance of the HDFS system is effectively and reasonably improved.

Preferably, the HDFS metadata server further has an extension interface, and the extension interface includes a JDBC/ODBC interface and a User Shell that directly interacts with the server through a command line. The programmer realizes interaction with the HDFS metadata server through the interface, and correspondingly expands the metadata server after the child node or the root node is expanded, so that the server has good compatibility and expansibility.

As described above, a large amount of resource data is stored in the region HDFS, and is used by a user to call the resource data therein through metadata. In this embodiment, the HDFS includes a plurality of areas, and each area HDFS may correspond to an actual geographic area, or may correspond to a department or a central office in a large system. For a large-scale system providing public service in the center, each room in the center has the requirements for establishing a data warehouse and performing data analysis on the system, and a plurality of works can be simultaneously carried out in the room. Therefore, in the future, a plurality of data warehouses are accommodated in the system, a plurality of data sets such as data tables are accommodated in each data warehouse, a plurality of applications operate on the data at the same time in an application layer, the HDFS based on the HDFS metadata server is effectively integrated through the root node and the child nodes, and when a plurality of application layer users operate on the bottom layer data at the same time, operation tasks can be achieved in parallel without blocking.

Preferably, the area HDFS is for a branch office across a region, each branch office corresponds to its own area HDFS, data and computing resources inside each area HDFS are different, and the corresponding unique identifiers of child nodes have the same structure, but the specific identifiers are different.

Aiming at the distributed relational database based on the HDFS metadata server, the embodiment of the invention also provides a construction method of the distributed relational database based on the HDFS metadata server. As shown in fig. 2, the construction method includes the following steps:

and step S1, partitioning the data according to the coverage of the HDFS system, and storing the data in the area HDFS according to the partitions.

And step S2, the region HDFS generates metadata according to the stored resource data and sends the metadata to the corresponding child node.

And step S3, the child node corresponding to the region HDFS stores the metadata, generates a unique identifier according to the metadata, and sends the metadata and the unique identifier to the HDFS metadata server.

By the construction method, the distributed relational database based on the HDFS metadata server is constructed, and because the construction method corresponds to the distributed relational database, the description and limitation of the structure of the relational database are also applicable to the construction method, and are not repeated herein.

As can be seen from the above, according to the distributed relational database based on the HDFS metadata server and the construction method provided by the embodiment of the present invention, the constructed distributed relational database can bear large data of a scale above PB level, and the system capacity is extended by simply extending the number of regional HDFS and child nodes, so that the present invention has excellent extensibility; the regional HDFS runs on a low-cost commercial cluster, the capacity expansion and cost problems caused by data growth are solved, and meanwhile, the reliability, safety and high availability of data are guaranteed. In addition, the HDFS can effectively support the upper distributed RDBMS system, provide a high-speed data operation and access interface, and provide the capability of performing aggregation, merging, extraction, and analysis operations on data.

The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features and (but not limited to) features having similar functions disclosed in the present invention are mutually replaced to form the technical solution.

Claims

1. A distributed relational database based on HDFS metadata servers, the distributed relational database comprising: the system comprises a plurality of regional HDFS, child nodes with the same number as the regional HDFS, an HDFS metadata server, a root node and a user interface; wherein,

the number of the child nodes is the same as that of the area HDFS, all the child nodes are connected with an HDFS metadata server and used for storing metadata sent by the area HDFS, generating a unique identifier for each metadata and sending the metadata and the unique identifier to the HDFS metadata server; the system is also used for receiving a unique identifier request of the HDFS metadata server, matching corresponding metadata and sending the metadata request to the regional HDFS;

the root node is used for storing all metadata, receiving a user request from a metadata server, generating a metadata request with a unique identifier according to the user request and feeding back the metadata request to the HDFS metadata server;

the unique identifier comprises a child node identification field, a metadata identification field, a storage timing field and a check field; the child node identification field is used for identifying a child node, the metadata identification field is used for matching a unique identifier according to a metadata request, the storage time sequence field is used for updating data, and the check field is used for checking matching and identification.

2. The HDFS metadata server-based distributed relational database according to claim 1, wherein the HDFS metadata server has an extended interface comprising a JDBC/ODBC interface and a User Shell that directly interacts with the server through a command line.

3. The HDFS metadata server-based distributed relational database according to claim 1, wherein the plurality of regional HDFS, each regional HDFS corresponding to an actual geographic area or a department or a central office within a system.

4. The HDFS metadata server-based distributed relational database according to claim 3, wherein the regional HDFS is for across-regional affiliates, each affiliate corresponds to its own regional HDFS, data and computational resources within each regional HDFS are different, and the corresponding unique child node identifiers have the same structure but the specific identifiers are different.

5. The HDFS metadata server-based distributed relational database according to any one of claims 1 to 4, wherein the regional HDFS stores data directly in the regional HDFS through a storage interface when users in the region perform data storage.

6. The HDFS metadata server-based distributed relational database according to any one of claims 1 to 4, wherein the regional HDFS is in a cloud storage manner.