CN112395354B - Distributed relational database based on HDFS metadata server and construction method - Google Patents

Distributed relational database based on HDFS metadata server and construction method Download PDF

Info

Publication number
CN112395354B
CN112395354B CN202011224970.1A CN202011224970A CN112395354B CN 112395354 B CN112395354 B CN 112395354B CN 202011224970 A CN202011224970 A CN 202011224970A CN 112395354 B CN112395354 B CN 112395354B
Authority
CN
China
Prior art keywords
hdfs
metadata
unique identifier
regional
metadata server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011224970.1A
Other languages
Chinese (zh)
Other versions
CN112395354A (en
Inventor
李发明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen China Blog Imformation Technology Co ltd
Original Assignee
Shenzhen China Blog Imformation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen China Blog Imformation Technology Co ltd filed Critical Shenzhen China Blog Imformation Technology Co ltd
Priority to CN202011224970.1A priority Critical patent/CN112395354B/en
Publication of CN112395354A publication Critical patent/CN112395354A/en
Application granted granted Critical
Publication of CN112395354B publication Critical patent/CN112395354B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a distributed relational database based on an HDFS (Hadoop distributed File System) metadata server and a construction method thereof, belonging to the field of distributed databases. The construction method partitions data according to the coverage area of the HDFS system, and stores the data in the HDFS according to the partitions; the region HDFS generates metadata according to the stored resource data and sends the metadata to the corresponding child nodes; the child nodes corresponding to the regional HDFS store the metadata, unique identifiers are generated according to the metadata, and the metadata and the unique identifiers are sent to an HDFS metadata server; and the metadata server stores the unique identifier, sends the metadata and the unique identifier to the root node, sets a user interface, and performs layer-by-layer feedback of a user request in the database through the user interface. The invention establishes the relationship between the father node and the child node through the metadata server, and realizes the distributed relational data storage of the cross-region multi-branch data source.

Description

Distributed relational database based on HDFS metadata server and construction method
Technical Field
The invention belongs to the field of distributed databases, and particularly relates to a distributed relational database based on an HDFS (Hadoop distributed File System) metadata server and a construction method thereof.
Background
In an internet platform architecture, a data storage layer is the basis of the whole architecture, and not only needs to effectively organize mass data, but also needs to provide a high-efficiency interface for an upper data base system, so that the requirement of mass structured data analysis is met. For example, for a mature cloud storage system, the database system has a stable architecture, good extensibility, compatibility, and friendly query and index functions. The Hadoop Distributed File System (HDFS) is a distributed File System that operates on general-purpose hardware and has high fault tolerance, and is suitable for being deployed on a cheap machine and has high compatibility. When applying HDFS to a specific information platform base layer, each platform architecture has its own unique structure, and HDFS cannot be directly applied to a platform architecture having a unique structure.
Disclosure of Invention
In view of the above-mentioned defects or shortcomings in the prior art, the present invention aims to provide a distributed relational database based on an HDFS metadata server and a construction method thereof, which establish a relationship between a parent node and a child node through the metadata server, and implement a cross-region, multi-branch data source distributed relational data storage.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present invention provides a distributed relational database based on an HDFS metadata server, where the distributed relational database includes: the system comprises a plurality of regional HDFS, child nodes with the same number as the regional HDFS, an HDFS metadata server, a root node and a user interface; wherein,
the region HDFS is in communication connection with the child nodes, is used for storing resource data of the region, generating metadata of the resource data, sending the metadata to the connected child nodes, receiving metadata requests of the child nodes and feeding back the resource data corresponding to the metadata to the child nodes;
the number of the child nodes is the same as that of the area HDFS, all the child nodes are connected with an HDFS metadata server and used for storing metadata sent by the area HDFS, generating a unique identifier for each metadata and sending the metadata and the unique identifier to the HDFS metadata server; the system is also used for receiving the unique identifier request of the HDFS metadata server to match with corresponding metadata and sending the metadata request to the regional HDFS;
the HDFS metadata server is in communication connection with the root node, and is used for sending the metadata and the unique identifier to the root node, reserving the unique identifier, receiving a user request and sending the user request to the root node; matching a corresponding unique identifier according to the metadata request fed back by the root node, identifying a corresponding child node, and sending the unique identifier request to the corresponding child node;
the root node is used for storing all metadata, receiving a user request from the metadata server, generating a metadata request with a unique identifier according to the user request and feeding back the metadata request to the HDFS metadata server.
In the above scheme, the unique identifier includes a sub-node identification field, a metadata identification field, a storage timing field, and a check field; the child node identification field is used for identifying a child node, the metadata identification field is used for matching a unique identifier according to a metadata request, the storage time sequence field is used for updating data, and the check field is used for checking matching and identification.
In the above scheme, the HDFS metadata server has an extended interface, and the extended interface includes a JDBC/ODBC interface and a User Shell that directly interacts with the server through a command line.
In the above scheme, each of the plurality of areas HDFS corresponds to an actual geographical area or a department or a central office in a system.
In the above scheme, the area HDFS is for a branch across a region, each branch corresponds to its own area HDFS, data and computational resources inside each area HDFS are different, and the corresponding unique identifiers of child nodes have the same structure, but the specific identifiers are different.
In the above scheme, when the user in the area stores data, the area HDFS directly stores the data in the area HDFS through the storage interface.
In the scheme, the region HDFS adopts a cloud storage mode.
In a second aspect, an embodiment of the present invention further provides a method for constructing a distributed relational database based on an HDFS metadata server, where the method for constructing the distributed relational database includes the following steps:
step S1, partitioning the data according to the coverage of the HDFS system, and storing the data in the region HDFS according to the partitions;
step S2, the regional HDFS generates metadata according to the stored resource data and sends the metadata to the corresponding child nodes;
step S3, the child node corresponding to the area HDFS stores the metadata, generates a unique identifier according to the metadata, and sends the metadata and the unique identifier to an HDFS metadata server;
step S4, the metadata server storing the unique identifier and sending the metadata and the unique identifier to a root node; simultaneously setting a user interface in the metadata server;
at step S5, the root node stores all metadata and corresponding unique identifiers.
The technical scheme of the embodiment of the invention has the following beneficial effects:
the distributed relational database based on the HDFS metadata server can bear large data with scales above PB level, and the system capacity is expanded by simply expanding the number of the regional HDFS and the sub-nodes, so that the distributed relational database has excellent expansibility; the regional HDFS runs on a low-cost commercial cluster, the compatibility solves the problems of capacity expansion and cost caused by data growth, and meanwhile, the reliability, safety and high availability of data are guaranteed; in addition, the HDFS can effectively support the upper distributed RDBMS system, provide a high-speed data operation and access interface, and provide the capability of performing aggregation, merging, extraction, and analysis operations on data. The construction method establishes the relationship between the father node and the child node through the metadata server, and achieves the cross-region and multi-branch data source distributed relational data storage.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic structural diagram of a distributed relational database based on an HDFS metadata server according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for constructing a distributed relational database based on an HDFS metadata server according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
The embodiment of the invention provides a distributed relational database based on an HDFS (Hadoop distributed File System) metadata server and a construction method thereof.
The HDFS is a Hadoop distributed file system and comprises two data storage modes of centralization and decentralization. In the decentralized HDFS, metadata distribution is calculated through an algorithm, and a metadata calculation and management module is required to be arranged; the centralized HDFS is provided with a metadata server which is used for carrying out global supervision on all storage units in the distributed file system to realize global scheduling, but when the metadata server faces mass data, the metadata amount can also increase in a series manner, and the storage pressure, the access pressure and the network pressure interacting with a bottom layer of the metadata can increase along with the increase of the system scale, so that the centralized HDFS cannot be applied to the HDFS under the condition of large data.
For example, when a nationwide or globally based teaching and scientific research information interaction platform needs to be constructed, a distributed relational database is used for storing, managing, accessing, retrieving, uploading, downloading and scheduling all data, the feedback speed of the system can be seriously reduced by mass data, when the number of interaction users increases, the metadata amount also increases in a series manner, and the HDFS system cannot be applied.
Fig. 1 is a schematic diagram illustrating a structure of a distributed relational database based on an HDFS metadata server according to an embodiment of the present invention. As shown in fig. 1, the distributed relational database includes: the system comprises a plurality of region HDFS, child nodes with the same number as the region HDFS, an HDFS metadata server, a root node and a user interface.
The region HDFS is in communication connection with the child nodes, and is used for storing resource data of the region where the region HDFS is located, generating metadata of the resource data and sending the metadata to the connected child nodes, receiving metadata requests of the child nodes, and feeding back the resource data corresponding to the metadata to the child nodes. When the users in the area store data, the data are directly stored in the area HDFS through a storage interface. Meanwhile, the regional HDFS can adopt a cloud storage mode, so that the system performance can be effectively expanded.
The number of the child nodes is the same as that of the area HDFS, all the child nodes are connected with an HDFS metadata server and used for storing metadata sent by the area HDFS, generating a unique identifier for each metadata and sending the metadata and the unique identifier to the HDFS metadata server; and the system is also used for receiving the unique identifier request of the HDFS metadata server to match corresponding metadata and sending the metadata request to the area HDFS.
The HDFS metadata server is in communication connection with the root node, and is used for sending the metadata and the unique identifier to the root node, reserving the unique identifier, receiving a user request and sending the user request to the root node; and matching the corresponding unique identifier according to the metadata request fed back by the root node, identifying the corresponding child node, and sending the unique identifier request to the corresponding child node.
The root node is used for storing all metadata, receiving a user request from the metadata server, generating a metadata request with a unique identifier according to the user request and feeding back the metadata request to the HDFS metadata server.
As described above, since the unique identifiers associated with all metadata posted by all child nodes are stored in the HDFS metadata server, reasonable storage resource allocation and management are required for all unique identifiers. The unique identifier is composed of several parts including a sub-node identification field, a metadata identification field, a storage timing field and a check field. The child node identification field is used for identifying a child node, the metadata identification field is used for matching a unique identifier according to a metadata request, the storage time sequence field is used for updating data, and the check field is used for checking matching and identification. Through the field combination, the generated storage capacity is far smaller than the identifier of the metadata, so that the storage space of a metadata server is saved, when the metadata server faces mass data, the metadata server can perform effective transverse expansion on the HDFS, and the storage capacity of the system is improved on the premise that the storage efficiency, the calling efficiency and the server performance are not influenced.
In the HDFS metadata server in this embodiment, metadata is stored in the root node, and data matching and identification of the regional HDFS are performed by the child nodes, so that performance of the HDFS system is effectively and reasonably improved.
Preferably, the HDFS metadata server further has an extension interface, and the extension interface includes a JDBC/ODBC interface and a User Shell that directly interacts with the server through a command line. The programmer realizes interaction with the HDFS metadata server through the interface, and correspondingly expands the metadata server after the child node or the root node is expanded, so that the server has good compatibility and expansibility.
As described above, a large amount of resource data is stored in the region HDFS, and is used by a user to call the resource data therein through metadata. In this embodiment, the HDFS includes a plurality of areas, and each area HDFS may correspond to an actual geographic area, or may correspond to a department or a central office in a large system. For a large-scale system providing public service in the center, each room in the center has the requirements for establishing a data warehouse and performing data analysis on the system, and a plurality of works can be simultaneously carried out in the room. Therefore, in the future, a plurality of data warehouses are accommodated in the system, a plurality of data sets such as data tables are accommodated in each data warehouse, a plurality of applications operate on the data at the same time in an application layer, the HDFS based on the HDFS metadata server is effectively integrated through the root node and the child nodes, and when a plurality of application layer users operate on the bottom layer data at the same time, operation tasks can be achieved in parallel without blocking.
Preferably, the area HDFS is for a branch office across a region, each branch office corresponds to its own area HDFS, data and computing resources inside each area HDFS are different, and the corresponding unique identifiers of child nodes have the same structure, but the specific identifiers are different.
Aiming at the distributed relational database based on the HDFS metadata server, the embodiment of the invention also provides a construction method of the distributed relational database based on the HDFS metadata server. As shown in fig. 2, the construction method includes the following steps:
and step S1, partitioning the data according to the coverage of the HDFS system, and storing the data in the area HDFS according to the partitions.
And step S2, the region HDFS generates metadata according to the stored resource data and sends the metadata to the corresponding child node.
And step S3, the child node corresponding to the region HDFS stores the metadata, generates a unique identifier according to the metadata, and sends the metadata and the unique identifier to the HDFS metadata server.
Step S4, the metadata server storing the unique identifier and sending the metadata and the unique identifier to a root node; simultaneously setting a user interface in the metadata server;
at step S5, the root node stores all metadata and corresponding unique identifiers.
By the construction method, the distributed relational database based on the HDFS metadata server is constructed, and because the construction method corresponds to the distributed relational database, the description and limitation of the structure of the relational database are also applicable to the construction method, and are not repeated herein.
As can be seen from the above, according to the distributed relational database based on the HDFS metadata server and the construction method provided by the embodiment of the present invention, the constructed distributed relational database can bear large data of a scale above PB level, and the system capacity is extended by simply extending the number of regional HDFS and child nodes, so that the present invention has excellent extensibility; the regional HDFS runs on a low-cost commercial cluster, the capacity expansion and cost problems caused by data growth are solved, and meanwhile, the reliability, safety and high availability of data are guaranteed. In addition, the HDFS can effectively support the upper distributed RDBMS system, provide a high-speed data operation and access interface, and provide the capability of performing aggregation, merging, extraction, and analysis operations on data.
The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features and (but not limited to) features having similar functions disclosed in the present invention are mutually replaced to form the technical solution.

Claims (6)

1. A distributed relational database based on HDFS metadata servers, the distributed relational database comprising: the system comprises a plurality of regional HDFS, child nodes with the same number as the regional HDFS, an HDFS metadata server, a root node and a user interface; wherein,
the region HDFS is in communication connection with the child nodes, is used for storing resource data of the region, generating metadata of the resource data, sending the metadata to the connected child nodes, receiving metadata requests of the child nodes and feeding back the resource data corresponding to the metadata to the child nodes;
the number of the child nodes is the same as that of the area HDFS, all the child nodes are connected with an HDFS metadata server and used for storing metadata sent by the area HDFS, generating a unique identifier for each metadata and sending the metadata and the unique identifier to the HDFS metadata server; the system is also used for receiving a unique identifier request of the HDFS metadata server, matching corresponding metadata and sending the metadata request to the regional HDFS;
the HDFS metadata server is in communication connection with the root node, and is used for sending the metadata and the unique identifier to the root node, reserving the unique identifier, receiving a user request and sending the user request to the root node; matching a corresponding unique identifier according to the metadata request fed back by the root node, identifying a corresponding child node, and sending the unique identifier request to the corresponding child node;
the root node is used for storing all metadata, receiving a user request from a metadata server, generating a metadata request with a unique identifier according to the user request and feeding back the metadata request to the HDFS metadata server;
the unique identifier comprises a child node identification field, a metadata identification field, a storage timing field and a check field; the child node identification field is used for identifying a child node, the metadata identification field is used for matching a unique identifier according to a metadata request, the storage time sequence field is used for updating data, and the check field is used for checking matching and identification.
2. The HDFS metadata server-based distributed relational database according to claim 1, wherein the HDFS metadata server has an extended interface comprising a JDBC/ODBC interface and a User Shell that directly interacts with the server through a command line.
3. The HDFS metadata server-based distributed relational database according to claim 1, wherein the plurality of regional HDFS, each regional HDFS corresponding to an actual geographic area or a department or a central office within a system.
4. The HDFS metadata server-based distributed relational database according to claim 3, wherein the regional HDFS is for across-regional affiliates, each affiliate corresponds to its own regional HDFS, data and computational resources within each regional HDFS are different, and the corresponding unique child node identifiers have the same structure but the specific identifiers are different.
5. The HDFS metadata server-based distributed relational database according to any one of claims 1 to 4, wherein the regional HDFS stores data directly in the regional HDFS through a storage interface when users in the region perform data storage.
6. The HDFS metadata server-based distributed relational database according to any one of claims 1 to 4, wherein the regional HDFS is in a cloud storage manner.
CN202011224970.1A 2020-11-05 2020-11-05 Distributed relational database based on HDFS metadata server and construction method Active CN112395354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011224970.1A CN112395354B (en) 2020-11-05 2020-11-05 Distributed relational database based on HDFS metadata server and construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011224970.1A CN112395354B (en) 2020-11-05 2020-11-05 Distributed relational database based on HDFS metadata server and construction method

Publications (2)

Publication Number Publication Date
CN112395354A CN112395354A (en) 2021-02-23
CN112395354B true CN112395354B (en) 2022-08-02

Family

ID=74598398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011224970.1A Active CN112395354B (en) 2020-11-05 2020-11-05 Distributed relational database based on HDFS metadata server and construction method

Country Status (1)

Country Link
CN (1) CN112395354B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103067461A (en) * 2012-12-18 2013-04-24 曙光信息产业(北京)有限公司 Metadata management system of document and metadata management method thereof
CN103647797A (en) * 2013-11-15 2014-03-19 北京邮电大学 Distributed file system and data access method thereof
CN104113597A (en) * 2014-07-18 2014-10-22 西安交通大学 Multi- data-centre hadoop distributed file system (HDFS) data read-write system and method
US10318491B1 (en) * 2015-03-31 2019-06-11 EMC IP Holding Company LLC Object metadata query with distributed processing systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103067461A (en) * 2012-12-18 2013-04-24 曙光信息产业(北京)有限公司 Metadata management system of document and metadata management method thereof
CN103647797A (en) * 2013-11-15 2014-03-19 北京邮电大学 Distributed file system and data access method thereof
CN104113597A (en) * 2014-07-18 2014-10-22 西安交通大学 Multi- data-centre hadoop distributed file system (HDFS) data read-write system and method
US10318491B1 (en) * 2015-03-31 2019-06-11 EMC IP Holding Company LLC Object metadata query with distributed processing systems

Also Published As

Publication number Publication date
CN112395354A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
CN109492040B (en) System suitable for processing mass short message data in data center
CN102244685B (en) Distributed type dynamic cache expanding method and system for supporting load balancing
CN102567495B (en) Mass information storage system and implementation method
CN102467570B (en) Connection query system and method for distributed data warehouse
CN111327681A (en) Cloud computing data platform construction method based on Kubernetes
US9996552B2 (en) Method for generating a dataset structure for location-based services and method and system for providing location-based services to a mobile device
US20160012118A1 (en) System and methods for mapping and searching objects in multidimensional space
CN104461740A (en) Cross-domain colony computing resource gathering and distributing method
CN105554123B (en) Large capacity perceives cloud computing platform system
CN102164184A (en) Computer entity access and management method for cloud computing network and cloud computing network
CN109933631A (en) Distributed parallel database system and data processing method based on Infiniband network
CN102033938A (en) Secondary mapping-based cluster dynamic expansion method
CN114647716B (en) System suitable for generalized data warehouse
CN108073696A (en) GIS application processes based on distributed memory database
CN105339899A (en) Method and controller for clustering applications in a software-defined network
CN108460072A (en) With electricity consumption data retrieval method and system
CN102103657A (en) Virtual world system and method for realizing virtual world
CN1326363C (en) Network management configuration method and apparatus thereof
CN103034650A (en) System and method for processing data
CN112231351A (en) Real-time query method and device for PB-level mass data
CN113127526A (en) Distributed data storage and retrieval system based on Kubernetes
CN108228725A (en) GIS application systems based on distributed data base
CN112395354B (en) Distributed relational database based on HDFS metadata server and construction method
CN112463755B (en) System and method for storing and reading big data of heterogeneous Internet of things based on HDFS
CN107908713B (en) Distributed dynamic rhododendron filtering system based on Redis cluster and filtering method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant