CN108984635A - A kind of HDFS storage system and date storage method - Google Patents

A kind of HDFS storage system and date storage method Download PDF

Info

Publication number
CN108984635A
CN108984635A CN201810643546.7A CN201810643546A CN108984635A CN 108984635 A CN108984635 A CN 108984635A CN 201810643546 A CN201810643546 A CN 201810643546A CN 108984635 A CN108984635 A CN 108984635A
Authority
CN
China
Prior art keywords
management node
storage
metadata
metadata management
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810643546.7A
Other languages
Chinese (zh)
Inventor
白学余
海鑫
高四辈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201810643546.7A priority Critical patent/CN108984635A/en
Publication of CN108984635A publication Critical patent/CN108984635A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of HDFS storage systems, comprising: multiple metadata management nodes, the distributed system High Availabitity component connecting with each metadata management node, metadata storage pool;Wherein, the metadata management node is used to receive and handle the storage request of data to be stored;The distributed system High Availabitity component, in the case where current distributed system High Availabitity component corresponding current meta data management node delay machine, the storage for being sent to current meta data management node request to be transferred on another metadata management node;The metadata storage pool, for storing the data to be stored, the multiple metadata management node establishes communication linkage with the metadata storage pool.HDFS storage system provided by the present invention ensure that the data consistency in service handoff procedure, data to be stored will not be lost in handoff procedure.The present invention also provides a kind of date storage methods, have above-mentioned beneficial effect.

Description

A kind of HDFS storage system and date storage method
Technical field
The present invention relates to technical field of data storage, store more particularly to a kind of HDFS storage system and a kind of data Method.
Background technique
HDFS is the storage assembly of Hadoop big data, is responsible for the storage of overall data, NameNode is first number of HDFS According to management module, if NameNode goes wrong, it is unavailable to will lead to whole HDFS storage system, for this purpose, HDFS has pushed away base In the High Availabitity solution of active-standby mode, same time, main NameNode is responsible for the data storage service of big data, if main NameNode goes wrong, and can take over service from NameNode, to carry out the storage service of big data entirety.
It is active shape that active and standby NameNode framework, which only has main NameNode in the same time, in traditional HDFS storage system State can receive the storage request of data;Standby NameNode is in Standy state, and active and standby NameNode enjoys a storage jointly Region, when switching, standby NameNode reads shared storage area, obtains newest state, becomes main NameNode.This The High Availabitity of kind storage mode is likely to result in the loss and inconsistence problems of data in switching, meanwhile, synchronization, only There is a NameNode that overall load can be allowed heavier.
Since existing HDFS storage system uses log management mode, and temporally log is exported, such as The main NameNode of fruit goes wrong, and will read log from NameNode, and take over to service.Since log is by certain time Derived from interval, if data, which are not synchronized to, will lead to the loss of data from NameNode and asks before main NameNode failure Topic.And only one NameNode of the same time is externally serviced, and there are problems that overload.
In summary as can be seen that guaranteeing in service handoff procedure how when main metadata management node sends failure Data consistency is current problem to be solved.
Summary of the invention
The object of the present invention is to provide a kind of HDFS storage systems, can be automatic in a certain metadata management node failure It switches on other metadata management nodes, guarantees the data consistency in service handoff procedure.The present invention also provides one Kind date storage method, has above-mentioned beneficial effect.
In order to solve the above technical problems, the present invention provides a kind of HDFS storage system, comprising: multiple metadata management sections Point, the distributed system High Availabitity component being connect with each metadata management node, metadata storage pool;Wherein, first number It is used to receive and handle the storage request of data to be stored according to management node;The distributed system High Availabitity component is used for Currently in the case where the corresponding current meta data management node delay machine of distributed system High Availabitity component, it will be sent to described current The storage request of metadata management node is transferred on another metadata management node;The metadata storage pool, for storing The data to be stored, the multiple metadata management node establish communication linkage with the metadata storage pool.
Preferably, further includes: client, the client and the multiple metadata management node, which are established, to be communicated to connect; The multiple metadata management node provides multiple virtual ip address to the client.
Preferably, the client is asked to the storage that the virtual ip address of the metadata management node sends storing data It asks;In the case where the virtual ip address corresponding metadata management node delay machine, the distributed system High Availabitity component The virtual ip address is transferred on another metadata management node.
Preferably, the multiple metadata management node is specifically used for: receive and handle transmitted by the client to The storage request of storing data, the file directory tree for safeguarding entire file system and maintenance documentation and data block block list Corresponding relationship.
Preferably, the metadata storage pool be distributed storage pond, the multiple metadata management node with it is described Metadata cluster in metadata storage pool keeps communication.
Preferably, further includes: the back end being connect with the metadata storage pool, for according to the client or institute The scheduling storage and retrieval data of metadata management node are stated, and is sent out every prefixed time interval to the metadata management node Send the list of the back end institute memory block block.
The present invention also provides a kind of date storage methods, comprising:
The storage that data to be stored is received and handled using multiple metadata management nodes is requested;
Wherein, each metadata management node is respectively connected with distributed system High Availabitity component;
In the case where current distributed system High Availabitity component corresponding current meta data management node delay machine, it will send Storage request to the current meta data management node is transferred on another metadata management node;
The multiple metadata management node is established with the metadata storage pool and is communicated to connect, in order to will it is described to Storing data is stored to the metadata storage pool.
Preferably, it is described using multiple metadata management nodes receive and handle data to be stored storage request include:
The storage that the data to be stored that client is sent is received and handled using multiple metadata management nodes is requested;It is described Client and the multiple metadata management node, which are established, to be communicated to connect;The multiple metadata management node is to the client Multiple virtual ip address are provided.
Preferably, it includes: described that the multiple metadata management node, which provides multiple virtual ip address to the client, Client sends the storage request of storing data to the virtual ip address of the metadata management node;In the virtual ip address In the case where corresponding metadata management node delay machine, the distributed system High Availabitity component shifts the virtual ip address Onto another metadata management node.
Preferably, the metadata storage pool be distributed storage pond, the multiple metadata management node with it is described Metadata cluster in metadata storage pool keeps communication.
HDFS storage system provided by the present invention, including multiple metadata management nodes, and each metadata management section Point is respectively connected with distributed system High Availabitity component, and each distributed system High Availabitity component connects with metadata storage pool It connects.Wherein, the metadata management node is used to receive and handle the data storage request that the client is sent;In current member In the case that data management node breaks down, the distributed system High Availabitity group that is connect with the current meta data administrative unit Storage request is switched to another metadata management node and handled by part;Between the multiple metadata management node mutually It is independent, it does not need progress data and synchronizes, be jointly processed by same part data;In a certain metadata management nodes break down, institute It states distributed system High Availabitity component to request the storage, is sent on another metadata management node and is handled, from And ensure that the service of storage system is not interrupted, it ensure that the whole High Availabitity of service, and ensure that in service handoff procedure Data consistency, data to be stored will not be lost in handoff procedure.
Detailed description of the invention
It, below will be to embodiment or existing for the clearer technical solution for illustrating the embodiment of the present invention or the prior art Attached drawing needed in technical description is briefly described, it should be apparent that, the accompanying drawings in the following description is only this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of structural block diagram of HDFS storage system provided by the embodiment of the present invention;
Fig. 2 is a kind of flow chart of specific embodiment of date storage method provided by the embodiment of the present invention.
Specific embodiment
Core of the invention is to provide a kind of HDFS storage system, ensure that the data consistency in service handoff procedure, Data to be stored will not be lost in handoff procedure.The present invention also provides a kind of date storage methods, have above-described embodiment.
The main NameNode of existing HDFS storage system carries out the synchronization of metadata using the mode for reading image file, Main NameNode records the operation that current system is done by writing journal file, and writes log information by certain time rule In image file, when finding NameNode switching, reading image file is removed from NameNode meeting active, to obtain master The various states of NameNode, to reach the handoff procedure of data.If journal file is noted down before writing mirror image, in service It is disconnected, it will cause the loss of data or the inconsistence problems of data.
In order to solve the disadvantage that in the prior art, the present invention provides a kind of HDFS storage systems, have multiple metadata Management node, and when so that a certain metadata node is broken down using High Availabitity component CTDB, the storage of client can be asked It asks generation into another metadata management node, ensure that the high availability of storage system, and ensure that service handoff procedure The consistency of middle data.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Referring to FIG. 1, Fig. 1 is a kind of structural block diagram of HDFS storage system provided by the embodiment of the present invention;This implementation Provided HDFS storage system includes: multiple metadata management nodes, the distribution connecting with each metadata management node System High Availabitity component, metadata storage pool;Wherein, the metadata management node is for receiving and handling data to be stored Storage request;The distributed system High Availabitity component, in the corresponding current member of current distributed system High Availabitity component In the case where data management node delay machine, the storage for being sent to current meta data management node request is transferred to another member On data management node;The metadata storage pool, for storing the data to be stored, the multiple metadata management node Communication linkage is established with the metadata storage pool.
There can be multiple metadata management nodes externally to provide service in the present embodiment, such as three metadata management sections Point, four metadata management nodes etc..The multiple metadata management node is jointly processed by with a metadata, is solved existing Storage system in the single metadata management node problem that causes load pressure excessive, considerably increase HDFS storage system Working efficiency.
It in the present embodiment, further include client in the HDFS storage system, the client and the multiple first number It establishes and communicates to connect according to management node;The multiple metadata management node provides multiple virtual ip address to the client. The client sends storage request, the metadata management corresponding to the virtual ip address to the multiple virtual ip address In the case where nodes break down, the virtual ip address is transferred to another by the distributed system High Availabitity component CTDB On metadata management node, continue to service.
The metadata management node is used to receive and handle the storage request of data to be stored;The metadata node is also For safeguarding the file directory tree of entire file system and the corresponding relationship of maintenance documentation and data block block list.It is described The unified maintenance of multiple metadata management nodes and shared a metadata.
The metadata storage pool is distributed storage pond, and the multiple metadata management node is deposited with the metadata Metadata cluster in reservoir keeps communication.
HDFS storage system provided by the present embodiment further includes the back end DataNode connecting with the metadata; The back end is used to be stored according to the scheduling of client or metadata management node and retrieves data, and every preset time It is spaced the list that the back end institute memory block block is sent to the metadata management node.
In the present embodiment, asking for distributed High Availabitity is solved using the distributed system High Availabitity component CTDB Topic is changed the High Availabitity application method in traditional storage system, is made each by the distributed High Availabitity component CTDB Unified in metadata management node to safeguard a metadata, each metadata management node works independently from each other, and shares with a Data, when some metadata management node in the multiple metadata management node breaks down, the distributed system System High Availabitity component CTDB can restart service, fault metadata is managed according to the cluster state at current time Storage request on node switches on other metadata management nodes, guarantees that the integrity service of the HDFS storage system will not It interrupts, ensure that the whole High Availabitity of service, and guarantee to service the data consistency in handoff procedure.
And HDFS storage system provided by the embodiment of the present invention can have multiple metadata management sections in the same time Point provides service to the client simultaneously, solves the single metadata management node load pressure of storage system in the prior art Excessive problem improves storage system to the efficiency of data processing.
Referring to FIG. 2, Fig. 2 is a kind of process of specific embodiment of date storage method provided by the embodiment of the present invention Figure;Specific steps are as follows:
Step S201: the storage that data to be stored is received and handled using multiple metadata management nodes is requested;Wherein, often A metadata management node is respectively connected with distributed system High Availabitity component;
Step S202: the current distributed system High Availabitity component corresponding current meta data management node delay machine the case where Under, the storage for being sent to current meta data management node request is transferred on another metadata management node;
Step S203: the multiple metadata management node is established with the metadata storage pool and is communicated to connect, so as to It stores in by the data to be stored to the metadata storage pool.
Date storage method provided by the present embodiment is for realizing HDFS storage system above-mentioned, therefore data storage side The embodiment part of the visible HDFS storage system hereinbefore of specific embodiment in method, details are not described herein.
A kind of HDFS storage system provided by the present invention and date storage method are described in detail above.This Apply that a specific example illustrates the principle and implementation of the invention in text, the explanation of above example is only intended to It facilitates the understanding of the method and its core concept of the invention.It should be pointed out that for those skilled in the art, Without departing from the principles of the invention, can be with several improvements and modifications are made to the present invention, these improvement and modification are also fallen Enter in the protection scope of the claims in the present invention.

Claims (10)

1. a kind of HDFS storage system characterized by comprising
Multiple metadata management nodes, the distributed system High Availabitity component being connect with each metadata management node, metadata Storage pool;
Wherein, the metadata management node is used to receive and handle the storage request of data to be stored;
The distributed system High Availabitity component, in the corresponding current meta data pipe of current distributed system High Availabitity component In the case where managing node delay machine, the storage for being sent to current meta data management node request is transferred to another metadata pipe It manages on node;
The metadata storage pool, for storing the data to be stored, the multiple metadata management node with the member Data storage pool establishes communication linkage.
2. HDFS storage system as described in claim 1, which is characterized in that further include: client, the client with it is described Multiple metadata management nodes establish communication connection;The multiple metadata management node provides multiple virtual to the client IP address.
3. HDFS storage system as claimed in claim 2, which is characterized in that the client is to the metadata management node Virtual ip address send storing data storage request;In the corresponding metadata management node delay machine of the virtual ip address In the case of, the virtual ip address is transferred on another metadata management node by the distributed system High Availabitity component.
4. HDFS storage system as claimed in claim 3, which is characterized in that the multiple metadata management node is specifically used In: it receives and handles the storage request of data to be stored transmitted by the client, safeguard the file mesh of entire file system The corresponding relationship of record tree and maintenance documentation and data block block list.
5. HDFS storage system as described in claim 1, which is characterized in that the metadata storage pool is distributed storage Pond, the multiple metadata management node are communicated with the metadata cluster holding in the metadata storage pool.
6. HDFS storage system as described in claim 1, which is characterized in that further include: it is connect with the metadata storage pool Back end, for according to the scheduling of the client or the metadata management node store and retrieve data, and every Prefixed time interval sends the list of the back end institute memory block block to the metadata management node.
7. a kind of date storage method, which is characterized in that be applied to HDFS storage system, comprising:
The storage that data to be stored is received and handled using multiple metadata management nodes is requested;
Wherein, each metadata management node is respectively connected with distributed system High Availabitity component;
In the case where current distributed system High Availabitity component corresponding current meta data management node delay machine, by being sent to The storage request for stating current meta data management node is transferred on another metadata management node;
The multiple metadata management node with the metadata storage pool establish communicate to connect, in order to by described wait store Data are stored to the metadata storage pool.
8. date storage method as claimed in claim 7, which is characterized in that described to be received using multiple metadata management nodes Storage with processing data to be stored is requested
The storage that the data to be stored that client is sent is received and handled using multiple metadata management nodes is requested;
The client and the multiple metadata management node, which are established, to be communicated to connect;The multiple metadata management node is to institute It states client and multiple virtual ip address is provided.
9. date storage method as claimed in claim 8, which is characterized in that the multiple metadata management node is to the visitor Family end provides multiple virtual ip address
The client sends the storage request of storing data to the virtual ip address of the metadata management node;In the void In the case where the quasi- corresponding metadata management node delay machine of IP address, the distributed system High Availabitity component is by the virtual IP address Address is transferred on another metadata management node.
10. date storage method as described in claim 1, which is characterized in that the metadata storage pool is distributed storage Pond, the multiple metadata management node are communicated with the metadata cluster holding in the metadata storage pool.
CN201810643546.7A 2018-06-21 2018-06-21 A kind of HDFS storage system and date storage method Pending CN108984635A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810643546.7A CN108984635A (en) 2018-06-21 2018-06-21 A kind of HDFS storage system and date storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810643546.7A CN108984635A (en) 2018-06-21 2018-06-21 A kind of HDFS storage system and date storage method

Publications (1)

Publication Number Publication Date
CN108984635A true CN108984635A (en) 2018-12-11

Family

ID=64541664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810643546.7A Pending CN108984635A (en) 2018-06-21 2018-06-21 A kind of HDFS storage system and date storage method

Country Status (1)

Country Link
CN (1) CN108984635A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143027A (en) * 2019-12-06 2020-05-12 北京浪潮数据技术有限公司 Cloud platform management method, system, equipment and computer readable storage medium
CN111338647A (en) * 2018-12-18 2020-06-26 杭州海康威视数字技术股份有限公司 Big data cluster management method and device
CN113824812A (en) * 2021-08-27 2021-12-21 济南浪潮数据技术有限公司 Method, device and storage medium for HDFS service to acquire service node IP

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104994168A (en) * 2015-07-14 2015-10-21 苏州科达科技股份有限公司 distributed storage method and distributed storage system
CN107181608A (en) * 2016-03-11 2017-09-19 阿里巴巴集团控股有限公司 A kind of method and operation management system for recovering service and performance boost
CN107920131A (en) * 2017-12-08 2018-04-17 郑州云海信息技术有限公司 A kind of metadata management method and device of HDFS storage systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104994168A (en) * 2015-07-14 2015-10-21 苏州科达科技股份有限公司 distributed storage method and distributed storage system
CN107181608A (en) * 2016-03-11 2017-09-19 阿里巴巴集团控股有限公司 A kind of method and operation management system for recovering service and performance boost
CN107920131A (en) * 2017-12-08 2018-04-17 郑州云海信息技术有限公司 A kind of metadata management method and device of HDFS storage systems

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MINGFEI10: "分布式高可用CTDB方案", 《CHINAUNIX》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111338647A (en) * 2018-12-18 2020-06-26 杭州海康威视数字技术股份有限公司 Big data cluster management method and device
CN111338647B (en) * 2018-12-18 2023-09-12 杭州海康威视数字技术股份有限公司 Big data cluster management method and device
CN111143027A (en) * 2019-12-06 2020-05-12 北京浪潮数据技术有限公司 Cloud platform management method, system, equipment and computer readable storage medium
CN113824812A (en) * 2021-08-27 2021-12-21 济南浪潮数据技术有限公司 Method, device and storage medium for HDFS service to acquire service node IP
CN113824812B (en) * 2021-08-27 2023-02-28 济南浪潮数据技术有限公司 Method, device and storage medium for HDFS service to acquire service node IP

Similar Documents

Publication Publication Date Title
CN107590182B (en) Distributed log collection method
EP3039549B1 (en) Distributed file system using consensus nodes
CN104679772B (en) Method, apparatus, equipment and the system of file are deleted in Distributed Data Warehouse
CN108984635A (en) A kind of HDFS storage system and date storage method
CN100375093C (en) Processing of multiroute processing element data
CN103905537A (en) System for managing industry real-time data storage in distributed environment
CN108964948A (en) Principal and subordinate's service system, host node fault recovery method and device
CN104320401A (en) Big data storage and access system and method based on distributed file system
US11068499B2 (en) Method, device, and system for peer-to-peer data replication and method, device, and system for master node switching
CN103237046A (en) Distributed file system supporting mixed cloud storage application and realization method thereof
CN105095317A (en) Distributive database service management system
CN102143237A (en) Grid-based Internet content delivery method and system
CN108881512A (en) Virtual IP address equilibrium assignment method, apparatus, equipment and the medium of CTDB
CN109446178A (en) A kind of Hadoop object storage high availability method, system, device and readable storage medium storing program for executing
CN102546776A (en) Method for realizing off-line reading files in SAN (Storage Area Networking) shared file system
CN109992373A (en) Resource regulating method, approaches to IM and device and task deployment system
CN103491192A (en) Namenode switching method and system of distributed system
CN109871365A (en) A kind of distributed file system
CN107682411A (en) A kind of extensive SDN controllers cluster and network system
CN105262640A (en) System and method for improving reliability of cloud platform server, and disposition framework of system
CN102831038B (en) The disaster recovery method and ENUM-DNS of ENUM-DNS
CN102710438A (en) Node management method, device and system
CN101621535B (en) Network communication method and device of real-time monitoring system
CN111475537B (en) Global data synchronization system based on pulsar
CN105302817B (en) Distributed file system management method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181211

RJ01 Rejection of invention patent application after publication