CN103023995B - A kind of distributed cloud based on Hadoop stores automatic classification data management system - Google Patents
A kind of distributed cloud based on Hadoop stores automatic classification data management system Download PDFInfo
- Publication number
- CN103023995B CN103023995B CN201210499413.XA CN201210499413A CN103023995B CN 103023995 B CN103023995 B CN 103023995B CN 201210499413 A CN201210499413 A CN 201210499413A CN 103023995 B CN103023995 B CN 103023995B
- Authority
- CN
- China
- Prior art keywords
- data
- message
- module
- staging
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013523 data management Methods 0.000 title claims abstract description 16
- 230000005540 biological transmission Effects 0.000 claims abstract 2
- 238000000034 method Methods 0.000 claims description 14
- 238000004458 analytical method Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 9
- 238000009826 distribution Methods 0.000 claims description 6
- 230000002688 persistence Effects 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000003860 storage Methods 0.000 abstract description 13
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a kind of distributed cloud based on Hadoop and store automatic classification data management system, comprise node server and central server, described node server acquisition server status message and data temperature allocation message, and the data staging administration module gathered message being sent to described central server.The present invention passes through at central server deploy data staging administration module, the unified message received from back end in HDFS cluster and namenode transmission, form the outer data staging instruction of band after treatment, and send to the namenode in HDFS cluster to be responsible for final data block to distribute again, thus the automaticdata differentiated control realized based on the distributed cloud storage system of Hadoop, improve the utilance of storage resources.
Description
Technical field
The invention belongs to field of computer technology, be specifically related to a kind of distributed cloud based on Hadoop and store automatic classification data management system.
Background technology
Along with cloud computing technology high speed development at home and abroad, the cloud memory technology based on Hadoop distributed file system (HDFS) is widely used.By to reuse and the mode of newly-increased PC server sets up HDFS cluster in large scale, utilize the local disk on PC server provide high-performance, high performance-price ratio, can the distributed cloud stores service of resilient expansion.
Due to the otherness of the server node of composition HDFS cluster, cluster interior joint probably has different memory properties and memory capacity.Therefore, how to take into full account internodal otherness, the distribution of optimization storage resources builds the distributed cloud storage system problem demanding prompt solution based on Hadoop.
Summary of the invention
In order to overcome above-mentioned the deficiencies in the prior art, the invention provides a kind of distributed cloud based on Hadoop and storing automatic classification data management system, realizing automaticdata differentiated control, improve utilization ratio of storage resources.
In order to realize foregoing invention object, the present invention takes following technical scheme:
A kind of distributed cloud based on Hadoop is provided to store automatic classification data management system, described system comprises node server and central server, described node server acquisition server status message and data temperature allocation message, and the data staging administration module gathered message being sent to described central server.
Described node server comprises server info acquisition module, data temperature acquisition module and data staging proxy module; Described server info acquisition module and data temperature acquisition module acquisition server status message and data temperature allocation message respectively, and respectively the message of collection is sent to described data staging administration module.
Described server info acquisition module is deployed on the back end of HDFS cluster, described data temperature acquisition module and data staging proxy module are deployed on the namenode of described HDFS cluster, and described data staging administration module is deployed on described central server.
Described server state message comprises hardware configuration and the running state information of back end, and described data temperature allocation message comprises data temperature message and Data distribution8 message.
Described message includes message header and message body, and described message header comprises central server title, IP address, node ID, encryption method, School Affairs timestamp; Described message body comprises specifies the node hardware configuration after encryption method encryption, node running status, data temperature and Data distribution8 through message header.
The outer data staging order of described data staging proxy module receiving belt, resolve the outer data staging order acquisition of described band and need Mobile data block message, and institute's obtaining information is informed namenode, described namenode copies target data block to destination node while receiving information, and the data block of deleting on source data node, after needing Mobile data block message all to obtain to terminate, namenode sends success message to data staging proxy module, and described data staging proxy module sends acknowledge message to data staging administration module.
Described target data block ID, source data node ID and the destination node ID needing Mobile data block message to comprise needs movement.
Described central server comprises message reception module, information persistence module, information cache module, data staging administration module, instruction processing module, instruction sending module, analysis engine Module nodes and Registering modules.
Described message reception module receives the server state message and data temperature allocation message that send respectively from back end and namenode, resolves and be sent to described data staging administration module to message;
Described data staging administration module generates the outer data staging order of band, and periodically sends to namenode;
Described information cache module receives the message from message reception module, effective information is formed with stored in information cache district through process, administer and maintain information cache district content simultaneously, and data staging administration module being sent to after information classification, described information cache district content comprises the establishment of information, renewal and deletion;
Information cache district exceeds capacity or time counter terminates or service stopping time, described information persistence module is written to disk to the message after information cache module process, and data staging administration module reads information from disk, send into information cache district;
The server state message that described analysis engine module information cache module sends and data temperature allocation message, form the distributed polar plot of data temperature, and the state updating of service data Temperature Distribution polar plot; Form the outer data staging instruction of band according to the distributed polar plot of data temperature, send to instruction processing module;
The content that described instruction processing module exports according to analysis engine module, is processed to form the instruction encoding that can be sent to specific node by instruction sending module;
Described instruction sending module receives the outer data staging order of the band generated from described data staging administration module, and sends instruction according to received instruction to destination node;
Described Node registry module receives the log-on message from information cache district, and the information of registration or renewal specified node.
Compared with prior art, beneficial effect of the present invention is:
1, be different from other Bedding storage methods by Data distribution8 on different storage mediums (internal memory, solid magnetic disc, disk, SAN network, tape), the distributed cloud based on Hadoop provided by the invention stores automatic classification data management system and utilizes the local disk of X86 server (SATA interface, scsi interface) to store the situation of data; Compared by static informations such as the contrast of node server configuration information disk size, quantity, interface type, read-write speed, in conjunction with operation condition of server message multidate informations such as () disk size, CPU, the network bandwidths and data temperature allocation message breath (the accessed number of times of data, time, frequency), realize depositing different temperatures data on the server of different performance, the optimization reaching server stores resources uses.
2, not the reasonable layout considering data block at the beginning of data store, the present invention uses the data block Distribution Strategy again of off-line, namely by the outer data hierarchy instruction of transmit band, in the load of whole HDFS cluster, the lightest or most suitable time carries out the movement of data block, thus more reasonably calculate the temperature information of data, reduce storing the normal impact used simultaneously.
3, the Hadoop distributed file system of the present invention and indication is loose coupling state, only need to modify to HDFS at two places in design process, can be transplanted in other cloud storage platform adopting distributed file system (metadata centralized management) very soon, data staging storage scheme is provided, there is stronger portability.
Accompanying drawing explanation
Fig. 1 is the data stewardship program figure of the distributed cloud storage based on Hadoop;
Fig. 2 is the distributed cloud storage automatic classification data management system logical architecture schematic diagram based on Hadoop;
Fig. 3 is central server comprising modules schematic diagram.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
As Fig. 1 and Fig. 2, a kind of distributed cloud based on Hadoop is provided to store automatic classification data management system, described system comprises node server and central server, described node server acquisition server status message and data temperature allocation message, and the data staging administration module gathered message being sent to described central server.
Described node server comprises server info acquisition module, data temperature acquisition module and data staging proxy module;
Described server info acquisition module, by collecting the hardware configuration information (comprising the static information of the configuring conditions such as CPU, internal memory, hard disk, network) of back end, sends to central server after generation server status message; Afterwards, this service, by the running status message (comprising the multidate information of the service conditions such as CPU, internal memory, hard disk, network) of periodically image data node, forms server state message by analysis and sends to central server after process.
Described data temperature acquisition module by number of times accessed for record data and frequency (by the direct mode of namenode source code in amendment HDFS cluster, or in distributed cloud storage system client read request message, embed the indirect mode of more New count), calculate the temperature information of each data in storage (existing with document form); By the metadata information on resolve Name node, obtain the distributed intelligence of file; Periodically renewal ground information generated data Temperature Distribution message is had to send to central server by above-mentioned.
Described server info acquisition module is deployed on the back end of HDFS cluster, described data temperature acquisition module and data staging proxy module are deployed on the namenode of described HDFS cluster, and described data staging administration module is deployed on described central server.
Described server state message comprises hardware configuration and the running state information of back end, and described data temperature allocation message comprises data temperature message and Data distribution8 message.
Described message includes message header and message body, and described message header comprises central server title, IP address, node ID, encryption method, School Affairs timestamp; Described message body comprises specifies the node hardware configuration after encryption method encryption, node running status, data temperature and Data distribution8 through message header.
The outer data staging order of described data staging proxy module receiving belt, resolve the outer data staging order acquisition of described band and need Mobile data block message, and institute's obtaining information is informed namenode, described namenode copies target data block to destination node while receiving information, and the data block of deleting on source data node, after needing Mobile data block message all to obtain to terminate, namenode sends success message to data staging proxy module, and described data staging proxy module sends acknowledge message to data staging administration module.
Described target data block ID, source data node ID and the destination node ID needing Mobile data block message to comprise needs movement.
As Fig. 3, described central server comprises message reception module, information persistence module, information cache module, data staging administration module, instruction processing module, instruction sending module, analysis engine Module nodes and Registering modules.
Described message reception module receives the server state message and data temperature allocation message that send respectively from back end and namenode, resolves and be sent to described data staging administration module to message;
Described data staging administration module generates the outer data staging order of band, and periodically sends to namenode;
Described information cache module receives the message from message reception module, effective information is formed with stored in information cache district through process, administer and maintain information cache district content simultaneously, and data staging administration module being sent to after information classification, described information cache district content comprises the establishment of information, renewal and deletion;
Information cache district exceeds capacity or time counter terminates or service stopping time, described information persistence module is written to disk to the message after information cache module process, and data staging administration module reads information from disk, send into information cache district;
The server state message that described analysis engine module information cache module sends and data temperature allocation message, form the distributed polar plot of data temperature, and the state updating of service data Temperature Distribution polar plot; Form the outer data staging instruction of band according to the distributed polar plot of data temperature, send to instruction processing module;
The content that described instruction processing module exports according to analysis engine module, is processed to form the instruction encoding that can be sent to specific node by instruction sending module;
Described instruction sending module receives the outer data staging order of the band generated from described data staging administration module, and sends instruction according to received instruction to destination node;
Described Node registry module receives the log-on message from information cache district, and the information of registration or renewal specified node (forming the ID of node, for operations such as analysis engine completion status information and data temperature distributed intelligence parsing, node identification, generation instructions).
Finally should be noted that: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit, although with reference to above-described embodiment to invention has been detailed description, those of ordinary skill in the field are to be understood that: still can modify to the specific embodiment of the present invention or equivalent replacement, and not departing from any amendment of spirit and scope of the invention or equivalent replacement, it all should be encompassed in the middle of right of the present invention.
Claims (7)
1. one kind stores automatic classification data management system based on the distributed cloud of Hadoop, it is characterized in that: described system comprises node server and central server, described node server acquisition server status message and data temperature allocation message, and the data staging administration module gathered message being sent to described central server;
Described node server comprises server info acquisition module, data temperature acquisition module and data staging proxy module; Described server info acquisition module and data temperature acquisition module acquisition server status message and data temperature allocation message respectively, and respectively the message of collection is sent to described data staging administration module;
The outer data staging order of described data staging proxy module receiving belt, resolve the outer data staging order acquisition of described band and need Mobile data block message, and institute's obtaining information is informed namenode, described namenode copies target data block to destination node while receiving information, and the data block of deleting on source data node, after needing Mobile data block message all to obtain to terminate, namenode sends success message to data staging proxy module, and described data staging proxy module sends acknowledge message to data staging administration module.
2. the distributed cloud based on Hadoop according to claim 1 stores automatic classification data management system, it is characterized in that: described server info acquisition module is deployed on the back end of HDFS cluster, described data temperature acquisition module and data staging proxy module are deployed on the namenode of described HDFS cluster, and described data staging administration module is deployed on described central server.
3. the distributed cloud based on Hadoop according to claim 1 stores automatic classification data management system, it is characterized in that: described server state message comprises hardware configuration and the running state information of back end, described data temperature allocation message comprises data temperature message and Data distribution8 message.
4. the distributed cloud based on Hadoop according to claim 3 stores automatic classification data management system, it is characterized in that: described server state message and data temperature allocation message include message header and message body, described message header comprises central server title, IP address, node ID, encryption method, School Affairs timestamp; Described message body comprises specifies the node hardware configuration after encryption method encryption, node running status, data temperature and Data distribution8 through message header.
5. the distributed cloud based on Hadoop according to claim 1 stores automatic classification data management system, it is characterized in that: described target data block ID, source data node ID and the destination node ID needing Mobile data block message to comprise needs movement.
6. the distributed cloud based on Hadoop according to claim 1 and 2 stores automatic classification data management system, it is characterized in that: described central server comprises message reception module, information persistence module, information cache module, data staging administration module, instruction processing module, instruction sending module, analysis engine module and Node registry module.
7. the distributed cloud based on Hadoop according to claim 6 stores automatic classification data management system, it is characterized in that: described message reception module receives the server state message and data temperature allocation message that send respectively from back end and namenode, resolves and be sent to described data staging administration module to message;
Described data staging administration module generates the outer data staging order of band, and periodically sends to namenode;
Described information cache module receives the message from message reception module, effective information is formed with stored in information cache district through process, administer and maintain information cache district content simultaneously, and data staging administration module being sent to after information classification, described information cache district content comprises the establishment of information, renewal and deletion;
When information cache district exceed capacity time counter terminates or service stopping time, described information persistence module is written to disk to the message after information cache module process, and data staging administration module reads information from disk, send into information cache district;
Described analysis engine module receives server state message and the data temperature allocation message of information cache module transmission, forms the distributed polar plot of data temperature, and the state updating of service data Temperature Distribution polar plot; Form the outer data staging instruction of band according to the distributed polar plot of data temperature, send to instruction processing module;
The content that described instruction processing module exports according to analysis engine module, is processed to form the instruction encoding that can be sent to specific node by instruction sending module;
Described instruction sending module receives the outer data staging order of the band generated from described data staging administration module, and sends instruction according to received instruction to destination node;
Described Node registry module receives the log-on message from information cache district, and the information of registration or renewal specified node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210499413.XA CN103023995B (en) | 2012-11-29 | 2012-11-29 | A kind of distributed cloud based on Hadoop stores automatic classification data management system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210499413.XA CN103023995B (en) | 2012-11-29 | 2012-11-29 | A kind of distributed cloud based on Hadoop stores automatic classification data management system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103023995A CN103023995A (en) | 2013-04-03 |
CN103023995B true CN103023995B (en) | 2015-09-09 |
Family
ID=47972119
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210499413.XA Active CN103023995B (en) | 2012-11-29 | 2012-11-29 | A kind of distributed cloud based on Hadoop stores automatic classification data management system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103023995B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103336670B (en) * | 2013-06-04 | 2016-11-23 | 华为技术有限公司 | A kind of method and apparatus data block being distributed automatically based on data temperature |
CN103780622B (en) * | 2014-01-24 | 2016-09-28 | 华中科技大学 | A kind of data classification encryption method of facing cloud storage |
CN104021503A (en) * | 2014-05-08 | 2014-09-03 | 国家电网公司 | Relaying cloud establishing method based on virtualized Hadoop cluster |
CN104135516B (en) * | 2014-07-29 | 2017-04-05 | 浪潮软件集团有限公司 | Distributed cloud storage method based on industry data acquisition |
CN104462577B (en) * | 2014-12-29 | 2018-04-13 | 北京奇艺世纪科技有限公司 | A kind of date storage method and device |
CN105930102B (en) * | 2016-05-06 | 2018-12-21 | 歌尔股份有限公司 | A kind of micro electronmechanical product test data synchronous safety transfer method and system |
CN106470242B (en) * | 2016-09-07 | 2019-07-19 | 东南大学 | A kind of large scale scale heterogeneous clustered node fast quantification stage division of cloud data center |
CN108600281B (en) * | 2017-03-16 | 2021-12-31 | 杭州海康威视数字技术股份有限公司 | Cloud storage system, media data storage method and system |
CN107135274A (en) * | 2017-06-20 | 2017-09-05 | 郑州云海信息技术有限公司 | The memory management method and device of a kind of distributed cluster system |
CN109361560A (en) * | 2018-01-24 | 2019-02-19 | 广州Tcl智能家居科技有限公司 | A kind of clustered node Communication processing method, system, storage medium and server |
CN111177486B (en) * | 2019-12-19 | 2020-09-08 | 四川蜀天梦图数据科技有限公司 | Message transmission method and device in distributed graph calculation process |
CN113407620B (en) * | 2020-03-17 | 2023-04-21 | 北京信息科技大学 | Data block placement method and system based on heterogeneous Hadoop cluster environment |
CN115190168B (en) * | 2022-07-08 | 2023-08-04 | 苏州浪潮智能科技有限公司 | Edge server management system and server cluster |
CN118193503B (en) * | 2024-05-17 | 2024-07-12 | 维沃多科技(北京)有限公司 | Hierarchical management system for server center data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101957863A (en) * | 2010-10-14 | 2011-01-26 | 广州从兴电子开发有限公司 | Data parallel processing method, device and system |
CN102263822A (en) * | 2011-07-22 | 2011-11-30 | 北京星网锐捷网络技术有限公司 | Distributed cache control method, system and device |
CN102638566A (en) * | 2012-02-28 | 2012-08-15 | 山东大学 | BLOG system running method based on cloud storage |
CN102646121A (en) * | 2012-02-23 | 2012-08-22 | 武汉大学 | Two-stage storage method combined with RDBMS (relational database management system) and Hadoop cloud storage |
-
2012
- 2012-11-29 CN CN201210499413.XA patent/CN103023995B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101957863A (en) * | 2010-10-14 | 2011-01-26 | 广州从兴电子开发有限公司 | Data parallel processing method, device and system |
CN102263822A (en) * | 2011-07-22 | 2011-11-30 | 北京星网锐捷网络技术有限公司 | Distributed cache control method, system and device |
CN102646121A (en) * | 2012-02-23 | 2012-08-22 | 武汉大学 | Two-stage storage method combined with RDBMS (relational database management system) and Hadoop cloud storage |
CN102638566A (en) * | 2012-02-28 | 2012-08-15 | 山东大学 | BLOG system running method based on cloud storage |
Non-Patent Citations (1)
Title |
---|
王峰,雷葆华.Hadoop分布式文件***的模型分析.《研究与开发》.2010,(第12期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN103023995A (en) | 2013-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103023995B (en) | A kind of distributed cloud based on Hadoop stores automatic classification data management system | |
US10387673B2 (en) | Fully managed account level blob data encryption in a distributed storage environment | |
CA2935215C (en) | Hierarchical chunking of objects in a distributed storage system | |
CN105335513B (en) | A kind of distributed file system and file memory method | |
CN103095806B (en) | A kind of load balancing management system of the real-time dataBase system towards bulk power grid | |
CN103294710B (en) | A kind of data access method and device | |
US20190007208A1 (en) | Encrypting existing live unencrypted data using age-based garbage collection | |
CA2871313C (en) | Method and system for managing power grid data | |
US20190370170A1 (en) | Garbage collection implementing erasure coding | |
CN103166991B (en) | Cross nodal point storage implementation method and device based on P2P and cloud storage | |
TW202113580A (en) | Log-structured storage systems | |
CN102411637A (en) | Metadata management method of distributed file system | |
CN108881942B (en) | Super-fusion normal state recorded broadcast system based on distributed object storage | |
CN103763368B (en) | A kind of method of data synchronization across data center | |
US20130297969A1 (en) | File management method and apparatus for hybrid storage system | |
CN103888499A (en) | Distributed object processing method and system | |
CN103533058A (en) | HDFS (Hadoop distributed file system)/Hadoop storage cluster-oriented resource monitoring system and HDFS/Hadoop storage cluster-oriented resource monitoring method | |
CN103595799A (en) | Method for achieving distributed shared data bank | |
US8214343B2 (en) | Purposing persistent data through hardware metadata tagging | |
CN105407044B (en) | A kind of implementation method of the cloud storage gateway system based on NFS | |
CN106020713A (en) | File storage method based on buffer area | |
CN108205468A (en) | A kind of distributed system and implementation method towards massive video image | |
WO2023103190A1 (en) | Multi-level linkage transparent sample model sharing apparatus for artificial intelligence platform | |
CN102904917A (en) | Mass image processing system and method thereof | |
CN114201446A (en) | Method and system for realizing HDFS (Hadoop distributed File System) remote storage mounting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |