CN111708738B - Method and system for realizing interaction of hadoop file system hdfs and object storage s3 data - Google Patents
Method and system for realizing interaction of hadoop file system hdfs and object storage s3 data Download PDFInfo
- Publication number
- CN111708738B CN111708738B CN202010482343.1A CN202010482343A CN111708738B CN 111708738 B CN111708738 B CN 111708738B CN 202010482343 A CN202010482343 A CN 202010482343A CN 111708738 B CN111708738 B CN 111708738B
- Authority
- CN
- China
- Prior art keywords
- hdfs
- object storage
- data
- hadoop
- file system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 14
- 230000003993 interaction Effects 0.000 title description 3
- 238000004364 calculation method Methods 0.000 claims description 10
- 230000004888 barrier function Effects 0.000 abstract description 4
- 230000000295 complement effect Effects 0.000 abstract description 2
- 238000002955 isolation Methods 0.000 abstract description 2
- 230000007547 defect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/164—File meta data generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a method and a system for realizing the mutual access of hdfs and object storage s3 data in a hadoop file system, wherein the method comprises the following steps: configuring a hadoop big data environment containing a file system hdfs and a distributed storage software ceph environment containing an object storage s3; both the hadoop big data environment and the distributed storage software ceph environment are communicated with the ceph-mon node through the naminode node; butting a file system hdfs through a naminode node, and butting an object storage s3 through a ceph-mon node; acquiring an external data access instruction; and performing data access between the corresponding file system hdfs and the object storage s3 according to the data access instruction. The invention breaks the isolation barrier of the mutual file reading access among different file systems, realizes the mutual access and the coexistence of the hdfs of the hadoop file system and the s3 data of the object storage, ensures that the complementary advantages of the hdfs and the s3 data of the object storage promote the development of big data together, and expands the application field range of the big data.
Description
Technical Field
The invention relates to the field of computer data interaction, in particular to a method and a system for realizing the mutual access of hdfs and object storage s3 data of a hadoop file system.
Background
Hadoop is an open source big data framework developed by the apache foundation and is a software platform for developing and operating large-scale data. The three core components of the method are a distributed file system hdfs, a job scheduling and cluster resource management framework yarn and a distributed operation programming framework MapReduce respectively. The method solves three core problems of the hadoop big data frame, how to solve mass data storage, how to solve the scheduling of computing resource tasks and how to solve the computing tasks of mass data. Wherein hdfs is a distributed file system specifically designed based on streaming data access and handling of large files that does not strictly adhere to the posix standard protocol. Due to the characteristics of high fault tolerance, high bandwidth and the like, the method is very suitable for being deployed on a large number of cheap hardware devices and is also very suitable for large-scale hadoop big data operation application.
The characteristics of hdfs are adopted to relax the complete compatibility of posix, so that the aim of streaming reading mass large files is fulfilled, the characteristics of large data scale, large file concurrency, large-scale node number and the like are supported through mobile calculation, and the position of a marker post in the field of large data calculation is ensured. But the disadvantage is also evident that it is not suitable for low latency (like ms level) data access in the first place, that is to say that the HDFS file system is suitable for large concurrent IO access, and the support for the application with high requirements for IOPS is not good enough. In addition, the read-write access support for a large number of small files is poor, so that similar picture calculation, a large number of small file calculation scenes are unsuitable.
The current mainstream distributed file system can make up for the defect of hdfs and has an object storage system s3, and the object storage uses a unique file storage mode, which is different from a file storage system and a block storage system. First, the underlying file interface is a rest style and underlying file distribution arrangement is based on a flattened structure of key-value pairs. Especially, the data organization mode of the flat mode can solve massive and large concurrent small file access, eliminates the dependence on metadata, can provide support of big iops, and is very suitable for the current mainstream big data age characteristics. With the rapid development of the internet, the data volume is exponentially increased, and the development of big data can be accelerated no matter whether the data mode, the data size, the structured data and the object storage s3 are used together and coexist in combination with the hdfs file system.
However, hdfs and s3 are two completely different styles and use completely different file access read interfaces, and there is a barrier for file read accesses from each other between such different file systems, so that there is a problem in the prior art and further improvement is needed.
Disclosure of Invention
The invention provides a method and a system for realizing the mutual access of hdfs and object storage s3 data of a hadoop file system in order to make up the defects of the prior art.
In order to achieve the above purpose, the specific technical scheme of the invention is as follows:
a method for realizing the mutual access of hdfs and object storage s3 data of a hadoop file system comprises the following steps:
configuring a hadoop big data environment containing a file system hdfs and a distributed storage software ceph environment containing an object storage s3; both the hadoop big data environment and the distributed storage software ceph environment are communicated with the ceph-mon node through the naminode node;
butting a file system hdfs through a naminode node, and butting an object storage s3 through a ceph-mon node;
acquiring an external data access instruction;
and performing data access between the corresponding file system hdfs and the object storage s3 according to the data access instruction.
Preferably, the acquiring the external data access instruction includes: when a client-side hadoop-client of the hadoop big data environment writes a big file, the client-side hadoop-client is calculated through a naminode node, then written into a datinode node, and then stored into an hdfs file system.
Preferably, the acquiring the external data access instruction includes: when a client-side hadoop-client of the hadoop big data environment writes a small file, invoking ceph-mon information after calculation through a naminode node, and then writing the file into an object storage through an s3 interface;
preferably, the acquiring the external data access instruction includes: when MapReduce is executed, the file calculates metadata information through the naminode, and copies the metadata information between the file system hdfs and the s3 object storage.
Further, when MapReduce is executed, the calculation result can be stored in hdfs or the object storage s3 according to the user-defined selection.
The invention also provides a system for realizing the mutual access of the hadoop file system hdfs and the object storage s3 data, which comprises the following steps: the hadoop big data environment containing the file system hdfs and the distributed storage software ceph environment containing the object store s3.
Both the hadoop big data environment and the distributed storage software ceph environment are communicated with the ceph-mon node through the naminode node; the file system hdfs is docked through the naminode node, and the object storage s3 is docked through the ceph-mon node.
The invention breaks the isolation barrier of the mutual file reading access among different file systems, realizes the mutual access and the coexistence of the hdfs of the hadoop file system and the s3 data of the object storage, ensures that the complementary advantages of the hdfs and the s3 data of the object storage promote the development of big data together, and expands the application field range of the big data.
Drawings
FIG. 1 is a flow chart of a method for implementing a hadoop file system hdfs and object storage s3 data interview according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a system for implementing mutual access between hdfs and s3 data of an object storage in a hadoop file system according to an embodiment of the present invention.
Detailed Description
In order that those of ordinary skill in the art will readily understand and practice the invention, embodiments of the invention will be further described with reference to the drawings.
Referring to fig. 1, the invention provides a method for realizing the mutual access of hdfs and object storage s3 data of a hadoop file system, which comprises the following steps:
s11, configuring a hadoop big data environment containing a file system hdfs and a distributed storage software ceph environment containing an object storage S3; both the hadoop big data environment and the distributed storage software ceph environment are communicated with the ceph-mon node through the naminode node;
s12, butting a file system hdfs through a naminode node, and butting an object storage S3 through a ceph-mon node;
s13, acquiring an external data access instruction;
s14, performing data access between the corresponding file system hdfs and the object storage S3 according to the data access instruction.
The step S13 of obtaining the external data access instruction includes the following steps:
(1) When a client-side hadoop-client of the hadoop big data environment writes a big file, calculating through a naminode node, writing the calculated data into a datinode node, and then storing the data into an hdfs file system;
(2) When a client-side hadoop-client of the hadoop big data environment writes a small file, invoking ceph-mon information after calculation through a naminode node, and then writing the file into an object storage through an s3 interface;
(3) When MapReduce is executed, the file calculates metadata information through the naminode, and copies the metadata information between the file system hdfs and the s3 object storage. The calculation result can be stored in the hdfs or object storage s3 service according to the user-defined selection.
Referring to fig. 2, the present invention further provides a system for implementing the data interview between the hadoop file system hdfs and the object storage s3, including: the hadoop big data environment containing the file system hdfs and the distributed storage software ceph environment containing the object store s3.
Both the hadoop big data environment and the distributed storage software ceph environment are communicated with the ceph-mon node through the naminode node; the file system hdfs is docked through the naminode node, and the object storage s3 is docked through the ceph-mon node.
For the access system, when a client hadoop-client of a hadoop big data environment writes a large number of small files, the hadoop cluster records a block with 150 bits and then records the block into a memory of a metadata node no matter how big the data is. But when hundreds of millions of small files need to be written at the same time, each file needs to occupy one block, the metadata node now requires approximately 20G of space. This severely constrains hadoop cluster performance.
However, after the above method is adopted, when a large number of small files are written, the client side directly writes the files into the object storage S3. The key value pair-based storage mode of flattening object storage eliminates the dependence on metadata, and breaks through the barrier between the hdfs file system and the object storage system, so that a large number of small files stored before can be directly copied into the object storage, the space of the original hadoop cluster is released again, the files are not lost, and the win-win effect is achieved.
In the invention, a set of hadoop big data environments and a distributed storage software ceph environment are actually deployed, and an s3 interface support object storage service is provided. Because hadoop-AWS modules of hadoops provide support for AWS integration by default and the object storage s3 interface provided by ceph is compatible with AWS, communication of hadoop big data environment and distributed storage software ceph environment is possible.
By the technical scheme, the writing of the hadoop big data application file into the hdfs and the object storage can be realized, the hdfs file is transferred to the object storage, the data of the object storage data transferred to the hdfs are accessed mutually, and different file systems coexist and the files are accessed mutually, so that the requirements of different file sizes on different storage under the hadoop big data application and the application requirements of coexistence of high bandwidth and high io purposes are ensured, and the development of big data is accelerated in the big data application industry is enlarged.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Claims (1)
1. A method for realizing the mutual access of hdfs and object storage s3 data of a hadoop file system is characterized by comprising the following steps:
configuring a hadoop big data environment containing a file system hdfs and a distributed storage software ceph environment containing an object storage s3; both the hadoop big data environment and the distributed storage software ceph environment are communicated with the ceph-mon node through the naminode node;
butting a file system hdfs through a naminode node, and butting an object storage s3 through a ceph-mon node;
acquiring an external data access instruction;
performing data access between the corresponding file system hdfs and the object storage s3 according to the data access instruction;
the obtaining the external data access instruction includes:
when a client-side hadoop-client of the hadoop big data environment writes a big file, calculating through a naminode node, writing the calculated data into a datinode node, and then storing the data into an hdfs file system; the obtaining the external data access instruction includes:
when a client-side hadoop-client of the hadoop big data environment writes a small file, invoking ceph-mon information after calculation through a naminode node, and then writing the file into an object storage through an s3 interface;
the obtaining the external data access instruction includes:
when MapReduce is executed, the file calculates metadata information through the naminode, and copies the metadata information between the hdfs and the s3 object storage of the file system;
when MapReduce is executed, the calculation result can be stored in hdfs or the object storage s3 according to the user-defined selection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010482343.1A CN111708738B (en) | 2020-05-29 | 2020-05-29 | Method and system for realizing interaction of hadoop file system hdfs and object storage s3 data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010482343.1A CN111708738B (en) | 2020-05-29 | 2020-05-29 | Method and system for realizing interaction of hadoop file system hdfs and object storage s3 data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111708738A CN111708738A (en) | 2020-09-25 |
CN111708738B true CN111708738B (en) | 2023-11-03 |
Family
ID=72538444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010482343.1A Active CN111708738B (en) | 2020-05-29 | 2020-05-29 | Method and system for realizing interaction of hadoop file system hdfs and object storage s3 data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111708738B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11928228B2 (en) * | 2020-09-25 | 2024-03-12 | EMC IP Holding Company LLC | Facilitating an object protocol based access of data within a multiprotocol environment |
CN114610690A (en) * | 2020-12-04 | 2022-06-10 | 中兴通讯股份有限公司 | Data storage system and construction method thereof |
CN112965950A (en) * | 2021-03-09 | 2021-06-15 | 浪潮云信息技术股份公司 | Method for realizing storage of stream data write-in object |
CN113127420B (en) * | 2021-03-30 | 2023-03-14 | 山东英信计算机技术有限公司 | Metadata request processing method, device, equipment and medium |
CN114185490A (en) * | 2021-12-06 | 2022-03-15 | 深圳市瑞驰信息技术有限公司 | Method for realizing data exchange between glusterfs file system and object storage s3 |
CN114500485A (en) * | 2022-01-28 | 2022-05-13 | 北京沃东天骏信息技术有限公司 | Data processing method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103581332A (en) * | 2013-11-15 | 2014-02-12 | 武汉理工大学 | HDFS framework and pressure decomposition method for NameNodes in HDFS framework |
US9928203B1 (en) * | 2015-07-15 | 2018-03-27 | Western Digital | Object storage monitoring |
CN109033250A (en) * | 2018-07-06 | 2018-12-18 | 内蒙古大学 | A kind of high availability object storage method for supporting large data files access service |
CN109783438A (en) * | 2018-12-05 | 2019-05-21 | 南京华讯方舟通信设备有限公司 | Distributed NFS system and its construction method based on librados |
CN110287150A (en) * | 2019-05-16 | 2019-09-27 | 中国科学院信息工程研究所 | A kind of large-scale storage systems meta-data distribution formula management method and system |
CN110688674A (en) * | 2019-09-23 | 2020-01-14 | ***股份有限公司 | Access butt-joint device, system and method and device applying access butt-joint device |
CN110750458A (en) * | 2019-10-22 | 2020-02-04 | 恩亿科(北京)数据科技有限公司 | Big data platform testing method and device, readable storage medium and electronic equipment |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI461929B (en) * | 2011-12-09 | 2014-11-21 | Promise Tecnnology Inc | Cloud data storage system |
US11106625B2 (en) * | 2015-11-30 | 2021-08-31 | International Business Machines Corporation | Enabling a Hadoop file system with POSIX compliance |
-
2020
- 2020-05-29 CN CN202010482343.1A patent/CN111708738B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103581332A (en) * | 2013-11-15 | 2014-02-12 | 武汉理工大学 | HDFS framework and pressure decomposition method for NameNodes in HDFS framework |
US9928203B1 (en) * | 2015-07-15 | 2018-03-27 | Western Digital | Object storage monitoring |
CN109033250A (en) * | 2018-07-06 | 2018-12-18 | 内蒙古大学 | A kind of high availability object storage method for supporting large data files access service |
CN109783438A (en) * | 2018-12-05 | 2019-05-21 | 南京华讯方舟通信设备有限公司 | Distributed NFS system and its construction method based on librados |
CN110287150A (en) * | 2019-05-16 | 2019-09-27 | 中国科学院信息工程研究所 | A kind of large-scale storage systems meta-data distribution formula management method and system |
CN110688674A (en) * | 2019-09-23 | 2020-01-14 | ***股份有限公司 | Access butt-joint device, system and method and device applying access butt-joint device |
CN110750458A (en) * | 2019-10-22 | 2020-02-04 | 恩亿科(北京)数据科技有限公司 | Big data platform testing method and device, readable storage medium and electronic equipment |
Non-Patent Citations (1)
Title |
---|
UMStor Hadapter:大数据与对象存储的柳暗花明;陈涛;《https://mp.weixin.qq.com/s/-Nsd9wOg5bNoV0RTWGeaBw,UMCloud优铭云》;第1-11页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111708738A (en) | 2020-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111708738B (en) | Method and system for realizing interaction of hadoop file system hdfs and object storage s3 data | |
US9304815B1 (en) | Dynamic replica failure detection and healing | |
US10540119B2 (en) | Distributed shared log storage system having an adapter for heterogenous big data workloads | |
CN110413685B (en) | Database service switching method, device, readable storage medium and computer equipment | |
CN109564502B (en) | Processing method and device applied to access request in storage device | |
CN110716845B (en) | Log information reading method of Android system | |
JP2009237826A (en) | Storage system and volume management method therefor | |
CN110018878B (en) | Distributed system data loading method and device | |
CN112800026A (en) | Data transfer node, method, system and computer readable storage medium | |
US11157456B2 (en) | Replication of data in a distributed file system using an arbiter | |
CN109347936B (en) | Redis proxy client implementation method, system, storage medium and electronic device | |
CN105808451B (en) | Data caching method and related device | |
CN111090782A (en) | Graph data storage method, device, equipment and storage medium | |
CN116302605A (en) | Message engine-based message transmission method | |
CN106790521B (en) | System and method for distributed networking by using node equipment based on FTP | |
US11467777B1 (en) | Method and system for storing data in portable storage devices | |
US11501014B2 (en) | Secure data replication in distributed data storage environments | |
CN114185815A (en) | Method, equipment and system for realizing memory key value storage | |
CN109343928B (en) | Virtual memory file redirection method and system for virtual machine in virtualization cluster | |
CN107102898B (en) | Memory management and data structure construction method and device based on NUMA (non Uniform memory Access) architecture | |
US11435948B2 (en) | Methods and systems for user space storage management | |
CN115604290B (en) | Kafka message execution method, device, equipment and storage medium | |
US11086853B1 (en) | Method and system for streaming data from portable storage devices | |
US11966637B1 (en) | Method and system for storing data in portable storage devices | |
US11693578B2 (en) | Method and system for handoff with portable storage devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |