CN107066205A - A kind of data-storage system - Google Patents
A kind of data-storage system Download PDFInfo
- Publication number
- CN107066205A CN107066205A CN201611257420.3A CN201611257420A CN107066205A CN 107066205 A CN107066205 A CN 107066205A CN 201611257420 A CN201611257420 A CN 201611257420A CN 107066205 A CN107066205 A CN 107066205A
- Authority
- CN
- China
- Prior art keywords
- map
- reduce
- data
- storage
- tasks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
- G06F16/183—Provision of network file services by network file servers, e.g. by using NFS, CIFS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention proposes a kind of data-storage system, and the data-storage system includes:Hadoop clusters, with the component being arranged in Hadoop clusters, nfs server module, wherein, component includes Map Reduce frameworks, and Map Reduce frameworks are used to perform Map Reduce flows, and Map Reduce flows include Map tasks and Reduce tasks, it is arranged on the disk array module in nfs server module, and pass through disk array module and nfs server module composition shared storage device, so as to provide storage for Hadoop clusters, and store the result of each Map task to shared storage device, to remove shuffle processes, so as to optimize the flow of Map tasks and Reduce tasks;And the file cutting used in Hadoop clusters is multiple pieces by component, and different computer nodes are dealt into by each piece, it is achieved thereby that load balancing.The present invention by the data-storage system so that system in cost performance, reliability, can safeguard, obtained larger improvement in terms of performance.
Description
Technical field
The present invention relates to the communications field, it particularly relates to a kind of data-storage system.
Background technology
In recent years, Hadoop increased income, and big data project is increasingly mature, and it brings feasible to each big data application industry
Solution, the parallel processing framework Map-Reduce of Hadoop clusters is to structuring and the equal energy of the processing of semi-structured data
Many nodal parallels are enough realized, the speed of Data Analysis Services can be largely lifted.
Meanwhile, the default storage of Hadoop clusters is used under the distributed file system HDFS carried, default situations, should
HDFS is stored using three copies, still, for big data application, and many copies of HDFS acquiescences are stored with several defects:
Big data application system generally not only only does big data analysis, also numerous other types of business datum, because
This HDFS is difficult the demand for meeting various application scenarios, especially small documents storage scenarios, therefore, it is necessary to by once during analysis
Data are imported, and are imported data among HDFS, are caused great inconvenience;
The memory space utilization rate of HDFS three copies is 33.3%, and for big data is stored and is analyzed, cost is
Fairly expensive;
HDFS belongs to open source projects, the reliability of file system, it is maintainable in terms of there is more problem, be not suitable for
Store the critical data in production environment.
The problem of in correlation technique, effective solution is not yet proposed at present.
The content of the invention
The problem of in correlation technique, the present invention proposes a kind of data-storage system, passes through disk array RAID and the
The mode of two combination of protocols, substitute the mode of HHDFS tri- copies storage so that the reliability of system, can safeguard in terms of
Larger improvement is arrived, so as to solve asking for distributed file system HDFS cost, reliability and ease for use in the prior art
Topic.
The technical proposal of the invention is realized in this way:
According to an aspect of the invention, there is provided a kind of data-storage system.
The data-storage system includes:Hadoop clusters, and component, the nfs server mould being arranged in Hadoop clusters
Block, wherein, component includes Map-Reduce frameworks, and Map-Reduce frameworks are used to perform Map-Reduce flows, and
Map-Reduce flows include Map tasks and Reduce tasks, are arranged on the disk array module in nfs server module, and
And by disk array module and nfs server module composition shared storage device, so that storage is provided for Hadoop clusters,
And store the result of each Map task to shared storage device, to remove shuffle processes, so as to optimize Map tasks
With the flow of Reduce tasks;And the file cutting used in Hadoop clusters is multiple pieces by component, and by each block
Different computer nodes are dealt into, it is achieved thereby that load balancing.
According to one embodiment of present invention, component further comprises:NFS sharing storage modules, HDFS storage agreements turn
Shuffle stage modules, Map-Reduce task scheduling modules are gone in mold changing block, Map-Reduce flows.
According to one embodiment of present invention, disk array uses RAID5 or RAID6 storage mode, and will
The file used in Hadoop clusters is cut into 64MB block.
The advantageous effects of the present invention are:
The mode that the present invention is combined by using nfs server and disk array constitutes shared storage, substitutes prior art
The mode of the middle copies of HDFS tri- storage, so as to reduce cost, improves the cost performance of system, and the text that Hadoop is used
Part cutting is multiple pieces, and is uniformly distributed to each calculate node, it is achieved thereby that load balancing, in addition, also optimizing Map-
Reduce flows, it eliminates shuffle processes, so as to reduce the process of data interaction, improves task processing time, Jin Erliao
Improve systematic function.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment
The accompanying drawing needed to use is briefly described, it should be apparent that, drawings in the following description are only some implementations of the present invention
Example, for those of ordinary skill in the art, on the premise of not paying creative work, can also be obtained according to these accompanying drawings
Obtain other accompanying drawings.
Fig. 1 is the schematic diagram of data-storage system according to embodiments of the present invention;
Fig. 2 is the layout schematic diagram of mechanism of data-storage system according to embodiments of the present invention;
Fig. 3 is Map-Reduce tasks carryings process schematic of the prior art;
Fig. 4 is Map-Reduce tasks carrying process schematics according to embodiments of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, the every other embodiment that those of ordinary skill in the art are obtained belongs to what the present invention was protected
Scope.
There is provided data-storage system for embodiments in accordance with the present invention.
As shown in Figures 1 to 4, data-storage system according to embodiments of the present invention includes:Hadoop clusters, and be arranged on
Component, nfs server module in Hadoop clusters, wherein, component includes Map-Reduce frameworks, and Map-Reduce frames
Frame is used to perform Map-Reduce flows, and Map-Reduce flows include Map tasks and Reduce tasks, are arranged on NFS
Disk array module in server module, and set by disk array module and the shared storage of nfs server module composition
It is standby, so as to provide storage for Hadoop clusters, and the result of each Map task is stored to shared storage device, to go
Fall shuffle processes, so as to optimize the flow of Map tasks and Reduce tasks;And component will be used in Hadoop clusters
File cutting be multiple pieces, and different computer nodes are dealt into by each piece, it is achieved thereby that load balancing.
In this embodiment, as shown in figure 1, disk array RAID is arranged in nfs server, and disk array is passed through
The combination of module and nfs server module constitutes shared storage, so that storage is provided for Hadoop clusters, in addition, such as Fig. 2 institutes
Show, the file cutting used in Hadoop clusters is multiple pieces (or section) by component, and by each piece be dealt into 3 it is different
Computer node, it is achieved thereby that load balancing, as shown in Figure 3 and Figure 4, the result of each Map task is stored to shared and deposited
Equipment is stored up,, can so as to optimize the flow of Map tasks and Reduce tasks, it is of course possible to understand to remove shuffle processes
Size and the computer node of distribution according to the actual requirements to block is configured, and the present invention is not limited this.
By the such scheme of the present invention, the mode combined by using nfs server and disk array constitutes shared deposit
Storage, substitutes the mode of the copies of HDFS tri- storage in the prior art, so as to reduce cost, improves the cost performance of system, and
The file cutting that Hadoop is used is multiple pieces, and is uniformly distributed to each calculate node, it is achieved thereby that load balancing, this
Outside, Map-Reduce flows are also optimized, it eliminates shuffle processes, so as to reduce the process of data interaction, improves task
Processing time, and then improve systematic function.
According to one embodiment of present invention, component further comprises:NFS sharing storage modules, HDFS storage agreements turn
Shuffle stage modules, Map-Reduce task scheduling modules are gone in mold changing block, Map-Reduce flows, wherein, it is above-mentioned
NFS sharing storage modules are used to nfs server and disk array being arranged to shared storage;Above-mentioned HDFS stores protocol conversion mould
Block is used for the protocol data that HDFS protocol data is converted to NFS, so as to realize the access to disk array;Above-mentioned Map-
Reduce flows go Shuffle stage modules to be used to remove Shuffle flows;Above-mentioned Map-Reduce task scheduling modules are used for
The task scheduling of Map tasks and Reduce tasks.
According to one embodiment of present invention, the disk array uses the RAID5 (independent disks of distributed parity
Structure) or RAID6 (disk structure of the parity check code of two kinds of storages) storage mode, and will be used in Hadoop clusters
File be cut into 64MB block.
In order to preferably describe the present invention, it is described in detail below by a specific embodiment.
The problem of in order to solve the cost of distributed file system HDFS presence, reliability and ease for use in the prior art,
Set forth herein the three copy storage modes substituted using disk array RAID storage modes in HDFS, its one side eliminates number
According to the process imported and exported, on the other hand, the traditional RAID5 that the disk array can be arranged in traditional magnetic disk array and
RAID6, so as to improve memory space utilization rate, reduces cost.
As shown in figure 1, because disk array can be conducted interviews by NFS protocol (or NFS Network File Systems), therefore
It is NFS access protocols by the protocol conversion of HDFS application layers by adding protocol conversion module in Hadoop clusters, so that will
Hadoop storage, which is accessed, is converted to the access to the disk array in nfs server, specifically, 1 pair of computer node
Hadoop clusters are conducted interviews, and by the Hadoop of component 1 application layer protocol, (or Hadoop clusters are accessed protocol conversion module 1
Agreement) data be converted to the access data of NFS protocol, so as to be conducted interviews to the disk array RAID of the nfs server, its
His component 2, the situation of component 3 are similar, are not described in detail herein
In addition, the data storage acquiescence in Hadoop clusters is to be stored in using three copy modes in HDFS systems,
Hadoop each component, such as MapReduce frameworks, HBase systems, dependence copy mechanism progress are fault-tolerant, for example, when first
Where copy during node failure, Hadoop component can access triplicate data above automatically, still, using disk array
Substitute after HDFS, just there is no the concept of copy for file, although what the RAID mechanism of disk array ensure that data itself can
By property, but can not ensure copy automatic switchover mechanism inside Hadoop fault tolerant mechanism can normal work, still, due to magnetic
Disk array storage is using NFS protocol export, therefore the data that all calculate nodes are seen are completely the same, be therefore, it can
It is the memory node of a duplicate of the document to think any one node, as shown in Fig. 2 the original stored according to Hadoop files
Then, cutting is carried out according to the object of fixed length to the file that need to store, every piece after such as cutting is 64MB (million), meanwhile, in order to protect
Card MapReduce tasks can be distributed to different calculate nodes, meanwhile, can using every piece (section) specify 3 calculate nodes as
Node where its stored copies, so, each component of Hadoop inside is taken after the data layout of file, according to acquiescence
Algorithm carries out task distribution, it is not necessary to make any change, so that using NFS shared characteristic, data storage is distributed into
Row pseudo-random distribution, it is ensured that the harmony of Map-Reduce task schedulings, also, it is to be understood that cutting after block size and
The number of the computer node of stored copies can be set according to the actual requirements, and this is not limited by the present invention.
In addition, when selecting calculate node for the copy of each object, using pseudo-random algorithm, it is ensured that each to calculate section
The selected probability of point is basically identical, so as to ensure Hadoop system in task scheduling, can make full use of every in system
One calculate node, does not result in part of nodes situation hungry to death.
In addition, as shown in figure 3, during MapReduce tasks carryings in Hadoop clusters, wherein, the MapReduce
Task includes:Map stages and Reduce stages, the Map stages are responsible for carrying out cutting processing to input file, then collect and divide again
Group is handled to the Reduce stages, to reach efficient Distributed Calculation efficiency, and, it is necessary to will before each Map stages terminate
Multiple destination files on disk are written to before the stage and carry out merger, a destination file is merged into, and the Reduce stages
, it is necessary to pull the destination file of Map tasks from each Map tasks end before starting, and all Map results are subjected to merger, shape
Into final destination file, enter Reduce calculation stages, the above since the stage after Map to Reduce before whole processing
Process, referred to as Shuffle processing procedures, still, for the MapReduce tasks that task amount is larger, in above flow
There are substantial amounts of I/O (input/output) operations, especially data pull stage during Shuffle, Reduce jobs nodes need
From Map jobs nodes by network transmission pulling data, the time of process consumption accounts for more than 10% ratio in whole operation
Weight, however, as shown in figure 4, for set forth herein use disk array framework, because all data are stored shared
On, therefore the process of the network transmission can omit completely, the operation without carrying out data pull, so as to utilize NFS files system
The shared characteristic of system, optimizes the shuffle processes of Hadoop clusters, it is to avoid data transfer, improves task processing time.
To sum up, it is shared by disk array and NFS network files this paper presents in big data storage and analysis system
The mode of combination of protocols, substitutes the Hadoop copies of HDFS tri- so that system can be safeguarded in cost, data reliability, system, property
Larger improvement can be obtained in terms of these, specifically:It is first, as shown in table 1 below for space availability ratio and cost,
In which it is assumed that the naked space costs of 1TB are P.
Table 1
As it can be seen from table 1 HDFS memory space utilization rate is 33.3%, purchase 300TB storage is such as assumed, it is actual
Free space only has 100TB, and using set forth herein disk array RAID combination NFS network files it is shared by the way of, storage
Space availability ratio is up to 90%, and carrying cost saves about 67%.
Secondly as HDFS is not standard storage interface, it is therefore desirable to which the data of analysis must be imported and exported, to dividing
Analysis efficiency causes large effect, and use set forth herein scheme, after creation data is produced in front end, can directly use
Hadoop processing, it is not necessary to import and export, greatly facilitates the transmission of data, in addition, being assisted using disk array combination NFS
After view, the Shuffle processes to MapReduce tasks have carried out local optimum herein, to reduce in Map task nodes and
Data transfer is carried out by http protocol between Reduce task nodes, so as to improve the efficiency of whole processing procedure.
Again, open source software is to realize function as main purpose, thus its engineering process and enterprise-level product comparatively,
Many weak points are had, therefore often there is more hidden danger in the stability and reliability of system, due to Hadoop system
HDFS storage systems are constantly among modification, and stability equally exists certain risk, therefore creation data can not be deposited directly
Being placed on has very big risk on HDFS, and the development of decades is passed through in disk array RAID storages, and reliability is entirely enterprise-level
Standard, is adapted to the storage of creation data, therefore, in the way of disk array RAID and second protocol combination, substitutes HDFS tri- secondary
The mode of this storage, is far above distributed file system HDFS in data reliability.
In summary, by means of the above-mentioned technical proposal of the present invention, combined by using nfs server and disk array
Mode constitutes shared storage, substitutes the mode of the copies of HDFS tri- storage in the prior art, so as to reduce cost, improves system
Cost performance, and the file cutting that Hadoop is used is multiple pieces, and each calculate node is uniformly distributed to, so as to realize
Load balancing, in addition, also optimizing Map-Reduce flows, it eliminates shuffle processes, so as to reduce data interaction
Process, improves task processing time, and then improve systematic function.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
God is with principle, and any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.
Claims (3)
1. a kind of data-storage system, the storage system includes Hadoop clusters, and is arranged in the Hadoop clusters
Component, nfs server module, wherein, the component includes Map-Reduce frameworks, and the Map-Reduce frameworks are used for
Map-Reduce flows are performed, and the Map-Reduce flows include Map tasks and Reduce tasks, it is characterised in that
The disk array module in the nfs server module is arranged on, and passes through the disk array module and the NFS
Server module constitutes shared storage device, so as to provide storage for the Hadoop clusters, and each described Map is appointed
The result of business is stored to the shared storage device, to remove shuffle processes, so as to optimize Map tasks and Reduce
The flow of business;And
The file cutting used in Hadoop clusters is multiple pieces by the component, and is dealt into different calculating by each piece
Machine node, it is achieved thereby that load balancing.
2. according to the storage system described in claim 1, it is characterised in that the component further comprises:The shared storage moulds of NFS
Shuffle stage modules, Map-Reduce is gone to appoint in block, HDFS storages protocol conversion module, the Map-Reduce flows
Business scheduler module.
3. storage system according to claim 1, it is characterised in that the disk array is deposited using RAID5's or RAID6
Storage mode, and the file used in Hadoop clusters is cut into 64MB block.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611257420.3A CN107066205B (en) | 2016-12-30 | 2016-12-30 | Data storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611257420.3A CN107066205B (en) | 2016-12-30 | 2016-12-30 | Data storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107066205A true CN107066205A (en) | 2017-08-18 |
CN107066205B CN107066205B (en) | 2020-06-05 |
Family
ID=59624054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611257420.3A Active CN107066205B (en) | 2016-12-30 | 2016-12-30 | Data storage system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107066205B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108776690A (en) * | 2018-06-05 | 2018-11-09 | 上海孚典智能科技有限公司 | The method of HDFS Distribution and Centralization blended data storage systems based on separated layer handling |
CN110297812A (en) * | 2019-06-13 | 2019-10-01 | 深圳市比比赞科技有限公司 | File memory method, the method for file synchronization, computer equipment and storage medium |
CN112328176A (en) * | 2020-11-04 | 2021-02-05 | 北京计算机技术及应用研究所 | Intelligent scheduling method based on multi-control disk array NFS sharing |
WO2022116766A1 (en) * | 2020-12-04 | 2022-06-09 | 中兴通讯股份有限公司 | Data storage system and construction method therefor |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101873342A (en) * | 2010-06-02 | 2010-10-27 | 深圳市迪菲特科技股份有限公司 | Data access method, data access system and disk array storage system |
CN102521687A (en) * | 2011-12-01 | 2012-06-27 | 中国资源卫星应用中心 | Miniaturized universal platform for preprocessing remote-sensing satellite data |
CN102915257A (en) * | 2012-09-28 | 2013-02-06 | 曙光信息产业(北京)有限公司 | TORQUE(tera-scale open-source resource and queue manager)-based parallel checkpoint execution method |
CN103747060A (en) * | 2013-12-26 | 2014-04-23 | 惠州华阳通用电子有限公司 | Distributed monitor system and method based on streaming media service cluster |
US20140358977A1 (en) * | 2013-06-03 | 2014-12-04 | Zettaset, Inc. | Management of Intermediate Data Spills during the Shuffle Phase of a Map-Reduce Job |
-
2016
- 2016-12-30 CN CN201611257420.3A patent/CN107066205B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101873342A (en) * | 2010-06-02 | 2010-10-27 | 深圳市迪菲特科技股份有限公司 | Data access method, data access system and disk array storage system |
CN102521687A (en) * | 2011-12-01 | 2012-06-27 | 中国资源卫星应用中心 | Miniaturized universal platform for preprocessing remote-sensing satellite data |
CN102915257A (en) * | 2012-09-28 | 2013-02-06 | 曙光信息产业(北京)有限公司 | TORQUE(tera-scale open-source resource and queue manager)-based parallel checkpoint execution method |
US20140358977A1 (en) * | 2013-06-03 | 2014-12-04 | Zettaset, Inc. | Management of Intermediate Data Spills during the Shuffle Phase of a Map-Reduce Job |
CN103747060A (en) * | 2013-12-26 | 2014-04-23 | 惠州华阳通用电子有限公司 | Distributed monitor system and method based on streaming media service cluster |
Non-Patent Citations (1)
Title |
---|
何文婷,等: "《支持Hadoop大数据访问的pNFS框架研究与实现》", 《计算机应用研究》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108776690A (en) * | 2018-06-05 | 2018-11-09 | 上海孚典智能科技有限公司 | The method of HDFS Distribution and Centralization blended data storage systems based on separated layer handling |
CN108776690B (en) * | 2018-06-05 | 2020-07-07 | 上海孚典智能科技有限公司 | Method for HDFS distributed and centralized mixed data storage system based on hierarchical governance |
CN110297812A (en) * | 2019-06-13 | 2019-10-01 | 深圳市比比赞科技有限公司 | File memory method, the method for file synchronization, computer equipment and storage medium |
CN112328176A (en) * | 2020-11-04 | 2021-02-05 | 北京计算机技术及应用研究所 | Intelligent scheduling method based on multi-control disk array NFS sharing |
CN112328176B (en) * | 2020-11-04 | 2024-01-30 | 北京计算机技术及应用研究所 | Intelligent scheduling method based on NFS sharing of multi-control disk array |
WO2022116766A1 (en) * | 2020-12-04 | 2022-06-09 | 中兴通讯股份有限公司 | Data storage system and construction method therefor |
Also Published As
Publication number | Publication date |
---|---|
CN107066205B (en) | 2020-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Das et al. | Big data analytics: A framework for unstructured data analysis | |
US8677366B2 (en) | Systems and methods for processing hierarchical data in a map-reduce framework | |
CN107066205A (en) | A kind of data-storage system | |
CN103106249B (en) | A kind of parallel data processing system based on Cassandra | |
Singh et al. | Hadoop: addressing challenges of big data | |
US20130227379A1 (en) | Efficient checksums for shared nothing clustered filesystems | |
CN106790572A (en) | The system and method that a kind of distributed information log is collected | |
Ngu et al. | B+-tree construction on massive data with Hadoop | |
Saxena et al. | Practical real-time data processing and analytics: distributed computing and event processing using Apache Spark, Flink, Storm, and Kafka | |
Khan et al. | Data model for big data in cloud environment | |
Li et al. | The overview of big data storage and management | |
CN107632780A (en) | A kind of roll of strip implementation method and its storage architecture based on distributed memory system | |
CN106156049A (en) | A kind of method and system of digital independent | |
CN102880832B (en) | A kind of implementation method of the system of the data magnanimity management under cluster | |
Tomar et al. | Integration of cloud computing and big data technology for smart generation | |
Feng et al. | Review of hadoop performance optimization | |
CN107395446A (en) | Daily record real time processing system | |
Bokhari et al. | An effective model for big data analytics | |
Li et al. | Design of the mass multimedia files storage architecture based on Hadoop | |
Kaur | Big data: A review of challenges, tools and techniques | |
Liu et al. | Research on it architecture of heterogeneous big data | |
Chakraborty et al. | A proposal for high availability of HDFS architecture based on threshold limit and saturation limit of the namenode | |
CN108062311A (en) | A kind of method and system of access service device web data | |
Wang et al. | Research on the architecture of Open Education based on cloud computing | |
US9639630B1 (en) | System for business intelligence data integration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211011 Address after: 100089 building 36, courtyard 8, Dongbeiwang West Road, Haidian District, Beijing Patentee after: Dawning Information Industry (Beijing) Co.,Ltd. Patentee after: ZHONGKE SUGON INFORMATION INDUSTRY CHENGDU Co.,Ltd. Address before: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing Patentee before: Dawning Information Industry (Beijing) Co.,Ltd. |