CN103617211A - HBase loaded data importing method - Google Patents

HBase loaded data importing method Download PDF

Info

Publication number
CN103617211A
CN103617211A CN201310584702.4A CN201310584702A CN103617211A CN 103617211 A CN103617211 A CN 103617211A CN 201310584702 A CN201310584702 A CN 201310584702A CN 103617211 A CN103617211 A CN 103617211A
Authority
CN
China
Prior art keywords
data
hbase
region
file
importing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310584702.4A
Other languages
Chinese (zh)
Inventor
郭美思
王秀娟
吴楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201310584702.4A priority Critical patent/CN103617211A/en
Publication of CN103617211A publication Critical patent/CN103617211A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an HBase loaded data importing method. The method includes in Region predistribution, setting environment and configuration parameters in a cluster; creating a HBase list according to compiling of a function determining the number of Regions; after Region predistribution is finished, utilizing distributed-type computation framework analyzing and processing capability and characteristics of parallel computation to compile a MapReduce program to enable source data to generate an Hfile file; using a completebulkload order to complete importing of the data, and importing the data into the HBase list according to a preset format. By the method, a well-generated HFile file can be directly loaded into a running HBase cluster, so that network traffic generated during data transmission and HBase loading in the process of data migration is reduced, data importing efficiency is improved, and CPU (central processing unit) and network resources are saved.

Description

A kind of HBase loads the introduction method of data
Technical field
The present invention relates to the introduction method that HBase loads data.
Technical background
Along with the develop rapidly of network technology, the rapid growth of data volume, in order to analyze and to utilize these huge data resources, traditional technology has run into huge obstacle already, cannot be competent at the task of large data analysis.And in order to meet the requirement of large data analysis, Google has proposed MapReduce technology, it is a kind of programming model towards large-scale data analyzing and processing and parallel computation.In the required technology of large data, distributed file system, distributed data base etc. is all the technology that is applicable to large data.HBase be a kind of extendible, support large-scale distributed database.It is to utilize Hadoop HDFS as its document storage system.Because it is with good expansibility, fault-tolerance, and random reading capability, support MapReduce parallel computation, by increasing company, accepted.But find after deliberation, the data importing instrument carrying in HBase has certain limitation, and it can not make user control data loading procedure completely, and the expection form can not self-defining data loading.Therefore it is very important, loading the introduction method that the HBase with specific format loads data.
The Bulk load carrying in HBase at present supports mass data to be loaded in HBase efficiently.Bulk load realizes by a MapReduce Job, and the inside HFile formatted file that directly generates a HBase by Job forms a special HBase tables of data, then directly data file is loaded in the cluster of operation.Use the simplest mode of bulk load function to use exactly importtsv instrument.Importtsv is a built-in tool from the direct loading content of TSV file to HBase.It is by a MapReduce Job of operation, the table of HBase that data are write direct from TSV file or write the own formatted data file of a HBase.
Although importtsv instrument is very useful when needs import HBase by text data, but there is certain situation, such as importing the data of extended formatting, you can wish to carry out generated data with programming, and MapReduce processes the most effective mode of mass data.This may be also in HBase, to have loaded mass data the most feasible unique method.Certainly we can use MapReduce to import data to HBase, but the data set of magnanimity can make MapReduce Job also become very heavy.If deal with improperly, may make the handling capacity in job when operation of MapReduce very little.
Summary of the invention
The technical problem to be solved in the present invention is: the Region quantity according in rational design HBase table, makes the pre-data uniform distribution importing in cluster.Adopt again the programming model of distributed computing framework MapReduce again to realize the HFile file that Map interface and Reduce interface obtain the specific format of expection, then utilize completebulkload instrument that file is loaded in HBase according to expection form.
In HBase, data merging is a frequent write operation task of carrying out, unless we can generate the internal data file of HBase, and directly loads.Although the writing speed of HBase is always very fast like this, if merging process does not have suitable configuration, also likely cause write operation often to get clogged.Another problem that the task that write operation is very heavy may cause is exactly that data have been write to identical group's server (region server), and this situation often appears at mass data is imported in a newly-built HBase.Once data centralization is at identical server, it is uneven that whole cluster just becomes, and writing rate can reduce significantly.Therefore,, first by predistribution Region, its fundamental purpose is HBase to be imported to data build cluster before, and can make the data uniform distribution importing in cluster.Then by MapReduce program, produce specific document format data.Finally HFile file is directly loaded in HBase.Aforesaid way will be guaranteed the preallocated rationality of Region, MapReduce program design and the rationality of writing.The method can improve importing efficiency, and supports parallel computation, therefore more efficient.
The technical solution adopted in the present invention is:
A kind of HBase loads the introduction method of data, first in Region predistribution, set environment and configuration parameter in cluster, then according to writing, determine that the function of Region quantity creates HBase table, treat that Region predistribution finishes, utilize the feature of distributed computing framework analyzing and processing ability and parallel computation to write MapReduce program source data is generated to Hfile file, finally, by complete the importing of data with completebulkload order, data have been imported to HBase table according to predetermined form.Can improve importing efficiency like this.The introduction method of these loading data is mainly realized by MapReduce module.
Each row of data in HBase all belongs to a specific Region, and a Region has comprised according to the HBase data line of sequence number sequence, and it is managed by RegionServer.Create after described HBase table, this table can start in an independent Region, first the data of all insertions can enter in this Region, when reaching a limit, data can be split into two Region, separated Region can be distributed on other Region Server, to reach the load balancing in cluster.Therefore, when data importing, first predistribution Region, is distributed to data in whole cluster and reaches load balancing with suitable algorithm, accelerates the speed that data load.
Described, write in MapReduce program, MapReduce framework is responsible for data to divide, using a storage block Block of file as a division, then extract the key-value pair set <K1 of the record in dividing, V1> inputs as Map, in the mapper of appointment class, by the form of row data-switching appointment of input, Map module is according to key-value pair conversion row data and generate row key, and the specify columns Praenomen title that claims and be listed as; In map method, set up Put object, by Put.add () function, the data after conversion are added in Put object, call context.write () method data are write in intermediate file; Then according to rowkey and Put object, generate middle key-value pair <rowkey, put>, and intermediate result is write to local disk.Reduce module is according to the position that obtains intermediate result from Master, by remote interface reading out data data are write and front row arranged from carry out the disk of TaskTracker of Map task, meet the output format of expection, thereby draw last Output rusults HFile file.
In Reduce class, can the result after processing be outputed in the file of appointment by the map output file form of user program appointment according to these records of processing of reduce method iteration, by setOutputPath function, set the path of output.
Described Region predistribution is according to the number that imports the data volume of HBase and the make out the scale Region of distributed type assemblies, then by data volume, design in advance and distribute Region, can significantly reduce the number of times of Region Split, Split not even, the object of load balancing while reaching data importing.
Described MapReduce programming framework, is the processing procedure that obtains HFile formatted file, Map resume module according to data layout, obtain appropriate design rowkey, then deal with data obtains intermediate result; In Reducer module, be organized into rational data layout, finally by HFile file output in the outgoing route of appointment; This process can arrange a plurality of map quantity, has improved treatment effeciency, has greatly promoted performance.Programming trouble when MapReduce framework has been simplified concurrent processor, provides the DLL (dynamic link library) traveling through.
Described introduction method is realized by writing MapReduce program: first in principal function, create a Job example, input path, outgoing route, mapper class, reducer class and the key of map output and the type of value of this example are set, then the configuration of HBase is set, as the node of zookeeper in cluster-specific.Again, according to the configuration arranging, set up HBase table, finally output is set as to HFileOutputFormat and can generates HFile file.
Beneficial effect of the present invention is:
What the present invention adopted is before data are transferred to HBase, to allocate the effect that Region quantity reaches cluster load balancing in advance, then utilizes MapReduce programming to generate HFile file, has computation capability.The method can directly load generated HFile file into operating HBase cluster.The network traffics that produce when data transmission and HBase load have so just been reduced in data migration process.This method has improved data importing efficiency simultaneously, saves CPU and Internet resources.
Accompanying drawing explanation
Fig. 1 is the flowchart of HBase data importing;
Fig. 2 is the preallocated sequential chart of Region in HBase data importing.
Embodiment
With reference to the accompanying drawings, in conjunction with the embodiments to the detailed description of the invention.
Embodiment 1:
In the present embodiment, first according to the environment of the pre-data that import and cluster, reasonably calculate the number of predistribution Region, the environment of existing cluster is 8 station servers, internal memory 96G, operating system is centos6.3, installs the assembly of cluster as HDFS, MapReduce, HBase etc. according to installation steps.The data file of this importing is 10,000,000,000 data, and data layout the following is: A75566620131107,121212,33333.The flowchart of HBase data importing as shown in Figure 1, first rationally arranges the parameter in cluster, then designs the preallocated algorithm of Region, guarantees data load balance.Then according to the preallocated quantity of Region, data are generated to HFile file by MapReduce program, finally utilize completebulkload order that the HFile file of predetermined form has been loaded in HBase table, complete the importing of data.
Embodiment 2:
The present embodiment is predistribution Region, as shown in Figure 2: first configure correlation parameter, then according to the HBase table of the Region number in Getsplit function creation algorithm for design.In this algorithm, be to comprise letter and date according to pre-importing in data layout, environment and the data scale of considering cluster design preallocated Region number again, letter and combination of numbers are used as dividing Region scope, and the scope of letter is A-Z, and the scope of numeral is that 01-12 represented for 12 month.Can there is like this 24*12=288 region to load uniformly these data, and because each letter and the data in month are uniformly, so the data in each region are also uniform.Can guarantee that like this load in Region is balanced, there is no the Region that load is heavy especially, also there is no the Region that load is light especially.
Embodiment 3:
HBase loads the introduction method of data and realizes by writing MapReduce program.First new configuration () in principal function, then creates a Job example new job () according to conf.Input path, outgoing route, mapper class, reducer class and the key of map output and the type of value of this example are set.Thereupon by the configuration of set () function setup HBase, as the node of zookeeper in cluster-specific.Again, according to the configuration arranging, set up HBase table, finally according to HFileOutputFormat.configureIncrementalLoad (job, htable) method, output is set as generating HFile file, wherein htable creates by new HTable ().
Embodiment 4:
In the mapper of appointment class by the form of row data-switching appointment of input, by to the rational rowkey of the operational design of row data, letter and numeral are reasonably arranged in together, and Region is corresponding with predistribution, the title that specify columns Praenomen claims and is listed as simultaneously.In map method, set up Put object, by Put.add () method, the data after conversion are added in Put object.Then calling context.write () method writes data in intermediate file.Can be by the map output file form of user program appointment according to these records of processing of reduce method iteration in Reduce class, by Iterator<Put> iter=puts.iterator () iteration value value is added in map, TreeSet<KeyValue> map=new TreeSet<KeyValue> (KeyValue.COMPARATOR) wherein.Then row and kv are write by context.write (row, kv).Finally the result after processing is outputed in the file of appointment, can set by setOutputPath function the path of output.Utilize completebulkload order that the HFile file of predetermined form has been loaded in HBase table, complete the importing of data.
In the operational process of program, some bottleneck and obstacle that the daily record that can generate by monitoring interface, Hadoop or Hbase exists while going to monitor the MapReduce loading data in cluster.According to the prompting showing in daily record, can adjust the environmental parameter in corresponding configuration parameter and cluster, as revised map quantity and reduce quantity, to make it the efficiency of operation higher; Adjust JVM storehouse size and memory size etc.By suitable modification configuration parameter, can improve the ability of cpu busy percentage and parallel computation, improve data importing efficiency, save Internet resources.

Claims (6)

1. a HBase loads the introduction method of data, it is characterized in that: first in Region predistribution, set environment and configuration parameter in cluster, then according to writing, determine that the function of Region quantity creates HBase table, treat that Region predistribution finishes, utilize the feature of distributed computing framework analyzing and processing ability and parallel computation to write MapReduce program source data is generated to Hfile file, finally, by complete the importing of data with completebulkload order, data have been imported to HBase table according to predetermined form.
2. a kind of HBase according to claim 1 loads the introduction method of data, it is characterized in that: create after described HBase table, this table can start in an independent Region, first the data of all insertions enter in this Region, data are split into two Region while reaching a limit, separated Region is distributed on other Region Server, to reach the load balancing in cluster.
3. a kind of HBase according to claim 1 loads the introduction method of data, it is characterized in that: in the described MapReduce of writing program, MapReduce framework is responsible for data to divide, using a storage block Block of file as a division, then extract the key-value pair set <K1 of the record in dividing, V1> inputs as Map, in the mapper of appointment class by the form of row data-switching appointment of input, Map module is according to key-value pair conversion row data and generate row key, and the specify columns Praenomen title that claims and be listed as, in map method, set up Put object, by Put.add () function, the data after conversion are added in Put object, call context.write () method data are write in intermediate file, then according to rowkey and Put object, generate middle key-value pair <rowkey, put>, and intermediate result is write to local disk, Reduce module is according to the position that obtains intermediate result from Master, by remote interface reading out data data are write and front row arranged from carry out the disk of TaskTracker of Map task, meet the output format of expection, thereby draw last Output rusults HFile file.
4. a kind of HBase according to claim 1 loads the introduction method of data, it is characterized in that: described Region predistribution is according to the number that imports the data volume of HBase and the make out the scale Region of distributed type assemblies, then by data volume, design in advance and distribute Region, can significantly reduce the number of times of Region Split, Split not even, the object of load balancing while reaching data importing.
5. a kind of HBase according to claim 3 loads the introduction method of data, it is characterized in that: described MapReduce programming framework, it is the processing procedure that obtains HFile formatted file, Map resume module according to data layout, obtain appropriate design rowkey, then deal with data obtains intermediate result; In Reducer module, be organized into rational data layout, finally by HFile file output in the outgoing route of appointment.
6. according to a kind of HBase described in the above-mentioned arbitrary claim of claim, load the introduction method of data, it is characterized in that, described introduction method is to realize by writing MapReduce program: first in principal function, create a Job example, input path, outgoing route, mapper class, reducer class and the key of map output and the type of value of this example are set, then the configuration of HBase is set, again, according to the configuration arranging, set up HBase table, finally output is set as to HFileOutputFormat and can generates HFile file.
CN201310584702.4A 2013-11-20 2013-11-20 HBase loaded data importing method Pending CN103617211A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310584702.4A CN103617211A (en) 2013-11-20 2013-11-20 HBase loaded data importing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310584702.4A CN103617211A (en) 2013-11-20 2013-11-20 HBase loaded data importing method

Publications (1)

Publication Number Publication Date
CN103617211A true CN103617211A (en) 2014-03-05

Family

ID=50167914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310584702.4A Pending CN103617211A (en) 2013-11-20 2013-11-20 HBase loaded data importing method

Country Status (1)

Country Link
CN (1) CN103617211A (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077420A (en) * 2014-07-21 2014-10-01 北京京东尚科信息技术有限公司 Method and device for importing data into HBase database
CN104239542A (en) * 2014-09-22 2014-12-24 浪潮(北京)电子信息产业有限公司 System and method for capturing data from source distributed database
CN104252535A (en) * 2014-09-16 2014-12-31 福建新大陆软件工程有限公司 Hbase-based data hash processing method and device
CN104516985A (en) * 2015-01-15 2015-04-15 浪潮(北京)电子信息产业有限公司 Rapid mass data importing method based on HBase database
CN104598562A (en) * 2015-01-08 2015-05-06 浪潮软件股份有限公司 XML file processing method and device based on MapReduce parallel computing model
CN105205154A (en) * 2015-09-24 2015-12-30 浙江宇视科技有限公司 Data migration method and device
CN105550296A (en) * 2015-12-10 2016-05-04 深圳市华讯方舟软件技术有限公司 Data importing method based on spark-SQL big data processing platform
CN105630896A (en) * 2015-12-21 2016-06-01 浪潮集团有限公司 Method for quickly importing mass data
CN105808577A (en) * 2014-12-29 2016-07-27 北京神州泰岳软件股份有限公司 HBase database-based data batch loading method and device
CN105893521A (en) * 2016-03-31 2016-08-24 南京烽火软件科技有限公司 Reading-and-writing separation HBase warehousing method
CN105988995A (en) * 2015-01-27 2016-10-05 杭州海康威视数字技术股份有限公司 HFile based data batch loading method
CN106055678A (en) * 2016-06-07 2016-10-26 国网河南省电力公司电力科学研究院 Hadoop-based panoramic big data distributed storage method
CN106648934A (en) * 2016-12-27 2017-05-10 中科天玑数据科技股份有限公司 Method and system for high-efficiency data transmission between Impala and HBase
CN106897450A (en) * 2017-03-03 2017-06-27 郑州云海信息技术有限公司 A kind of method that HBase is quickly introduced based on HDFS mass datas
CN107016039A (en) * 2017-01-06 2017-08-04 阿里巴巴集团控股有限公司 The method and Database Systems of database write-in
CN108255966A (en) * 2017-12-25 2018-07-06 太极计算机股份有限公司 A kind of data migration method and storage medium
CN108494589A (en) * 2018-03-14 2018-09-04 北京思特奇信息技术股份有限公司 A kind of management method and system of distribution Nginx servers
CN109271365A (en) * 2018-09-19 2019-01-25 浪潮软件股份有限公司 A method of based on Spark memory techniques to HBase database acceleration reading/writing
CN109445795A (en) * 2018-09-14 2019-03-08 厦门天锐科技股份有限公司 Data processing method when multiple asynchronous call same request of data in call back function
CN109614140A (en) * 2018-12-17 2019-04-12 泰康保险集团股份有限公司 Configuration data processing method and device, electronic equipment, storage medium
CN109657009A (en) * 2018-12-21 2019-04-19 北京锐安科技有限公司 The pre- partitioned storage periodic table creation method of data, device, equipment and storage medium
CN109918425A (en) * 2017-12-14 2019-06-21 北京京东尚科信息技术有限公司 A kind of method and system realized data and import non-relational database
CN110990394A (en) * 2018-09-28 2020-04-10 杭州海康威视数字技术股份有限公司 Distributed column database table-oriented line number statistical method and device and storage medium
CN112667593A (en) * 2020-12-27 2021-04-16 武汉达梦数据库股份有限公司 Method and device for ETL (extract transform and load) flow to execute hbase fast loading

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521246A (en) * 2011-11-11 2012-06-27 国网信息通信有限公司 Cloud data warehouse system
CN102685221A (en) * 2012-04-29 2012-09-19 华北电力大学(保定) Distributed storage and parallel mining method for state monitoring data
CN102750367A (en) * 2011-12-29 2012-10-24 中华电信股份有限公司 Big data checking system and method thereof on cloud platform
CN103049556A (en) * 2012-12-28 2013-04-17 中国科学院深圳先进技术研究院 Fast statistical query method for mass medical data
US20130185337A1 (en) * 2012-01-18 2013-07-18 Cloudera, Inc. Memory allocation buffer for reduction of heap fragmentation
US20130282668A1 (en) * 2012-04-20 2013-10-24 Cloudera, Inc. Automatic repair of corrupt hbases

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521246A (en) * 2011-11-11 2012-06-27 国网信息通信有限公司 Cloud data warehouse system
CN102750367A (en) * 2011-12-29 2012-10-24 中华电信股份有限公司 Big data checking system and method thereof on cloud platform
US20130185337A1 (en) * 2012-01-18 2013-07-18 Cloudera, Inc. Memory allocation buffer for reduction of heap fragmentation
US20130282668A1 (en) * 2012-04-20 2013-10-24 Cloudera, Inc. Automatic repair of corrupt hbases
CN102685221A (en) * 2012-04-29 2012-09-19 华北电力大学(保定) Distributed storage and parallel mining method for state monitoring data
CN103049556A (en) * 2012-12-28 2013-04-17 中国科学院深圳先进技术研究院 Fast statistical query method for mass medical data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JOSHUASABRINA: "提升HBase写性能", 《JOSHUASABRINA.ITEYE.COM/BLOG/1798239》 *
NUOLINE: "hbase的预分配region", 《BLOG.CSDN.NET/NUOLINE/ARTICLE/DETAILS/8610794》 *
学步: "HBase自动分区(Auto-Sharding)", 《BLOG.SINA.COM.CN/S/BLOG_9CEE0FD901018VU2.HTML》 *
程佳: "一种基于Hadoop的RDF数据划分与存储研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077420B (en) * 2014-07-21 2017-05-03 北京京东尚科信息技术有限公司 Method and device for importing data into HBase database
CN104077420A (en) * 2014-07-21 2014-10-01 北京京东尚科信息技术有限公司 Method and device for importing data into HBase database
CN104252535A (en) * 2014-09-16 2014-12-31 福建新大陆软件工程有限公司 Hbase-based data hash processing method and device
CN104239542A (en) * 2014-09-22 2014-12-24 浪潮(北京)电子信息产业有限公司 System and method for capturing data from source distributed database
CN104239542B (en) * 2014-09-22 2017-11-17 浪潮(北京)电子信息产业有限公司 A kind of system and method for source distribution formula database capture data
CN105808577A (en) * 2014-12-29 2016-07-27 北京神州泰岳软件股份有限公司 HBase database-based data batch loading method and device
CN105808577B (en) * 2014-12-29 2019-08-20 北京神州泰岳软件股份有限公司 A kind of method and apparatus of the batch data storage based on HBase database
CN104598562A (en) * 2015-01-08 2015-05-06 浪潮软件股份有限公司 XML file processing method and device based on MapReduce parallel computing model
CN104516985A (en) * 2015-01-15 2015-04-15 浪潮(北京)电子信息产业有限公司 Rapid mass data importing method based on HBase database
CN105988995A (en) * 2015-01-27 2016-10-05 杭州海康威视数字技术股份有限公司 HFile based data batch loading method
CN105988995B (en) * 2015-01-27 2019-05-24 杭州海康威视数字技术股份有限公司 A method of based on HFile batch load data
CN105205154A (en) * 2015-09-24 2015-12-30 浙江宇视科技有限公司 Data migration method and device
CN105205154B (en) * 2015-09-24 2021-06-22 浙江宇视科技有限公司 Data migration method and device
CN105550296A (en) * 2015-12-10 2016-05-04 深圳市华讯方舟软件技术有限公司 Data importing method based on spark-SQL big data processing platform
CN105550296B (en) * 2015-12-10 2018-10-30 深圳市华讯方舟软件技术有限公司 A kind of data lead-in method based on spark-SQL big data processing platforms
CN105630896A (en) * 2015-12-21 2016-06-01 浪潮集团有限公司 Method for quickly importing mass data
CN105893521A (en) * 2016-03-31 2016-08-24 南京烽火软件科技有限公司 Reading-and-writing separation HBase warehousing method
CN106055678A (en) * 2016-06-07 2016-10-26 国网河南省电力公司电力科学研究院 Hadoop-based panoramic big data distributed storage method
CN106648934B (en) * 2016-12-27 2019-12-03 中国科学院计算技术研究所 A kind of efficient data transfer method and system between Impala and HBase
CN106648934A (en) * 2016-12-27 2017-05-10 中科天玑数据科技股份有限公司 Method and system for high-efficiency data transmission between Impala and HBase
CN107016039B (en) * 2017-01-06 2020-11-03 创新先进技术有限公司 Database writing method and database system
CN107016039A (en) * 2017-01-06 2017-08-04 阿里巴巴集团控股有限公司 The method and Database Systems of database write-in
CN106897450A (en) * 2017-03-03 2017-06-27 郑州云海信息技术有限公司 A kind of method that HBase is quickly introduced based on HDFS mass datas
CN109918425A (en) * 2017-12-14 2019-06-21 北京京东尚科信息技术有限公司 A kind of method and system realized data and import non-relational database
CN108255966A (en) * 2017-12-25 2018-07-06 太极计算机股份有限公司 A kind of data migration method and storage medium
CN108494589B (en) * 2018-03-14 2021-05-14 北京思特奇信息技术股份有限公司 Management method and system of distributed Nginx server
CN108494589A (en) * 2018-03-14 2018-09-04 北京思特奇信息技术股份有限公司 A kind of management method and system of distribution Nginx servers
CN109445795A (en) * 2018-09-14 2019-03-08 厦门天锐科技股份有限公司 Data processing method when multiple asynchronous call same request of data in call back function
CN109271365A (en) * 2018-09-19 2019-01-25 浪潮软件股份有限公司 A method of based on Spark memory techniques to HBase database acceleration reading/writing
CN110990394A (en) * 2018-09-28 2020-04-10 杭州海康威视数字技术股份有限公司 Distributed column database table-oriented line number statistical method and device and storage medium
CN110990394B (en) * 2018-09-28 2023-10-20 杭州海康威视数字技术股份有限公司 Method, device and storage medium for counting number of rows of distributed column database table
CN109614140B (en) * 2018-12-17 2022-02-08 泰康保险集团股份有限公司 Configuration data processing method and device, electronic equipment and storage medium
CN109614140A (en) * 2018-12-17 2019-04-12 泰康保险集团股份有限公司 Configuration data processing method and device, electronic equipment, storage medium
CN109657009A (en) * 2018-12-21 2019-04-19 北京锐安科技有限公司 The pre- partitioned storage periodic table creation method of data, device, equipment and storage medium
CN109657009B (en) * 2018-12-21 2021-03-12 北京锐安科技有限公司 Method, device, equipment and storage medium for creating data pre-partition storage periodic table
CN112667593A (en) * 2020-12-27 2021-04-16 武汉达梦数据库股份有限公司 Method and device for ETL (extract transform and load) flow to execute hbase fast loading

Similar Documents

Publication Publication Date Title
CN103617211A (en) HBase loaded data importing method
US8984516B2 (en) System and method for shared execution of mixed data flows
Zhao et al. Dache: A data aware caching for big-data applications using the MapReduce framework
US9336288B2 (en) Workflow controller compatibility
US8677366B2 (en) Systems and methods for processing hierarchical data in a map-reduce framework
Chen et al. A study of SQL-on-Hadoop systems
Lai et al. Towards a framework for large-scale multimedia data storage and processing on Hadoop platform
CN103646073A (en) Condition query optimizing method based on HBase table
KR20150092586A (en) Method and Apparatus for Processing Exploding Data Stream
CN106611037A (en) Method and device for distributed diagram calculation
CN109408493A (en) A kind of moving method and system of data source
CN107016039B (en) Database writing method and database system
US10585897B2 (en) Reducing redundant operations in a streaming environment
CN112052011A (en) Method and device for combining small programs, electronic equipment and medium
CN106570151A (en) Data collection processing method and system for mass files
CN112506887A (en) Vehicle terminal CAN bus data processing method and device
Gupta et al. Efficient query analysis and performance evaluation of the NoSQL data store for bigdata
Tseng et al. A successful application of big data storage techniques implemented to criminal investigation for telecom
Yu et al. Design and implementation of business access control in new generation power grid dispatching and control system
Bodepudi Data Transfer Between RDBMS and HDFS By Using The Spark Framework In Sqoop For Better Performance
Aziz et al. Big data optimisation among RDDs persistence in apache spark
CN113360494B (en) Wide-table data generation method, updating method and related device
Junwei et al. Architecture for component library retrieval on the cloud
Vengadeswaran et al. Grouping-aware data placement in hdfs for data-intensive applications based on graph clustering
Zhang et al. Design and Implementation of Telecom Offline Data Integrated Processing Based on Hadoop Architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140305