CN103617211A - HBase loaded data importing method - Google Patents
HBase loaded data importing method Download PDFInfo
- Publication number
- CN103617211A CN103617211A CN201310584702.4A CN201310584702A CN103617211A CN 103617211 A CN103617211 A CN 103617211A CN 201310584702 A CN201310584702 A CN 201310584702A CN 103617211 A CN103617211 A CN 103617211A
- Authority
- CN
- China
- Prior art keywords
- data
- hbase
- region
- file
- importing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an HBase loaded data importing method. The method includes in Region predistribution, setting environment and configuration parameters in a cluster; creating a HBase list according to compiling of a function determining the number of Regions; after Region predistribution is finished, utilizing distributed-type computation framework analyzing and processing capability and characteristics of parallel computation to compile a MapReduce program to enable source data to generate an Hfile file; using a completebulkload order to complete importing of the data, and importing the data into the HBase list according to a preset format. By the method, a well-generated HFile file can be directly loaded into a running HBase cluster, so that network traffic generated during data transmission and HBase loading in the process of data migration is reduced, data importing efficiency is improved, and CPU (central processing unit) and network resources are saved.
Description
Technical field
The present invention relates to the introduction method that HBase loads data.
Technical background
Along with the develop rapidly of network technology, the rapid growth of data volume, in order to analyze and to utilize these huge data resources, traditional technology has run into huge obstacle already, cannot be competent at the task of large data analysis.And in order to meet the requirement of large data analysis, Google has proposed MapReduce technology, it is a kind of programming model towards large-scale data analyzing and processing and parallel computation.In the required technology of large data, distributed file system, distributed data base etc. is all the technology that is applicable to large data.HBase be a kind of extendible, support large-scale distributed database.It is to utilize Hadoop HDFS as its document storage system.Because it is with good expansibility, fault-tolerance, and random reading capability, support MapReduce parallel computation, by increasing company, accepted.But find after deliberation, the data importing instrument carrying in HBase has certain limitation, and it can not make user control data loading procedure completely, and the expection form can not self-defining data loading.Therefore it is very important, loading the introduction method that the HBase with specific format loads data.
The Bulk load carrying in HBase at present supports mass data to be loaded in HBase efficiently.Bulk load realizes by a MapReduce Job, and the inside HFile formatted file that directly generates a HBase by Job forms a special HBase tables of data, then directly data file is loaded in the cluster of operation.Use the simplest mode of bulk load function to use exactly importtsv instrument.Importtsv is a built-in tool from the direct loading content of TSV file to HBase.It is by a MapReduce Job of operation, the table of HBase that data are write direct from TSV file or write the own formatted data file of a HBase.
Although importtsv instrument is very useful when needs import HBase by text data, but there is certain situation, such as importing the data of extended formatting, you can wish to carry out generated data with programming, and MapReduce processes the most effective mode of mass data.This may be also in HBase, to have loaded mass data the most feasible unique method.Certainly we can use MapReduce to import data to HBase, but the data set of magnanimity can make MapReduce Job also become very heavy.If deal with improperly, may make the handling capacity in job when operation of MapReduce very little.
Summary of the invention
The technical problem to be solved in the present invention is: the Region quantity according in rational design HBase table, makes the pre-data uniform distribution importing in cluster.Adopt again the programming model of distributed computing framework MapReduce again to realize the HFile file that Map interface and Reduce interface obtain the specific format of expection, then utilize completebulkload instrument that file is loaded in HBase according to expection form.
In HBase, data merging is a frequent write operation task of carrying out, unless we can generate the internal data file of HBase, and directly loads.Although the writing speed of HBase is always very fast like this, if merging process does not have suitable configuration, also likely cause write operation often to get clogged.Another problem that the task that write operation is very heavy may cause is exactly that data have been write to identical group's server (region server), and this situation often appears at mass data is imported in a newly-built HBase.Once data centralization is at identical server, it is uneven that whole cluster just becomes, and writing rate can reduce significantly.Therefore,, first by predistribution Region, its fundamental purpose is HBase to be imported to data build cluster before, and can make the data uniform distribution importing in cluster.Then by MapReduce program, produce specific document format data.Finally HFile file is directly loaded in HBase.Aforesaid way will be guaranteed the preallocated rationality of Region, MapReduce program design and the rationality of writing.The method can improve importing efficiency, and supports parallel computation, therefore more efficient.
The technical solution adopted in the present invention is:
A kind of HBase loads the introduction method of data, first in Region predistribution, set environment and configuration parameter in cluster, then according to writing, determine that the function of Region quantity creates HBase table, treat that Region predistribution finishes, utilize the feature of distributed computing framework analyzing and processing ability and parallel computation to write MapReduce program source data is generated to Hfile file, finally, by complete the importing of data with completebulkload order, data have been imported to HBase table according to predetermined form.Can improve importing efficiency like this.The introduction method of these loading data is mainly realized by MapReduce module.
Each row of data in HBase all belongs to a specific Region, and a Region has comprised according to the HBase data line of sequence number sequence, and it is managed by RegionServer.Create after described HBase table, this table can start in an independent Region, first the data of all insertions can enter in this Region, when reaching a limit, data can be split into two Region, separated Region can be distributed on other Region Server, to reach the load balancing in cluster.Therefore, when data importing, first predistribution Region, is distributed to data in whole cluster and reaches load balancing with suitable algorithm, accelerates the speed that data load.
Described, write in MapReduce program, MapReduce framework is responsible for data to divide, using a storage block Block of file as a division, then extract the key-value pair set <K1 of the record in dividing, V1> inputs as Map, in the mapper of appointment class, by the form of row data-switching appointment of input, Map module is according to key-value pair conversion row data and generate row key, and the specify columns Praenomen title that claims and be listed as; In map method, set up Put object, by Put.add () function, the data after conversion are added in Put object, call context.write () method data are write in intermediate file; Then according to rowkey and Put object, generate middle key-value pair <rowkey, put>, and intermediate result is write to local disk.Reduce module is according to the position that obtains intermediate result from Master, by remote interface reading out data data are write and front row arranged from carry out the disk of TaskTracker of Map task, meet the output format of expection, thereby draw last Output rusults HFile file.
In Reduce class, can the result after processing be outputed in the file of appointment by the map output file form of user program appointment according to these records of processing of reduce method iteration, by setOutputPath function, set the path of output.
Described Region predistribution is according to the number that imports the data volume of HBase and the make out the scale Region of distributed type assemblies, then by data volume, design in advance and distribute Region, can significantly reduce the number of times of Region Split, Split not even, the object of load balancing while reaching data importing.
Described MapReduce programming framework, is the processing procedure that obtains HFile formatted file, Map resume module according to data layout, obtain appropriate design rowkey, then deal with data obtains intermediate result; In Reducer module, be organized into rational data layout, finally by HFile file output in the outgoing route of appointment; This process can arrange a plurality of map quantity, has improved treatment effeciency, has greatly promoted performance.Programming trouble when MapReduce framework has been simplified concurrent processor, provides the DLL (dynamic link library) traveling through.
Described introduction method is realized by writing MapReduce program: first in principal function, create a Job example, input path, outgoing route, mapper class, reducer class and the key of map output and the type of value of this example are set, then the configuration of HBase is set, as the node of zookeeper in cluster-specific.Again, according to the configuration arranging, set up HBase table, finally output is set as to HFileOutputFormat and can generates HFile file.
Beneficial effect of the present invention is:
What the present invention adopted is before data are transferred to HBase, to allocate the effect that Region quantity reaches cluster load balancing in advance, then utilizes MapReduce programming to generate HFile file, has computation capability.The method can directly load generated HFile file into operating HBase cluster.The network traffics that produce when data transmission and HBase load have so just been reduced in data migration process.This method has improved data importing efficiency simultaneously, saves CPU and Internet resources.
Accompanying drawing explanation
Fig. 1 is the flowchart of HBase data importing;
Fig. 2 is the preallocated sequential chart of Region in HBase data importing.
Embodiment
With reference to the accompanying drawings, in conjunction with the embodiments to the detailed description of the invention.
Embodiment 1:
In the present embodiment, first according to the environment of the pre-data that import and cluster, reasonably calculate the number of predistribution Region, the environment of existing cluster is 8 station servers, internal memory 96G, operating system is centos6.3, installs the assembly of cluster as HDFS, MapReduce, HBase etc. according to installation steps.The data file of this importing is 10,000,000,000 data, and data layout the following is: A75566620131107,121212,33333.The flowchart of HBase data importing as shown in Figure 1, first rationally arranges the parameter in cluster, then designs the preallocated algorithm of Region, guarantees data load balance.Then according to the preallocated quantity of Region, data are generated to HFile file by MapReduce program, finally utilize completebulkload order that the HFile file of predetermined form has been loaded in HBase table, complete the importing of data.
Embodiment 2:
The present embodiment is predistribution Region, as shown in Figure 2: first configure correlation parameter, then according to the HBase table of the Region number in Getsplit function creation algorithm for design.In this algorithm, be to comprise letter and date according to pre-importing in data layout, environment and the data scale of considering cluster design preallocated Region number again, letter and combination of numbers are used as dividing Region scope, and the scope of letter is A-Z, and the scope of numeral is that 01-12 represented for 12 month.Can there is like this 24*12=288 region to load uniformly these data, and because each letter and the data in month are uniformly, so the data in each region are also uniform.Can guarantee that like this load in Region is balanced, there is no the Region that load is heavy especially, also there is no the Region that load is light especially.
Embodiment 3:
HBase loads the introduction method of data and realizes by writing MapReduce program.First new configuration () in principal function, then creates a Job example new job () according to conf.Input path, outgoing route, mapper class, reducer class and the key of map output and the type of value of this example are set.Thereupon by the configuration of set () function setup HBase, as the node of zookeeper in cluster-specific.Again, according to the configuration arranging, set up HBase table, finally according to HFileOutputFormat.configureIncrementalLoad (job, htable) method, output is set as generating HFile file, wherein htable creates by new HTable ().
Embodiment 4:
In the mapper of appointment class by the form of row data-switching appointment of input, by to the rational rowkey of the operational design of row data, letter and numeral are reasonably arranged in together, and Region is corresponding with predistribution, the title that specify columns Praenomen claims and is listed as simultaneously.In map method, set up Put object, by Put.add () method, the data after conversion are added in Put object.Then calling context.write () method writes data in intermediate file.Can be by the map output file form of user program appointment according to these records of processing of reduce method iteration in Reduce class, by Iterator<Put> iter=puts.iterator () iteration value value is added in map, TreeSet<KeyValue> map=new TreeSet<KeyValue> (KeyValue.COMPARATOR) wherein.Then row and kv are write by context.write (row, kv).Finally the result after processing is outputed in the file of appointment, can set by setOutputPath function the path of output.Utilize completebulkload order that the HFile file of predetermined form has been loaded in HBase table, complete the importing of data.
In the operational process of program, some bottleneck and obstacle that the daily record that can generate by monitoring interface, Hadoop or Hbase exists while going to monitor the MapReduce loading data in cluster.According to the prompting showing in daily record, can adjust the environmental parameter in corresponding configuration parameter and cluster, as revised map quantity and reduce quantity, to make it the efficiency of operation higher; Adjust JVM storehouse size and memory size etc.By suitable modification configuration parameter, can improve the ability of cpu busy percentage and parallel computation, improve data importing efficiency, save Internet resources.
Claims (6)
1. a HBase loads the introduction method of data, it is characterized in that: first in Region predistribution, set environment and configuration parameter in cluster, then according to writing, determine that the function of Region quantity creates HBase table, treat that Region predistribution finishes, utilize the feature of distributed computing framework analyzing and processing ability and parallel computation to write MapReduce program source data is generated to Hfile file, finally, by complete the importing of data with completebulkload order, data have been imported to HBase table according to predetermined form.
2. a kind of HBase according to claim 1 loads the introduction method of data, it is characterized in that: create after described HBase table, this table can start in an independent Region, first the data of all insertions enter in this Region, data are split into two Region while reaching a limit, separated Region is distributed on other Region Server, to reach the load balancing in cluster.
3. a kind of HBase according to claim 1 loads the introduction method of data, it is characterized in that: in the described MapReduce of writing program, MapReduce framework is responsible for data to divide, using a storage block Block of file as a division, then extract the key-value pair set <K1 of the record in dividing, V1> inputs as Map, in the mapper of appointment class by the form of row data-switching appointment of input, Map module is according to key-value pair conversion row data and generate row key, and the specify columns Praenomen title that claims and be listed as, in map method, set up Put object, by Put.add () function, the data after conversion are added in Put object, call context.write () method data are write in intermediate file, then according to rowkey and Put object, generate middle key-value pair <rowkey, put>, and intermediate result is write to local disk, Reduce module is according to the position that obtains intermediate result from Master, by remote interface reading out data data are write and front row arranged from carry out the disk of TaskTracker of Map task, meet the output format of expection, thereby draw last Output rusults HFile file.
4. a kind of HBase according to claim 1 loads the introduction method of data, it is characterized in that: described Region predistribution is according to the number that imports the data volume of HBase and the make out the scale Region of distributed type assemblies, then by data volume, design in advance and distribute Region, can significantly reduce the number of times of Region Split, Split not even, the object of load balancing while reaching data importing.
5. a kind of HBase according to claim 3 loads the introduction method of data, it is characterized in that: described MapReduce programming framework, it is the processing procedure that obtains HFile formatted file, Map resume module according to data layout, obtain appropriate design rowkey, then deal with data obtains intermediate result; In Reducer module, be organized into rational data layout, finally by HFile file output in the outgoing route of appointment.
6. according to a kind of HBase described in the above-mentioned arbitrary claim of claim, load the introduction method of data, it is characterized in that, described introduction method is to realize by writing MapReduce program: first in principal function, create a Job example, input path, outgoing route, mapper class, reducer class and the key of map output and the type of value of this example are set, then the configuration of HBase is set, again, according to the configuration arranging, set up HBase table, finally output is set as to HFileOutputFormat and can generates HFile file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310584702.4A CN103617211A (en) | 2013-11-20 | 2013-11-20 | HBase loaded data importing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310584702.4A CN103617211A (en) | 2013-11-20 | 2013-11-20 | HBase loaded data importing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103617211A true CN103617211A (en) | 2014-03-05 |
Family
ID=50167914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310584702.4A Pending CN103617211A (en) | 2013-11-20 | 2013-11-20 | HBase loaded data importing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103617211A (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104077420A (en) * | 2014-07-21 | 2014-10-01 | 北京京东尚科信息技术有限公司 | Method and device for importing data into HBase database |
CN104239542A (en) * | 2014-09-22 | 2014-12-24 | 浪潮(北京)电子信息产业有限公司 | System and method for capturing data from source distributed database |
CN104252535A (en) * | 2014-09-16 | 2014-12-31 | 福建新大陆软件工程有限公司 | Hbase-based data hash processing method and device |
CN104516985A (en) * | 2015-01-15 | 2015-04-15 | 浪潮(北京)电子信息产业有限公司 | Rapid mass data importing method based on HBase database |
CN104598562A (en) * | 2015-01-08 | 2015-05-06 | 浪潮软件股份有限公司 | XML file processing method and device based on MapReduce parallel computing model |
CN105205154A (en) * | 2015-09-24 | 2015-12-30 | 浙江宇视科技有限公司 | Data migration method and device |
CN105550296A (en) * | 2015-12-10 | 2016-05-04 | 深圳市华讯方舟软件技术有限公司 | Data importing method based on spark-SQL big data processing platform |
CN105630896A (en) * | 2015-12-21 | 2016-06-01 | 浪潮集团有限公司 | Method for quickly importing mass data |
CN105808577A (en) * | 2014-12-29 | 2016-07-27 | 北京神州泰岳软件股份有限公司 | HBase database-based data batch loading method and device |
CN105893521A (en) * | 2016-03-31 | 2016-08-24 | 南京烽火软件科技有限公司 | Reading-and-writing separation HBase warehousing method |
CN105988995A (en) * | 2015-01-27 | 2016-10-05 | 杭州海康威视数字技术股份有限公司 | HFile based data batch loading method |
CN106055678A (en) * | 2016-06-07 | 2016-10-26 | 国网河南省电力公司电力科学研究院 | Hadoop-based panoramic big data distributed storage method |
CN106648934A (en) * | 2016-12-27 | 2017-05-10 | 中科天玑数据科技股份有限公司 | Method and system for high-efficiency data transmission between Impala and HBase |
CN106897450A (en) * | 2017-03-03 | 2017-06-27 | 郑州云海信息技术有限公司 | A kind of method that HBase is quickly introduced based on HDFS mass datas |
CN107016039A (en) * | 2017-01-06 | 2017-08-04 | 阿里巴巴集团控股有限公司 | The method and Database Systems of database write-in |
CN108255966A (en) * | 2017-12-25 | 2018-07-06 | 太极计算机股份有限公司 | A kind of data migration method and storage medium |
CN108494589A (en) * | 2018-03-14 | 2018-09-04 | 北京思特奇信息技术股份有限公司 | A kind of management method and system of distribution Nginx servers |
CN109271365A (en) * | 2018-09-19 | 2019-01-25 | 浪潮软件股份有限公司 | A method of based on Spark memory techniques to HBase database acceleration reading/writing |
CN109445795A (en) * | 2018-09-14 | 2019-03-08 | 厦门天锐科技股份有限公司 | Data processing method when multiple asynchronous call same request of data in call back function |
CN109614140A (en) * | 2018-12-17 | 2019-04-12 | 泰康保险集团股份有限公司 | Configuration data processing method and device, electronic equipment, storage medium |
CN109657009A (en) * | 2018-12-21 | 2019-04-19 | 北京锐安科技有限公司 | The pre- partitioned storage periodic table creation method of data, device, equipment and storage medium |
CN109918425A (en) * | 2017-12-14 | 2019-06-21 | 北京京东尚科信息技术有限公司 | A kind of method and system realized data and import non-relational database |
CN110990394A (en) * | 2018-09-28 | 2020-04-10 | 杭州海康威视数字技术股份有限公司 | Distributed column database table-oriented line number statistical method and device and storage medium |
CN112667593A (en) * | 2020-12-27 | 2021-04-16 | 武汉达梦数据库股份有限公司 | Method and device for ETL (extract transform and load) flow to execute hbase fast loading |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521246A (en) * | 2011-11-11 | 2012-06-27 | 国网信息通信有限公司 | Cloud data warehouse system |
CN102685221A (en) * | 2012-04-29 | 2012-09-19 | 华北电力大学(保定) | Distributed storage and parallel mining method for state monitoring data |
CN102750367A (en) * | 2011-12-29 | 2012-10-24 | 中华电信股份有限公司 | Big data checking system and method thereof on cloud platform |
CN103049556A (en) * | 2012-12-28 | 2013-04-17 | 中国科学院深圳先进技术研究院 | Fast statistical query method for mass medical data |
US20130185337A1 (en) * | 2012-01-18 | 2013-07-18 | Cloudera, Inc. | Memory allocation buffer for reduction of heap fragmentation |
US20130282668A1 (en) * | 2012-04-20 | 2013-10-24 | Cloudera, Inc. | Automatic repair of corrupt hbases |
-
2013
- 2013-11-20 CN CN201310584702.4A patent/CN103617211A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521246A (en) * | 2011-11-11 | 2012-06-27 | 国网信息通信有限公司 | Cloud data warehouse system |
CN102750367A (en) * | 2011-12-29 | 2012-10-24 | 中华电信股份有限公司 | Big data checking system and method thereof on cloud platform |
US20130185337A1 (en) * | 2012-01-18 | 2013-07-18 | Cloudera, Inc. | Memory allocation buffer for reduction of heap fragmentation |
US20130282668A1 (en) * | 2012-04-20 | 2013-10-24 | Cloudera, Inc. | Automatic repair of corrupt hbases |
CN102685221A (en) * | 2012-04-29 | 2012-09-19 | 华北电力大学(保定) | Distributed storage and parallel mining method for state monitoring data |
CN103049556A (en) * | 2012-12-28 | 2013-04-17 | 中国科学院深圳先进技术研究院 | Fast statistical query method for mass medical data |
Non-Patent Citations (4)
Title |
---|
JOSHUASABRINA: "提升HBase写性能", 《JOSHUASABRINA.ITEYE.COM/BLOG/1798239》 * |
NUOLINE: "hbase的预分配region", 《BLOG.CSDN.NET/NUOLINE/ARTICLE/DETAILS/8610794》 * |
学步: "HBase自动分区(Auto-Sharding)", 《BLOG.SINA.COM.CN/S/BLOG_9CEE0FD901018VU2.HTML》 * |
程佳: "一种基于Hadoop的RDF数据划分与存储研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104077420B (en) * | 2014-07-21 | 2017-05-03 | 北京京东尚科信息技术有限公司 | Method and device for importing data into HBase database |
CN104077420A (en) * | 2014-07-21 | 2014-10-01 | 北京京东尚科信息技术有限公司 | Method and device for importing data into HBase database |
CN104252535A (en) * | 2014-09-16 | 2014-12-31 | 福建新大陆软件工程有限公司 | Hbase-based data hash processing method and device |
CN104239542A (en) * | 2014-09-22 | 2014-12-24 | 浪潮(北京)电子信息产业有限公司 | System and method for capturing data from source distributed database |
CN104239542B (en) * | 2014-09-22 | 2017-11-17 | 浪潮(北京)电子信息产业有限公司 | A kind of system and method for source distribution formula database capture data |
CN105808577A (en) * | 2014-12-29 | 2016-07-27 | 北京神州泰岳软件股份有限公司 | HBase database-based data batch loading method and device |
CN105808577B (en) * | 2014-12-29 | 2019-08-20 | 北京神州泰岳软件股份有限公司 | A kind of method and apparatus of the batch data storage based on HBase database |
CN104598562A (en) * | 2015-01-08 | 2015-05-06 | 浪潮软件股份有限公司 | XML file processing method and device based on MapReduce parallel computing model |
CN104516985A (en) * | 2015-01-15 | 2015-04-15 | 浪潮(北京)电子信息产业有限公司 | Rapid mass data importing method based on HBase database |
CN105988995A (en) * | 2015-01-27 | 2016-10-05 | 杭州海康威视数字技术股份有限公司 | HFile based data batch loading method |
CN105988995B (en) * | 2015-01-27 | 2019-05-24 | 杭州海康威视数字技术股份有限公司 | A method of based on HFile batch load data |
CN105205154A (en) * | 2015-09-24 | 2015-12-30 | 浙江宇视科技有限公司 | Data migration method and device |
CN105205154B (en) * | 2015-09-24 | 2021-06-22 | 浙江宇视科技有限公司 | Data migration method and device |
CN105550296A (en) * | 2015-12-10 | 2016-05-04 | 深圳市华讯方舟软件技术有限公司 | Data importing method based on spark-SQL big data processing platform |
CN105550296B (en) * | 2015-12-10 | 2018-10-30 | 深圳市华讯方舟软件技术有限公司 | A kind of data lead-in method based on spark-SQL big data processing platforms |
CN105630896A (en) * | 2015-12-21 | 2016-06-01 | 浪潮集团有限公司 | Method for quickly importing mass data |
CN105893521A (en) * | 2016-03-31 | 2016-08-24 | 南京烽火软件科技有限公司 | Reading-and-writing separation HBase warehousing method |
CN106055678A (en) * | 2016-06-07 | 2016-10-26 | 国网河南省电力公司电力科学研究院 | Hadoop-based panoramic big data distributed storage method |
CN106648934B (en) * | 2016-12-27 | 2019-12-03 | 中国科学院计算技术研究所 | A kind of efficient data transfer method and system between Impala and HBase |
CN106648934A (en) * | 2016-12-27 | 2017-05-10 | 中科天玑数据科技股份有限公司 | Method and system for high-efficiency data transmission between Impala and HBase |
CN107016039B (en) * | 2017-01-06 | 2020-11-03 | 创新先进技术有限公司 | Database writing method and database system |
CN107016039A (en) * | 2017-01-06 | 2017-08-04 | 阿里巴巴集团控股有限公司 | The method and Database Systems of database write-in |
CN106897450A (en) * | 2017-03-03 | 2017-06-27 | 郑州云海信息技术有限公司 | A kind of method that HBase is quickly introduced based on HDFS mass datas |
CN109918425A (en) * | 2017-12-14 | 2019-06-21 | 北京京东尚科信息技术有限公司 | A kind of method and system realized data and import non-relational database |
CN108255966A (en) * | 2017-12-25 | 2018-07-06 | 太极计算机股份有限公司 | A kind of data migration method and storage medium |
CN108494589B (en) * | 2018-03-14 | 2021-05-14 | 北京思特奇信息技术股份有限公司 | Management method and system of distributed Nginx server |
CN108494589A (en) * | 2018-03-14 | 2018-09-04 | 北京思特奇信息技术股份有限公司 | A kind of management method and system of distribution Nginx servers |
CN109445795A (en) * | 2018-09-14 | 2019-03-08 | 厦门天锐科技股份有限公司 | Data processing method when multiple asynchronous call same request of data in call back function |
CN109271365A (en) * | 2018-09-19 | 2019-01-25 | 浪潮软件股份有限公司 | A method of based on Spark memory techniques to HBase database acceleration reading/writing |
CN110990394A (en) * | 2018-09-28 | 2020-04-10 | 杭州海康威视数字技术股份有限公司 | Distributed column database table-oriented line number statistical method and device and storage medium |
CN110990394B (en) * | 2018-09-28 | 2023-10-20 | 杭州海康威视数字技术股份有限公司 | Method, device and storage medium for counting number of rows of distributed column database table |
CN109614140B (en) * | 2018-12-17 | 2022-02-08 | 泰康保险集团股份有限公司 | Configuration data processing method and device, electronic equipment and storage medium |
CN109614140A (en) * | 2018-12-17 | 2019-04-12 | 泰康保险集团股份有限公司 | Configuration data processing method and device, electronic equipment, storage medium |
CN109657009A (en) * | 2018-12-21 | 2019-04-19 | 北京锐安科技有限公司 | The pre- partitioned storage periodic table creation method of data, device, equipment and storage medium |
CN109657009B (en) * | 2018-12-21 | 2021-03-12 | 北京锐安科技有限公司 | Method, device, equipment and storage medium for creating data pre-partition storage periodic table |
CN112667593A (en) * | 2020-12-27 | 2021-04-16 | 武汉达梦数据库股份有限公司 | Method and device for ETL (extract transform and load) flow to execute hbase fast loading |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103617211A (en) | HBase loaded data importing method | |
US8984516B2 (en) | System and method for shared execution of mixed data flows | |
Zhao et al. | Dache: A data aware caching for big-data applications using the MapReduce framework | |
US9336288B2 (en) | Workflow controller compatibility | |
US8677366B2 (en) | Systems and methods for processing hierarchical data in a map-reduce framework | |
Chen et al. | A study of SQL-on-Hadoop systems | |
Lai et al. | Towards a framework for large-scale multimedia data storage and processing on Hadoop platform | |
CN103646073A (en) | Condition query optimizing method based on HBase table | |
KR20150092586A (en) | Method and Apparatus for Processing Exploding Data Stream | |
CN106611037A (en) | Method and device for distributed diagram calculation | |
CN109408493A (en) | A kind of moving method and system of data source | |
CN107016039B (en) | Database writing method and database system | |
US10585897B2 (en) | Reducing redundant operations in a streaming environment | |
CN112052011A (en) | Method and device for combining small programs, electronic equipment and medium | |
CN106570151A (en) | Data collection processing method and system for mass files | |
CN112506887A (en) | Vehicle terminal CAN bus data processing method and device | |
Gupta et al. | Efficient query analysis and performance evaluation of the NoSQL data store for bigdata | |
Tseng et al. | A successful application of big data storage techniques implemented to criminal investigation for telecom | |
Yu et al. | Design and implementation of business access control in new generation power grid dispatching and control system | |
Bodepudi | Data Transfer Between RDBMS and HDFS By Using The Spark Framework In Sqoop For Better Performance | |
Aziz et al. | Big data optimisation among RDDs persistence in apache spark | |
CN113360494B (en) | Wide-table data generation method, updating method and related device | |
Junwei et al. | Architecture for component library retrieval on the cloud | |
Vengadeswaran et al. | Grouping-aware data placement in hdfs for data-intensive applications based on graph clustering | |
Zhang et al. | Design and Implementation of Telecom Offline Data Integrated Processing Based on Hadoop Architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140305 |