CN104572895B - MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method - Google Patents

MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method Download PDF

Info

Publication number
CN104572895B
CN104572895B CN201410820059.5A CN201410820059A CN104572895B CN 104572895 B CN104572895 B CN 104572895B CN 201410820059 A CN201410820059 A CN 201410820059A CN 104572895 B CN104572895 B CN 104572895B
Authority
CN
China
Prior art keywords
data
export
mpp
hadoop
sql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410820059.5A
Other languages
Chinese (zh)
Other versions
CN104572895A (en
Inventor
陈雨
夏旭东
崔维力
武新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Original Assignee
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd filed Critical TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority to CN201410820059.5A priority Critical patent/CN104572895B/en
Publication of CN104572895A publication Critical patent/CN104572895A/en
Application granted granted Critical
Publication of CN104572895B publication Critical patent/CN104572895B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method, the method of the intercommunication of data and the method that data interchange is carried out by TXT transfers between MPP databases and Hadoop clusters are directly carried out using data interchange instrument, data directly export (importing) to Hadoop clusters by MPP databases, without passing through the memory cell transfer outside MPP databases and Hadoop clusters, so that export process is more efficient, such as need Hadoop clusters to carry out the after-treatment of data, TXT form transfer modes may be selected.The present invention can solve the problem that between MPP databases and Hadoop business data can not intercommunication the problem of, realize the mashed up of two kinds of business platforms of MPP databases and Hadoop.

Description

MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method
Technical field
The present invention relates to belong to distributed data base field, more particularly to a kind of MPP databases and Hadoop company-datas Interoperability methods, instrument and its implementation.
Background technology
Before internet appearance, data are mainly produced by man-machine conversation's mode, based on structural data.For this Kind affairs type data, additions and deletions of the end user to data, which change to look into, more to be paid close attention to, and corresponding data processing is referred to as OLTP (Online Transaction Processing, Transaction Processing).Traditional Relational DataBase (RDBMS) is mainly towards this need Ask what is designed and develop, and critical role was occupied between past 30 years.During this period, data increases slowly, and compare between system Relatively isolated, traditional database can meet types of applications demand substantially.
With the appearance and fast development of internet, the especially rapid development of mobile Internet in recent years, data source There occurs qualitative change.Data are automatically generated by equipment, server, various applications, and these data are with unstructured, half hitch Structure turns to master, and growth rate is in geometry level.For this categorical data (being referred to as big data), the less execution of end user is to data Additions and deletions change operation, be more concerned with obtaining data by database with prestissimo, and data are arranged, alternate analysis and Depth is excavated, and produces report and the prediction to data etc..Corresponding data processing is referred to as OLAP (Online Analytical Processing, on-line analytical processing).
Traditional database is analyzed this kind of demand for big data and all almost felt simply helpless in technology and function, with data Source and the change to data processing needs, it has been found that single platform meets that all application demands are not real enough, and opens Begin to select most suitable product and technology according to application demand, data characteristicses and magnitude.The technology path of data processing field From traditional database (OldSQL) rule all the land situation moved towards subdivision development, become at this stage by OldSQL, NewSQL and NoSQL polymorphic types support the situation that multiclass is applied jointly.
NewSQL types of database is primarily referred to as MPP (Massively Parallel Processing, large-scale parallel Processing) framework advanced database cluster, emphasis Industry-oriented big data, using Shared Nothing frameworks, deposited by row The multinomial big data treatment technologies such as storage, coarseness index, in conjunction with the efficient distributed computing model of MPP frameworks, are completed to dividing The support of class application is analysed, running environment is mostly inexpensive PC Server, has the characteristics of high-performance and high scalability, in enterprise Analysis classes application field is widely applied.
NoSQL types are primarily referred to as technology extension and encapsulation based on Hadoop, and the big of correlation is derived around Hadoop Data technique, storage and calculating for tackling traditional Relational DataBase half/unstructured data difficult to deal with etc..At present Application scenarios the most typical are exactly big data storage, the analysis by extending and encapsulating Hadoop to realize to internet arena Support.For the processing of half/unstructured data, complicated ETL (Exract-Transform-Load, extraction-conversion-dress Carrying) flow, complicated data mining and computation model, Hadoop be more good at.
In summary, for data between MPP databases and Hadoop business can not intercommunication the problem of, the invention provides one The mode of the method that kind supports MPP databases and Hadoop data interchanges, wherein both direct intercommunications, data transmission efficiency is very Height, it is one of mashed up premise of two kinds of business platforms of MPP databases and Hadoop.
The content of the invention
The problem to be solved in the present invention be directed between MPP databases and Hadoop business can not intercommunication the problem of, propose A kind of MPP databases and Hadoop company-datas interoperability methods and data interchange instrument.It is in order to solve the above technical problems, of the invention The technical scheme of use is:A kind of MPP databases and Hadoop company-data interoperability methods, including
(1) data are directly exported to Hadoop clusters by MPP databases using data interchange instrument, or data are by MPP Database is exported to Hadoop clusters by TXT transfers;
(2) data are introduced directly into MPP databases by Hadoop clusters using data interchange instrument, or data by Hadoop clusters are directed into MPP databases by TXT transfers;
Wherein, the data interchange instrument include main control module, Command Line Parsing module, connector, export import scheduler, Worker thread, log pattern, SQL structures module, worker thread pond;
The main control module is used to start the feedback that other modules work and receive other modules;
The Command Line Parsing module is used to the information of user's input being parsed into information recognizable inside program;
The connector is used to realize the connection with MPP databases;
The export imports the export and importing work that scheduler is used to complete data;
The worker thread is used to handle export importing operation;
The log pattern is used to create instrument running log overall situation example;
The SQL structures module is used to build export importing SQL;
The worker thread pond is used to obtain worker thread.
Further, it is described directly to be exported data to Hadoop clusters by MPP databases using data interchange instrument Step is:
(1) data interchange instrument start-up;
(2) status checkout, data interchange instrument carry out status checkout, MPP numbers to MPP data base set pocket transmission sql commands After receiving status checkout sql command according to storehouse cluster, connect Hadoop clusters and check the writeable of Hadoop cluster assigned catalogues Enter state, MPP data-base clusters check itself each node state and each data fragmentation state;
(3) metadata is exported, data tool sends export metadata sql command, MPP data-base clusters to MPP databases After receiving export metadata sql command, metadata is exported into Hadoop file system assigned catalogues;
(4) obtain in database and treat derived table;
(5) data are exported by table, data interchange instrument uses concurrently to be exported by table mode to the pocket transmission of MPP data base sets Sql command, after MPP data-base clusters receive table export sql command, data export operation is performed, directly exports to data The back end assigned catalogue of Hadoop clusters;
(6) export successfully, normally exit;
(7) export failure, implementation procedure are interrupted and exited.
Further, the data are exported to the step of Hadoop clusters by MPP databases by TXT transfers is:
(1) data interchange instrument start-up;
(2) status checkout, data interchange instrument carry out status checkout to MPP data base set pocket transmission sql commands.MPP numbers After receiving status checkout sql command according to storehouse cluster, itself each node state and each data fragmentation state are checked;
(3) metadata is exported, data interchange instrument sends export metadata sql command, MPP databases to MPP databases After cluster receives export metadata sql command, metadata is exported to the assigned catalogue of external storage, export form is TXT;
(4) obtain in database and treat derived table;
(5) data are exported by table, data interchange instrument uses concurrently to be exported by table mode to the pocket transmission of MPP data base sets Sql command, after MPP data-base clusters receive table export sql command, data export operation is performed, is directly exported to data outer Portion stores assigned catalogue;
(6) Hadoop imports data, and Hadoop clients are installed in the physical machine where external storage, performs Hadoop - put order, the data file of TXT forms is imported into Hadoop assigned catalogue;
(7) Hadoop imports data success, normally exits;
(8) implementation procedure is interrupted and exited.
Further, it is described to be introduced directly into data to MPP databases by Hadoop clusters using data interchange instrument Step is:
(1) data interchange instrument start-up;
(2) status checkout, data interchange instrument carry out status checkout, MPP numbers to MPP data base set pocket transmission sql commands After receiving status checkout sql command according to storehouse cluster, connect Hadoop clusters and check the readable of Hadoop cluster assigned catalogues Take state;MPP data-base clusters check itself each node state simultaneously;
(3) metadata is imported, import tool sends to MPP databases and imports metadata sql command, MPP data-base clusters Receive after importing metadata sql command, metadata is imported by Hadoop file system assigned catalogue;
(4) table to be imported in database is obtained;
(5) data are imported by table, data interchange instrument uses concurrently to be imported by table mode to the pocket transmission of MPP data base sets Sql command, after MPP data-base clusters receive table importing sql command, data import operation is performed, directly accesses Hadoop clusters Back end import data to MPP databases;
(6) import successfully, normally exit;
(7) failure is imported, implementation procedure is interrupted and exited.
Further, the step of data are directed into MPP databases by Hadoop clusters by TXT transfers be:
(1) Hadoop exports data, and Hadoop clients are installed in the physical machine where external storage, performs Hadoop - get order, by Hadoop assigned catalogue export TXT forms data file into external storage assigned catalogue;
(2) data interchange instrument start-up;
(3) status checkout, data interchange instrument carry out status checkout, MPP numbers to MPP data base set pocket transmission sql commands After receiving status checkout sql command according to storehouse cluster, itself each node state is checked;
(4) metadata is imported, data interchange instrument sends to MPP databases and imports metadata sql command, MPP databases After cluster receives importing metadata sql command, metadata is imported by the assigned catalogue of external storage;
(5) all tables in database are obtained;
(6) performed by table and import data, data interchange instrument is used by table mode concurrently to MPP data base set pocket transmissions Sql command is imported, after MPP data-base clusters receive table importing sql command, data import operation is performed, by the finger of external storage Determine catalogue and import data;
(7) import successfully, normally exit;
(8) implementation procedure is interrupted and exited.
Further, screening export is supported when data export in MPP databases, mode derived from screening is input tape The SQL statement of where conditions.
A kind of MPP databases and Hadoop company-data intercommunication instruments, including main control module, Command Line Parsing module, connection Device, export import scheduler, worker thread, log pattern, SQL structures module, worker thread pond;The main control module with it is described Command Line Parsing module, worker thread pond, export scheduler connection;The log pattern is built with the Command Line Parsing module, SQL Module, worker thread, worker thread pond, connector connection;The export import scheduler and the connector, worker thread, Worker thread pond, SQL structure module connections;The connector is connected with the worker thread;The worker thread and the SQL Build module connection;
The main control module is used to start the feedback that other modules work and receive other modules;
The Command Line Parsing module is used to the information of user's input being parsed into information recognizable inside program;
The connector is used to realize the connection with MPP databases;
The export imports the export and importing work that scheduler is used to complete data;
The worker thread is used to handle export importing operation;
The log pattern is used to create instrument running log overall situation example;
The SQL structures module is used to build export importing SQL;
The worker thread pond is used to obtain worker thread.
A kind of MPP databases and Hadoop company-data intercommunication instrument implementation methods, comprise the following steps:
(1) user inputs by start up with command-line options instrument and therewith configuration information, and main control module starts with instrument start-up, Main control module creates instrument running log overall situation example by log pattern first after starting, and then completes the initial of other modules Chemical industry is made;
(2) main control module receives the character string forms configuration information of user's input, and the information is passed into Command Line Parsing mould Block, configuration is inputted to user and further parsed;
(3) user inputs character string form configuration information is parsed into program inside and can recognize that by Parameter analysis of electrochemical module matches somebody with somebody confidence Breath, and it is returned to main control module;
(6) main control module starts export importing scheduler, and import scheduler by export imports work to complete export;
(7) export imports scheduler and creates main connector example, and connects MPP databases by main connector;(6) export Import scheduler and module construction status checkout SQL is built by SQL, status checkout is performed by main connector;
(7) export imports scheduler and builds module construction export importing metadata SQL by SQL, is held by main connector Row export imports metadata;
(8) export imports scheduler and builds module construction by SQL and inquire about and needs to be exported importing table SQL, passes through main company Connect device and perform to inquire about and need to be exported importing table, acquisition needs to be exported importing table;
(9) export imports scheduler and creates job scheduling daily record overall situation example by log pattern;
(10) export imports scheduler and obtains worker thread by thread pool module, and quantity imports degree of parallelism equal to export and matched somebody with somebody Put, create the working connectors of respective amount, and each worker thread distributes a working connectors, then starts all works Industry, each worker thread parallel processing export import operation;Wherein, the export of single table, which imports, is referred to as operation, and job content includes: The first step, MPP databases are connected by working connectors, second step, is exported by SQL structure module constructions and imports SQL, the 3rd Step performs export by working connectors and imported;
(11) export imports the export importing operation execution situation that scheduler collects each worker thread, holding after collecting Market condition arranges returns to main control module for export importing implementing result, and final export is imported result and returns to use by main control module Family.
The present invention has the advantages and positive effects of:The data realized between MPP databases and Hadoop clusters are mutual It is logical, and export/lead-in mode can be according to being actually needed flexible selection:In the case of not needing Hadoop after-treatments, it may be selected Efficient direct mode;Otherwise TXT form transfer modes may be selected.
Brief description of the drawings
Fig. 1 is the schematic diagram of a kind of MPP databases and Hadoop data interchange methods;
Fig. 2 is that data are directly exported to the schematic diagram of Hadoop clusters by MPP databases;
Fig. 3 is that data are directly exported to the schematic diagram of the specific execution step of Hadoop clusters by MPP databases;
Fig. 4 is that data are exported to the schematic diagram of Hadoop clusters by MPP databases by TXT transfers;
Fig. 5 is that the method that data are exported to Hadoop clusters by MPP databases by TXT transfers specifically performs showing for step It is intended to;
Fig. 6 is that data are introduced directly into the schematic diagram of MPP databases by Hadoop clusters;
Fig. 7 is that data are introduced directly into the schematic diagram of the specific execution step of MPP databases by Hadoop clusters;
Fig. 8 is that data are introduced directly into the schematic diagram of MPP databases by Hadoop clusters;
Fig. 9 is that data are directed into the methods of MPP databases by TXT transfers by Hadoop clusters and specifically perform showing for step It is intended to;
Figure 10 is the schematic diagram of data interchange instrument.
Embodiment
Below with reference to the accompanying drawings and the present invention is described in detail with reference to example.It should be noted that do not conflicting In the case of, the feature in embodiment and embodiment in the application can be combined with each other.
The present invention provides a kind of MPP databases and Hadoop company-data intercommunication instruments and data interchange method, including profit The method of the intercommunication of data is directly carried out between MPP databases and Hadoop clusters with data interchange instrument and by TXT The method for rotating into row data interchange.
1st, as shown in Fig. 2 data are directly exported to Hadoop clusters, the calculate node of MPP databases by MPP databases leads to The back end that data interchange instrument accesses Hadoop clusters is crossed, data are directly exported to Hadoop clusters, without passing through MPP Database and the memory cell transfer outside Hadoop clusters, so that export process is more efficient.It was specifically performed Journey is as shown in Figure 3:
Step 301, data interchange instrument start-up;
Step 302, status checkout.Data interchange instrument carries out status checkout to MPP data base set pocket transmission sql commands. After MPP data-base clusters receive status checkout sql command, connect Hadoop clusters and check Hadoop cluster assigned catalogues Writable state, MPP data-base clusters check itself each node state and each data fragmentation state;If state inspection results is not By then performing step 307, otherwise performing step 303;
Step 303, metadata is exported.Data interchange instrument sends export metadata sql command, MPP numbers to MPP databases After receiving export metadata sql command according to storehouse cluster, metadata is exported into Hadoop file system assigned catalogues.If perform mistake Then execution step 307 is lost, otherwise performs step 304;
Step 304, obtain in database and treat derived table.Data interchange instrument sends band where conditions to MPP databases Table query SQL order, represent all export without where conditions, MPP data-base clusters receive the table with where conditions and look into After asking sql command, the table order that inquiry meets where conditions is performed, step 307 is performed if failure, otherwise returned data is mutual Meet the table name of condition in logical tool database;Perform step 305;
Step 305, data are exported by table.Data interchange instrument uses concurrently to be mass-sended by table mode to MPP data base sets After sending export sql command, MPP data-base clusters to receive table export sql command, data export operation is performed, directly leads data Go out the back end assigned catalogue to Hadoop clusters;If single table export failure, skips the table and continues with next table, if Continuous N (user specifies) tables export failure, then perform step 307, otherwise continue executing with to the export of all tables and finish, then hold Row step 306;
Step 306, export successfully, normally exit;
Step 307, export failure, implementation procedure are interrupted and exited.
2nd, as shown in figure 4, data are exported to Hadoop clusters, data interchange instrument general by MPP databases by TXT transfers Data are exported the memory cell to outside MPP databases and Hadoop clusters by MPP databases, by Hadoop clients- The data of TXT textual forms are directed into Hadoop clusters by put modes by external memory unit, so that Hadoop can be right It is as shown in Figure 5 that the data of TXT textual forms carry out after-treatment its specific implementation procedure before importing:
Step 501, data interchange instrument start-up;
Step 502, status checkout.Data interchange instrument carries out status checkout to MPP data base set pocket transmission sql commands. After MPP data-base clusters receive status checkout sql command, itself each node state and each data fragmentation state are checked;If shape State inspection result is by not performing step 507 then, otherwise performing step 503;
Step 503, metadata is exported.Data interchange instrument sends export metadata sql command, MPP numbers to MPP databases After receiving export metadata sql command according to storehouse cluster, metadata is exported to the assigned catalogue of external storage, export form For TXT.Step 507 is performed if failure is performed, otherwise performs step 504;
Step 504, obtain in database and treat derived table (according to specified requirements).Export instrument is sent to MPP databases Table query SQL order with where conditions (representing all export without where conditions), MPP data-base clusters receive band After the table query SQL order of where conditions, the table order that inquiry meets where conditions is performed, step 507 is performed if failure, Otherwise the table name for meeting condition in export tool database is returned;Perform step 505;
Step 505, data are exported by table.Export instrument is used and concurrently led by table mode to MPP data base set pocket transmissions Go out sql command, after MPP data-base clusters receive table export sql command, perform data export operation, directly export to data External storage assigned catalogue;If single table export failure, skips the table and continue with next table, if continuous N (user specifies) Table export failure is opened, then performs step 507, otherwise continues executing with to the export of all tables and finishes, then execution step 506;
Step 506, Hadoop imports data.Hadoop clients are installed in the physical machine where external storage, performed Hadoop-put orders, the data file of TXT forms is imported into Hadoop assigned catalogue, if Hadoop imports data Success, data are exported to Hadoop normal terminations by MPP databases, perform step 507;Otherwise step 508 is performed;
Step 507, Hadoop imports data success, normally exits;
Step 508, implementation procedure is interrupted and exited.
3rd, as shown in fig. 6, data are introduced directly into MPP databases by Hadoop clusters, data are without passing through MPP databases With the memory cell transfer outside Hadoop clusters, and the calculate node of MPP databases directly accesses the number of Hadoop clusters According to node, so that importing process is more efficient.Its specific implementation procedure is as shown in Figure 7:
Step 701, data interchange instrument start-up;
Step 702, status checkout.Import tool carries out status checkout to MPP data base set pocket transmission sql commands.MPP After data-base cluster receives status checkout sql command, what is connected Hadoop clusters and check Hadoop cluster assigned catalogues can Reading state;MPP data-base clusters check itself each node state simultaneously;If state inspection results are not by performing step Rapid 706, otherwise perform step 703;
Step 703, metadata is imported.Import tool sends to MPP databases and imports metadata sql command, MPP databases After cluster receives importing metadata sql command, metadata is imported by Hadoop file system assigned catalogue.If failure is performed Step 707 is performed, otherwise performs step 704;
Step 704, all tables in database are obtained.Import tool sends table query SQL order, MPP numbers to MPP databases After receiving table query SQL order according to storehouse cluster, perform and inquire about all table orders, step 707 is performed if failure, otherwise return and lead Enter all table names in tool database;Perform step 705;
Step 705, performed by table and import data.Import tool uses concurrently to be mass-sended by table mode to MPP data base sets After sending importing sql command, MPP data-base clusters to receive table importing sql command, data import operation is performed, is directly accessed The back end of Hadoop clusters imports data to MPP databases;If single table imports failure, skip the table and continue with down One table, if continuous N (user specifies) tables import failure, step 706 is performed, otherwise continues executing with to all tables and has imported Finish, then perform step 706;
Step 706, import successfully, normally exit;
Step 707, failure is imported, implementation procedure is interrupted and exited.
4th, as shown in figure 8, data are directed into MPP databases by Hadoop clusters by TXT transfers, Hadoop clusters are by number TXT text modes are exported to the memory cell beyond MPP databases and Hadoop clusters according to this, then by MPP databases that TXT is literary The manner data are directed into MPP databases.Its specific implementation procedure is as shown in Figure 9:
Step 901, Hadoop exports data.Hadoop clients are installed in the physical machine where external storage, performed Hadoop-get orders, the data file of TXT forms is exported into external storage assigned catalogue by Hadoop assigned catalogue. If Hadoop exports data failure, step 908 is performed, otherwise performs step 902;
Step 902, data interchange instrument start-up, step 903 is performed;
Step 903, status checkout.Data interchange instrument carries out status checkout to MPP data base set pocket transmission sql commands. After MPP data-base clusters receive status checkout sql command, itself each node state is checked;If state inspection results are obstructed Cross, then perform step 908, otherwise perform step 904;
Step 904, metadata is imported.Data interchange instrument sends to MPP databases and imports metadata sql command, MPP numbers Received according to storehouse cluster after importing metadata sql command, metadata is imported by the assigned catalogue of external storage.Held if failure is performed Row step 908, otherwise perform step 905;
Step 905, all tables in database are obtained.Data interchange instrument sends table query SQL order to MPP databases, After MPP data-base clusters receive table query SQL order, perform and inquire about all table orders, step 908 is performed if failure, otherwise All table names in returned data intercommunication tool database;Perform step 906;
Step 906, performed by table and import data.Data interchange instrument is used by table mode concurrently to MPP data base sets Pocket transmission imports sql command, after MPP data-base clusters receive table importing sql command, performs data import operation, is deposited by outside The assigned catalogue of storage imports data;If single table imports failure, skip the table and continue with next table, if (user refers to continuous N Determine) a table importing failure, then step 908 is performed, otherwise continues executing with to the importing of all tables and finishes, then perform step 907;
Step 907, import successfully, normally exit;
Step 908, implementation procedure is interrupted and exited.
Embodiments of the invention are described in detail above, but the content is only presently preferred embodiments of the present invention, It is not to be regarded as the practical range for limiting the present invention.All equivalent changes made according to the scope of the invention and improvement etc., all should Still belong within this patent covering scope.

Claims (8)

1. a kind of MPP databases and Hadoop company-data interoperability methods, it is characterised in that including
(1) data are directly exported to Hadoop clusters by MPP databases using data interchange instrument, or data are by MPP data Storehouse is exported to Hadoop clusters by TXT transfers;
(2) data are introduced directly into MPP databases by Hadoop clusters using data interchange instrument, or data are by Hadoop collection Group is directed into MPP databases by TXT transfers;
Wherein, the data interchange instrument includes main control module, Command Line Parsing module, connector, export importing scheduler, work Thread, log pattern, SQL structures module, worker thread pond;
The main control module is used to start the feedback that other modules work and receive other modules;
The Command Line Parsing module is used to the information of user's input being parsed into information recognizable inside program;
The connector is used to realize the connection with MPP databases;
The export imports the export and importing work that scheduler is used to complete data;
The worker thread is used to handle export importing operation;
The log pattern is used to create instrument running log overall situation example;
The SQL structures module is used to build export importing SQL;
The worker thread pond is used to obtain worker thread.
2. a kind of MPP databases according to claim 1 and Hadoop company-data interoperability methods, it is characterised in that institute State directly to export data to the step of Hadoop clusters by MPP databases using data interchange instrument and be:
(1) data interchange instrument start-up;
(2) status checkout, data interchange instrument carry out status checkout, MPP databases to MPP data base set pocket transmission sql commands After cluster receives status checkout sql command, connect Hadoop clusters and check the writable shape of Hadoop cluster assigned catalogues State, MPP data-base clusters check itself each node state and each data fragmentation state;
(3) metadata is exported, data tool sends export metadata sql command to MPP databases, and MPP data-base clusters receive After exporting metadata sql command, metadata is exported into Hadoop file system assigned catalogues;
(4) obtain in database and treat derived table;
(5) data are exported by table, data interchange instrument uses concurrently exports SQL by table mode to the pocket transmission of MPP data base sets Order, after MPP data-base clusters receive table export sql command, data export operation is performed, data are directly exported into Hadoop The back end assigned catalogue of cluster;
(6) export successfully, normally exit;
(7) export failure, implementation procedure are interrupted and exited.
3. a kind of MPP databases according to claim 1 and Hadoop company-data interoperability methods, it is characterised in that institute State data and exported by MPP databases by TXT transfers to the step of Hadoop clusters and be:
(1) data interchange instrument start-up;
(2) status checkout, data interchange instrument carry out status checkout, MPP databases to MPP data base set pocket transmission sql commands After cluster receives status checkout sql command, itself each node state and each data fragmentation state are checked;
(3) metadata is exported, data interchange instrument sends export metadata sql command, MPP data-base clusters to MPP databases After receiving export metadata sql command, metadata is exported to the assigned catalogue of external storage, export form is TXT;
(4) obtain in database and treat derived table;
(5) data are exported by table, data interchange instrument uses concurrently exports SQL by table mode to the pocket transmission of MPP data base sets Order, after MPP data-base clusters receive table export sql command, data export operation is performed, data are directly exported into outside and deposited Store up assigned catalogue;
(6) Hadoop import data, in the physical machine where external storage install Hadoop clients, perform Hadoop- Put orders, the data file of TXT forms is imported into Hadoop assigned catalogue;
(7) Hadoop imports data success, normally exits;
(8) implementation procedure is interrupted and exited.
4. a kind of MPP databases according to claim 1 and Hadoop company-data interoperability methods, it is characterised in that institute State to be introduced directly into data to the step of MPP databases by Hadoop clusters using data interchange instrument and be:
(1) data interchange instrument start-up;
(2) status checkout, data interchange instrument carry out status checkout, MPP databases to MPP data base set pocket transmission sql commands After cluster receives status checkout sql command, connect Hadoop clusters and check Hadoop cluster assigned catalogues can be read shape State;MPP data-base clusters check itself each node state simultaneously;
(3) metadata is imported, import tool sends to MPP databases and imports metadata sql command, and MPP data-base clusters receive After importing metadata sql command, metadata is imported by Hadoop file system assigned catalogue;
(4) table to be imported in database is obtained;
(5) data are imported by table, data interchange instrument uses concurrently imports SQL by table mode to the pocket transmission of MPP data base sets Order, after MPP data-base clusters receive table importing sql command, data import operation is performed, directly accesses the number of Hadoop clusters MPP databases are imported data to according to node;
(6) import successfully, normally exit;
(7) failure is imported, implementation procedure is interrupted and exited.
5. a kind of MPP databases according to claim 1 and Hadoop company-data interoperability methods, it is characterised in that institute Stating the step of data are directed into MPP databases by Hadoop clusters by TXT transfers is:
(1) Hadoop export data, in the physical machine where external storage install Hadoop clients, perform Hadoop- Get orders, the data file of TXT forms is exported into external storage assigned catalogue by Hadoop assigned catalogue;
(2) data interchange instrument start-up;
(3) status checkout, data interchange instrument carry out status checkout, MPP databases to MPP data base set pocket transmission sql commands After cluster receives status checkout sql command, itself each node state is checked;
(4) metadata is imported, data interchange instrument sends to MPP databases and imports metadata sql command, MPP data-base clusters Receive after importing metadata sql command, metadata is imported by the assigned catalogue of external storage;
(5) all tables in database are obtained;
(6) performed by table and import data, data interchange instrument uses concurrently to be imported by table mode to the pocket transmission of MPP data base sets Sql command, after MPP data-base clusters receive table importing sql command, data import operation is performed, by the specified mesh of external storage Record imports data;
(7) import successfully, normally exit;
(8) implementation procedure is interrupted and exited.
6. a kind of MPP databases according to claim 1 and Hadoop company-data interoperability methods, it is characterised in that:MPP Screening export is supported when data export in database, mode derived from screening is the SQL statement of input tape where conditions.
7. a kind of MPP databases and Hadoop company-data intercommunication instruments, including main control module, Command Line Parsing module, connector, Export imports scheduler, worker thread, log pattern, SQL structures module, worker thread pond;The main control module is matched somebody with somebody with described Put parsing module, worker thread pond, export scheduler connection;The log pattern and the Command Line Parsing module, SQL structure moulds Block, worker thread, worker thread pond, connector connection;The export imports scheduler and the connector, worker thread, work Make thread pool, SQL structure module connections;The connector is connected with the worker thread;The worker thread and the SQL structures Model block connection;
The main control module is used to start the feedback that other modules work and receive other modules;
The Command Line Parsing module is used to the information of user's input being parsed into information recognizable inside program;
The connector is used to realize the connection with MPP databases;
The export imports the export and importing work that scheduler is used to complete data;
The worker thread is used to handle export importing operation;
The log pattern is used to create instrument running log overall situation example;
The SQL structures module is used to build export importing SQL;
The worker thread pond is used to obtain worker thread.
8. a kind of MPP databases and Hadoop company-data intercommunication instrument implementation methods, it is characterised in that:Comprise the following steps:
(1) user inputs by start up with command-line options instrument and therewith configuration information, and main control module starts with instrument start-up, master control Module creates instrument running log overall situation example by log pattern first after starting, and then completes the initial chemical industry of other modules Make;
(2) main control module receives the character string forms configuration information of user's input, and the information is passed into Command Line Parsing module, right User inputs configuration and further parsed;
(3) user inputs character string form configuration information is parsed into recognizable configuration information inside program by Parameter analysis of electrochemical module, And it is returned to main control module;
(4) main control module starts export importing scheduler, and import scheduler by export imports work to complete export;
(5) export imports scheduler and creates main connector example, and connects MPP databases by main connector;(6) export imports Scheduler builds module construction status checkout SQL by SQL, and status checkout is performed by main connector;
(7) export imports scheduler and builds module construction export importing metadata SQL by SQL, is performed and led by main connector Go out to import metadata;
(8) export imports scheduler and builds module construction by SQL and inquire about and needs to be exported importing table SQL, passes through main connector Perform inquiry to need to be exported importing table, acquisition needs to be exported importing table;
(9) export imports scheduler and creates job scheduling daily record overall situation example by log pattern;
(10) export imports scheduler and obtains worker thread by thread pool module, and quantity is equal to export and imports degree of parallelism configuration, wound The working connectors of respective amount are built, and each worker thread distributes a working connectors, then starts all operations, each Worker thread parallel processing export imports operation;Wherein, the export of single table, which imports, is referred to as operation, and job content includes:The first step, MPP databases are connected by working connectors, second step, module construction export is built by SQL and imports SQL, the 3rd step passes through Working connectors perform export and imported;
(11) export imports the export importing operation execution situation that scheduler collects each worker thread, the execution feelings after collecting Condition arranges returns to main control module for export importing implementing result, and final export is imported result and returns to user by main control module.
CN201410820059.5A 2014-12-24 2014-12-24 MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method Active CN104572895B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410820059.5A CN104572895B (en) 2014-12-24 2014-12-24 MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410820059.5A CN104572895B (en) 2014-12-24 2014-12-24 MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method

Publications (2)

Publication Number Publication Date
CN104572895A CN104572895A (en) 2015-04-29
CN104572895B true CN104572895B (en) 2018-02-23

Family

ID=53088957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410820059.5A Active CN104572895B (en) 2014-12-24 2014-12-24 MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method

Country Status (1)

Country Link
CN (1) CN104572895B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320755A (en) * 2015-10-14 2016-02-10 夏君 Secure high-speed data transmission method
CN105933446A (en) * 2016-06-28 2016-09-07 中国农业银行股份有限公司 Service dual-active implementation method and system of big data platform
CN106227862A (en) * 2016-07-29 2016-12-14 浪潮软件集团有限公司 E-commerce data integration method based on distribution
CN106446153A (en) * 2016-09-21 2017-02-22 广州特道信息科技有限公司 Distributed newSQL database system and method
CN107622094A (en) * 2017-08-30 2018-01-23 苏州朗动网络科技有限公司 A kind of high-volume data guiding system and method based on search engine
CN107679192B (en) * 2017-10-09 2020-09-22 中国工商银行股份有限公司 Multi-cluster cooperative data processing method, system, storage medium and equipment
CN110019469B (en) * 2017-12-07 2022-06-21 金篆信科有限责任公司 Distributed database data processing method and device, storage medium and electronic device
CN108446145A (en) * 2018-03-21 2018-08-24 苏州提点信息科技有限公司 A kind of distributed document loads MPP data base methods automatically
CN112632114B (en) * 2019-10-08 2024-03-19 ***通信集团辽宁有限公司 Method, device and computing equipment for fast reading data by MPP database
CN110716802B (en) * 2019-10-11 2022-05-17 恩亿科(北京)数据科技有限公司 Cross-cluster task scheduling system and method
CN111143403B (en) * 2019-12-10 2021-05-14 跬云(上海)信息科技有限公司 SQL conversion method and device and storage medium
CN111416861B (en) * 2020-03-20 2022-07-26 中国建设银行股份有限公司 Communication management system and method
CN114138750B (en) * 2021-12-03 2022-10-18 无锡星凝互动科技有限公司 AI consultation database based cluster building method and system
CN116010337B (en) * 2022-12-05 2023-07-21 广州海量数据库技术有限公司 Method for accessing ORC data by openGauss

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187937A (en) * 2007-10-30 2008-05-28 北京航空航天大学 Mode multiplexing isomerous database access and integration method under gridding environment
CN101944128A (en) * 2010-09-25 2011-01-12 中兴通讯股份有限公司 Data export and import method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050044086A1 (en) * 2003-08-18 2005-02-24 Cheng-Hwa Liu Symmetry database system and method for data processing
US20130110799A1 (en) * 2011-10-31 2013-05-02 Sally Blue Hoppe Access to heterogeneous data sources

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187937A (en) * 2007-10-30 2008-05-28 北京航空航天大学 Mode multiplexing isomerous database access and integration method under gridding environment
CN101944128A (en) * 2010-09-25 2011-01-12 中兴通讯股份有限公司 Data export and import method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Hadoop+MPP架构的电信运营商网络数据共享平台研究;辛晃 等;《电信科学》;20140430(第04期);135-145 *

Also Published As

Publication number Publication date
CN104572895A (en) 2015-04-29

Similar Documents

Publication Publication Date Title
CN104572895B (en) MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method
CN105550241B (en) Multi-dimensional database querying method and device
CN104965735B (en) Device for generating upgrading SQL scripts
CN101980213B (en) J2EE-based data persistence method and system
CN104778540A (en) BOM (bill of material) management method and management system for building material equipment manufacturing
CN104317970B (en) A kind of data stream type processing method based on data mart modeling center
CN106897322A (en) The access method and device of a kind of database and file system
CN103441900A (en) Centralization cross-platform automated testing system and control method thereof
JP2010524060A (en) Data merging in distributed computing
CN111290813B (en) Software interface field data standardization method, device, equipment and medium
US10924551B2 (en) IRC-Infoid data standardization for use in a plurality of mobile applications
CN106776962A (en) A kind of general Excel data import multiple database physical table methods
CN103186541A (en) Generation method and device for mapping relationship
CN106528898A (en) Method and device for converting data of non-relational database into relational database
CN114416855A (en) Visualization platform and method based on electric power big data
CN112579586A (en) Data processing method, device, equipment and storage medium
CN113741883B (en) RPA lightweight data middling station system
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN106126522A (en) A kind of processing system of accounting statement
CN110196849B (en) System and method for realizing user portrait construction processing based on big data management technology
CN108255852B (en) SQL execution method and device
CN104573053B (en) A kind of configuration item template dynamic customization method based on XML
Li et al. Research and application of computer aided design system for product innovation
CN112395343B (en) DSG-based field change data acquisition and extraction method
CN110647518B (en) Data source fusion calculation method, component and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant