CN106250429A - Data extraction method based on sqoop - Google Patents

Data extraction method based on sqoop Download PDF

Info

Publication number
CN106250429A
CN106250429A CN201610592714.5A CN201610592714A CN106250429A CN 106250429 A CN106250429 A CN 106250429A CN 201610592714 A CN201610592714 A CN 201610592714A CN 106250429 A CN106250429 A CN 106250429A
Authority
CN
China
Prior art keywords
sqoop
data
task
method based
interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610592714.5A
Other languages
Chinese (zh)
Inventor
付迅
周庆勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN201610592714.5A priority Critical patent/CN106250429A/en
Publication of CN106250429A publication Critical patent/CN106250429A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention particularly relates to a data extraction method based on sqoop. The data extraction method based on the Sqoop comprises the steps of firstly configuring a Sqoop task through a web interface, immediately executing or executing the Sqoop task at regular time, then remotely logging in a Linux server where the Sqoop is located through gan and executing a Sqoop command, reading an execution log in a memory once every 1 second through a task execution interface, displaying the execution log on the interface in real time, and storing the log into a database for later viewing after the execution of the Sqoop task is finished. According to the data extraction method based on the sqoop, the sqoop tasks can be scheduled and checked through unified configuration of a web interface, a data bidirectional import and export function of a relational database and a big data cluster is achieved, the management and maintenance problems of multiple data sources in the continuous development process of a cloud computing technology are solved, the method has important significance for development and popularization of the cloud computing technology, and the method is suitable for popularization and application.

Description

A kind of data pick-up method based on sqoop
Technical field
The present invention relates to data message technical field, particularly to a kind of data pick-up method based on sqoop.
Background technology
Along with the development of cloud computing technology, cloud computing technology constantly lands becomes support every profession and trade Information Technology Development Mainstay.Traditional operation system builds mostly on relevant database, needs these data pick-ups to big number Calculate, analyze according to cluster is carried out, the most again result data is imported relational database for displaying.It addition, one complicated Operation system usually contains multiple data sources, and the simple sqoop of dependence goes to extract the data of multi-data source, although functionally can be real Existing, but management, safeguard the most difficult.Therefore, unification configures, dispatches, checks method or the platform of sqoop task Have good value for applications.
For the problems referred to above, the present invention devises a kind of data pick-up method based on sqoop.
Sqoop order is performed by Ganymed ssh2 for Java Telnet sqoop server.Ganymed is one Individual Java achieves the storehouse of increasing income of ssh2 agreement.Ssh server can be connected in java applet by ganymed, support Remotely perform order, shell accesses, SCP and SFTP.
Timer-triggered scheduler is realized by Quartz.Quartz be one completely by the job scheduling framework of increasing income of written in Java, It can combine with J2EE with J2SE application program and can also be used alone.Quartz can be used to dispatch ten, hundred, very To being several ten thousand simple or complicated job, these job can be Java component or the EJBs of standard.
Summary of the invention
The present invention is in order to make up the defect of prior art, it is provided that a kind of simple efficient data pick-up based on sqoop Method.
The present invention is achieved through the following technical solutions:
A kind of data pick-up method based on sqoop, it is characterised in that comprise the following steps:
(1) first passing through web interface configuration sqoop task, system automatically generates sqoop order and is stored in data base;
(2) Sqoop task has two kinds of selections after configuring, and can select click button, be immediately performed sqoop task, it is also possible to Selecting by interface configurations intervalometer, system generates clocking discipline, and enables quartz intervalometer;
(3) when directly perform sqoop task or intervalometer then after, the sqoop order in system elder generation reading database, then lead to Cross ganymed be remotely logged into the Linux server at sqoop place and perform sqoop order, the execution journal of Sqoop order Captured by ganymed, and be stored in internal memory;
(4) execution journal in an internal memory was read every 1 second in tasks carrying interface, and showed in real time on interface, with Implementation effect in linux order line is consistent;
(5), after Sqoop tasks carrying terminates, daily record is stored in data base for checking later.
In described step (1), first choice relation data source and tables of data, and configure concrete sql statement data are entered Row cleans, selects, and then selects big data source, configures the corresponding parameter of big data source, and system can be automatic according to the configuration at interface Generate corresponding sqoop order and be stored in data base.
Described big data source includes HDFS, HBase and Hive tri-kinds.
In described step (2), by web interface select every day/what day/some sky monthly, when being then filled out performing Carving, system can automatically generate the cron expression formula meeting quartz intervalometer accordingly, and quartz intervalometer is expressed according to cron Formula timing performs sqoop task.
The invention has the beneficial effects as follows: be somebody's turn to do data pick-up method based on sqoop, can configure by web interface is unified, Dispatch and check sqoop task, it is achieved that relational database imports and exports function with the data double-way of large data sets group, solves The management of multi-data source data and maintenance problems during cloud computing technology development, for the development of cloud computing technology with push away The most significant, suitable popularization and application.
Accompanying drawing explanation
Accompanying drawing 1 is present invention data pick-up based on sqoop method schematic diagram.
Detailed description of the invention
In order to make the technical problem to be solved, technical scheme and beneficial effect clearer, below tie Closing drawings and Examples, the present invention will be described in detail.It should be noted that, specific embodiment described herein is only used To explain the present invention, it is not intended to limit the present invention.
It is somebody's turn to do data pick-up method based on sqoop, comprises the following steps:
(1) first pass through web interface configuration sqoop task:
1. choice relation data source (such as Oracle, MySql) and tables of data;
2. configure concrete sql statement data to be carried out, selects (optional);
3. big data source (HDFS, HBase, Hive tri-kinds) is selected;
4. the corresponding parameter of big data source is configured, the path of such as HDFS, the table name etc. of Hive.
System can automatically generate corresponding sqoop order according to the configuration at interface and be stored in data base, it is not necessary to understand sqoop Just can use.Also support directly the sqoop order finished writing to be filled up in interface.
(2) Sqoop task has two kinds of selections after configuring: can directly perform this task, it is also possible to configuration intervalometer, logical Cross intervalometer timer-triggered scheduler sqoop task.By web interface select every day/what day/some sky monthly, be then filled out holding In the row moment, system can automatically generate the cron expression formula meeting quartz accordingly, and quartz holds according to the timing of cron expression formula Row sqoop task.
(3) when directly perform sqoop task or intervalometer then after, the sqoop order in system elder generation reading database, so It is remotely logged on the linux server at sqoop place by ganymed afterwards and performs corresponding sqoop order.Sqoop order Execution journal captured by ganymed, and be stored in internal memory.
(4) execution journal that tasks carrying interface was read in an internal memory every 1 second, and show in real time on interface.With Implementation effect in linux order line is consistent.
(5), after Sqoop tasks carrying terminates, daily record is stored in data base for checking later.

Claims (4)

1. a data pick-up method based on sqoop, it is characterised in that comprise the following steps:
(1) first passing through web interface configuration sqoop task, system automatically generates sqoop order and is stored in data base;
(2) Sqoop task has two kinds of selections after configuring, and can select click button, be immediately performed sqoop task, it is also possible to Selecting by interface configurations intervalometer, system generates clocking discipline, and enables quartz intervalometer;
(3) when directly perform sqoop task or intervalometer then after, the sqoop order in system elder generation reading database, then lead to Cross ganymed be remotely logged into the Linux server at sqoop place and perform sqoop order, the execution journal of Sqoop order Captured by ganymed, and be stored in internal memory;
(4) execution journal in an internal memory was read every 1 second in tasks carrying interface, and showed in real time on interface, with Implementation effect in linux order line is consistent;
(5), after Sqoop tasks carrying terminates, daily record is stored in data base for checking later.
Data pick-up method based on sqoop the most according to claim 1, it is characterised in that: in described step (1), first First choice relation data source and tables of data, and configure concrete sql statement and data are carried out, select, then select big number According to source, configuring the corresponding parameter of big data source, system can automatically generate corresponding sqoop order according to the configuration at interface and be stored in Data base.
Data pick-up method based on sqoop the most according to claim 2, it is characterised in that: described big data source includes HDFS, HBase and Hive tri-kinds.
Data pick-up method based on sqoop the most according to claim 1, it is characterised in that: in described step (2), logical Cross web interface select every day/what day/some sky monthly, be then filled out performing the moment, system can automatically generate corresponding symbol Closing the cron expression formula of quartz intervalometer, quartz intervalometer performs sqoop task according to the timing of cron expression formula.
CN201610592714.5A 2016-07-26 2016-07-26 Data extraction method based on sqoop Pending CN106250429A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610592714.5A CN106250429A (en) 2016-07-26 2016-07-26 Data extraction method based on sqoop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610592714.5A CN106250429A (en) 2016-07-26 2016-07-26 Data extraction method based on sqoop

Publications (1)

Publication Number Publication Date
CN106250429A true CN106250429A (en) 2016-12-21

Family

ID=57604819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610592714.5A Pending CN106250429A (en) 2016-07-26 2016-07-26 Data extraction method based on sqoop

Country Status (1)

Country Link
CN (1) CN106250429A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363782A (en) * 2018-02-11 2018-08-03 中国联合网络通信集团有限公司 A kind of data cleaning method and Data clean system
CN108664657A (en) * 2018-05-20 2018-10-16 湖北九州云仓科技发展有限公司 A kind of big data method for scheduling task, electronic equipment, storage medium and platform
CN108875017A (en) * 2018-06-20 2018-11-23 山东浪潮商用***有限公司 A kind of massive data synchronization system and method based on Sqoop technology
CN110308964A (en) * 2019-07-08 2019-10-08 北京瑞福缘动网络科技有限公司 Method and electronic equipment for two-way parsing Cron expression formula
CN113392343A (en) * 2021-08-17 2021-09-14 深圳市信润富联数字科技有限公司 Data extraction method, device, medium and computer program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838617A (en) * 2014-02-18 2014-06-04 河海大学 Method for constructing data mining platform in big data environment
CN103856565A (en) * 2014-03-18 2014-06-11 浪潮集团有限公司 E-commerce tax source management cloud collection monitoring method
CN104077552A (en) * 2014-07-07 2014-10-01 北京泰乐德信息技术有限公司 Rail traffic signal comprehensive operation and maintenance method and system based on cloud computing
CN104408167A (en) * 2014-12-09 2015-03-11 浪潮电子信息产业股份有限公司 Method for expanding sqoop function in Hue based on django
CN105069142A (en) * 2015-08-18 2015-11-18 山大地纬软件股份有限公司 System and method for extraction, transformation and distribution of data increments

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838617A (en) * 2014-02-18 2014-06-04 河海大学 Method for constructing data mining platform in big data environment
CN103856565A (en) * 2014-03-18 2014-06-11 浪潮集团有限公司 E-commerce tax source management cloud collection monitoring method
CN104077552A (en) * 2014-07-07 2014-10-01 北京泰乐德信息技术有限公司 Rail traffic signal comprehensive operation and maintenance method and system based on cloud computing
CN104408167A (en) * 2014-12-09 2015-03-11 浪潮电子信息产业股份有限公司 Method for expanding sqoop function in Hue based on django
CN105069142A (en) * 2015-08-18 2015-11-18 山大地纬软件股份有限公司 System and method for extraction, transformation and distribution of data increments

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ABRAHAM ELMAHREK: "Sqooping Data with Hue", 《HTTPS://BLOG.CLOUDERA.COM/BLOG/2013/11/SQOOPING-DATA-WITH-HUE》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363782A (en) * 2018-02-11 2018-08-03 中国联合网络通信集团有限公司 A kind of data cleaning method and Data clean system
CN108664657A (en) * 2018-05-20 2018-10-16 湖北九州云仓科技发展有限公司 A kind of big data method for scheduling task, electronic equipment, storage medium and platform
CN108875017A (en) * 2018-06-20 2018-11-23 山东浪潮商用***有限公司 A kind of massive data synchronization system and method based on Sqoop technology
CN110308964A (en) * 2019-07-08 2019-10-08 北京瑞福缘动网络科技有限公司 Method and electronic equipment for two-way parsing Cron expression formula
CN113392343A (en) * 2021-08-17 2021-09-14 深圳市信润富联数字科技有限公司 Data extraction method, device, medium and computer program product

Similar Documents

Publication Publication Date Title
CN106250429A (en) Data extraction method based on sqoop
CN108536761B (en) Report data query method and server
CN102831052B (en) Test exemple automation generating apparatus and method
CN103441900B (en) Centralized cross-platform automatization test system and control method thereof
CN102982085B (en) Data mover system and method
CN108280023B (en) Task execution method and device and server
CN103309904A (en) Method and device for generating data warehouse ETL (Extraction, Transformation and Loading) codes
CN109448100B (en) Three-dimensional model format conversion method, system, computer device and storage medium
CN103677973A (en) Distributed multi-task scheduling management system
CN104778124A (en) Automatic testing method for software application
CN106293891B (en) Multidimensional investment index monitoring method
CN104461671A (en) Method and system for periodically managing code modification report
CN106844682A (en) Method for interchanging data, apparatus and system
CN110471754A (en) Method for exhibiting data, device, equipment and storage medium in job scheduling
CN102306122A (en) Automated testing method and equipment
CN110764747B (en) Airflow-based data calculation scheduling method
CN106484624A (en) The method of testing of interface automatic test
CN105589739B (en) A kind of process control system and method
CN104156198A (en) Method and device for automatically generating software integration version updating description
CN110442651A (en) A method of it is uploaded automatically based on kettle realization excel data and triggers scheduling
CN108664657A (en) A kind of big data method for scheduling task, electronic equipment, storage medium and platform
CN105279092A (en) Software testing method and apparatus
CN103699478A (en) Test case generation system and test case generation method
CN111190814A (en) Software test case generation method and device, storage medium and terminal
WO2016165461A1 (en) Automated testing method and apparatus for network management system software of telecommunications network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161221