CN115048205B - ETL scheduling platform, deployment method thereof and computer-readable storage medium - Google Patents

ETL scheduling platform, deployment method thereof and computer-readable storage medium Download PDF

Info

Publication number
CN115048205B
CN115048205B CN202210971604.5A CN202210971604A CN115048205B CN 115048205 B CN115048205 B CN 115048205B CN 202210971604 A CN202210971604 A CN 202210971604A CN 115048205 B CN115048205 B CN 115048205B
Authority
CN
China
Prior art keywords
carte
task
node
scheduling
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210971604.5A
Other languages
Chinese (zh)
Other versions
CN115048205A (en
Inventor
赵耀坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuexin Semiconductor Technology Co.,Ltd.
Original Assignee
Guangzhou Yuexin Semiconductor Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yuexin Semiconductor Technology Co Ltd filed Critical Guangzhou Yuexin Semiconductor Technology Co Ltd
Priority to CN202210971604.5A priority Critical patent/CN115048205B/en
Publication of CN115048205A publication Critical patent/CN115048205A/en
Application granted granted Critical
Publication of CN115048205B publication Critical patent/CN115048205B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3017Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/41User authentication where a single sign-on provides access to a plurality of computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/16Implementing security features at a particular protocol layer
    • H04L63/168Implementing security features at a particular protocol layer above the transport layer
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • General Factory Administration (AREA)

Abstract

The invention provides an ETL (extract-transform-load) scheduling platform, a deployment method thereof and a computer readable storage medium, wherein the ETL scheduling platform is provided with at least two scheduling nodes which are communicated with each other, each scheduling node is used as a master node of any Carte cluster and is used for configuring the Carte cluster and configuring each Carte node in the Carte cluster, and each scheduling node can manage all Carte nodes which are deployed; after receiving the corresponding task to be scheduled, any scheduling node automatically balances the task load to each scheduling node. The ETL scheduling platform directly calls each Carte node to execute tasks, high availability and load balancing functions of the Kettle cluster are achieved, and a set of high availability cluster task scheduling method is designed on the ETL scheduling platform.

Description

ETL scheduling platform, deployment method thereof and computer-readable storage medium
Technical Field
The invention relates to the technical field of computers, in particular to an ETL scheduling platform based on a button, a deployment method thereof and a computer readable storage medium.
Background
ETL (Extract Transform Load) is used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from a source to a destination. The key (KDE Extraction transport Transformation and Loading Environment) is an open-source ETL tool abroad, at present, when the key is put into a production Environment for use, hundreds or hundreds of tasks (jobs) are often generated, at present, an ETL (key) scheduling platform (key manager) is generally used for managing the tasks, and the key (key Extraction transport Transformation and Loading Environment) has the greatest advantage of simple and easy use, but the following defects also exist:
1) Currently, the key manager only develops (or upgrades) to the key-manager 0.4.0, which only supports the old version of the key, and no longer supports the latest version of the key.
2) Currently, the button manager does not support HA (High Availability), and there is a single point of failure (short SPOF) problem, that is, a component in the system that is unable to operate due to a failure of one point in the system, in other words, the single point of failure may cause an overall failure.
3) The current working mode of the button Manger is to directly call a button API data interface in a mode of taking a button core package as a dependent item of the button Manger so as to realize an ETL scheduling function.
4) The current timing scheduling function of the key is a primary scheduling mode of the key, the granularity of scheduling time is not accurate, only fixed time or fixed finishing time interval is supported, the requirement on production is not met, multi-time scheduling assignment is not supported, the operation of several numbers per month cannot be fixed, and the flexibility is insufficient.
5) Fig. 1 shows a framework diagram of a key Manger in the prior art and a control diagram of a key system in the prior art shown in fig. 2, most of the scheduling platforms in the industry use a key core API to directly call an ETL function, which makes the scheduling platform strongly associated with a key version, and the key Manger depends on a specific key version for use and needs to follow a key upgrade (FDC: machine failure analysis system, BPM: business process management system, MES: manufacturing execution system, SAP: an enterprise resource management software system).
6) When the system is used, tasks are often automatically stopped, the stopping reason cannot be determined, and the button management has logic errors, so that the system log has error alarm, the working performance is influenced, and the mail alarm function is not supported.
Disclosure of Invention
The invention aims to provide an ETL scheduling platform, a deployment method thereof and a computer readable storage medium, so as to solve the problems of task scheduling of the existing button scheduling platform, such as button Manger.
In order to solve the above technical problem, the present invention provides an ETL scheduling platform, which has at least two scheduling nodes that communicate with each other, where each scheduling node is used as a host node of any Carte cluster and is used to configure Carte clusters and configure each Carte node in the Carte clusters, and each scheduling node can manage all Carte nodes that are deployed;
after receiving a corresponding task to be scheduled, any scheduling node automatically balances the load of the task to each scheduling node, so that when the Carte node designated by the task is available, the designated Carte node executes the task, and when the Carte node designated by the task is unavailable, a new Carte node is searched in all the Carte clusters to execute the task.
Preferably, the scheduling node comprises:
the Carte node management module is used for adding and deleting a Carte cluster server, setting an SSH remote login server and managing an sftp file, and adding, deleting, starting and stopping a corresponding Carte node on the corresponding Carte cluster server through a data interface so as to configure a Carte cluster and each Carte node in the Carte cluster;
the task scheduling module is used for adding, deleting, starting, stopping or immediately executing a task to be scheduled, assigning a corresponding Carte node for the task to be scheduled to execute, and searching a new Carte node in all the Carte clusters to execute the task when the Carte node assigned by the task is unavailable;
and the operation monitoring module is used for monitoring the basic state of the server where the scheduling node is positioned and the basic state of each Carte cluster server, monitoring the execution states of all tasks and sending a system alarm mail.
Preferably, each scheduling node is arranged on the corresponding server in a one-to-one correspondence manner, and has a task management interface for displaying each task, the status of each Carte node for executing the task, a log of the executed task, and an execution flow chart of the task, so as to manage the tasks and the Carte nodes in batch.
Preferably, the scheduling node further comprises:
the resource library management module is used for creating and deleting a management data resource library and/or a file resource library, and synchronizing data of the file resource libraries on different Carte cluster servers;
the conversion and operation module is used for uploading, downloading, inquiring, deleting and renaming the operation and the conversion in the resource library;
and the system setting module is used for carrying out user management and adding or deleting users logging in the ETL scheduling platform.
Preferably, the Carte node management module includes a Carte node management unit, where the Carte node management unit is configured to add a Carte node, specify a server where the Carte node is located, a deployment location in the server, and a processed item type, and set at least one item tag for the added Carte node, where the Carte nodes having the same item tag form a Carte cluster to process a task in the same item type, and the Carte nodes in the Carte cluster have no master-slave nodes and are all task execution nodes.
Preferably, the task scheduling module is further configured to set the newly added task to be scheduled to be started at a fixed time, and set at least one of a fixed time starting time, a name of a project type to which the task belongs, the number of times of automatic restart after the task fails to be executed, a log level of task execution, and a task starting parameter for the newly added task to be scheduled.
Preferably, the task scheduling module is further configured to:
acquiring the item type of the added task to be scheduled, distributing the task to be scheduled to an item Carte cluster with a corresponding item label, and further appointing a default Carte node for processing the task to be scheduled;
searching whether the task is executed last time;
if not, skipping the scheduling of the task, namely directly finishing the scheduling of the task;
if the execution is finished, further judging whether the default Carte node is available, if so, using the default Carte node as a designated execution node of the task to be scheduled so as to execute the task to be scheduled;
and if the default node is unavailable, eliminating the unavailable default Carte node, communicating with each scheduling node, acquiring a keyword value of the task by using a consistent Hash algorithm, acquiring Carte nodes with item labels corresponding to the task to be scheduled from each Carte cluster according to the keyword value, and executing the task to be scheduled by using the Carte nodes as new Carte nodes.
Preferably, the task scheduling module adopts an integrated Quartz and Springboot framework to realize the timed starting of the tasks, and the timed scheduling of each task is completed by executing the timed starting time saved in the task scheduling module.
The invention also provides a deployment method of the ETL scheduling platform, which comprises the following steps:
deploying basic components, wherein the basic components comprise a MySQL database and a JAVA language development kit;
deploying at least one scheduling node;
logging in any scheduling node to deploy a Carte cluster, wherein the Care cluster comprises SSH remote login information of a configuration server, an sftp file and various Carte nodes under the Carte cluster;
configuring the information of the Ketttle resource library;
configuring a task to be scheduled to a corresponding Carte cluster;
and starting a corresponding scheduling task and waiting for the task to execute.
Preferably, the step of configuring the task to be scheduled to the corresponding Carte cluster includes:
newly adding a corresponding task and designating the task as a timing task;
setting a timing starting time for the newly added task;
setting starting parameters for the newly added task;
setting a project name and a corresponding relation between the project name and a corresponding project label of a Carte node for the newly added task;
setting the automatic restart times of task execution failure for the newly added task;
and setting the log level of task execution for the newly added task.
Preferably, in the step of configuring the task to be scheduled to the corresponding Carte cluster, a default Carte node for processing the task to be scheduled is set according to a preset project type processed by the Carte node, and whether the default Carte node is available is checked;
in the step of starting the corresponding scheduling task, the corresponding card node executes the task at regular time by adopting an integrated Quartz and Springboot framework according to the preset timing starting time, and the execution status, the execution log and the execution flow chart of each task, each card node are displayed through a task management interface arranged on each server so as to manage the tasks and the card nodes in batches.
Preferably, the step of configuring the keytlle repository information comprises: creating and deleting data resource libraries and/or file resource libraries for operation and conversion, synchronizing file resource libraries on different servers, and uploading, downloading, querying, deleting and renaming the operation and conversion.
Preferably, the step of logging in any scheduling node to deploy the Carte cluster comprises:
adding at least one project label to each Carte node, wherein Carte nodes with the same project label form a Carte cluster for processing tasks under the same project type, and the Carte nodes in the Carte cluster have no master-slave division and are all task execution nodes; the step of further configuring the task to be scheduled to the corresponding Carte cluster according to the deployed Carte cluster comprises:
adding a new task to be scheduled, setting the task to be started at a fixed time, and setting at least one of fixed time starting time, the name of a project type to which the task belongs, the number of times of automatic restart after the task fails to be executed, the log level of task execution and task starting parameters for the newly added task to be scheduled;
acquiring the item type of the added task to be scheduled, distributing the task to be scheduled to an item Carte cluster with a corresponding item label, and further appointing a default Carte node for processing the task to be scheduled;
searching whether the task is executed last time;
if not, skipping the scheduling of the task, namely directly finishing the scheduling of the task;
if the execution is finished, further judging whether the default Carte node is available, if so, using the default Carte node as a designated execution node of the task to be scheduled so as to execute the task to be scheduled;
and if the default Carte node is unavailable, eliminating the unavailable default Carte node, communicating with each scheduling node, acquiring a keyword value of the task by using a consistent hash algorithm, acquiring the Carte node with a project label corresponding to the task to be scheduled from each Carte cluster according to the keyword value, and executing the task to be scheduled by using the Carte node as a new Carte node.
The present invention also provides a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements a deployment method for an ETL platform.
The ETL scheduling platform and the deployment method thereof provided by the invention have the following advantages:
1. according to the method, the key cluster is realized by self, the ETL scheduling platform directly calls each Carte node to execute the task, the high availability and load balancing functions of the key cluster are realized, the Carte nodes in the Carte cluster have no master node or slave node, each Carte node is a task execution node, and the condition that the whole system is unavailable due to failure of a Carte master node is avoided.
2. Because a single point of failure exists in a cluster supported by Carte, a set of highly available cluster scheduling task method is designed on an ETL scheduling platform: by attaching a project label to a Carte node, a task under a certain project needs to be preset with default Carte node processing, the default Carte node has a project label corresponding to the task, if the default Carte node is unavailable, a new Carte node processing task is removed and searched, and a plurality of project labels can be attached to the Carte node, so that the task can be normally executed as long as one surviving node exists in a cluster, and a consistent Hash algorithm is used to obtain the new Carte node when the default Carte node is unavailable, thereby avoiding that the task under the same project is disorderly and drifted when being executed in the project Carte cluster.
3. When the Carte node is deployed, the deployment position of the Carte node in the server is also configured, namely configuration information of a resource library is issued to the Carte node through the ETL scheduling platform, the Carte node can be directly executed, the Carte cluster is directly called through an API (application program interface) data interface to execute conversion and operation, and compared with an original edition keyboard, the original edition keyboard can only read the resource library configured by the repisitory.
4. The ETL scheduling platform is high in availability, tasks are executed regularly through the Springboot + Quartz frame, accurate and flexible scheduling granularity of the scheduling platform is achieved, and load balancing and high-availability load balancing functions of tasks of the scheduling platform are further achieved.
5. The ETL scheduling platform realizes the host management function of the Carte, can realize SSH login and sftp uploading and downloading functions of the corresponding host, can monitor and manage the execution status of each task on any server, is convenient for finding errors in the execution of the tasks in time, and is convenient for a user to manage host resources.
Drawings
FIG. 1 is a schematic diagram of a prior art Kettle system architecture;
FIG. 2 is a schematic diagram of prior art Kettle system control;
FIG. 3 is a schematic diagram of ETL dispatch platform control;
FIG. 4 is a system architecture diagram of a deployment method of an ETL scheduling platform;
FIG. 5 is a block architecture diagram of a scheduling node;
FIG. 6 is a schematic diagram of a server node interface of one embodiment of an ETL scheduling platform;
FIG. 7 is a schematic diagram of a server add interface of one embodiment of an ETL scheduling platform;
FIG. 8 is a schematic diagram of a Carte configuration interface for one embodiment of an ETL scheduling platform;
FIG. 9 is a schematic diagram of a Carte node base information setup interface according to an embodiment of the ETL scheduling platform;
FIG. 10 is a schematic diagram of a Carte node parameter setup interface of an embodiment of an ETL scheduling platform;
FIG. 11 is a schematic diagram of a Carte node deployment setup interface of an embodiment of an ETL scheduling platform;
FIG. 12 is a schematic diagram of a document library management interface of one embodiment of an ETL scheduling platform;
FIG. 13 is a schematic diagram of an add file repository interface of an embodiment of an ETL scheduling platform;
FIG. 14 is a database management interface diagram of one embodiment of an ETL scheduling platform;
FIG. 15 is a schematic diagram of an add data repository interface of an embodiment of an ETL scheduling platform;
FIG. 16 is a transformation & job interface diagram of one embodiment of an ETL scheduling platform;
FIG. 17 is a schematic diagram of an embodiment of an ETL scheduling platform illustrating a job flow diagram at the convert & job interface;
FIG. 18 is a schematic diagram of a task orchestration interface of one embodiment of an ETL scheduling platform;
FIG. 19 is a schematic diagram of an Add task scheduling interface of one embodiment of an ETL scheduling platform.
Detailed Description
The ETL scheduling platform and the deployment method thereof proposed by the present invention are further described in detail below with reference to the accompanying drawings and specific embodiments. Advantages and features of the present invention will become apparent from the following description and from the claims. It is to be noted that the drawings are in a very simplified form and are not to precise scale, which is merely for the purpose of facilitating and distinctly claiming the embodiments of the present invention.
Specifically, please refer to fig. 3 and fig. 5, which are schematic diagrams illustrating an ETL scheduling platform according to an embodiment of the present invention. An ETL scheduling platform (or the ETL scheduling platform is named as tank in practical use, meaning water tank, and the meaning of putting various data into a tank water tank and flowing out in a specified format) is provided with at least two scheduling nodes which are communicated with each other to execute task scheduling, wherein each scheduling node is used as a main node of any Carte cluster and is used for configuring the Carte cluster and configuring each Carte node in the Carte cluster, and each scheduling node can manage all the Carte nodes which are deployed.
After receiving the corresponding task to be scheduled, any scheduling node automatically balances the load of the task to each scheduling node, so that when the Carte node designated by the task is available, the designated Carte node executes the task, and when the Carte node designated by the task is unavailable, a new Carte node is searched in all Carte clusters to execute the task.
The Carte sub-server acts as a component module of the keyboard to remotely perform transformations and jobs. Carte is a lightweight service process that can support remote monitoring and provide clustering capability for the transformation, and the sub-server is the smallest component module of the cluster and is also a small http server for receiving remote client commands for deployment, management and monitoring of jobs and transformation.
Each scheduling node is correspondingly arranged on a corresponding server one by one, and is provided with a task management interface for displaying each task, the task execution state of each Carte node, the task execution log and the task execution flow chart so as to manage the tasks and the Carte nodes in batches.
As shown in fig. 5, the scheduling node as a part of the ETL scheduling platform includes: the system comprises a Carte node management module, a task scheduling module, an operation monitoring module, a resource library management module, a conversion and operation module and a system setting module.
The Carte node management module is used for adding and deleting a Carte cluster server, setting an SSH remote login server and managing an sftp file, adding, deleting, starting and stopping a corresponding Carte node on the corresponding Carte cluster server through a data interface to configure a Carte cluster and each Carte node in the Carte cluster, and comprises a Carte node management unit, wherein the Carte node management unit is used for adding the Carte node, specifying a server where the Carte node is located, a deployment position in the server, and a processed item type, and setting at least one item label for the added Carte node, wherein the Carte nodes with the same item label form a Carte cluster and are used for processing tasks under the same item type, and the Carte nodes in the Carte cluster have no branch and are all task execution nodes.
The task scheduling module is used for adding, deleting, starting, stopping or immediately executing a task to be scheduled, assigning a corresponding Carte node for the task to be scheduled to execute, searching a new Carte node in all Carte clusters to execute the task when the Carte node assigned by the task is unavailable, setting the newly added task to be scheduled to be started at a fixed time, and setting at least one of fixed time starting time, the name of a project type to which the task belongs, the number of times of automatic restarting after the task execution fails, a log level of task execution and a task starting parameter for the newly added task to be scheduled, in one example, the task scheduling module sets at least one of fixed time starting time, the name of the project type to which the task belongs, the number of times of automatic restarting after the task execution fails, a log level of task execution and optionally sets a task starting parameter for the newly added task to be scheduled.
In addition, the task scheduling module is further configured to allocate the tasks to the nodes for processing, and sequentially execute the following steps:
acquiring the item type of the added task to be scheduled, distributing the task to be scheduled to an item Carte cluster with a corresponding item label, and further appointing a default Carte node for processing the task to be scheduled;
searching whether the task is executed last time;
if the execution is not finished, the default Carte node is considered to be unavailable, and the scheduling of the task is skipped, namely the scheduling of the task is directly finished;
if the execution is finished, further judging whether the default Carte node is available, if so, using the default Carte node as a specified execution node of the task to be scheduled for executing the task to be scheduled;
if the default node is unavailable, the unavailable default Carte node is removed, the default Carte node is communicated with each scheduling node, the keyword value of the task is obtained by using a consistent Hash algorithm, the Carte node with the project label corresponding to the task to be scheduled is obtained from each Carte cluster according to the keyword value, the Carte node is used as a new Carte node to execute the task to be scheduled, and the Carte node obtained by using the consistent Hash algorithm can effectively avoid that the task to be scheduled is carried out in a caret cluster.
It should be noted that the task scheduling module adopts an integrated Quartz and Spring boot framework to realize the timed starting of the tasks, and the timed scheduling of each task is completed by executing the timed starting time and the timed starting rule pre-stored in the task scheduling module, and the Spring boot is designed based on Spring4.0, so that the original excellent characteristics of the Spring framework are inherited, and the whole building and developing process of the Spring application is further simplified by simplifying the configuration. In addition, the SpringBoot integrates a large number of frames to enable the problems of version conflict of dependent packets, instability of citation and the like to be well solved, quartz is also used for realizing a timing task, accurate and flexible scheduling granularity of an ETL scheduling platform is realized by integrating Quartz and Springboot frames, and load balance and high availability of tasks of the ETL scheduling platform are realized.
The operation monitoring module is used for monitoring the basic state of the server where the scheduling node is located and the basic state of each Carte cluster server, monitoring the execution states of all tasks and sending a system alarm mail.
The resource library management module is used for creating and deleting a management data resource library and/or a file resource library, and synchronizing data of the file resource libraries on different Carte cluster servers.
And the conversion and operation module is used for uploading, downloading, inquiring, deleting and renaming the operation and the conversion in the resource library. In the ETL platform, one data extraction process mainly includes creating one job, and each job may include a plurality of conversion operations. Transformation is one of the important components in the ETL solution, and is mainly used for operations such as extraction, transformation, and loading of data, and is essentially a logical structure of a set of graphical data transformation configurations. The execution of the button tool is divided into two layers, namely conversion and operation, and the two layers are mainly different in data transmission and execution modes. A job contains one or more job items, and these job items are all executed in a certain order. The order of Job execution is determined by the jump between Job items (Job Hop) and the execution result of each Job item. Sending mail, sending data to a data warehouse, such as a data warehouse configured in an ETL scheduling platform, can also be performed through jobs.
And the system setting module is used for carrying out user management, adding or deleting users logging in the ETL scheduling platform, or further setting the operation authority of the users.
Specifically, referring to fig. 4, the present invention further provides a schematic diagram of an embodiment of a deployment method of an ETL scheduling platform, where the deployment method of the ETL scheduling platform includes the following steps:
s1, deploying basic components, wherein the basic components comprise a MySQL database and a JAVA language development toolkit (JDK 1.8).
S2, deploying at least one scheduling node, where the step of deploying two scheduling nodes to two different IP addresses (or servers) respectively is as follows:
s2.1, uploading the Tank software to a server/data/etl, wherein the specific executed codes are as follows:
tank
├── config
├── lib
├── libdb
├── init-sql
├── run.sh
├── start.sh
├── stop.sh
├── tank-1.0.0.1-SNAPSHOT.jar
├── lib
└── Kettle.tar
s2.2, modifying a configuration file/data/ethyl/tank/config/application-pro.yml, wherein code line segments are specifically shown as follows:
# storage platform configuration
primary:
driver-class-name: com.mysql.jdbc.Driver
url: jdbc:mysql://172.16.154.76:3306/cs_tank_dev
Figure 474693DEST_PATH_IMAGE001
useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=true&serverTimezone=GMT%2B8
username: ********
password: ********
Store Call Task execution Log
second:
driver-class-name: com.mysql.jdbc.Driver
url: jdbc:mysql://172.16.154.76:3306/cs_tank-log_dev
Figure 520009DEST_PATH_IMAGE001
useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=true&serverTimezone=GMT%2B8
username: *******
password: *******
S2.3, initializing the database, connecting the database by using Navicat for MySQL, and creating two databases: cs _ rank _ dev and cs _ rank-log _ dev, and designating the character set as utf8-mb4; then connecting the cs _ rank _ dev database, and executing an init-sql/cs _ rank _ dev.sql init-sql/Quartz2.3.2.SQL command; connecting the cs _ tank-log _ dev database to execute an init-sql/cs _ tank-log _ dev.
And respectively starting the scheduling nodes on the two IP addresses, and then performing subsequent operation on a task management page of any one scheduling node, wherein the scheduling nodes can automatically balance the load of the scheduling task to each scheduling node for execution.
Wherein, the code for starting Tank:
cd /data/tank/
./start.sh
code to close Tank:
cd /data/tank/
./start.sh
s3, logging in any scheduling node to deploy a Carte cluster, including SSH remote login information and an sftp file of a configuration server, and each Carte node under the Carte cluster, configuring the Carte cluster node of the button to a cluster host by using a Carte node management module of an ETL scheduling platform, and specifically executing the operation steps:
first, a scheduling node is opened on any one of the servers and a default account password is entered for login.
Then, configuring a server node, through operation of a Carte node management module, clicking an adding server on an interface correspondingly shown by an upper server node of the task management interface shown in fig. 6, then jumping to the adding server in the server adding interface shown in fig. 7 and filling in server SSH login information, testing a connection, and saving the server after the test is passed, in one example, a name, an IP address or a host name, a connection port, a user name, a password and remarks of the server need to be input.
And respectively adding SSH remote login information of two different IP address servers to a server management list, and performing SSH and sftp login management on the server on a task management interface after configuration is finished.
Finally, deploying a Carte cluster to a server, opening a Carte adding service on a Carte configuration page in fig. 8, and jumping to an interface of a Carte adding node, in an example, configuring basic information of the Carte node, as shown in fig. 9, filling in Project tags of the Carte node, which projects the Carte node can run, and setting a login user name and a login password for the Carte service, because the Carte is a lightweight Web service, a port of the service also needs to be specified; then, configuring node parameters, as shown in fig. 10, setting the maximum row number of the log record, the log retention time and the call object retention time which are retained by the Carte in the memory, wherein if no optimization is required, a default value can be used; and then, deploying the Carte, as shown in fig. 11, designating a deployment path (a deployment position of the Carte on the server) for storage and deployment, and the dispatching node of the Tank will automatically deploy the key.
At least one project label is added to each Carte node, the Carte node can have a plurality of project labels, the Carte nodes with the same project label form a Carte cluster for processing tasks under the same project type, therefore, the Carte nodes in the Carte cluster have no master-slave division and are all task execution nodes, and the condition that the whole system is unavailable due to the fault of a Carte master node is avoided.
The ETL scheduling platform directly calls each Carte node to execute tasks, high availability and load balancing functions of the Carte cluster are achieved, the Carte nodes in the Carte cluster have no master nodes or slave nodes, each Carte node is a task execution node, and the condition that the whole system is unavailable due to failure of the Carte master nodes is avoided.
When the Carte node is deployed, the deployment position of the Carte node in the server is also configured, namely, the configuration information of the resource library is issued to the Carte node through the ETL scheduling platform, the Carte node can be directly executed, the Carte cluster is directly called through the API data interface to execute conversion and operation, compared with an original edition of Key, the original edition of Key can only read the resource library configured by the replay.
In an example, after deployment is completed, a Carte start button is clicked on the management interface shown in fig. 6 to start a Carte node, so that Carte nodes having the same Project Tag automatically form a Carte cluster to process tasks under the same type of Project, and the cluster has no master-slave classification and is an execution node. In addition, if the Carte item is managed on the Carte node started on the page, the account password filled in when the original management interface of the Carte node is opened and the Carte configuration is input by clicking the open button, the Carte can be logged in for management, the Carte log is set by default and reserved for 120 minutes, and the execution record can be automatically cleared after the task successfully executed by the Carte is executed, so that the performance of the Carte is optimized.
S4, configuring information of the Ketttle resource library, wherein two types of resource libraries supported by the Kettle are provided, namely a file resource library and a database resource library, and the step of configuring the information of the Ketttle resource library comprises the following steps: the method comprises the steps of creating and deleting a data resource library and/or a file resource library for operation and conversion, synchronizing the file resource libraries on different servers, uploading, downloading, inquiring, deleting and renaming the operation and the conversion, namely, the method is realized through a resource library management module in an ETL scheduling platform.
In one example, the specific execution interface and execution steps for configuring the Ketttle resource library information are as follows:
creating a File resource library, as shown in fig. 12-13, selecting resource library management in fig. 12, then selecting File library management, clicking an add button, jumping to an interface shown in fig. 13, inputting parameters of the File resource library, such as a storage path, a File type and the like, testing and storing after connection, wherein the File system type supports File transfer protocols such as File, FTP, FTPs, HDFS, SFTP and the like, and the performance of the File is the best.
When the task is started, the file resource library does not need to be connected with a database to read the task in advance, so that the starting speed is high, the deployment mode is simple, the file can be directly uploaded, and the version management can be performed by using software such as git and svn. However, the file resource libraries are stored locally, so that high availability is not supported, and the file resource libraries on different servers are synchronized in the application, so that information scheduling in the file resource libraries is facilitated.
Creating a data resource library, as shown in fig. 14-15, selecting resource library management in fig. 14, selecting database management, clicking an add button, jumping to an interface shown in fig. 15, filling relevant information of database connection, clicking a test connection, and clicking for storage after no error is confirmed, wherein the database resource library supports the connection of mainstream database types such as MySQL, oracle and the like. Since the data is stored in the database and the repository is highly available as long as the database is highly available, the repository is simple to use by multiple cards.
S5, configuring the tasks to be scheduled to the corresponding Carte clusters, setting default Carte nodes for processing the tasks to be scheduled according to the preset project types processed by the Carte nodes in the step of configuring the tasks to be scheduled to the corresponding Carte clusters, and checking whether the default Carte nodes are available; in the step of starting the corresponding scheduling task, adopting integrated Quartz and Springboot frames to enable the corresponding card node to execute the task regularly according to the preset timing starting time, displaying the execution status, the execution log and the execution flow chart of each task, each card node through a task management interface arranged on each server, managing the tasks and the card nodes in batches, realizing the accurate and flexible scheduling granularity of the scheduling platform by adopting the integrated Quartz and Springboot frames, and further realizing the functions of load balancing and high available load balancing of the tasks of the scheduling platform.
In one example, as shown in fig. 18 to 19, the task scheduling module of the ETL scheduling platform executes, first enters a task scheduling interface on the task management interface shown in fig. 18, on which a batch of tasks can be started, stopped, deleted, or immediately executed, or an execution state of a certain task can be checked, an execution log can be checked, and an execution flowchart of the task can be queried. Then click and add an interface which jumps to the configuration of the task to be scheduled, namely the interface shown in fig. 19, on the task scheduling interface, and the step of configuring the task to be scheduled to the corresponding Carte cluster includes:
newly adding a corresponding task and designating the task as a timing task; setting timing starting time for a newly added task, wherein the timing format is a Java Quartz cron expression format, and searching whether the task is executed last time;
setting a project name (one task can only belong to one project) and a corresponding relation between the project name and a corresponding project label of a Carte node for the newly added task;
setting the automatic restart times of task execution failure for the newly added task;
and setting the log level of task execution for the newly added task.
The logic for realizing task scheduling through the task scheduling module is as follows:
firstly, adding a new task to be scheduled, setting the new task to be scheduled to be started at a fixed time, and setting at least one of fixed time starting time, the name of a project type to which the task belongs, the number of times of automatic restart after the task fails to execute, the log level of task execution and task starting parameters for the newly added task to be scheduled.
Secondly, acquiring the item type to which the added task to be scheduled belongs, allocating the task to be scheduled to an item Carte cluster with a corresponding item label, and further assigning a default Carte node for processing the task to be scheduled, as shown in fig. 4, the task is allocated to the Carte cluster in the first server or the Carte cluster in the second server for processing through a first scheduling node or a scheduling node;
then, whether the last execution of the task is finished is searched;
if not, skipping the scheduling of the task, namely directly finishing the scheduling of the task;
if the execution is finished, further judging whether the default Carte node is available, if so, using the default Carte node as a specified execution node of the task to be scheduled to execute the task to be scheduled;
and if the default Carte node is unavailable, eliminating the unavailable default Carte node, communicating with each scheduling node, acquiring a keyword value of the task by using a consistent Hash algorithm, acquiring the Carte node with a project label corresponding to the task to be scheduled from each Carte cluster according to the keyword value, and executing the task to be scheduled as a new Carte node. The advantage of using the consistency algorithm to obtain the Carte node is that the same task can be prevented from fluctuating when executed in the Carte cluster.
Because a single point of failure exists in a cluster supported by Carte, a set of highly available cluster scheduling task method is designed on an ETL scheduling platform: by attaching a project label to a Carte node, a task under a certain project needs to be preset with default Carte node processing, the default Carte node has a project label corresponding to the task, if the default Carte node is unavailable, a new Carte node processing task is removed and searched, and a plurality of project labels can be attached to the Carte node, so that the task can be normally executed as long as one surviving node exists in a cluster, and a consistent Hash algorithm is used to obtain the new Carte node when the default Carte node is unavailable, thereby avoiding that the task under the same project is disorderly and drifted when being executed in the project Carte cluster.
And S6, starting the corresponding scheduling task, waiting for the task to be executed, and viewing and starting and stopping the task through a task scheduling interface as can be seen in FIG. 18.
The task scheduling function executed by the task scheduling module is similar to that of Linux Crontab, but compared with Crontab, the Web management mode of the ETL scheduling platform provided by the invention is more convenient than an SSH mode, and the single-point fault problem of the Crontab is avoided, the Log (Log) storage problem in the task execution process is optimized, the Log in the execution process is uniformly stored in a Log database for subsequent analysis and monitoring, carte does not have a Log storage function, if the Carte execution task is called by the Crontab, the Log in the task execution is deleted under the condition that the Carte parameter is not met, and because the Log of the Carte execution task exists in a memory, the Log in the task execution process cannot be recorded by the mode, so the ETL scheduling platform provided by the invention can monitor and manage the execution condition of each task on any server, is convenient for finding out errors in the task execution, and is convenient for a user to manage host resources.
As shown in fig. 16-17, the interface shown by the conversion & job on the task management interface is used to manage the jobs in all the resource pools configured on Tank, and perform the operations of conversion, addition, deletion, name modification, job viewing, etc., that is, through the conversion and job module in the ETL scheduling platform, the conversion and job module is used to upload, download, query, delete, rename the jobs and conversions in the resource pools.
The operation monitoring module of the ETL scheduling platform, that is, the corresponding operation monitoring interface (not shown), is configured to monitor the basic state of the server where the scheduling node is located and the basic state of each Carte cluster server, monitor the execution states of all tasks, and send a system alarm mail, and further have a function of managing all executed tasks: querying which tasks have execution, querying the status of execution, viewing or downloading an execution log of the tasks, stopping the executing tasks, cleaning up a task execution history, cleaning up all tasks whose status is skipped, and cleaning up all tasks before a week.
The present invention also provides a computer readable storage medium having a computer program stored thereon, the computer program, when executed by a processor, performing a method for deploying an ETL platform.
In summary, in the ETL scheduling platform based on the key and the architecture method thereof provided by the embodiments of the present invention, the key cluster is implemented by the application, the ETL scheduling platform directly calls each Carte node to execute a task, so that the high availability and load balancing functions of the key cluster are implemented, the Carte nodes in the Carte cluster have no master-slave nodes, and each Carte node is a task execution node, thereby avoiding the unavailability of the whole system due to the failure of the Carte master node.
Because the cluster supported by the Carte has single point of failure, a set of high-availability cluster task scheduling method is designed on the ETL scheduling platform, tasks can be normally executed as long as one surviving node exists in the cluster, and a consistent hash algorithm is used for obtaining a new Carte node when the default Carte node is unavailable, so that the tasks under the same project are prevented from being disorderly drifted when being executed in the project Carte cluster.
The above description is only for the purpose of describing the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention, and any variations and modifications made by those skilled in the art based on the above disclosure are within the scope of the appended claims.

Claims (14)

1. An ETL (extract transform load) scheduling platform is characterized by comprising at least two scheduling nodes which are communicated with each other, wherein each scheduling node is correspondingly arranged on a corresponding server one by one, each scheduling node is used as a scheduling node of any Carte cluster and is used for configuring the Carte cluster and each Carte node in the Carte cluster, each scheduling node can manage all Carte nodes which are arranged, and the Carte nodes in the Carte cluster have no master-slave division and are all task execution nodes;
after any scheduling node receives a corresponding task to be scheduled, the task is automatically load-balanced to each scheduling node, so that when the Carte node specified by the task is available, the specified Carte node executes the task, and when the Carte node specified by the task is unavailable, a new Carte node is searched in all the Carte nodes according to the project labels on the Care nodes to execute the task.
2. The ETL scheduling platform of claim 1, wherein the scheduling node comprises:
the system comprises a Carte node management module, a data interface module and a data interface module, wherein the Carte node management module is used for adding and deleting a Carte cluster server, setting an SSH remote login server and managing an sftp file, and adding, deleting, starting and stopping a corresponding Carte node on the corresponding Carte cluster server through the data interface so as to configure the Carte cluster and each Carte node in the Carte cluster;
the task scheduling module is used for adding, deleting, starting, stopping or immediately executing a task to be scheduled, assigning a corresponding Carte node for the task to be scheduled to execute, and searching a new Carte node in all the Carte clusters to execute the task when the Carte node assigned by the task is unavailable;
and the operation monitoring module is used for monitoring the basic state of the server where the scheduling node is positioned and the basic state of each Carte cluster server, monitoring the execution states of all tasks and sending a system alarm mail.
3. The ETL scheduling platform of claim 2, wherein each of said scheduling nodes has a task management interface for displaying the status of each task, the status of each Carte node executing the task, the log of executing the task, the execution flow chart of the task, so as to manage the tasks and Carte nodes in bulk.
4. The ETL scheduling platform of claim 2, wherein said scheduling node further comprises:
the resource library management module is used for creating and deleting a management data resource library and/or a file resource library, and synchronizing data of the file resource libraries on different Carte cluster servers;
the conversion and operation module is used for uploading, downloading, inquiring, deleting and renaming the operation and the conversion in the resource library;
and the system setting module is used for carrying out user management and adding or deleting users logging in the ETL scheduling platform.
5. The ETL scheduling platform of claim 2, wherein the Carte node management module comprises a Carte node management unit, and the Carte node management unit is configured to add a Carte node, specify a server where the Carte node is located and a deployment location in the server, and a processed item type, and set at least one item tag for the added Carte node, where Carte nodes with the same item tag form a Carte cluster to process tasks under the same item type.
6. The ETL scheduling platform of claim 2, wherein said task scheduling module is further configured to set a newly added task to be scheduled to be started at a fixed time, and to set at least one of a fixed time start time, a name of a project type to which the task belongs, a number of automatic restarts after a task execution failure, a log level of task execution, and a task start parameter for the newly added task to be scheduled.
7. The ETL scheduling platform of claim 2, wherein said task scheduling module is further to:
acquiring the item type of the added task to be scheduled, distributing the task to be scheduled to an item Carte cluster with a corresponding item label, and further appointing a default Carte node for processing the task to be scheduled;
searching whether the task is executed last time;
if not, skipping the scheduling of the task, namely directly finishing the scheduling of the task;
if the execution is finished, further judging whether the default Carte node is available, if so, using the default Carte node as a designated execution node of the task to be scheduled so as to execute the task to be scheduled;
if the default is not the sameCarteAnd if the nodes are unavailable, eliminating the unavailable default Carte nodes, communicating with each scheduling node, acquiring a keyword value of the task by using a consistent Hash algorithm, acquiring Carte nodes with item labels corresponding to the task to be scheduled from each Carte node according to the keyword value, and executing the task to be scheduled by using the Carte nodes as new Carte nodes.
8. The ETL scheduling platform of claim 6, wherein said task scheduling module employs an integrated Quartz and Springboot framework to implement a timed start of tasks, the timed scheduling of each task being accomplished by executing the timed start time saved in said task scheduling module.
9. A deployment method of an ETL scheduling platform, which employs the ETL scheduling platform according to any one of claims 1 to 8, further comprising the following steps:
deploying basic components, wherein the basic components comprise a MySQL database and a JAVA language development toolkit;
deploying at least one scheduling node;
logging in any scheduling node to deploy the Carte cluster, wherein the Carte cluster comprises SSH remote login information of a configuration server, an sftp file and each Carte node under the Carte cluster;
configuring the information of the Ketttle resource library;
configuring a task to be scheduled to a corresponding Carte cluster;
and starting the corresponding scheduling task and waiting for the task to execute.
10. The method for deploying an ETL scheduling platform according to claim 9, wherein the step of configuring the task to be scheduled to the corresponding Carte cluster comprises:
newly adding a corresponding task and designating the task as a timing task;
setting timing starting time for the newly added task;
setting starting parameters for the newly added task;
setting a project name and a corresponding relation between the project name and a corresponding project label of a Carte node for the newly added task;
setting the automatic restart times of task execution failure for the newly added task;
and setting the log level of task execution for the newly added task.
11. The deployment method of the ETL scheduling platform according to any one of claims 9 to 10, wherein in the step of configuring the task to be scheduled to the corresponding Carte cluster, a default Carte node for processing the task to be scheduled is further set according to a preset item type processed by the Carte node, and it is checked whether the default Carte node is available;
in the step of starting the corresponding scheduling task, adopting an integrated Quartz and Springboot framework to enable the corresponding Carte node to execute the task at regular time according to the preset timing starting time, and displaying each task, the execution state of each Carte node, an execution log and an execution flow chart of the task through a task management interface arranged on each server so as to manage the tasks and the Carte nodes in batches.
12. The deployment method of the ETL scheduling platform of claim 9, wherein the step of configuring the kttle repository information comprises: creating and deleting data resource libraries and/or file resource libraries for operation and conversion, synchronizing file resource libraries on different servers, and uploading, downloading, querying, deleting and renaming the operation and conversion.
13. The deployment method of the ETL scheduling platform according to any of claims 9-10, wherein the step of logging in any scheduling node to deploy Carte cluster comprises:
adding at least one item label to each Carte node, wherein Carte nodes with the same item label form a Carte cluster for processing tasks under the same item type, and the Carte nodes in the Carte cluster have no master-slave difference and are all task execution nodes; the step of further configuring the task to be scheduled to the corresponding Carte cluster according to the deployed Carte cluster comprises:
adding a new task to be scheduled, setting the new task to be started at a fixed time, and setting at least one of fixed time starting time, the name of a project type to which the task belongs, the number of times of automatic restart after the task fails to be executed, the log level of task execution and task starting parameters for the newly added task to be scheduled;
acquiring the item type of the added task to be scheduled, distributing the task to be scheduled to an item Carte cluster with a corresponding item label, and further appointing a default Carte node for processing the task to be scheduled;
searching whether the task is executed last time;
if not, skipping the scheduling of the task, namely directly finishing the scheduling of the task;
if the execution is finished, further judging whether the default Carte node is available, if so, using the default Carte node as a designated execution node of the task to be scheduled so as to execute the task to be scheduled;
and if the default Carte node is unavailable, eliminating the unavailable default Carte node, communicating with each scheduling node, acquiring a keyword value of the task by using a consistent Hash algorithm, acquiring the Carte node with a project label corresponding to the task to be scheduled from each Carte cluster according to the keyword value, and executing the task to be scheduled by using the Carte node as a new Carte node.
14. A computer readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the ETL of any one of claims 9 to 13SchedulingA deployment method of a platform.
CN202210971604.5A 2022-08-15 2022-08-15 ETL scheduling platform, deployment method thereof and computer-readable storage medium Active CN115048205B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210971604.5A CN115048205B (en) 2022-08-15 2022-08-15 ETL scheduling platform, deployment method thereof and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210971604.5A CN115048205B (en) 2022-08-15 2022-08-15 ETL scheduling platform, deployment method thereof and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN115048205A CN115048205A (en) 2022-09-13
CN115048205B true CN115048205B (en) 2023-02-07

Family

ID=83166880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210971604.5A Active CN115048205B (en) 2022-08-15 2022-08-15 ETL scheduling platform, deployment method thereof and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN115048205B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640968A (en) * 2022-10-18 2023-01-24 中电金信软件有限公司 Job scheduling method and device, electronic equipment and storage medium
CN115357657B (en) * 2022-10-24 2023-03-24 成都数联云算科技有限公司 Data processing method and device, computer equipment and storage medium
CN115687486B (en) * 2022-11-14 2023-06-13 浪潮智慧科技有限公司 Light-weight data acquisition method and device based on keyle
CN116383295A (en) * 2023-06-06 2023-07-04 工业富联(佛山)创新中心有限公司 Data processing method, device, electronic equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888636A (en) * 2019-12-03 2020-03-17 中电工业互联网有限公司 ETL Web application system architecture method based on button
CN114064240A (en) * 2021-11-15 2022-02-18 国泰君安证券股份有限公司 Platform system, method, apparatus, processor and computer storage medium for implementing low code configurability ETL data transformation
CN114372105A (en) * 2022-01-13 2022-04-19 中电福富信息科技有限公司 ETL tool based method for realizing system automatic inspection

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112491606B (en) * 2020-11-20 2022-05-24 湖南麒麟信安科技股份有限公司 Method for automatically deploying high-availability cluster of service system based on infrastructure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888636A (en) * 2019-12-03 2020-03-17 中电工业互联网有限公司 ETL Web application system architecture method based on button
CN114064240A (en) * 2021-11-15 2022-02-18 国泰君安证券股份有限公司 Platform system, method, apparatus, processor and computer storage medium for implementing low code configurability ETL data transformation
CN114372105A (en) * 2022-01-13 2022-04-19 中电福富信息科技有限公司 ETL tool based method for realizing system automatic inspection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
某电信公司的商业智能分析***的设计与实现;谢迎凤;《中国优秀硕士学位论文全文数据库 信息科技辑》;20210315(第3期);第I138-131页 *

Also Published As

Publication number Publication date
CN115048205A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN115048205B (en) ETL scheduling platform, deployment method thereof and computer-readable storage medium
US11106455B2 (en) Integration of containers with external elements
US20230308525A1 (en) Embedded database as a microservice for distributed container cloud platform
US20210406079A1 (en) Persistent Non-Homogeneous Worker Pools
EP3428811B1 (en) Database interface agent for a tenant-based upgrade system
US9529613B2 (en) Methods and apparatus to reclaim resources in virtual computing environments
RU2429529C2 (en) Dynamic configuration, allocation and deployment of computer systems
US20180143856A1 (en) Flexible job management for distributed container cloud platform
CN108696372B (en) Method and system for keeping system configuration consistency
CN111125444A (en) Big data task scheduling management method, device, equipment and storage medium
US20170063986A1 (en) Target-driven tenant identity synchronization
CN113204353B (en) Big data platform assembly deployment method and device
CN115357198B (en) Mounting method and device of storage volume, storage medium and electronic equipment
US11449350B2 (en) Systems and methods for automatically updating compute resources
US11750451B2 (en) Batch manager for complex workflows
US20230342183A1 (en) Management method and apparatus for container cluster
CN114253562A (en) Management and deployment method and system of server software package
EP4162649B1 (en) Stable references for network function life cycle management automation
CN117076096A (en) Task flow execution method and device, computer readable medium and electronic equipment
CN115309558A (en) Resource scheduling management system, method, computer equipment and storage medium
CN115543491A (en) Microservice processing method and device
US8516091B2 (en) Mass configuring technical systems
US11743188B2 (en) Check-in monitoring for workflows
EP4148570A1 (en) Content processing management system and method
CN117389582A (en) Method, device and system for updating container application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 510000 No. 28, Fenghuang fifth road, Huangpu District, Guangzhou, Guangdong

Patentee after: Yuexin Semiconductor Technology Co.,Ltd.

Address before: 510000 No. 28, Fenghuang fifth road, Huangpu District, Guangzhou, Guangdong

Patentee before: Guangzhou Yuexin Semiconductor Technology Co.,Ltd.