CN118034887A

CN118034887A - Big data platform task management method and system

Info

Publication number: CN118034887A
Application number: CN202410230292.1A
Authority: CN
Inventors: 郭慧蓉; 吴广; 屈春花; 柏世豪
Original assignee: Chongqing Fumin Bank Co Ltd
Current assignee: Chongqing Fumin Bank Co Ltd
Priority date: 2024-02-29
Filing date: 2024-02-29
Publication date: 2024-05-14

Abstract

The invention relates to the technical field of data processing, in particular to a task management method and system for a big data platform. The big data platform task management system comprises a dispatch service module, a task management module, a dispatch instance management module and a task instance management module. The dispatching service module executes dispatching operation according to specific frequency, manages the whole dispatching life cycle and ensures the task to be executed according to the plan. The task management module is used for creating and managing a series of tasks, supporting various task types and can declare the dependency relationship among the tasks. The scheduling instance management module is used for managing the content instance generated by each scheduling, providing a Gantt chart view and executing instance operation functions, and visualizing the scheduling process. The task instance management module is used for managing specific task instances in each scheduling content, so that fine monitoring and management of task execution are realized. The invention can improve the efficiency of task scheduling management through the combination of the modules.

Description

Big data platform task management method and system

Technical Field

The invention relates to the technical field of data processing, in particular to a task management method and system for a big data platform.

Background

With the rapid development of big data technology, the data volume in enterprises is rapidly expanded, the complexity of data processing is continuously increased, and the importance of data scheduling and data management is increasingly highlighted.

In large enterprises, the management of tasks has become increasingly complex, involving multiple departments and different types of data sources, as well as various data processing logic. When an enterprise processes multi-step and multi-level data tasks, the problems of low efficiency and insufficient accuracy are often encountered, and particularly in a dynamically-changing business environment, the fixed setting of a task management tool is difficult to quickly adapt to new requirements. To cope with these complex business scenarios, enterprises have to rely on integration of multiple task management tools, but this increases the complexity of operations and reduces the efficiency of management.

Disclosure of Invention

The invention aims to provide a task management system of a big data platform, which can improve the efficiency of task management.

The basic scheme provided by the invention is as follows: a big data platform task management system comprises a dispatch service module, a task management module, a dispatch instance management module and a task instance management module; the dispatching service module executes dispatching operation according to specific frequency and manages the dispatching life cycle; the task management module is used for creating and managing a series of tasks, supporting various task types and declaring the dependency relationship among the tasks; the scheduling instance management module is used for managing each instance of scheduling content generated according to the scheduling frequency and comprises a function of checking Gantt charts and executing instance operation; the task instance management module is used for managing specific task instances in each scheduling content.

The invention has the realization principle and beneficial effects that: the scheduling service module executes scheduling operation according to the set frequency, manages the whole scheduling life cycle and ensures the on-time execution of tasks. The task management module provides functionality to create and manage multiple types of tasks, including declaring dependencies between tasks, making task execution more efficient. The scheduling instance management module manages each scheduling instance, provides Gantt chart views and instance operation functions, improves visualization and operability of scheduling processes, and improves scheduling management efficiency. The task instance management module is used for managing specific task instances in each scheduling content, and can monitor the execution state and performance of each task. By combining the modules, the efficiency of task scheduling management can be improved.

Further, the scheduling service module comprises a scheduling frequency configurator, an instance concurrency configurator and a policy manager; the scheduling frequency configurator is used for setting a CRONTAB expression, a fixed interval, single scheduling of appointed date and time and manually triggered scheduling; the instance concurrency configurator is used for setting the concurrency of each scheduling instance; the policy manager is used for managing priority policies of the scheduling hierarchy and processing policies when the configuration task fails.

The beneficial effect of this scheme is: the scheduling frequency configurator provides flexible scheduling options for different types of tasks and service scenes by supporting various scheduling modes such as CRONTAB expression, fixed interval, single scheduling of appointed date and time, manual triggering and the like. The instance concurrency configurator can control the concurrent execution quantity of each scheduling instance, optimize the resource use and prevent overload. The policy manager can improve the reliability of task execution and the capability of the system to cope with abnormal situations by managing the priority policy of the scheduling hierarchy and defining the processing policy when the task fails.

Further, the task management module comprises a task type manager, a task dependency configurator, a task attribute configurator and an alarm manager; the task type manager is used for supporting and expanding various task types; the task dependency configurator is used for declaring the dependency relationship between the tasks; the task attribute configurator is used for configuring fixed attributes of the tasks; the alarm manager is used for configuring and triggering alarms in different scenes.

The beneficial effect of this scheme is: the task type manager supports and extends multiple task types so that the system can handle a variety of different data processing tasks. The task dependency configurator can enable a user to clearly declare the dependency relationship between tasks, and ensures the correct sequence and data consistency of task execution. The task attribute configurator is used for configuring the fixed attribute of the task, so that the task configuration is standardized and easy to manage. The alarm manager provides functions of configuring and triggering alarms in different scenes, and improves timely response capability of the system.

Further, the scheduling instance management module includes a viewer and an instance operator; the viewer uses Gantt chart to display the blood-cause dependence and the critical path of the task; the instance operator is used for terminating, suspending, recovering and re-operating the scheduling instance.

The beneficial effect of this scheme is: the viewer uses the Gantt chart to present the blood-cause dependency and critical path of the tasks so that the user can understand the relationship and execution sequence between the tasks at a glance and can identify potential scheduling problems, thereby optimizing the scheduling policy. The instance operator provides the ability to terminate, suspend, resume and re-run the scheduling instance, enabling the user to flexibly cope with various runtime situations, improving the responsiveness to abnormal situations.

Further, the task instance management module comprises a log viewer, a link analyzer and a time consumption analyzer; the log viewer is used for viewing log information of the task instance; the link analyzer is used for analyzing an upstream link, a downstream link and a slowest link of the task instance; the time consumption analyzer is used for analyzing the execution time consumption of the historical task.

The beneficial effect of this scheme is: the log viewer enables a user to access and review detailed log information for task instances. The link analyzer provides the ability to analyze task instances upstream and downstream dependencies and identify slowest links, helping to optimize the overall workflow and improve data processing efficiency. The time-consuming analyzer allows the user to analyze the execution time of the historical tasks and helps the user optimize the evaluation system performance.

Further, the alarm manager sends out alarm notification by using mail, short message and nail.

The beneficial effect of this scheme is: by the multi-channel notification method, the system can quickly notify relevant personnel when a key problem or a situation requiring urgent treatment occurs, so that the problem can be timely noticed and treated.

Drawings

FIG. 1 is a schematic diagram of a mission-critical baseline alert notification delivery for a big data platform mission management system;

FIG. 2 is a Gantt chart of a big data platform task management system;

Fig. 3 is a schematic diagram of actual task execution of a task management system of a big data platform.

Detailed Description

The following is a further detailed description of the embodiments:

Example 1

The task management system of the big data platform shown in fig. 1 comprises a dispatch service module, a task management module, a dispatch instance management module and a task instance management module.

The scheduling service module in this embodiment performs scheduling operations according to a specific frequency and manages the lifecycle of scheduling. The dispatch service module includes a dispatch frequency configurator, an instance concurrency configurator, and a policy manager.

The scheduling frequency configurator is used for setting CRONTAB expression, fixed interval, single scheduling of appointed date and time and manually triggered scheduling. The CRONTAB expression is similar to the Linux server CRONTAB function, and the user can accurately define the time point of task execution, such as a specific time of day, week or month; the fixed interval scheduling is suitable for tasks to be executed according to a fixed time interval, such as automatically starting the next task after the fixed time interval after the last scheduled task is completed; the AT command similar to a Linux server is scheduled once AT a specified date and time, so that a task running once AT a specific time point can be realized; the manual trigger provides a function of starting the task immediately for the user, and is suitable for online testing or use in emergency. The system can process periodic and predictive tasks through diversified scheduling options, can flexibly cope with sudden and special situations, and improves the efficiency and response capability of the whole system.

And the instance concurrency configurator is used for setting the concurrency of each scheduling instance. In the embodiment, the concurrency of the instances is dynamically adjusted based on task dependency and resource limitation, and a user can set the concurrency of each scheduling instance, namely the number of scheduling instances which can be executed in parallel at the same time. The scheduling instance refers to each specific scheduling operation generated according to the scheduling rule. For example, a schedule set using the CRONTAB expression "0 0" will generate a new schedule instance every day. The aim of example concurrency configuration is to control the number of tasks running simultaneously, ensure that system resources are effectively utilized, and avoid performance problems caused by resource overload or task dependent incompletion. By reasonably configuring the concurrency of the instances, the situation that the next scheduling instance starts to execute when the last scheduling instance is not executed yet can be prevented.

And the policy manager is used for managing the priority policy of the scheduling hierarchy and the processing policy when the configuration task fails. The priority policy may automatically adjust the execution priority of the tasks based on the downstream dependency number of each task, that is, the more the number of tasks in the downstream aggregate, the higher the priority. This "downstream priority" strategy ensures that critical tasks are completed in time, thereby avoiding affecting the efficiency of the overall workflow due to critical task delays.

In other embodiments, the priority policy management unit not only considers the downstream dependency numbers of tasks, but also evaluates the urgency of each task and the impact on the overall workflow in conjunction with past log data.

By analyzing the history log file, the execution mode and time sensitivity of the task are determined. For example, certain tasks may be performed frequently within a particular time period, or may be performed urgently after a particular event. The machine learning model is used to analyze the log data and assign an "importance score" to each task based on the historical execution frequency of the task, the impact of the execution results, and the correlation between tasks. A composite priority score is calculated in combination with the downstream dependency number and the importance score. In assigning tasks, not only the number of downstream dependencies is considered, but also the urgency of the task and the potential impact on the business process. As new logs are generated and historical data is accumulated, the machine learning model is updated periodically to reflect the latest business situation and task importance.

Task failure policy management provides a processing policy when a task fails. For example: when the task fails, the method is configured to ignore errors and continue to execute subsequent downstream tasks; or may choose to terminate all relevant upstream and downstream tasks, preventing further error propagation. In some cases, it is also possible to suspend the entire scheduling process or retry the failed task. In addition, the system can also be configured to notify relevant responsible persons when the task fails, so as to ensure timely intervention and problem solving.

In addition, the scheduling service module further comprises a backtracking unit and a parameter configuration unit, the backtracking unit backtracks tasks according to a specified time period, the parameter configuration unit is used for configuring static variables and dynamic variables, and the configuration of the dynamic variables dynamically calculates variable values based on the current scheduling state.

The backtracking unit enables the system to review and re-execute tasks at a certain time point in the past, repair past errors in time and deal with data changes. For example, if a problem is found with data processing for the past week, the user may simply set the week to be a backtracking period, and the system will automatically re-process all tasks within the period. For parallel backtracking operation, the user can also specify the concurrency, i.e. the number of tasks that can be backtracked simultaneously at the same time. Through backtracking processing, consistency and accuracy of data can be ensured.

In terms of variable and parameter management, the parameter configuration unit may be used for static variable configuration and dynamic variable configuration. Static variables are suitable for settings that remain unchanged during the scheduling process, common including, but not limited to, values such as the date of T-1 (i.e., yesterday's date), the last month of the month, and the particular time of the weekend Zhou Chudeng. The values of these variables are typically calculated from the time of the scheduling frequency so that tasks can be adjusted or executed for these specific points in time. And the dynamic variables can be dynamically calculated and adjusted according to the state of the current schedule. Similar to the Spring expression in the Spring Boot, the user can define a complex expression to calculate the variable value. The parameter configuration unit may set various variables and parameters and apply these settings to all tasks under the schedule. In practical application scenarios, these parameters and variables are typically applied to the case of variable values in some subordinate tasks, such as linking to databases, adjusting the number of specific products, etc.

The task management module in this embodiment is used to create and manage a series of tasks, support multiple task types, and declare dependency relationships between tasks. The task management module comprises a task type manager, a task dependency configurator, a task attribute configurator and an alarm manager. And the task type manager is used for supporting and expanding a plurality of task types. And the task dependency configurator is used for declaring the dependency relationship between the tasks. And the task attribute configurator is used for configuring the fixed attribute of the task. The alarm manager is used for configuring and triggering alarms under different scenes and sending alarm notification by using mails, short messages and nails.

The task type manager supports a variety of task types including, but not limited to shell, sqoop, hive2, spark, datax, stored procedures, branching tasks, quality checking tasks, file listening tasks, database listening tasks, and the like. In addition, an interface is reserved to support new task types which may occur in the future, and the expansibility and the adaptability of the system are ensured. The Sqoop task type refers to a script or an executed command of which the specific content is executed by the Sqoop; the Hive task type is script of Hive execution/execution content of Hive; quality checking task: rule type (hierarchical detection, single sql detection, double sql detection), data source, custom sql, check criteria (consistency, accuracy, etc.).

The dependency relationship between tasks is declared in the form of configuration or dragging, so that the flexibility and the user friendliness of task management are improved. The task relies on the blood relationship of the data, i.e. ensures that the order of task execution conforms to the logic of the data stream, thereby ensuring that the downstream task can correctly acquire the required data. In this embodiment, whether to configure the scheduling dependency of the task based on the blood-edge relationship of the data table may be selected according to the service requirement.

Each task also contains a series of fixed attributes such as task name, description, failed retry function, responsible person, developer, affiliated resource pool, submitting server and path of executing content (e.g. some script paths to be executed, etc., file protocol is not limited to local file protocol, hdfs file protocol, shared file protocol, etc.), priority setting, notification management (notification management contains failure, success, delay, etc.), task resource monitoring, task searching, task relationship display.

The rule for setting the task priority in this embodiment is that the priority of the task level is higher than the priority of the scheduling level, so as to ensure that the critical task is executed with priority. Alarms in the task management module include various conditions such as success alarms, failure alarms, delay alarms, timeout alarms, and expected unexecuted alarms. The alarm modes are various, including nailing, micro-message, telephone, short message and mail, even the alarm content can be pushed to the operation and maintenance monitoring large screen, and an expandable notification type interface is provided. Mission-critical (highest priority e.g., supervision related) baseline alert notification transitivity as shown in fig. 1, mainly includes three parts of content:

1. Creating a base line: tasks added to the baseline are specified and baseline priorities and alarm policy parameters are set.

2. Determining a monitoring range according to the baseline task K: upstream nodes of the baseline task, i.e. nodes affecting the output of task K, are all included in the monitoring range, such as A, B, E, F, I; downstream nodes of the baseline task are not within the monitoring range, such as M, C, D, G, H, J, L; the critical path is defined as the longest time-consuming path of all paths affecting task K, such as ABFIK paths in the illustration.

3. And starting a baseline alarm or an event alarm according to the actual running condition of the monitoring range class task.

The big data platform task management system in the embodiment further comprises a task resource monitoring module for monitoring the use condition of various resources. Such as CPU, memory, disk read-write, and IO occupation, help users optimize resource allocation and scheduling policies.

The scheduling instance management module in this embodiment is used for managing each instance of scheduling content generated according to the scheduling frequency, and includes a viewer and an instance operator. A viewer, using a Gantt chart to demonstrate the blood-lineage dependence and critical path of a task. As shown in fig. 2. Through the Gantt chart, the user can clearly see the starting time and the ending time of each task and the dependency relationship among the tasks. In addition, the Gantt chart also makes the abnormal conditions in the monitoring process become clear at a glance, such as task delay or overlong execution time. And the instance operator is used for terminating, suspending, recovering and re-running the scheduling instance.

The task instance management module in this embodiment is configured to manage a specific task instance in each scheduling content. The task instance management module includes a log viewer, a link analyzer, and a time-consuming analyzer. And the log viewer is used for viewing log information of the task instance. And the link analyzer is used for analyzing the upstream link, the downstream link and the slowest link of the task instance. And the time consumption analyzer is used for analyzing the execution time consumption of the historical task.

The log viewer can display a basic running log, providing error logs, warning information, and other critical running data. Through the detailed log information, the user can effectively diagnose problems, understand the execution condition of tasks, and evaluate the performance of tasks. The link analyzer is used for analyzing the upstream and downstream links of the task instance and provides a visual representation of the interdependence relationship between tasks. The user can clearly see through this tool which tasks are pre-or post-conditions of other tasks, thereby helping the user understand the data flow of the overall task stream. The time-consuming analyzer may analyze the time consumed in performing the historical tasks.

In this embodiment, the user may also force the status of the task instance to be successful in some cases; failure or problem tasks may be re-executed, including re-running a single task or its associated upstream and downstream tasks; a backtracking operation can be performed on a single task instance for reprocessing or analyzing past data; and terminating the executing task instance if necessary to prevent error diffusion or resource waste.

In addition, when the task is actually executed, the task is influenced by a defined timing scheduling time, and is also influenced by a plurality of factors, such as timing time of an upstream task, actual execution completion time of the upstream task, and remaining resources of a task execution resource group, as shown in fig. 3.

The foregoing is merely exemplary of the present application, and specific structures and features well known in the art will not be described in detail herein, so that those skilled in the art will be aware of all the prior art to which the present application pertains, and will be able to ascertain the general knowledge of the technical field in the application or prior art, and will not be able to ascertain the general knowledge of the technical field in the prior art, without using the prior art, to practice the present application, with the aid of the present application, to ascertain the general knowledge of the same general knowledge of the technical field in general purpose. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present application, and these should also be considered as the scope of the present application, which does not affect the effect of the implementation of the present application and the utility of the patent. The protection scope of the present application is subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

Claims

1. The big data platform task management system is characterized by comprising a scheduling service module, a task management module, a scheduling instance management module and a task instance management module; the dispatching service module executes dispatching operation according to specific frequency and manages the dispatching life cycle; the task management module is used for creating and managing a series of tasks; the scheduling instance management module is used for managing each instance of scheduling content generated according to the scheduling frequency; the task instance management module is used for managing specific task instances in each scheduling content.

2. The big data platform task management system of claim 1, wherein the dispatch service module includes a dispatch frequency configurator, an instance concurrency configurator, and a policy manager; the scheduling frequency configurator is used for setting a CRONTAB expression, a fixed interval, single scheduling of appointed date and time and manually triggered scheduling; the instance concurrency configurator is used for setting the concurrency of each scheduling instance; the policy manager is used for managing priority policies of the scheduling hierarchy and processing policies when the configuration task fails.

3. The big data platform task management system of claim 2, wherein the task management module comprises a task type manager, a task dependency configurator, a task attribute configurator, and an alarm manager; the task type manager is used for supporting and expanding various task types; the task dependency configurator is used for declaring the dependency relationship between the tasks; the task attribute configurator is used for configuring fixed attributes of the tasks; the alarm manager is used for configuring and triggering alarms in different scenes.

4. A big data platform task management system according to claim 3, wherein the scheduling instance management module comprises a viewer and an instance manipulator; the viewer uses Gantt chart to display the blood-cause dependence and the critical path of the task; the instance operator is used for terminating, suspending, recovering and re-operating the scheduling instance.

5. The big data platform task management system of claim 4, wherein the task instance management module includes a log viewer, a link analyzer, and a time-consuming analyzer; the log viewer is used for viewing log information of the task instance; the link analyzer is used for analyzing an upstream link, a downstream link and a slowest link of the task instance; the time consumption analyzer is used for analyzing the execution time consumption of the historical task.

6. A big data platform task management system according to claim 3, wherein the alarm manager includes sending an alarm notification using mail, sms, or staples.

7. A big data platform task management method, characterized in that the method uses the big data platform task management system according to any of claims 1-6.