CN113342532B

CN113342532B - Zookeeper-based distributed task scheduling method and system

Info

Publication number: CN113342532B
Application number: CN202110716212.XA
Authority: CN
Inventors: 晏宁; 万磊; 李毅; 钱进
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2023-03-21
Anticipated expiration: 2041-06-25
Also published as: CN113342532A

Abstract

The invention relates to the technical field of financial technology (Fintech), and discloses a Zookeeper-based distributed task scheduling method and a Zookeeper-based distributed task scheduling system, wherein on one hand, a Zookeeper node creation mechanism is utilized, and orderly distribution of tasks is realized by creating nodes and designating node identification information in a Zookeeper directory; on the other hand, the event monitoring mechanism of the Zookeeper is utilized to keep the machine monitoring the node creation event in the Zookeeper, the machine execution task consistent with the identification information of the node can be automatically triggered, and the problem of poor system stability caused by the fact that a large number of machines simultaneously perform lock snatching is avoided.

Description

Zookeeper-based distributed task scheduling method and system

Technical Field

The invention relates to the technical field of financial technology (Fintech), in particular to a Zookeeper-based distributed task scheduling method and a Zookeeper-based distributed task scheduling system.

Background

With the development of computer technology, more and more technologies (big data, distributed, blockchain, artificial intelligence, and the like) are applied to the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but due to the requirements of security and real-time performance of the financial industry, higher requirements are also put forward on task scheduling technology.

The traditional distributed task scheduling is based on the distributed function of a database to issue tasks, then each task executor preempts the task execution through a database lock, and in order to solve the problem of Single Point of Failure (SPOF), a plurality of same task executors are deployed, and if too many task executors participate in the preemption at the same time, the database and a server may be jammed.

Disclosure of Invention

The invention mainly aims to provide a Zookeeper-based distributed task scheduling method and a Zookeeper-based distributed task scheduling system, and aims to solve the technical problem that the existing task scheduling system based on a locking mechanism is not high in stability.

In order to achieve the above object, the present invention provides a Zookeeper-based distributed task scheduling method, which is applied to a Zookeeper-based distributed task scheduling system, where the distributed task scheduling system includes a uniform resource scheduler, a type resource scheduler, and a task executor, and the Zookeeper-based distributed task scheduling method includes:

when it is monitored that a Task node associated with a target Task in a Zookeeper directory is created, the uniform resource scheduler determines a target type resource scheduler in a candidate type resource scheduler, creates a Master node under the Task node, and creates an Engine node corresponding to the target type resource scheduler under the Master node, wherein node information of the Engine node comprises identification information of the target type resource scheduler;

when the target type resource scheduler monitors that the Engine node is created and the identification information of the Engine node is consistent with the identification information of the target type resource scheduler, the target type resource scheduler splits the target task to obtain subtasks, determines a target task executor in candidate task executors, creates an Agent node corresponding to the target task executor under the Engine node and associates the corresponding subtasks for the Agent node, wherein the node information of the Agent node comprises the identification information of the target task executor;

and when the target task executor monitors that the Agent node is created and the identification information of the Agent node is consistent with the identification information of the target task executor, the target task executor executes a corresponding subtask.

Optionally, the distributed task scheduling system further includes a resource manager;

the step of the uniform resource scheduler determining the target type resource scheduler among the candidate type resource schedulers comprises:

the uniform resource scheduler sends a candidate type resource scheduler acquisition request to the resource manager based on the task type of the target task;

and the uniform resource scheduler determines a weight value of each candidate type resource scheduler based on the machine parameter of the candidate type resource scheduler corresponding to the task type fed back by the resource manager, and determines a target type resource scheduler in the candidate type resource schedulers according to the weight value.

Optionally, the machine parameters include the number of available CPU cores and available memory;

the uniform resource scheduler determines a weight value of each candidate type resource scheduler based on a machine parameter of the candidate type resource scheduler corresponding to the task type and fed back by the resource manager, and the step of determining a target type resource scheduler in the candidate type resource schedulers according to the weight value comprises the following steps:

the uniform resource scheduler determines the weight value of each candidate type resource scheduler based on the available CPU core number, the available memory, a first preset CPU intensive coefficient and a first preset IO intensive coefficient of each candidate type resource scheduler corresponding to the task type and fed back by the resource manager; the first preset CPU intensive coefficient is smaller than the first preset IO intensive coefficient;

and determining a target type resource scheduler in the candidate type resource schedulers according to the weight values.

Optionally, the target type resource scheduler splits the target task to obtain subtasks, and the step of determining a target task executor from the candidate task executors includes:

the target type resource scheduler splits the target task to obtain subtasks based on the task data volume of the corresponding Engine node, and sends a candidate task executor acquisition request to the resource manager;

and the target type resource scheduler determines a weight value of each candidate task executor based on machine parameters of the candidate task executors fed back by the resource manager, and determines target task executors corresponding to the number of the subtasks in the candidate task executors according to the weight values.

the target type resource scheduler determines a weight value of each candidate task executor according to machine parameters of the candidate task executor fed back by the resource manager, and the step of determining the target task executor corresponding to the number of the subtasks in the candidate task executor according to the weight value includes:

the target type resource scheduler determines a weighted value of each candidate task executor based on the available CPU core number, the available memory, a second preset CPU intensive coefficient and a second preset IO intensive coefficient of each candidate task executor fed back by the resource manager; the second preset CPU intensive coefficient is larger than the second preset IO intensive coefficient;

and determining target task executors corresponding to the quantity of the subtasks in the candidate task executors according to the weight values.

Optionally, the step of determining, by the target-type resource scheduler, a weighted value of each candidate task executor based on the available CPU core number, the available memory, the second preset CPU intensive coefficient, and the second preset IO intensive coefficient of each candidate task executor fed back by the resource manager includes:

the target type resource scheduler determines the weight value of each candidate task executor based on a second preset formula, the available CPU core number, the available memory, a second preset CPU intensive coefficient and a second preset IO intensive coefficient of each candidate task executor fed back by the resource manager;

the second preset is:

wherein, W ₂ The weight value of the candidate task executor;

c is the number of available CPU cores;

m is available memory;

T _c2 setting a second preset CPU intensive coefficient;

T _m2 and setting the IO intensive coefficient for the second preset IO intensive coefficient.

Optionally, after the step of executing the corresponding subtask by the target task executor, the method further includes:

if the subtask fails to be executed, updating the state information of the corresponding Agent node into a failure state by a target task executor corresponding to the subtask in the Zookeeper directory;

when monitoring that the state information of the Agent node is in a failure state, the target type resource scheduler sends a candidate task executor acquisition request to the resource manager again, determines a first target task executor in the candidate task executor on the basis of machine parameters of the candidate task executor fed back by the resource manager, and creates a new Agent node under the Engine node corresponding to the Agent node of which the state information is in the failure state, wherein the node information of the new Agent node comprises identification information of the first target task executor, and the new Agent node is associated with a subtask corresponding to the Agent node of the failure state;

and when the first target task executor monitors that the new Agent node is created and the identification information of the new Agent node is consistent with the identification information of the first target task executor, the first target task executor executes the corresponding subtask again.

if the subtask is successfully executed, the target task executor corresponding to the subtask updates the state information of the corresponding Agent node into a successful state in the Zookeeper directory;

when the target type resource scheduler monitors that the state information of all Agent nodes under the corresponding Engine node is in a successful state, updating the state information of the Engine node to be in a successful state;

and when the uniform resource scheduler monitors that the state information of all Engine nodes under the corresponding Master node is in a success state, updating the state information of the Master node into a success state, and updating the state information of the Task node into a success state to indicate that the target Task is successfully executed.

Optionally, the Zookeeper-based distributed task scheduling method further includes:

constructing a visual task scheduling graph based on the Zookeeper directory, and synchronizing the updating condition of the Zookeeper directory to the visual task scheduling graph;

and when a modification instruction for modifying the nodes in the visual task scheduling graph is received, executing modification operation corresponding to the modification instruction in the Zookeeper directory.

In addition, in order to achieve the above object, the present invention further provides a Zookeeper-based distributed task scheduling system, which includes a uniform resource scheduler, a type resource scheduler, and a task executor;

the uniform resource scheduler is used for determining a target type resource scheduler in a candidate type resource scheduler when it is monitored that a Task node associated with a target Task in a Zookeeper directory is created, creating a Master node under the Task node, and creating an Engine node corresponding to the target type resource scheduler under the Master node, wherein node information of the Engine node comprises identification information of the target type resource scheduler;

the type resource scheduling is used for splitting the target task to obtain a subtask by the type resource scheduler when the situation that the Engine node is created and the identification information of the Engine node is consistent with the identification information of the target type resource scheduler is monitored, determining a target task executor in a candidate task executor, creating an Agent node corresponding to the target task executor under the Engine node, and associating the Agent node with the corresponding subtask, wherein the node information of the Agent node comprises the identification information of the target task executor;

and the task executor is used for executing corresponding subtasks when the Agent node is monitored to be created and the identification information of the Agent node is consistent with the identification information of the target task executor.

When a Task node associated with a target Task in a Zookeeper directory is created, a uniform resource scheduler creates a Master node under the Task node, determines a target type resource scheduler in a candidate type resource scheduler, and creates an Engine node corresponding to the target type resource scheduler under the Master node, wherein node information of the Engine node comprises identification information of the target type resource scheduler; when the target type resource scheduler monitors that the Engine node is created and the identification information of the Engine node is consistent with the identification information of the target type resource scheduler, the target type resource scheduler splits the target task to obtain subtasks, determines a target task executor in candidate task executors, creates an Agent node corresponding to the target task executor under the Engine node and associates the corresponding subtasks for the Agent node, wherein the node information of the Agent node comprises the identification information of the target task executor; and when the target task executor monitors that the Agent node is created and the identification information of the Agent node is consistent with the identification information of the target task executor, the target task executor executes a corresponding subtask. On one hand, the method realizes the ordered distribution of tasks by utilizing a node creation mechanism of the Zookeeper and creating nodes and appointing the identification information of the nodes in a Zookeeper directory; on the other hand, the event monitoring mechanism of the Zookeeper is utilized to keep the machine monitoring the node creation event in the Zookeeper, the machine execution task consistent with the identification information of the node can be automatically triggered, and the problem of poor system stability caused by the fact that a large number of machines simultaneously perform lock snatching is avoided.

Drawings

FIG. 1 is a schematic structural diagram of a Zookeeper-based distributed task scheduling device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a Zookeeper-based distributed task scheduling method according to the present invention;

FIG. 3 is a schematic diagram of the Zookeeper directory and the visual task scheduler of the present invention;

FIG. 4 is another schematic diagram of a visual task schedule of the present invention;

FIG. 5 is another schematic diagram of a visual task scheduler of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a Zookeeper-based distributed task scheduling device of a hardware operating environment according to an embodiment of the present invention.

The Zookeeper-based distributed task scheduling device in the embodiment of the invention can be a PC (personal computer) or a server device, and a virtual machine runs on the device.

As shown in fig. 1, the Zookeeper-based distributed task scheduling device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory such as a disk memory. The memory 1005 may alternatively be a memory system separate from the processor 1001 described above.

Those skilled in the art will appreciate that the Zookeeper-based distributed task scheduling device architecture shown in fig. 1 does not constitute a limitation of the device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, as one type of memory 1005, an operating system, a network communication module, a user interface module, and a Zookeeper-based distributed task scheduler may be included.

In the Zookeeper-based distributed task scheduling device shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call the Zookeeper-based distributed task scheduler stored in the memory 1005 and perform the operations in the Zookeeper-based distributed task scheduling method described below.

Based on the hardware structure, the embodiment of the Zookeeper-based distributed task scheduling method is provided.

Zookeeper, a distributed, open source distributed application program coordination service, mainly used to solve the consistency problem of the distributed cluster application system, it can provide data storage based on the directory node tree mode similar to the file system, maintain and monitor the state change of the stored data, and by monitoring the change of the data state, achieve the cluster management based on the data. Zookeeper maintains a data structure similar to a file system, each sub-directory entry is called a znode (directory node), similar to a file system, we can freely add and delete a znode, and add and delete sub-znodes under a znode, and a client can register and monitor any node which is concerned by the client.

Referring to fig. 2, fig. 2 is a schematic flowchart of a first embodiment of a Zookeeper-based distributed task scheduling method according to the present invention, where the Zookeeper-based distributed task scheduling method is applied to a Zookeeper-based distributed task scheduling system, the distributed task scheduling system includes a uniform resource scheduler, a type resource scheduler, and a task executor, and the method includes:

step S10, when it is monitored that a Task node associated with a target Task in a Zookeeper directory is created, the uniform resource scheduler determines a target type resource scheduler in a candidate type resource scheduler, creates a Master node under the Task node, and creates an Engine node corresponding to the target type resource scheduler under the Master node, wherein node information of the Engine node comprises identification information of the target type resource scheduler;

the distributed task scheduling method based on the Zookeeper is applied to a distributed task scheduling system based on the Zookeeper, and the distributed task scheduling system comprises a uniform resource scheduler (Master), a type resource scheduler (Engine) and a task executor (Agent).

In this context, the present embodiment provides a Zookeeper-based distributed task scheduling scheme. In this embodiment, the Zookeeper-based distributed task scheduling system includes a uniform resource scheduler (Master), a type resource scheduler (Engine), and a task executor (Agent), where the uniform resource scheduler may be pre-specified by the distributed task scheduling system, or pre-determined by a candidate uniform resource scheduler based on a Zookeeper election mechanism for automatic election; the type resource scheduler and the task executor need to be determined in the task scheduling process according to task attributes and machine parameters; in addition, a Zookeeper directory is configured for the task scheduling system, as the name suggests, the Zookeeper directory is a data structure similar to a directory, the task scheduling system can create nodes in the Zookeeper directory according to task scheduling conditions, configure corresponding machine resources for the nodes, then execute the tasks through the machine resources of the nodes, monitor the execution conditions of the tasks, and update state information of the nodes according to the monitoring conditions.

In this embodiment, when a user needs to schedule a distributed Task (target Task), a Task node may be created in a Zookeeper directory, and it can be understood that node information of the Task node may include a Task identifier (Task id) and a Task execution state (status), and may also include related information such as Task creation time, task attributes, and the like. It should be noted that the Task node is only a naming method agreed with the uniform resource scheduler, and does not form a limitation on the present solution.

The task identifier is associated with the task execution condition, the execution link and the execution result of each machine, and the task execution condition, the execution link and the execution result of the task can be rapidly grasped according to the task identifier.

Referring to fig. 3, the uniform resource scheduler may monitor the Zookeeper directory, and when a Task node creation event is monitored, that is, when the Task node is monitored to be created, the uniform resource scheduler may create a Master node corresponding to the uniform resource scheduler under the Task node, where node information of the Master node may include identification information of the uniform resource scheduler, a Task execution state, a Task type of a target Task, and node creation time. The Task execution state of the Master node is determined by the Task execution state of the Engine node at the lower level of the Master node, when the Task execution states of all Engine nodes at the lower level are successful (success), the Task execution state of the Master node is updated to success, the Task execution state of the Task node is updated to success, if the Task execution state of at least one Engine node at the lower level is failed (failed), the Task execution state of the Master node is updated to failed, and the Task execution state of the Task node is updated to failed; the task type of the target task may be one or more, the task type of the target task may be determined by the uniform resource scheduler according to the service type related to the target task, and certainly may also be determined according to other preset task type division bases (for example, the type of the task initiating user, a task execution scenario, and the like), and this embodiment is not particularly limited; the identification information may be an ip address or other information capable of serving as a unique identification.

Further, the step of determining the target type resource scheduler among the candidate type resource schedulers may include:

a1, the uniform resource scheduler sends a candidate type resource scheduler acquisition request to the resource manager based on the task type of the target task;

step a2, the uniform resource scheduler determines a weight value of each candidate type resource scheduler based on a machine parameter of the candidate type resource scheduler corresponding to the task type and fed back by the resource manager, and determines a target type resource scheduler in the candidate type resource schedulers according to the weight value.

Specifically, the distributed task scheduling system further includes a resource manager (resource manager), and all machines in the distributed task scheduling system keep heartbeat with the resource manager according to a preset frequency to upload machine parameters (the number of available CPU cores, available memory, and the like) to the resource manager, so that the resource manager can acquire the latest machine parameters of all machines in the distributed task scheduling system. When a target type resource scheduler needs to be determined, the uniform resource scheduler can send a candidate type resource scheduler acquisition request corresponding to the task type to a resource manager based on the task type of the target task; after receiving the request, the resource manager determines the type resource scheduler corresponding to the task type as a candidate type resource scheduler and feeds back machine parameters of the candidate type resource scheduler to the uniform resource scheduler; after receiving the machine parameters of the candidate type resource schedulers, the uniform resource scheduler determines the weight value of each candidate type resource scheduler according to the machine parameters, and then determines the target type resource scheduler in the candidate type resource schedulers according to the weight value.

Further, each task type generally needs to determine a target type resource scheduler, so the number of target type resource schedulers determined in the candidate type resource schedulers is the same as the number of task types of the target task.

Further, the step a2 specifically includes:

step b1, the uniform resource scheduler determines the weight value of each candidate type resource scheduler based on the available CPU core number, the available memory, the first preset CPU intensive coefficient and the first preset IO intensive coefficient of each candidate type resource scheduler corresponding to the task type and fed back by the resource manager; the first preset CPU intensive coefficient is smaller than the first preset IO intensive coefficient;

and b2, determining a target type resource scheduler in the candidate type resource schedulers according to the weight values.

In the embodiment, the machine parameters include the number of available CPU cores (C) and the available memory (M), where the number of available CPU cores refers to the number of currently available CPU cores of the machine; the available memory refers to the current maximum available memory of the machine.

In addition, the weighting value of the computing machine also needs intensive coefficients, and the intensive coefficients comprise CPU intensive coefficients (T) _c ) And IO intensive coefficients (T) _m ) The CPU intensive coefficient reflects the dependency degree of a machine role on a CPU, the IO intensive coefficient reflects the dependency degree of the machine role on IO, the intensive coefficient is set by a manager for different machine roles (including a type resource scheduler and a task executor) in advance, and generally, because an Engine machine (namely the type resource scheduler) is IO intensive, the Engine machine is more prone to select a machine with larger available memory in candidate type resource schedulers, so that the T of the Engine machine is set _m1 >T _c1 (ii) a Correspondingly, because Agent machines (i.e., task executors) are CPU-intensive, there is a greater tendency to select machines with larger available CPU cores among the candidate task executors, and therefore the T of the Agent machine is set _m2 <T _c2 。

Determining available CPU core number, available memory, first preset CPU intensive coefficient (T) of candidate type resource scheduler _c1 ) And a first preset IO intensive coefficient (T) _m1 ) Then, the weight value of each candidate type resource scheduler can be determined according to the following first preset formula.

Wherein, W ₁ A weight value of the candidate type resource scheduler;

c is the number of available CPU cores;

m is available memory;

T _c1 setting a first CPU intensive coefficient for the first preset;

T _m1 and setting the IO intensive coefficient for the first preset.

It can be seen that by setting T _m1 >T _c1 Can make T be _c1 /T _m1 ＜1，T _m1 /T _c1 And the contribution degree of the available memory to the weight value is larger and the contribution degree of the available CPU core number is relatively smaller when the weight value of the candidate type resource scheduler is calculated, so that the candidate type resource scheduler with the larger weight value is selected when the candidate type resource scheduler is screened according to the size of the weight value, namely the candidate type resource scheduler with the larger memory is used as the target type resource scheduler, so that the machine running condition of the target type resource scheduler is matched with the machine role, and the effect of the highest resource utilization rate is realized while the task execution efficiency is improved.

Step S20, when the target type resource scheduler monitors that the Engine node is created and the identification information of the Engine node is consistent with the identification information of the target type resource scheduler, the target type resource scheduler splits the target task to obtain a subtask, determines a target task executor in a candidate task executor, creates an Agent node corresponding to the target task executor under the Engine node, and associates the corresponding subtask with the Agent node, wherein the node information of the Agent node comprises the identification information of the target task executor;

in this embodiment, the candidate type resource scheduler may monitor the Zookeeper directory, when it is monitored that an Engine node creates an event, that is, when it is monitored that an Engine node is created, the candidate type resource scheduler may compare the identification information of the Engine node with the identification information of itself, determine whether the identification information is consistent, if so, it may be understood that only if the identification information of the target type resource scheduler is consistent with the identification information of the Engine node, the target type resource scheduler may split the target task to obtain a subtask, determine a target task executor in the candidate task executor, create an Agent node corresponding to the target task executor under the Engine node, and associate a corresponding subtask for the Agent node, so as to implement that the subtask is allocated to the corresponding target task executor. The node information of the Agent node comprises identification information of a target task executor and a task execution state; in addition, after the uniform resource scheduler determines the target type resource scheduler, that is, when the corresponding Engine node is created, the uniform resource scheduler associates and configures the target task of the corresponding type for the Engine node.

It should be noted that, when the target type resource scheduler splits the target task, the data size of each sub-task may be flexibly configured according to needs, which is not specifically limited in this embodiment.

It can be understood that each node has a corresponding task execution state, wherein the task execution state of the Engine node is determined by the task execution state of Agent nodes at the lower level, when the task execution states of all Agent nodes at the lower level are success (success), the task execution state of the Engine node is updated to success, and if the task execution state of at least one Agent node at the lower level is failure (failed), the task execution state of the Engine node is updated to failed. The task execution state of an Agent node is determined by the actual situation that the corresponding task executor executes the task, and it can be understood that if a lower node is further arranged below the Agent node, the task execution state of the Agent node is determined by the task execution state of the lower node, which is similar to the determination logic of the task execution state and is not described herein again.

The target type resource scheduler can determine target task executors from the candidate task executors according to the task data volume of the corresponding task of the corresponding Engine node, wherein the number of the target task executors is related to the task data volume.

The target type resource scheduler splits the target task to obtain subtasks, and the step of determining the target task executor in the candidate task executors comprises the following steps:

step c1, the target type resource scheduler splits the target task based on the task data volume of the corresponding Engine node to obtain a subtask, and sends a candidate task executor acquisition request to the resource manager;

and c2, the target type resource scheduler determines the weight value of each candidate task executor according to the machine parameters of the candidate task executors fed back by the resource manager, and determines the target task executors corresponding to the number of the subtasks in the candidate task executors according to the weight values.

Specifically, when a target task executor needs to be determined, a target type resource scheduler may send an acquisition request for acquiring information of a candidate task executor to a resource manager, and in addition, the target type resource scheduler may split a task according to a task data volume of a task of an Engine node corresponding to the target type resource scheduler to obtain a plurality of subtasks; after receiving the request, the resource manager feeds back the machine parameters of the candidate task executors to the target type resource scheduler; after receiving the machine parameters of the candidate task executors, the target type resource scheduler determines the weight value of each candidate task executor according to the machine parameters, and then determines the target task executors corresponding to the number of the subtasks in the candidate type resource scheduler according to the weight values. The candidate task executor may be a preset machine available as a task executor.

Further, the step c2 specifically includes:

step d1, the target type resource scheduler determines the weight value of each candidate task executor based on the available CPU core number, the available memory, a second preset CPU intensive coefficient and a second preset IO intensive coefficient of each candidate task executor fed back by the resource manager; the second preset CPU intensive coefficient is larger than the second preset IO intensive coefficient;

and d2, determining target task executors corresponding to the quantity of the subtasks in the candidate task executors according to the weight values.

In this embodiment, since Agent machines (i.e., task executors) are CPU-intensive, machines with a larger number of available CPU cores tend to be selected among candidate task executors as target task executors, and the T of the Agent machine is set _m2 <T _c2 。

Determining the available CPU core number, the available memory and a second preset CPU intensive coefficient (T) of the candidate task executor _c1 ) And a second preset IO intensive coefficient (T) _m1 ) Then, the weight value of each candidate task executor may be determined according to the following second preset formula.

Wherein, W ₂ The weight value of the candidate task executor;

c is the number of available CPU cores;

m is available memory;

T _c2 setting a second CPU intensive coefficient for the second preset;

T _m2 and setting the IO intensive coefficient for the second preset.

It can be seen that by setting T _m2 ＜T _c2 Can make T be _c2 /T _m2 ＞1，T _m2 /T _c2 The method has the advantages that < 1, so that when the weight values of the candidate task executors are calculated, the contribution degree of the available CPU core number to the weight values is larger, and the contribution degree of the available memory is relatively smaller, so that when the candidate task executors are screened according to the weight values, the candidate task executors with larger weight values are selected, namely the candidate task executors with more CPU core numbers can be used as target task executors, so that the machine running condition of the target task executors is matched with the machine roles, the task execution efficiency is improved, and the effect of the highest resource utilization rate is achieved。

And S30, when the target task executor monitors that the Agent node is created and the identification information of the Agent node is consistent with the identification information of the target task executor, the target task executor executes a corresponding subtask.

In this embodiment, the candidate task executor may monitor the Zookeeper directory, and when it is monitored that an Agent node creates an event, that is, when it is monitored that an Agent node is created, the candidate task executor may compare the identification information of the Agent node with the identification information of itself to determine whether the information is consistent, and if so, it may be understood that only if the identification information of the target task executor is consistent with the identification information of the Agent node, the candidate task executor may execute a subtask associated with the Agent node whose identification information is consistent with the identification information of itself.

In this embodiment, when it is monitored that a Task node associated with a target Task in a Zookeeper directory is created, a uniform resource scheduler creates a Master node under the Task node, determines a target type resource scheduler in a candidate type resource scheduler, and creates an Engine node corresponding to the target type resource scheduler under the Master node, where node information of the Engine node includes identification information of the target type resource scheduler; when the target type resource scheduler monitors that the Engine node is created and the identification information of the Engine node is consistent with the identification information of the target type resource scheduler, the target type resource scheduler splits the target task to obtain subtasks, determines a target task executor in candidate task executors, creates an Agent node corresponding to the target task executor under the Engine node and associates the corresponding subtasks for the Agent node, wherein the node information of the Agent node comprises the identification information of the target task executor; and when the target task executor monitors that the Agent node is created and the identification information of the Agent node is consistent with the identification information of the target task executor, the target task executor executes a corresponding subtask. On one hand, the embodiment realizes the ordered distribution of tasks by utilizing a node creation mechanism of the Zookeeper and creating nodes and appointing the identification information of the nodes in a Zookeeper directory; on the other hand, the event monitoring mechanism of the Zookeeper is utilized to keep the machine monitoring the node creation event in the Zookeeper, the machine execution task consistent with the identification information of the node can be automatically triggered, and the problem of poor system stability caused by the fact that a large number of machines simultaneously perform lock snatching is avoided.

Further, based on the above embodiments, a second embodiment of the Zookeeper-based distributed task scheduling method of the present invention is provided.

After the step S30, the method further includes:

step d1, if the subtask fails to be executed, the target task executor corresponding to the subtask updates the state information of the corresponding Agent node into a failure state in the Zookeeper directory;

step d2, when monitoring that the state information of the Agent node is in a failure state, the target type resource scheduler sends a candidate task executor acquisition request to the resource manager again, determines a first target task executor in the candidate task executor on the basis of machine parameters of the candidate task executor fed back by the resource manager, and creates a new Agent node under the Engine node corresponding to the Agent node of which the state information is in the failure state, wherein the node information of the new Agent node comprises identification information of the first target task executor, and the new Agent node is associated with a subtask corresponding to the Agent node of which the state is in the failure state;

and d3, when the first target task executor monitors that the new Agent node is created and the identification information of the new Agent node is consistent with the identification information of the first target task executor, the first target task executor executes the corresponding subtask again.

This embodiment describes a retry mechanism when task execution fails. When the target task executor fails to execute the subtasks, the target task executor updates the state information of the Agent node corresponding to the target task executor into a failure state in a Zookeeper directory, and when a target type resource scheduler corresponding to a target Engine node on the upper level of the Agent node monitors that the state information of the Agent node is the failure state, the target type resource scheduler sends a candidate task executor acquisition request to a resource manager again; after receiving the request, the resource manager feeds back the machine parameters of the candidate task executors to the target type resource scheduler, and after receiving the machine parameters of the candidate task executors, the target type resource scheduler determines the weight values of the candidate task executors according to the machine parameters, and then determines the target task executors in the candidate type resource scheduler again according to the weight values. It is understood that when the target type resource scheduler determines a new target task executor again, i.e. the first target task executor, the task executor that failed to execute the task is excluded, so as to avoid determining the task executor that failed to execute the task as the new target task executor again.

After determining a first target task executor, a target type resource scheduler creates a new Agent node under a target Engine node, wherein the node information of the new Agent node comprises identification information of the first target task executor, and the new Agent node is associated with a subtask corresponding to an Agent node in a failure state; and when the first target task executor monitors that a new Agent node is created and the identification information of the new Agent node is consistent with the identification information of the first target task executor, the first target task executor executes the subtask again.

Further, in an implementation scenario, considering that task execution logics corresponding to Agent nodes under the same Engine node are similar, if the first target task executor fails to execute the subtask again, which indicates that the reason for the task execution failure may not be the machine itself, the target type resource scheduler corresponding to the target Engine node does not re-determine a new target task executor and retry again, but updates the state information of the target Engine node to a failure state.

Of course, to further eliminate the cause of the task failure by the machine itself, the target type resource scheduler may perform a preset number of retries.

Further, after the step S30, the method further includes:

step e1, if the subtask is successfully executed, the target task executor corresponding to the subtask updates the state information of the corresponding Agent node into a successful state in the Zookeeper directory;

step e2, when the target type resource scheduler monitors that the state information of all Agent nodes under the corresponding Engine node is in a success state, updating the state information of the Engine node to be in a success state;

and e3, when the uniform resource scheduler monitors that the state information of all Engine nodes under the corresponding Master node is in a success state, updating the state information of the Master node into a success state, and updating the state information of the Task node into a success state to indicate that the target Task is successfully executed.

In this embodiment, after the target task executor successfully executes the subtask, the state information of the corresponding Agent node in the Zookeeper directory is updated to a successful state; when monitoring that the state information of all Agent nodes under the corresponding Engine node is a success state, the target type resource scheduler updates the state information of the Engine node to be the success state; similarly, when monitoring that the state information of all Engine nodes under the Master node corresponding to the uniform resource scheduler is in a successful state, the uniform resource scheduler updates the state information of the Master node to be in a successful state, and also updates the state information of the Task node to be in a successful state so as to indicate that the target Task is successfully executed.

Further, the Zookeeper-based distributed task scheduling method further includes:

step f1, constructing a visual task scheduling graph based on the Zookeeper directory, and synchronizing the updating condition of the Zookeeper directory to the visual task scheduling graph;

and f2, when a modification instruction for modifying the nodes in the visual task scheduling graph is received, executing modification operation corresponding to the modification instruction in the Zookeeper directory.

In this embodiment, in order to enable a user to intuitively know the scheduling condition of a task, a visual task scheduling graph is constructed based on a Zookeeper directory, the visual task scheduling graph is a hierarchical directory type similar to the Zookeeper directory and constructed according to the association relationship between upper and lower nodes, see fig. 3, and the graph can visually display the upper and lower relationships of each node, a machine identifier (ip) corresponding to the node, and a task execution state, and can also display the node creation time.

After the visual task scheduling graph is constructed based on the Zookeeper directory, the distributed task scheduling system monitors the Zookeeper directory, and when the Zookeeper directory is updated, the updating content of the Zookeeper directory is synchronized into the visual task scheduling graph so that a user can know the scheduling progress of a task in time.

Further, after knowing the scheduling progress of the task, the user may modify the node in the visual task scheduling graph, where the modification may be to modify state information of the node, and after detecting the modification, the task scheduling adjusts the modification in the Zookeeper directory, so as to suspend the task or start the task. Specifically, when monitoring that the state information of the node or the superior node in the Zookeeper directory is in a failure state, a machine corresponding to the Master, the Engine or the Agent node suspends the execution of the task; when monitoring that the state information of the node or the superior node is an execution state, a machine corresponding to the Master, the Engine or the Agent node starts to execute the task.

Further, a visual role directory can be constructed according to role classification of each node based on the Zookeeper directory, as shown in fig. 4, which can visually display machine identifiers (ip) and machine parameters of role machines of various types, such as a CPU and a memory (memory).

Further, a DAG (Direct Acyclic Graph) type as shown in fig. 5 may be created according to the interaction situation of the nodes based on the Zookeeper directory, and fig. 5 may intuitively show the interaction situation of each node.

The embodiment provides a task scheduling visualization mode, so that a user can manually pause and start a task, and the flexibility of task scheduling is improved.

Further, considering that there are many execution steps and long execution time for a large data type task, if an execution machine corresponding to an Agent node fails, the type resource scheduler needs to apply for a new execution machine for the Agent node again, and re-execute a subtask associated with the Agent node through the new execution machine, and if a part of tasks are completed, a large amount of resources and time are wasted if re-executed.

Aiming at the defects, if the task scheduling system detects that the target task is a big data type task, the subtask associated with the Agent node can be further subdivided into a plurality of subtasks (hereinafter referred to as subtasks), correspondingly, a subtask node is correspondingly created under the Agent node, a corresponding subtask actuator is determined for the subtask node, after each subtask node successfully executes the subtask, a result file obtained by executing the subtask is uploaded, so that when other subtasks fail to execute the subtasks, the Agent node creates a new subtask node for the failed subtask and determines a new subtask execution machine, and associates the new subtask node with the successfully executed subtask, so that the newly determined subtask execution machine can download the result file of the successfully executed subtask and continue to execute the failed subtask according to the result file, thereby realizing the breakpoint of the task to continue, avoiding the beginning to execute the task, and saving the time and resources for executing the task.

Further, when there are multiple target tasks, there may be a situation where multiple uniform resource schedulers or type resource schedulers send candidate machine (candidate type resource schedulers or candidate task executors) acquisition requests to the resource manager based on different target tasks at the same time, in order to allow the more important and more urgent tasks to be scheduled preferentially, after receiving the candidate machine acquisition requests, the resource manager may determine the priority of the task corresponding to each candidate machine acquisition request, and preferentially feed back candidate machine parameters for the request with the higher priority of the task, so as to implement scheduling of the more important and more urgent tasks preferentially.

The task priority is pre-configured by task configuration personnel according to the importance degree and the urgency degree of the task.

The invention also provides a Zookeeper-based distributed task scheduling system, which comprises:

the system comprises a uniform resource scheduler, a Master node and an Engine node, wherein the uniform resource scheduler is used for determining a target type resource scheduler in a candidate type resource scheduler when a Task node associated with a target Task in a Zookeeper directory is monitored to be created, the Master node is created under the Task node, the Engine node corresponding to the target type resource scheduler is created under the Master node, and node information of the Engine node comprises identification information of the target type resource scheduler;

The method executed by each program unit may refer to each embodiment of the Zookeeper-based distributed task scheduling method of the present invention, and details are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.

The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. The Zookeeper-based distributed task scheduling method is characterized in that the Zookeeper-based distributed task scheduling method is applied to a Zookeeper-based distributed task scheduling system, the distributed task scheduling system comprises a uniform resource scheduler, a type resource scheduler and a task executor, and the Zookeeper-based distributed task scheduling method comprises the following steps:

when it is monitored that a Task node associated with a target Task in a Zookeeper directory is created, the uniform resource scheduler determines a target type resource scheduler in a candidate type resource scheduler, creates a Master node under the Task node, and creates an Engine node corresponding to the target type resource scheduler under the Master node, wherein node information of the Engine node comprises identification information of the target type resource scheduler, wherein the uniform resource scheduler is pre-specified by the distributed Task scheduling system, or is pre-automatically elected and determined by the candidate uniform resource scheduler based on a Zookeeper election mechanism, and the type resource scheduler and the Task executor need to be determined in a Task scheduling process according to Task attributes and machine parameters;

2. The Zookeeper-based distributed task scheduling method of claim 1, wherein the distributed task scheduling system further comprises a resource manager;

3. The Zookeeper-based distributed task scheduling method of claim 2, wherein the machine parameters include an available CPU core count and an available memory;

the unified resource scheduler determines a weight value of each candidate type resource scheduler based on machine parameters of the candidate type resource scheduler corresponding to the task type and fed back by the resource manager, and the step of determining a target type resource scheduler in the candidate type resource schedulers according to the weight values comprises the following steps:

4. The Zookeeper-based distributed task scheduling method of claim 2, wherein the target type resource scheduler splits the target task to obtain subtasks, and the step of determining the target task executor among the candidate task executors comprises:

and the target type resource scheduler determines the weight value of each candidate task executor according to the machine parameters of the candidate task executor fed back by the resource manager, and determines the target task executor corresponding to the number of the subtasks in the candidate task executor according to the weight value.

5. The Zookeeper-based distributed task scheduling method of claim 4, wherein the machine parameters include an available CPU core count and an available memory;

and determining target task executors corresponding to the number of the subtasks in the candidate task executors according to the weight values.

6. The Zookeeper-based distributed task scheduling method of claim 5, wherein the step of the target-type resource scheduler determining the weight value of each candidate task executor based on the available CPU core number, the available memory, the second preset CPU intensive coefficient and the second preset IO intensive coefficient of each candidate task executor fed back by the resource manager comprises:

the second preset formula is as follows:

Tm2＜Tc2；

wherein W2 is the weight value of the candidate task executor;

c is the number of available CPU cores;

m is available memory;

tc2 is a second preset CPU intensive coefficient;

tm2 is a second preset IO intensive coefficient.

7. The Zookeeper-based distributed task scheduling method of claim 2, wherein after the step of the target task executor executing the corresponding subtask, further comprising:

when monitoring that the state information of the Agent node is in a failure state, the target type resource scheduler sends a candidate task executor acquisition request to the resource manager again, determines a first target task executor in the candidate task executor based on machine parameters of the candidate task executor fed back by the resource manager, and creates a new Agent node under the Engine node corresponding to the Agent node of which the state information is in the failure state, wherein the node information of the new Agent node comprises identification information of the first target task executor, and the new Agent node is associated with a subtask corresponding to the Agent node of which the state is in the failure state;

8. The Zookeeper-based distributed task scheduling method of claim 1, wherein after the step of the target task executor executing the corresponding subtask, further comprising:

when the target type resource scheduler monitors that the state information of all Agent nodes under the corresponding Engine node is in a successful state, updating the state information of the Engine node into a successful state;

9. The Zookeeper-based distributed task scheduling method of claim 1, wherein the Zookeeper-based distributed task scheduling method further comprises:

10. A Zookeeper-based distributed task scheduling system is characterized in that the distributed task scheduling system comprises a uniform resource scheduler, a type resource scheduler and a task executor;

the uniform resource scheduler is configured to, when it is monitored that a Task node associated with a target Task in a Zookeeper directory is created, determine, by the uniform resource scheduler, a target type resource scheduler in a candidate type resource scheduler, create a Master node under the Task node, and create an Engine node corresponding to the target type resource scheduler under the Master node, where node information of the Engine node includes identification information of the target type resource scheduler, where the uniform resource scheduler is pre-specified by the distributed Task scheduling system, or is pre-automatically elected and determined by the candidate uniform resource scheduler based on a Zookeeper election mechanism, and the type resource scheduler and the Task executor need to be determined in a Task scheduling flow according to Task attributes and machine parameters;

the type resource scheduling is used for splitting the target task to obtain a subtask by the type resource scheduler when the Engine node is monitored to be created and the identification information of the Engine node is consistent with the identification information of the target type resource scheduler, determining a target task executor in a candidate task executor, creating an Agent node corresponding to the target task executor under the Engine node, associating the Agent node with the corresponding subtask, wherein the node information of the Agent node comprises the identification information of the target task executor;