CN112764907A - Task processing method and device, electronic equipment and storage medium - Google Patents

Task processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112764907A
CN112764907A CN202110101553.6A CN202110101553A CN112764907A CN 112764907 A CN112764907 A CN 112764907A CN 202110101553 A CN202110101553 A CN 202110101553A CN 112764907 A CN112764907 A CN 112764907A
Authority
CN
China
Prior art keywords
task
tasks
freezing
source
downstream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110101553.6A
Other languages
Chinese (zh)
Other versions
CN112764907B (en
Inventor
余利华
郭忆
李卓豪
陈苏安
汪源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202110101553.6A priority Critical patent/CN112764907B/en
Publication of CN112764907A publication Critical patent/CN112764907A/en
Application granted granted Critical
Publication of CN112764907B publication Critical patent/CN112764907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/483Multiproc

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)

Abstract

The embodiment of the invention provides a task processing method, a task processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring at least one initial source task; generating corresponding dependency relationship information for each initial source task in the at least one initial source task; merging the at least one initial source task according to the dependency relationship information to obtain at least one final source task; and freezing the at least one final source task and all corresponding downstream tasks to stop the operation of the at least one final source task and all corresponding downstream tasks. The invention can assist in quickly recovering data failure, greatly reduces the time for repairing data and simultaneously efficiently ensures the correctness of the repaired data.

Description

Task processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of database technologies, and in particular, to a task processing method and apparatus, an electronic device, and a storage medium.
Background
Currently, data processing is performed based on an Extract-Transform-Load (ETL) technology, and a data processing flow is generally divided into a plurality of task steps to complete. Complex dependency relationships exist among the multiple tasks, and the upstream task of one task runs successfully and can only run after the running time of the task is reached.
In the related art, when a large amount of data errors occur, one or more tasks with data errors at the most upstream are found, a Directed Acyclic Graph (DAG) corresponding to the task is obtained according to the task and the tasks which are already run at the downstream of the task, and data restoration is performed by repeating the operation from the upstream to the downstream according to the DAG.
However, in the above scheme, when the running time corresponding to the downstream non-running task is reached in the process of data repair, the downstream task is run based on the data corresponding to the upstream task, resulting in a new data error.
Disclosure of Invention
The embodiment of the invention provides a task processing method and device, electronic equipment and a storage medium, which are used for completely freezing tasks for repairing data so as to solve the problems of incorrect downstream task data output and low data repairing efficiency in the data repairing process.
In a first aspect, an embodiment of the present invention provides a task processing method, where the method includes:
acquiring at least one initial source task;
generating corresponding dependency relationship information for each initial source task in the at least one initial source task;
merging the at least one initial source task according to the dependency relationship information to obtain at least one final source task;
and freezing the at least one final source task and all corresponding downstream tasks to stop the operation of the at least one final source task and all corresponding downstream tasks.
In a possible implementation manner, the merging the at least one initial source task according to the dependency relationship information includes:
determining a plurality of initial source tasks with the upstream and downstream task relationships according to the dependency relationship information;
and taking the most upstream initial source task in the initial source tasks and the initial source task without the upstream and downstream task relationship as final source tasks.
The task processing method provided by the embodiment of the invention is used for merging the initial source tasks so as to avoid repeatedly freezing the same downstream task.
In a possible embodiment, before freezing all of the at least one ultimate source task, the method further includes:
and merging the common downstream tasks when different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
According to the task processing method provided by the embodiment of the invention, a plurality of common downstream tasks are merged and processed into one downstream task, so that the repeated processing on the same downstream task is avoided.
In a possible implementation manner, generating corresponding dependency relationship information for each initial source task includes:
determining a DAG task instance graph corresponding to each initial source task by adopting a DFS/BFS algorithm;
and on the basis of the DAG task instance graph corresponding to each initial source task, adding downstream tasks of the initial source task, which do not reach the execution time, to obtain the dependency relationship information.
The embodiment of the invention provides a task processing method, and the generated dependency relationship information not only comprises a downstream task generating a task instance, but also comprises a downstream task which does not reach the execution time, so that the problem that the downstream task data output is incorrect due to the fact that the upstream task is not repaired when the subsequent downstream task reaches the execution time is avoided.
In one possible embodiment, the method further comprises:
and responding to a task unfreezing instruction, and unfreezing from the at least one final source task according to a mode of performing unfreezing processing on the downstream task from the upstream task to the downstream task when the upstream task is completely unfrozen, so as to recover the scheduling operation of the unfrozen task.
According to the task processing method provided by the embodiment of the invention, the execution of the downstream task depends on the total unfreezing of the upstream task, so that the unfreezing can be controlled when the upstream task completes repairing, the downstream task is executed again, and the problem of incorrect data output of the downstream task is solved.
In one possible embodiment, the task to be defrosted, which performs the unfreezing process, is defrosted in the following manner:
if the task to be unfrozen generates the task instance, re-running the task instance of the task to be unfrozen, and determining that the task to be unfrozen is subjected to unfreezing treatment after the re-running is successful; or
And if the task to be unfrozen does not generate a task instance, directly unfreezing the task to be unfrozen.
The task processing method provided by the embodiment of the invention performs re-operation on the task generating the task instance during the unfreezing, and performs the unfreezing operation again when the error task succeeds in re-operation, so as to solve the problem that the downstream task data is incorrect in output.
In a possible embodiment, after re-running the task instance of the task to be defrosted, the method further includes:
determining whether the heavy operation fails or not, and prompting the unfreezing failure;
and when a forced unfreezing instruction is received, the freezing processing of the task to be unfrozen indicated by the unfreezing instruction is released.
The task processing method provided by the embodiment of the invention supports the repair of the tasks with heavy operation failure, can repair according to the unfreezing failure prompt, and unfreezes through the forced unfreezing instruction after repair.
In one possible embodiment, the method further comprises:
identifying a freezing pool state for freezing the tasks as being generated, and adding the at least one final source task and all corresponding downstream tasks into the freezing pool;
and when all tasks in the freezing pool are frozen, marking the state of the freezing pool as frozen.
In one possible implementation, when a task in the freezing pool is unfrozen, the state of the freezing pool is identified as being unfrozen;
and when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen.
The task processing method provided by the embodiment of the invention provides a scheme for adding the task into the freezing pool for management after freezing, and the task freezing processing progress can be determined according to the state of the freezing pool.
In one possible embodiment, the method further comprises:
responding to a thawing pause instruction, and determining a task to be thawed for currently executing thawing;
and pausing the unfreezing of the task to be unfrozen, and marking the state of the freezing pool as paused.
The task processing method provided by the embodiment of the invention provides a freezing pause function so as to meet the requirement of stopping data repair in the operation and maintenance process.
In one possible embodiment, the method further comprises:
in response to a thawing resuming instruction, determining at least one task to be thawed which is unfrozen and is at the most upstream in the suspended freezing pool;
and starting from the at least one task to be unfrozen, unfreezing the task to be unfrozen in a mode that the downstream task is unfrozen when the upstream task is completely unfrozen from the upstream task to the downstream task.
The task processing method provided by the embodiment of the invention provides a recovery unfreezing function so as to meet the requirement of 'recovery' data repair in the operation and maintenance process.
In a possible embodiment, after the task to be defrosted is suspended from being defrosted, the method further includes:
responding to a task deleting instruction, and determining a source task indicated by the task deleting instruction;
when the indicated source task and all corresponding downstream tasks are determined to be independent of other final source tasks except the source task, the indicated final source task and all corresponding downstream tasks are unfrozen and deleted;
otherwise, the tasks depending on other final source tasks are reserved, and other tasks except the reserved tasks in the indicated source tasks and all the downstream tasks are unfrozen and deleted.
The task processing method provided by the embodiment of the invention provides a function of deleting the tasks in the freezing pool, and realizes the updating of the freezing pool.
In a possible embodiment, after the task to be defrosted is suspended from being defrosted, the method further includes:
responding to a task adding instruction, and determining a source task indicated by the task adding instruction;
and determining that the same tasks do not exist in the indicated source task and all the downstream tasks thereof, and freezing the indicated source task and all the downstream tasks thereof to stop the operation of the indicated source task and all the downstream tasks thereof.
The task processing method provided by the embodiment of the invention provides a task adding mode that the task adding mode is added in the freezing pool and does not intersect with the task in the freezing pool, so that the freezing pool is updated.
In a possible implementation, when determining that the same task exists in the indicated source task and all of the downstream tasks thereof as the at least one final source task and all of the downstream tasks corresponding thereto, the method further includes:
if the same task as the indicated source task exists in the at least one final source task and all the corresponding downstream tasks, freezing the same task and all the corresponding downstream tasks to stop the operation of the same task and all the downstream tasks;
if the same task as the downstream task of the indicated source task exists in the at least one final source task and all the downstream tasks corresponding to the at least one final source task, freezing the same task and all the corresponding downstream tasks to stop the operation of the same task and all the downstream tasks, taking the indicated source task as a new final source task, and freezing the new final source task and all the corresponding downstream tasks to stop the operation of the new final source task and all the corresponding downstream tasks;
the task processing method provided by the embodiment of the invention provides a task adding mode of adding a task intersection with the task in the freezing pool, and realizes the updating of the freezing pool.
In a second aspect, an embodiment of the present invention provides a task processing apparatus, including:
the source task acquisition module is used for acquiring at least one initial source task;
the dependency relationship generating module is used for generating corresponding dependency relationship information for each initial source task in the at least one initial source task;
a source task merging module, configured to merge the at least one initial source task according to the dependency relationship information to obtain at least one final source task;
and the freezing processing module is used for freezing the at least one final source task and all corresponding downstream tasks so as to stop the operation of the at least one final source task and all corresponding downstream tasks.
In a possible implementation manner, the source task merging module performs merging processing on the at least one initial source task according to the dependency relationship information, including:
determining a plurality of initial source tasks with the upstream and downstream task relationships according to the dependency relationship information;
and taking the most upstream initial source task in the initial source tasks and the initial source task without the upstream and downstream task relationship as final source tasks.
In a possible implementation manner, before the freezing module freezes each of the at least one ultimate source task, the freezing module is further configured to:
and merging the common downstream tasks when different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
In a possible implementation manner, the generating of the dependency relationship information for each initial source task by the dependency relationship generating module includes:
determining a DAG task instance graph corresponding to each initial source task by adopting a DFS/BFS algorithm;
and on the basis of the DAG task instance graph corresponding to each initial source task, adding downstream tasks of the initial source task, which do not reach the execution time, to obtain the dependency relationship information.
In one possible embodiment, the apparatus further comprises:
and the task unfreezing module is used for responding to a task unfreezing instruction, starting from the at least one final source task, and unfreezing the downstream task in a mode of executing the unfreezing processing of the downstream task from the upstream task to the downstream task when the unfreezing processing of all the upstream tasks is completed so as to recover the scheduling operation of the unfrozen task.
In one possible embodiment, the task unfreezing module unfreezes the task to be unfrozen, which performs the unfreezing process, in the following way:
if the task to be unfrozen generates the task instance, re-running the task instance of the task to be unfrozen, and determining that the task to be unfrozen is subjected to unfreezing treatment after the re-running is successful; or
And if the task to be unfrozen does not generate a task instance, directly unfreezing the task to be unfrozen.
In a possible implementation manner, after the task unfreezing module re-runs the task instance of the task to be unfrozen, the task unfreezing module further includes:
determining whether the heavy operation fails or not, and prompting the unfreezing failure;
and when a forced unfreezing instruction is received, the freezing processing of the task to be unfrozen indicated by the unfreezing instruction is released.
In one possible embodiment, the method further comprises:
the freezing pool processing module is used for identifying the freezing pool state for freezing the tasks as the generation, and adding the at least one final source task and all corresponding downstream tasks into the freezing pool; and when all tasks in the freezing pool are frozen, marking the state of the freezing pool as frozen.
In one possible embodiment, the freeze tank processing module is further configured to:
when the task in the freezing pool is unfrozen, the state of the freezing pool is marked as unfreezing;
and when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen.
In one possible embodiment, the method further comprises:
the unfreezing pause module is used for responding to a unfreezing pause instruction and determining a task to be unfrozen which is currently executed for unfreezing; and pausing the unfreezing of the task to be unfrozen, and marking the state of the freezing pool as paused.
In one possible embodiment, the method further comprises:
the unfreezing recovery module is used for unfreezing and responding to a command of recovering unfreezing, and determining at least one task to be unfrozen which is not unfrozen and is positioned at the most upstream in the suspended freezing pool; and starting from the at least one task to be unfrozen, unfreezing the task to be unfrozen in a mode that the downstream task is unfrozen when the upstream task is completely unfrozen from the upstream task to the downstream task.
As a possible embodiment, after the pause thawing module pauses the thawing of the task to be thawed, the pause thawing module is further configured to:
responding to a task deleting instruction, and determining a source task indicated by the task deleting instruction;
when the indicated source task and all corresponding downstream tasks are determined to be independent of other final source tasks except the source task, the indicated final source task and all corresponding downstream tasks are unfrozen and deleted;
otherwise, the tasks depending on other final source tasks are reserved, and other tasks except the reserved tasks in the indicated source tasks and all the downstream tasks are unfrozen and deleted.
As a possible embodiment, after the pause thawing module pauses the thawing of the task to be thawed, the pause thawing module is further configured to:
responding to a task adding instruction, and determining a source task indicated by the task adding instruction;
and determining that the same tasks do not exist in the indicated source task and all the downstream tasks thereof, and freezing the indicated source task and all the downstream tasks thereof to stop the operation of the indicated source task and all the downstream tasks thereof.
As a possible implementation, the pause unfreezing module is further configured to, when the same task exists in the indicated source task and all the downstream tasks thereof as in the at least one final source task and all the downstream tasks corresponding thereto:
if the same task as the indicated source task exists in the at least one final source task and all the corresponding downstream tasks, freezing the same task and all the corresponding downstream tasks to stop the operation of the same task and all the downstream tasks;
if the same task as the downstream task of the indicated source task exists in the at least one final source task and all the downstream tasks corresponding to the at least one final source task, freezing the same task and all the corresponding downstream tasks to stop the operation of the same task and all the downstream tasks, taking the indicated source task as a new final source task, and freezing the new final source task and all the corresponding downstream tasks to stop the operation of the new final source task and all the corresponding downstream tasks.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors, and a memory for storing processor-executable instructions;
wherein the processor is configured to execute the instructions to implement any one of the task processing methods provided by the first aspect.
In a fifth aspect, an embodiment of the present invention provides a storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the computer program implements any one of the task processing methods provided in the first aspect.
The task processing method, the task processing device, the electronic equipment and the storage medium provided by the embodiment of the invention have the following beneficial effects:
selecting a plurality of initial source tasks for processing, combining related tasks which may cause repeated generation of task instances in the initial source tasks to obtain a final source task, and freezing the final source task and all downstream tasks, so that the problem that a data repairing scheme can only select one source task is solved, data repairing can be performed quickly, and the data repairing efficiency is improved; the problem that task instances are repeatedly generated when a plurality of source tasks are independently subjected to data restoration is solved; the problem of incorrect data output of downstream tasks in the data repairing process can be avoided.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention and are not to be construed as limiting the invention.
FIG. 1 is a DAG diagram illustrating a multi-source task, according to an example embodiment;
FIG. 2 is a DAG diagram illustrating a complement operation for a single source task in accordance with an exemplary embodiment;
FIG. 3 is a DAG diagram illustrating adding an upstream task of an empty run in accordance with an exemplary embodiment;
FIG. 4 is a DAG graph generated in a corresponding technique, shown in accordance with an exemplary embodiment;
FIG. 5 is a corresponding DAG diagram illustrating upstream data being uncorrected correctly resulting in a data error in accordance with an illustrative embodiment;
FIG. 6 is a flowchart illustrating a method of task processing in accordance with an exemplary embodiment;
FIG. 7 is a torsional schematic of a frozen pool state shown in accordance with an exemplary embodiment;
FIG. 8 is a DAG diagram illustrating a freeze pool unfreezing operation in accordance with an exemplary embodiment;
FIG. 9 illustrates a DAG diagram corresponding to a freeze pool suspend/resume operation in accordance with an exemplary embodiment;
FIG. 10 is a DAG diagram illustrating freezing pools before task deletion in accordance with an exemplary embodiment;
FIG. 11 is a diagram illustrating a DAG corresponding to a frozen pool after a task is deleted and a task is added in accordance with an illustrative embodiment;
FIG. 12 is a schematic diagram of a task processing device, shown in accordance with an exemplary embodiment;
FIG. 13 is a schematic diagram of an electronic device shown in accordance with an exemplary embodiment;
FIG. 14 is a schematic diagram of a program product shown in accordance with an exemplary embodiment.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.
Hereinafter, some terms in the embodiments of the present invention are explained to facilitate understanding by those skilled in the art.
(1) The data warehouse technology ETL (Extract-Transform-Load) is used to describe the process of extracting Extract, transpose, and Load from source to destination.
(2) Directed-Acyclic Graph DAG (Directed-Acyclic-Graph), in Graph theory, a Directed Graph is a Directed Acyclic Graph if it cannot go through several edges from any vertex back to the point.
(3) Task/task instance, the relationship of task to task instance is similar to the relationship of code to process. A task is a defined part of a task instance, and the task instance is a real running of the task. The task can set corresponding scheduling time on the scheduling system, and when the specified scheduling time is reached, the scheduling system can generate and run a corresponding task instance according to the content of the task.
(4) The tasks are dependent, and the dependency relationships exist between the tasks and are finally reflected on the task instances, namely, if the downstream task instances need to run, the precondition is that the upstream task instances are successfully run.
(5) Freezing/unfreezing, freezing refers to the action of a certain task from a normal state to a frozen state, and unfreezing refers to the action of a certain task from a frozen state to a normal state.
(6) And the normal state/frozen state is the task in the normal state, and when the specified scheduling time is reached, the scheduling system can generate a corresponding task instance. After a certain task is frozen, the task is in a frozen state, and the scheduling system cannot generate a corresponding task instance for the frozen task.
Summary of The Invention
The inventor finds that the basic process of the data complementing function is as follows: finding one or more final upstream tasks, obtaining a DAG corresponding to the task according to the task and a task which is operated at the downstream of the task, and re-running from the upstream to the downstream according to the DAG, wherein the main scene of the method is to carry out data backtracking on a newly-on-line task and is not designed for recovering data error faults, so that when a data complementing function is utilized to recover the faults, at least the following problems often occur.
1) Numerous source tasks
Because the final source tasks of the error are not one and only one, if data supplementation is performed on each final source task once, a large number of downstream task instances which run repeatedly can be generated, and the data correctness of the downstream task instances can not be ensured.
As shown in fig. 1, there are two source tasks, which are task 1-1 and task 1-2, respectively, and both source tasks have a common downstream task, specifically, downstream task 2-2, task 3-1, and task 3-2. If the complementary data operations are performed on tasks 1-1 and 1-2, respectively, as shown in FIG. 2, two DAG execution graphs are generated. Task 2-2, task 3-1, and task 3-2 will generate two task instances, which will certainly cause a waste of computing resources.
Meanwhile, the task 2-2 needs to depend on the task 1-1 and the task 1-2, and if the complement operation is performed on only one of the task 1-1 and the task 1-2, the complement operation is directly performed on the task 2-2, so that the correctness of the data output by the task 2-2 cannot be obviously guaranteed. Similarly, the correctness of the data output by the task 3-1 and the task 3-2 is difficult to be ensured.
In order to solve the above problem in the related art, as shown in fig. 3, an idle upstream task 0 is newly added, and then tasks 1-1 and 1-2 depend on this task and perform a data complement operation directly on task 0. By the method, the situation that when a plurality of source tasks have a plurality of same downstream tasks, the downstream tasks are run for a plurality of times can be avoided, and meanwhile, the accuracy of dependence of the downstream tasks is guaranteed, for example, in an original DAG, the task 2-2 depends on the task 1-1 and the task 1-2 in two DAGs respectively, and in a new DAG, the task 1-1 and the task 1-2 are depended on simultaneously. Therefore, when the data of the task 2-2 is repaired by the new DAG, the task 1-1 and the task 1-2 complete data repair, and the correctness of the whole data repair is ensured.
Although the problems can be solved by adding a new task and setting a task dependency relationship, the added task also changes the DAG relationship of the ETL task, and increases the operation and maintenance cost of subsequent tasks. Meanwhile, the operation for setting the dependency is complicated, and when the number of source tasks is large, a large amount of time is needed to set the air running task at the most upstream, on which the source tasks depend.
2) Downstream tasks yield incorrect data
Before the complement operation, the complement data function draws a DAG graph of the complement data execution task instance according to the DAG relation of the ETL task. The DAG graph comprises task instances with plan execution time less than or equal to the current time. If the plan execution time of the task instance is larger than the current time, the task instance is not added into the DAG graph of the task instance.
Still taking the DAG shown in fig. 1 as an example, assuming that when the data is repaired by the data complementing function, the tasks 2-3 and 3-2 have not yet reached the specified scheduled execution time, the DAG execution diagram of the task instance drawn by the data complementing function is the portion shown by the solid line in fig. 4.
The data repair itself also takes a certain time, and different tasks take different times. Therefore, when data repair is performed using the complementary data function, there is a possibility that the scheduled execution times of tasks 2-3 and 3-2 are reached, but the correct scenes have not been repaired by the upstream data thereof.
Taking fig. 5 as an example, the unfilled portion of the solid line represents the task whose data is correct for repair, and the filled portion of the solid line represents the task whose data has not been correct for repair. It can be seen that task 1-2 data has been repaired correctly while task 2-3 is running, and therefore the data of the upstream tasks are correct while task 2-3 is running. When the task 2-3 is finished, the task 3-2 starts to run. But at this time, the data of the task 2-2 is not repaired correctly, so that when the task 3-2 is running, the data upstream of the task still has partial errors, and finally, the data produced by the task 3-2 is still wrong.
Errors of the type described above are often difficult to find, and given that there are many other tasks downstream of task 3-2, the error will in turn cause a field of data failure.
In view of this, the embodiment of the present invention provides a task processing method, which selects multiple initial source tasks, merges tasks related to task instances that may result in repeated generation of tasks among the multiple initial source tasks to obtain a final source task, and freezes the final source and all downstream tasks thereof, so as to solve the problem that a data repair scheme can only select one source task or needs to add idle running tasks to the multiple source tasks, and improve the efficiency of data repair; and the problems that task instances are repeatedly generated in the data repairing process and the downstream task data is incorrectly output are solved.
Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.
Exemplary method
FIG. 6 is a flowchart illustrating a method of task processing according to an exemplary embodiment, the method including the steps of:
in step S601, at least one initial source task is obtained;
in implementation, when a data error occurs, one or more of the most upstream erroneous tasks are acquired as initial source tasks.
In step S602, generating corresponding dependency relationship information for each of the at least one initial source task;
the dependency relationship information of each initial source task comprises all downstream tasks taking the initial source task as a final upstream task and the upstream and downstream dependency relationship among the tasks, wherein all the downstream tasks comprise the tasks reaching the specified planned running time and the tasks not reaching the specified planned running time.
In step S603, merging the at least one initial source task according to the dependency relationship information to obtain at least one final source task;
the at least one initial source task obtained in the above way may have a part of initial source tasks in an upstream-downstream dependency relationship, and may also have no upstream-downstream dependency between the part of initial source tasks and any other initial source. One possible way to have upstream and downstream dependencies is where one original source task is a downstream task of another original source task.
As an optional implementation manner, in order to avoid that one task generates multiple task instances, the merging process is performed on the at least one initial source task according to the dependency relationship information, and includes:
determining a plurality of initial source tasks with the upstream and downstream task relationships according to the dependency relationship information;
and taking the most upstream initial source task in the initial source tasks and the initial source task without the upstream and downstream task relationship as final source tasks.
The tasks of the final source after the acquiring and combining process may have a common downstream task.
As an optional implementation manner, before freezing all the at least one source task, to avoid that one task generates multiple task instances, the method further includes:
and merging the common downstream tasks when different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
In step S604, the at least one final source task and all corresponding downstream tasks are all frozen to stop the operation of the at least one final source task and all corresponding downstream tasks.
According to the task processing method provided by the embodiment of the invention, a plurality of initial source tasks are simultaneously obtained, the final source task is obtained by combining the tasks related to the task examples which possibly cause repeated generation in the initial source tasks, and the final source and all the downstream tasks are frozen, so that on one hand, the problem that the task examples are repeatedly generated when the data of a plurality of source tasks are independently repaired is avoided, and the problem that only one source task can be selected in a data repairing scheme is solved, the data repairing can be rapidly carried out, and the data repairing efficiency is improved; on the other hand, the problem that the data output of the downstream task is incorrect in the data repairing process can be solved by avoiding the situation that the data repairing is not finished by the upstream task and the downstream task reaching the specified planned operation time runs by using the error data.
As an optional implementation manner, the embodiment of the present invention may generate corresponding dependency relationship information for each initial source task in the following manner:
determining a DAG task instance graph corresponding to each initial source task by adopting a DFS (Depth First Search)/BFS (Breadth First Search) algorithm;
and on the basis of the DAG task instance graph corresponding to each initial source task, adding downstream tasks of the initial source task, which do not reach the execution time, to obtain the dependency relationship information.
In the implementation, the acquired initial source tasks are processed through a DFS/BFS algorithm one by one to calculate a DAG task instance graph corresponding to each initial source task, and each DAG task instance graph only comprises a task instance reaching the specified plan running time and does not comprise a task instance not reaching the specified plan running time. After all the initial source tasks are traversed, on the basis of the DAG task instance graph corresponding to each initial source task, adding downstream tasks of the initial source tasks which do not reach the execution time to obtain the dependency relationship information. Merging and de-duplicating the DAG task instance graph of each initial source task, wherein if a downstream task in a certain DAG task instance graph is an initial source task of another DAG, the source tasks of the DAG task instance graph are used as common final source tasks; the deduplication operation is to merge common downstream tasks, so as to realize deduplication of repeated dependency relationships.
As an optional implementation manner, in the data restoration process, after the at least one ultimate source task and all corresponding downstream tasks are frozen, a thawing operation may be performed, specifically, the following manner is adopted for thawing:
in step S604, in response to the task unfreezing instruction, the task is unfrozen from the at least one final source task in a manner that the downstream task is unfrozen from the upstream task to the downstream task when the upstream task is completely unfrozen, and after the unfrozen task is unfrozen, the task is in an unfrozen state, i.e., a normal state, and can be scheduled by the scheduling system to run.
As an alternative implementation, in the embodiment of the present invention, the task is frozen by using a freezing pool, and possible implementations are given below.
1) Freezing function of freezing pool
As shown in fig. 7, the freezing pool is in a generating state when being created, and when the task freezing process is started, the freezing pool state for freezing the task is identified as being generated, and the at least one final source task and all corresponding downstream tasks are added to the freezing pool, and specifically, each task can be frozen one by one according to a DAG sequence corresponding to a dependency relationship; and when all tasks in the freezing pool are frozen, marking the state of the freezing pool as frozen.
The task has a normal state and a frozen state, the normal state is a state that the task starts to run when reaching the specified scheduled task running time, and the frozen state is a state that the task stops running or cannot run when reaching the specified scheduled running time. The embodiment of the invention can store the freezing table in the database to record the tasks added into the freezing pool, the task state is twisted by the scheduling system, and if a certain task needs to be frozen at a certain time point, the task information and the corresponding time point of the task are recorded in the freezing table. As shown in table 1, once recorded in the freeze table, the task state is twisted from the normal state to the frozen state.
TABLE 1 freezing table
Figure BDA0002916128440000151
By adding at least one final source task and all corresponding downstream tasks into the freezing pool, the simultaneous selection of a plurality of final source tasks for data restoration can be supported. Taking fig. 1 as an example, task 1-1 and task 1-2 may be selected as source tasks in the freezing pool, and task 2-1, task 2-2, task 2-3, task 3-1 and task 3-2 will be added into the freezing pool as downstream tasks.
Tasks added into the freezing pool are all frozen, and the frozen tasks are in a frozen state. Once a task is in a frozen state, the task will not generate a corresponding task instance even if the task reaches the specified scheduled execution time.
When the task which has generated the task instance executes the freezing processing, whether the task instance which has generated the task is still in the running state is also judged, and if the task instance is in the running state, the task instance is terminated. Because the upstream data is wrong, even if the operation is successful, the data is wrong, and the embodiment of the invention freezes the tasks, so that the condition that the downstream tasks continue to produce incorrect data can be avoided.
2) Thawing function of freezing pool
Embodiments of the present invention provide for a thawing operation of a freezing bath in a frozen state, which would twist the state of the freezing bath to a thawing state. And the freezing pool in the state corresponds to the DAG according to the dependency relationship information, and unfreezes each task one by one according to the DAG sequence. Taking the example of FIG. 1, task 2-2 requires both task 1-1 and task 1-2 to be in a thawed state before thawing.
As shown in fig. 7, when a task in the freezing pool is defrosted, the state of the freezing pool is identified as being defrosted; and when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen, wherein the state represents that the data is repaired correctly. If there are other accidents during thawing, the state of the frozen pool will be reversed from thawing to a failed state.
The unfreezing operation is actually deleting the frozen records in the database corresponding to table 1. Once the record that the task is frozen is deleted from table 1, the task state is twisted from the frozen state to the normal state, and the task in the normal state generates a corresponding task instance when the task is scheduled and checked next time.
When a scheduling system generates a task instance for a certain task, firstly, whether the task is in a freezing table is judged (task information and scheduled execution time exist in the task), if yes, the instance is not generated (delayed generation), and the instance is generated after unfreezing.
For example: if the scheduled execution time of task 2-2 is 09:00 every day, the scheduling system will determine whether the task 2-2 at time 09:00 is in the freeze table when the scheduled execution time reaches 09:00, the azkaban system can uniquely locate each task by project _ id and flow _ name, and each task instance can also uniquely locate by project _ id, flow _ name and schedule _ exec _ time. Therefore, it is determined whether the task instance of task 2-2 at time 09:00 is frozen, and actually, the corresponding record is looked up from the freezing table through project _ id, flow _ name and schedule _ exec _ time. If the corresponding record is found, the task instance of the task 2-2 at the time of 09:00 is frozen, and the task instance is not generated temporarily until the corresponding record in the freezing table is deleted, and the task instance of the task 2-2 at the time of 09:00 is not generated.
As an alternative embodiment, the task to be defrosted, which performs the unfreezing process, is defrosted in the following manner:
judging whether the task to be unfrozen generates a task instance, if the task to be unfrozen generates the task instance, re-running the task instance of the task to be unfrozen, namely performing task instance re-running operation on the task, and determining that the task to be unfrozen is finished with unfreezing treatment after the re-running is successful; and if the task to be unfrozen does not generate the task instance, directly requesting the scheduling system to unfreeze the task to be unfrozen.
For the task of the failed rerun, data repairing operation is required, the freezing pool in the embodiment of the invention also provides forced unfreezing operation, and the function provides a chance for manual processing for data developers.
As an optional implementation manner, after re-running the task instance of the task to be thawed, the method further includes:
determining whether the re-operation fails or not, and performing unfreezing failure prompt to prompt a data developer to perform data repairing operation;
and when a forced unfreezing instruction is received, the freezing processing of the task to be unfrozen indicated by the unfreezing instruction is released, and after the data restoration is completed, the generation of the forced unfreezing instruction can be triggered to release the freezing processing of the task to be unfrozen indicated by the unfreezing instruction.
Taking fig. 8 as an example, task 1-1, task 1-2, task 2-1, task 2-2, and task 3-1 are tasks that generate task instances, task 2-3, and task 3-2 are tasks that do not generate task instances, and task 1-1, task 1-2, task 2-1, task 2-2, and task 3-1 may first re-run a corresponding task instance when unfreezing, and if the re-run instance is successfully run, the state of the corresponding task is set to be unfrozen. And if the tasks 2-3 and 3-2 are in the unfrozen state and the scheduled execution time corresponding to the tasks is reached, corresponding task instances are generated in the scheduling system.
As an alternative embodiment, as shown in table 2, a task table in the freeze pool may be established based on table 1, the status of the task may be identified by the status in the task table, the status of the task may be identified as being frozen if the freeze processing of the task is completed, the status of the task may be identified as being unfrozen if the unfreezing processing of the task is performed, and the status of the task may be identified as being unfrozen if the unfreezing processing of the task is completed.
TABLE 2 task tables in freezing pools
Figure BDA0002916128440000171
The modification of the status of the usual tasks can be carried out by the freezing tank itself by performing a thawing operation, for example by thawing one task after the other to finally achieve a state reversal. If the unfreezing operation of the task fails, such as the task fails to run again and manual intervention is needed, a failure prompt is output on a page, and an entrance is provided for modifying the status of the task, wherein the modification can only be carried out from unfreezing to unfrozen.
3) Pause and resume functions of freezing pool
In the related art, if the time consumed for repairing data is long, in order to avoid that data output in a subsequent time period is influenced by data repair, the data complementing function needs to be closed before tasks in the subsequent time period start to run, and all running task instances need to be terminated. The complement function does not support post-pause recovery and therefore can only be completely stopped. Thus, the next time the repair period is reached, it is necessary to manually rerun the downstream tasks that are not running. If the downstream tasks needing manual rerun are many, the downstream tasks need to be added into the idle running task which is simultaneously depended on, and the repairing process of the fault data is carried out according to the previous data repairing scheme again.
The embodiment of the invention provides a pause function for a frozen task, which specifically comprises the following steps:
responding to a thawing pause instruction, and determining a task to be thawed for currently executing thawing;
and pausing the unfreezing of the task to be unfrozen, and marking the state of the freezing pool as paused.
When the freezing tank is used to perform the freezing process of a task, the freezing tank in thawing also provides a pause function, as shown in fig. 7, and the pause operation twists the state of the freezing tank from thawing to a paused state. In view of the status of the task in the freezing tank proposed by the above-mentioned embodiment of the present invention, the thawing progress of the freezing tank can be determined while the freezing tank is in the suspended state.
The embodiment of the invention provides a function for recovering a frozen task, which specifically comprises the following steps:
in response to a thawing resuming instruction, determining at least one task to be thawed which is unfrozen and is at the most upstream in the suspended freezing pool; and starting from the at least one task to be unfrozen, unfreezing the task to be unfrozen in a mode that the downstream task is unfrozen when the upstream task is completely unfrozen from the upstream task to the downstream task.
When the freezing pool is used to perform the freezing process of the task, as shown in fig. 7, the freezing pool in the suspended state can be subjected to a recovery operation, which twists the state of the freezing pool from the suspended state to the unfreezing state. Specifically, according to the status of the tasks in the freezing pool, the status is determined to be the frozen task, and at least one task to be defrosted which is positioned at the most upstream is determined to start to continue to execute the defreezing operation.
After the pause operation is triggered in the embodiment of the invention, the freezing pool can not continue to unfreeze the downstream task. A task that is thawing will be reset to a frozen state and terminated if an instance of the task is rerun at the time of thawing and the corresponding instance is still running.
Taking FIG. 9 as an example, task 1-1, task 1-2, task 2-1, and task 2-3 are already in a thawed state before the pause operation is triggered. Task 2-2 is in defrost. After the pause operation is triggered, task 2-2 will be reset to frozen, the task instance of task 2-2 rerun will be terminated, and tasks 3-1, 3-2 will not continue to unfreeze.
After the recovery operation of the pool to be frozen is triggered, the task 2-2 will be continuously unfrozen, and the task instance of the task 2-2 will be rerun. After task 2-2 has been thawed, task 3-1 and task 3-2 will be continuously thawed. When the status of all tasks is thawed, the status of the freezing tank will be twisted from thawed to thawed.
The pause and recovery operations of the freezing pool provided by the embodiment of the invention can be temporarily interrupted as required and do not need to be marked again during subsequent recovery, so that the data recovery efficiency is improved, and the data recovery time can be provided, for example, when a re-running task instance fails, the task to be unfrozen is paused, and the state of the freezing pool is marked as paused. And after the data is repaired correctly and the corresponding task is defrosted forcibly, the unfreezing operation of the freezing pool is recovered.
4) Update function for freezing pool
The embodiment of the invention provides a function of updating frozen tasks, which specifically comprises a function of deleting a source task, wherein when the source task is deleted, the source task and downstream tasks thereof need to be all unfrozen and removed from the original DAG. Downstream tasks that are thawed and removed require independence from other source tasks. As an optional implementation manner, after the task to be defrosted is suspended from being defrosted, the method further includes:
responding to a task deleting instruction, and determining a source task indicated by the task deleting instruction;
when the indicated source task and all corresponding downstream tasks are determined to be independent of other final source tasks except the source task, the indicated final source task and all corresponding downstream tasks are unfrozen and deleted;
otherwise, the tasks depending on other final source tasks are reserved, and other tasks except the reserved tasks in the indicated source tasks and all the downstream tasks are unfrozen and deleted.
The embodiment of the invention provides a function of newly adding a freezing task, and after the task to be unfrozen is suspended for unfreezing, the method further comprises the following steps:
responding to a task adding instruction, and determining a source task indicated by the task adding instruction;
and determining that the same tasks do not exist in the indicated source task and all the downstream tasks thereof as the at least one final source task and all the downstream tasks corresponding to the final source task, and freezing the indicated source task and all the downstream tasks thereof to stop the operation of the indicated source task and all the downstream tasks thereof.
When the freezing pool is applied to freeze the task, determining a source task to be added into the freezing pool, if the newly added source task and the downstream task thereof are not overlapped with the DAG of the original freezing pool, the original task is not affected, and the indicated source task and the corresponding downstream task thereof can be directly frozen and added into the freezing pool.
As an optional implementation manner, when determining that the same task exists in the indicated source task and all the downstream tasks thereof as the at least one final source task and all the downstream tasks corresponding to the at least one final source task, the following processing manner may be adopted:
if the same task as the indicated source task exists in the at least one final source task and all the corresponding downstream tasks, freezing the same task and all the corresponding downstream tasks to stop the operation of the same task and all the downstream tasks;
if the same task as the downstream task of the indicated source task exists in the at least one final source task and all the downstream tasks corresponding to the at least one final source task, freezing the same task and all the corresponding downstream tasks to stop the operation of the same task and all the downstream tasks, taking the indicated source task as a new final source task, and freezing the new final source task and all the corresponding downstream tasks to stop the operation of the new final source task and all the corresponding downstream tasks.
When the freezing pool is applied to freeze a task, determining a source task to be added into the freezing pool, if a newly added source task and a downstream task thereof have a coincident task with a DAG of the original freezing pool, and the coincident task is an indicated source task, taking the coincident task as a starting point, refreezing the starting point task and all other downstream tasks, freezing the indicated source task and the corresponding downstream tasks thereof into the freezing pool, and merging the coincident tasks. When the overlapped task is a downstream task of the indicated source task, the overlapped task is taken as a starting point, the starting point task and all other downstream tasks are re-frozen, the indicated source task is taken as a new final source task, the indicated source task and the corresponding downstream task are frozen and added into a freezing pool, and the overlapped tasks are merged. I.e. if a newly added task is upstream of a task, it needs to be rerun whether or not it was previously thawed.
When the freezing pool is used for freezing a task, the freezing pool in unfreezing also provides a pause function, and the pause operation can twist the state of the freezing pool to a paused state. The frozen pool in the suspended state can also be subjected to updating operation, if the frozen pool is updated, after the updating operation is finished, the state of the frozen pool is twisted from the suspended state to the unfreezing state.
FIG. 10 is a DAG execution diagram of an example freeze pool, with solid lines indicating that the task has been unfrozen and dashed lines indicating that the task is in a frozen state. In response to a task deletion instruction, determining to delete task 1-1, determining that task 2-2, task 3-1 and task 3-2 downstream of task 1-1 depend on another task 1-2, thus keeping task 2-2, task 3-1 and task 3-2, deleting task 1-1 and task 2-1 which do not depend on other source tasks, adding task 1-3 and task 2-3, task 2-4 and task 3-3 downstream thereof, overlapping tasks being 2-3, task 2-3 having been unfrozen originally, but adding task 1-3 and task 2-4 and task 3-3 downstream thereof in a DAG graph because new task 1-3 is an upstream task of task 2-3, then task 2-3 is refrozen, the DAG for the frozen pool changes from fig. 10 to fig. 11.
5) State torsion of freezing pool
As described above, when the freezing tank is used to perform task freezing processing, the state of the freezing tank may be in production, frozen, unfreezing, suspended, unfrozen, and failed according to different operations, the freezing tank in the suspended state may be discarded, the discarded freezing tank may unfreeze all tasks, and then the state of the freezing tank is twisted to "discarded". The state of the freeze pool may be maintained using a freeze pool table structure, as shown in table 2:
TABLE 2 freezing pool table structure
Field(s) Type (B) Description of the invention
id int Main key self-increasing
name varchar Freezing pool name
status varchar Frozen state of the pool
creator varchar Creators
create_time bigint Creation time
mofifier varchar Reviser
modify_time bigint Modifying time
service_time bigint Modifying time
version int Frozen pool versions
The state twist of the frozen pool is realized by modifying the status field of the table structure of the frozen pool. The Status field owns: in production, frozen, thawing, thawed, failed, discarded, and paused states. The above-mentioned state of the freezing pool is reversed in the above description, and will not be repeated here.
The function of the freezing pool provided by the embodiment of the invention can assist in quickly recovering data failure, greatly reduces the time for repairing data, and simultaneously efficiently ensures the correctness of the repaired data.
Exemplary device
Having described the embodiments of the exemplary task processing method of the present invention, a task processing device according to an exemplary embodiment of the present invention will be described next with reference to fig. 12.
As shown in fig. 12, based on the same inventive concept, an embodiment of the present invention further provides a task processing apparatus, including:
a source task obtaining module 1201, obtaining at least one initial source task;
a dependency relationship generating module 1202, configured to generate corresponding dependency relationship information for each initial source task of the at least one initial source task;
a source task merging module 1203, configured to merge the at least one initial source task according to the dependency relationship information to obtain at least one final source task;
a freezing processing module 1204, configured to freeze the at least one source task and all corresponding downstream tasks, so as to stop running of the at least one source task and all corresponding downstream tasks.
In a possible implementation manner, the source task merging module 1203 performs merging processing on the at least one initial source task according to the dependency relationship information, including:
determining a plurality of initial source tasks with the upstream and downstream task relationships according to the dependency relationship information;
and taking the most upstream initial source task in the initial source tasks and the initial source task without the upstream and downstream task relationship as final source tasks.
In a possible implementation manner, before the freezing module 1204 freezes each of the at least one ultimate source task, the freezing module is further configured to:
and merging the common downstream tasks when different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
In a possible implementation manner, the dependency relationship generating module 1202 generates corresponding dependency relationship information for each initial source task, including:
determining a DAG task instance graph corresponding to each initial source task by adopting a DFS/BFS algorithm;
and on the basis of the DAG task instance graph corresponding to each initial source task, adding downstream tasks of the initial source task, which do not reach the execution time, to obtain the dependency relationship information.
In one possible embodiment, the apparatus further comprises:
and a task unfreezing module 1205, configured to respond to a task unfreezing instruction, start from the at least one final source task, and unfreeze the downstream task in a manner that the unfreezing processing of the downstream task is executed when all the upstream tasks are finished with the unfreezing processing, so as to resume scheduling operation of the unfrozen task.
In one possible implementation, the task unfreezing module 1205 unfreezes the task to be unfrozen that performs the unfreezing process as follows:
if the task to be unfrozen generates the task instance, re-running the task instance of the task to be unfrozen, and determining that the task to be unfrozen is subjected to unfreezing treatment after the re-running is successful; or
And if the task to be unfrozen does not generate a task instance, directly unfreezing the task to be unfrozen.
In a possible implementation manner, after the task unfreezing module 1205 reruns the task instance of the task to be unfrozen, the method further includes:
determining whether the heavy operation fails or not, and prompting the unfreezing failure;
and when a forced unfreezing instruction is received, the freezing processing of the task to be unfrozen indicated by the unfreezing instruction is released.
In one possible embodiment, the method further comprises:
the freezing pool processing module is used for identifying the freezing pool state for freezing the tasks as the generation, and adding the at least one final source task and all corresponding downstream tasks into the freezing pool; and when all tasks in the freezing pool are frozen, marking the state of the freezing pool as frozen.
In one possible embodiment, the freeze tank processing module is further configured to:
when the task in the freezing pool is unfrozen, the state of the freezing pool is marked as unfreezing;
and when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen.
In one possible embodiment, the method further comprises:
the unfreezing pause module is used for responding to a unfreezing pause instruction and determining a task to be unfrozen which is currently executed for unfreezing; and pausing the unfreezing of the task to be unfrozen, and marking the state of the freezing pool as paused.
In one possible embodiment, the method further comprises:
the unfreezing recovery module is used for unfreezing and responding to a command of recovering unfreezing, and determining at least one task to be unfrozen which is not unfrozen and is positioned at the most upstream in the suspended freezing pool; and starting from the at least one task to be unfrozen, unfreezing the task to be unfrozen in a mode that the downstream task is unfrozen when the upstream task is completely unfrozen from the upstream task to the downstream task.
As a possible embodiment, after the pause thawing module pauses the thawing of the task to be thawed, the pause thawing module is further configured to:
responding to a task deleting instruction, and determining a source task indicated by the task deleting instruction;
when the indicated source task and all corresponding downstream tasks are determined to be independent of other final source tasks except the source task, the indicated final source task and all corresponding downstream tasks are unfrozen and deleted;
otherwise, the tasks depending on other final source tasks are reserved, and other tasks except the reserved tasks in the indicated source tasks and all the downstream tasks are unfrozen and deleted.
As a possible embodiment, after the pause thawing module pauses the thawing of the task to be thawed, the pause thawing module is further configured to:
responding to a task adding instruction, and determining a source task indicated by the task adding instruction;
and determining that the same tasks do not exist in the indicated source task and all the downstream tasks thereof, and freezing the indicated source task and all the downstream tasks thereof to stop the operation of the indicated source task and all the downstream tasks thereof.
As a possible implementation, the pause unfreezing module is further configured to, when the same task exists in the indicated source task and all the downstream tasks thereof as in the at least one final source task and all the downstream tasks corresponding thereto:
if the same task as the indicated source task exists in the at least one final source task and all the corresponding downstream tasks, freezing the same task and all the corresponding downstream tasks to stop the operation of the same task and all the downstream tasks;
if the same task as the downstream task of the indicated source task exists in the at least one final source task and all the downstream tasks corresponding to the at least one final source task, freezing the same task and all the corresponding downstream tasks to stop the operation of the same task and all the downstream tasks, taking the indicated source task as a new final source task, and freezing the new final source task and all the corresponding downstream tasks to stop the operation of the new final source task and all the corresponding downstream tasks.
The electronic device 130 according to this embodiment of the present invention is described below with reference to fig. 13. The electronic device shown in fig. 13 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 13, the electronic device 130 may be embodied in the form of a general purpose computing device, which may be a terminal device, for example. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131). The processor 132 is configured to execute the instructions to implement the task processing method provided by the above-described embodiment of the present invention.
Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable electronic device 130 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the inventory supply chain management device 130 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 136. As shown, network adapter 136 communicates with the other modules of electronic device 130 over bus 133. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 130, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Exemplary program product
In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps of the method of task processing according to various exemplary embodiments of the invention described in the above-mentioned "exemplary methods" section of this description, when the program product is run on the terminal device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As shown in fig. 14, a program product 140 according to an embodiment of the present invention is depicted, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).
It should be noted that although several modules or sub-modules of the system are mentioned in the above detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module according to embodiments of the invention. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.
Moreover, although the operations of the modules of the system of the present invention are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain operations may be omitted, operations combined into one operation execution, and/or operations broken down into multiple operation executions.
While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A task processing method, comprising:
acquiring at least one initial source task;
generating corresponding dependency relationship information for each initial source task in the at least one initial source task;
merging the at least one initial source task according to the dependency relationship information to obtain at least one final source task;
and freezing the at least one final source task and all corresponding downstream tasks to stop the operation of the at least one final source task and all corresponding downstream tasks.
2. The method of claim 1, wherein merging the at least one initial source task according to the dependency information comprises:
determining a plurality of initial source tasks with the upstream and downstream task relationships according to the dependency relationship information;
and taking the most upstream initial source task in the initial source tasks and the initial source task without the upstream and downstream task relationship as final source tasks.
3. The method of claim 1, wherein prior to freezing each of the at least one ultimate source task, further comprising:
and merging the common downstream tasks when different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
4. The method of claim 1, wherein generating corresponding dependency information for each of the initial source tasks comprises:
determining a DAG task instance graph corresponding to each initial source task by adopting a DFS/BFS algorithm;
and on the basis of the DAG task instance graph corresponding to each initial source task, adding downstream tasks of the initial source task, which do not reach the execution time, to obtain the dependency relationship information.
5. The method of claim 1, further comprising:
and responding to a task unfreezing instruction, and unfreezing from the at least one final source task according to a mode of performing unfreezing processing on the downstream task from the upstream task to the downstream task when the upstream task is completely unfrozen, so as to recover the scheduling operation of the unfrozen task.
6. The method of any one of claims 1 to 5, further comprising:
identifying a freezing pool state for freezing the tasks as being generated, and adding the at least one final source task and all corresponding downstream tasks into the freezing pool;
and when all tasks in the freezing pool are frozen, marking the state of the freezing pool as frozen.
7. The method of claim 6,
when the task in the freezing pool is unfrozen, the state of the freezing pool is marked as unfreezing;
and when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen.
8. A task processing apparatus, characterized in that the apparatus comprises:
the source task acquisition module is used for acquiring at least one initial source task;
the dependency relationship generating module is used for generating corresponding dependency relationship information for each initial source task in the at least one initial source task;
a source task merging module, configured to merge the at least one initial source task according to the dependency relationship information to obtain at least one final source task;
and the freezing processing module is used for freezing the at least one final source task and all corresponding downstream tasks so as to stop the operation of the at least one final source task and all corresponding downstream tasks.
9. An electronic device comprising one or more processors and a memory for storing instructions executable by the processors;
wherein the processor is configured to execute the instructions to implement the task processing method according to any one of claims 1 to 7.
10. A storage medium having stored therein a computer program which, when executed by a processor, implements a task processing method according to any one of claims 1 to 7.
CN202110101553.6A 2021-01-26 2021-01-26 Task processing method and device, electronic equipment and storage medium Active CN112764907B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110101553.6A CN112764907B (en) 2021-01-26 2021-01-26 Task processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110101553.6A CN112764907B (en) 2021-01-26 2021-01-26 Task processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112764907A true CN112764907A (en) 2021-05-07
CN112764907B CN112764907B (en) 2024-05-10

Family

ID=75707400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110101553.6A Active CN112764907B (en) 2021-01-26 2021-01-26 Task processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112764907B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168275A (en) * 2021-10-28 2022-03-11 厦门国际银行股份有限公司 Task scheduling method, system, terminal device and storage medium
CN117076095A (en) * 2023-10-16 2023-11-17 华芯巨数(杭州)微电子有限公司 Task scheduling method, system, electronic equipment and storage medium based on DAG

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8578389B1 (en) * 2004-05-04 2013-11-05 Oracle America, Inc. Method and system for merging directed acyclic graphs representing data flow codes
CN103761111A (en) * 2014-02-19 2014-04-30 中国科学院软件研究所 Method and system for constructing data-intensive workflow engine based on BPEL language
CN108388474A (en) * 2018-02-06 2018-08-10 北京易沃特科技有限公司 Intelligent distributed management of computing system and method based on DAG
CN108984284A (en) * 2018-06-26 2018-12-11 杭州比智科技有限公司 DAG method for scheduling task and device based on off-line calculation platform
CN110019144A (en) * 2018-06-19 2019-07-16 杭州数澜科技有限公司 A kind of method and system of big data platform data O&M
CN110347708A (en) * 2019-06-28 2019-10-18 深圳市元征科技股份有限公司 A kind of data processing method and relevant device
CN110737542A (en) * 2018-07-19 2020-01-31 慧与发展有限责任合伙企业 Freezing and unfreezing upstream and downstream rolls
CN111061551A (en) * 2019-12-06 2020-04-24 深圳前海微众银行股份有限公司 Node merging and scheduling method, device, equipment and storage medium
CN112052077A (en) * 2019-06-06 2020-12-08 北京字节跳动网络技术有限公司 Method, device, equipment and medium for software task management
CN112100019A (en) * 2019-09-12 2020-12-18 无锡江南计算技术研究所 Multi-source fault collaborative analysis positioning method for large-scale system
US20200401444A1 (en) * 2019-06-24 2020-12-24 Nvidia Corporation Efficiently executing workloads specified via task graphs
CN112148455A (en) * 2020-09-29 2020-12-29 星环信息科技(上海)有限公司 Task processing method, device and medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8578389B1 (en) * 2004-05-04 2013-11-05 Oracle America, Inc. Method and system for merging directed acyclic graphs representing data flow codes
CN103761111A (en) * 2014-02-19 2014-04-30 中国科学院软件研究所 Method and system for constructing data-intensive workflow engine based on BPEL language
CN108388474A (en) * 2018-02-06 2018-08-10 北京易沃特科技有限公司 Intelligent distributed management of computing system and method based on DAG
CN110019144A (en) * 2018-06-19 2019-07-16 杭州数澜科技有限公司 A kind of method and system of big data platform data O&M
CN108984284A (en) * 2018-06-26 2018-12-11 杭州比智科技有限公司 DAG method for scheduling task and device based on off-line calculation platform
CN110737542A (en) * 2018-07-19 2020-01-31 慧与发展有限责任合伙企业 Freezing and unfreezing upstream and downstream rolls
CN112052077A (en) * 2019-06-06 2020-12-08 北京字节跳动网络技术有限公司 Method, device, equipment and medium for software task management
US20200401444A1 (en) * 2019-06-24 2020-12-24 Nvidia Corporation Efficiently executing workloads specified via task graphs
CN110347708A (en) * 2019-06-28 2019-10-18 深圳市元征科技股份有限公司 A kind of data processing method and relevant device
CN112100019A (en) * 2019-09-12 2020-12-18 无锡江南计算技术研究所 Multi-source fault collaborative analysis positioning method for large-scale system
CN111061551A (en) * 2019-12-06 2020-04-24 深圳前海微众银行股份有限公司 Node merging and scheduling method, device, equipment and storage medium
CN112148455A (en) * 2020-09-29 2020-12-29 星环信息科技(上海)有限公司 Task processing method, device and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168275A (en) * 2021-10-28 2022-03-11 厦门国际银行股份有限公司 Task scheduling method, system, terminal device and storage medium
CN117076095A (en) * 2023-10-16 2023-11-17 华芯巨数(杭州)微电子有限公司 Task scheduling method, system, electronic equipment and storage medium based on DAG
CN117076095B (en) * 2023-10-16 2024-02-09 华芯巨数(杭州)微电子有限公司 Task scheduling method, system, electronic equipment and storage medium based on DAG

Also Published As

Publication number Publication date
CN112764907B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
US8015430B1 (en) Using asset dependencies to identify the recovery set and optionally automate and/or optimize the recovery
EP2979205B1 (en) Recovery processing using torn write detection
JP5970617B2 (en) Development support system
US7634687B2 (en) Checkpoint restart system and method
US8140907B2 (en) Accelerated virtual environments deployment troubleshooting based on two level file system signature
US9063894B2 (en) Cascade ordering
US9569204B2 (en) End-to-end continuous integration and verification of software
CN112764907B (en) Task processing method and device, electronic equipment and storage medium
US20110078499A1 (en) Business process error handling through process instance backup and recovery
US8032618B2 (en) Asynchronous update of virtualized applications
CN112416379B (en) Application program installation method and device, computing equipment and readable storage medium
US20150294250A1 (en) Building confidence of system administrator in productivity tools and incremental expansion of adoption
US20200125964A1 (en) Predicting successful completion of a database utility process within a time frame having concurrent user access to the database
US7949688B2 (en) Method of recording and backtracking business information model changes
CN108664255A (en) A kind of method for upgrading software and device
Montezanti et al. Soft errors detection and automatic recovery based on replication combined with different levels of checkpointing
CA2299850C (en) System and method for the management of computer software maintenance
US11334347B2 (en) Cognitive build recovery from inter-code commit issues
JP6327028B2 (en) Object storage system, control method thereof, and control program thereof
WO2015072078A1 (en) Service resumption sequence generating device, service resumption sequence generating method, and service resumption sequence generating program
US11327723B1 (en) Development environment integrated with failure detection system
CN113672277B (en) Code synchronization method, system, computer device and storage medium
CN117130987B (en) Flight control management method for large-scale unmanned aerial vehicle cluster
US11620208B2 (en) Deployment of variants built from code
JP7024804B2 (en) System update device and system update method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant