CN116521404A - Dependency diagnosis method, dependency diagnosis device, electronic equipment and storage medium - Google Patents

Dependency diagnosis method, dependency diagnosis device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116521404A
CN116521404A CN202210068326.2A CN202210068326A CN116521404A CN 116521404 A CN116521404 A CN 116521404A CN 202210068326 A CN202210068326 A CN 202210068326A CN 116521404 A CN116521404 A CN 116521404A
Authority
CN
China
Prior art keywords
batch
task
dependency relationship
tasks
diagnosed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210068326.2A
Other languages
Chinese (zh)
Inventor
罗建林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boc Financial Technology Co ltd
Original Assignee
Boc Financial Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Boc Financial Technology Co ltd filed Critical Boc Financial Technology Co ltd
Priority to CN202210068326.2A priority Critical patent/CN116521404A/en
Publication of CN116521404A publication Critical patent/CN116521404A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0715Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a system implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method, a device, an electronic device and a storage medium for diagnosing a dependency relationship, wherein the method comprises the following steps: obtaining persistence information of each batch of tasks; determining a first dependency relationship of batch tasks to be diagnosed based on the persistence information of each batch task, wherein the batch tasks to be diagnosed belong to each batch task; and diagnosing a second dependency relationship of the batch task to be diagnosed based on the first dependency relationship of the batch task to be diagnosed to obtain a diagnosis result, wherein the second dependency relationship is preconfigured. The method, the device, the electronic equipment and the storage medium provided by the invention can realize automatic diagnosis of the dependency relationship of the batch tasks, ensure the accuracy of diagnosis, reduce unreasonable follow-up dependency and non-optimal dependency of the batch tasks, improve the diagnosis efficiency of the dependency relationship of the batch tasks, save manual complicated operation, shorten the daily development period, and further ensure the timeliness of the business after batch scheduling task production.

Description

Dependency diagnosis method, dependency diagnosis device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for diagnosing a dependency relationship, an electronic device, and a storage medium.
Background
At present, the drive dependency relationship of the batch scheduling system is configured by technicians based on certain business logic, but as the business development is carried out, a batch scheduling system which is easy to be simple becomes more and more huge, so that not only are batch tasks to be scheduled more and more, but also the dependency relationship among the batch tasks is more and more complex, the difficulty of manually configuring the dependency relationship is more and more, and the unreasonable configuration is easy to occur.
In order to diagnose the dependency relationship, the existing scheme generally adopts a manual judgment mode to analyze and process the dependency relationship of batch tasks one by one, however, because batch dispatching systems often have numerous batch tasks, the manual judgment mode not only consumes a great deal of manpower, but also has lower efficiency and affects the timeliness of actual business service.
Disclosure of Invention
The invention provides a dependency diagnosis method, a device, electronic equipment and a storage medium, which are used for solving the defect of low dependency diagnosis efficiency in the prior art and improving the diagnosis efficiency of batch task dependency.
The invention provides a method for diagnosing a dependency relationship, which comprises the following steps:
obtaining persistence information of each batch of tasks;
determining a first dependency relationship of batch tasks to be diagnosed based on the persistence information of each batch task, wherein the batch tasks to be diagnosed belong to each batch task;
and diagnosing a second dependency relationship of the batch task to be diagnosed based on the first dependency relationship of the batch task to be diagnosed to obtain a diagnosis result, wherein the second dependency relationship is preconfigured.
According to the method for diagnosing the dependency relationship provided by the invention, the first dependency relationship of the batch task to be diagnosed is determined based on the persistence information of each batch task, and the method comprises the following steps:
and determining a first dependency relationship of the batch task to be diagnosed based on operation information for a data table and/or a file included in the persistence information of each batch task.
According to the method for diagnosing the dependency relationship provided by the invention, the determining the first dependency relationship of the batch task to be diagnosed based on the operation information for the data table and/or the file included in the persistence information of each batch task comprises the following steps:
determining an associated batch task associated with the batch task to be diagnosed based on operation information for a data table and/or a file included in the persistence information of each batch task;
and determining a first dependency relationship of the batch task to be diagnosed based on operation information for the same data table and/or the same file, which is included in the persistence information of the batch task to be diagnosed and the associated batch task.
According to the method for diagnosing the dependency relationship provided by the invention, the method for acquiring the persistence information of each batch of tasks comprises the following steps:
determining a persistence information acquisition program corresponding to each batch task based on the program language of each batch task;
and acquiring the persistence information of each batch of tasks based on the persistence information acquisition program corresponding to each batch of tasks.
According to the method for diagnosing a dependency relationship provided by the invention, the obtaining of the persistence information of each batch of tasks based on the persistence information obtaining program corresponding to each batch of tasks includes:
based on a persistence information acquisition program corresponding to any batch of tasks, acquiring persistence information of any batch of tasks;
and if the persistent information of any batch of tasks is not successfully obtained, obtaining the persistent information of any batch of tasks which are manually input.
According to the diagnosis method of the dependency relationship, the second dependency relationship is determined based on the directed acyclic graph of the batch scheduling system.
The invention also provides a device for diagnosing the dependency relationship, which comprises:
the acquisition unit is used for acquiring the persistence information of each batch of tasks;
the determining unit is used for determining a first dependency relationship of the batch tasks to be diagnosed based on the persistence information of each batch task, and the batch tasks to be diagnosed belong to each batch task;
the diagnosis unit is used for diagnosing a second dependency relationship of the batch task to be diagnosed based on the first dependency relationship of the batch task to be diagnosed to obtain a diagnosis result, wherein the second dependency relationship is preconfigured.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method for diagnosing a dependency as described in any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method of diagnosing a dependency as described in any one of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of a method of diagnosing a dependency as described in any one of the above.
According to the dependency relationship diagnosis method, device, electronic equipment and storage medium, the first dependency relationship of the batch task to be diagnosed is determined by acquiring the persistence information of each batch task, and the second dependency relationship of the batch task to be diagnosed is diagnosed based on the first dependency relationship of the batch task to be diagnosed, so that the dependency relationship of the batch task can be automatically diagnosed, the accuracy of diagnosis is ensured, the follow-up dependency and non-optimal dependency of the unreasonable batch task are reduced, the diagnosis efficiency of the batch task dependency relationship is improved, the manual tedious operation is saved, the daily development period is shortened, and the timeliness of the business after batch scheduling task is put into operation can be further ensured.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for diagnosing a dependency relationship according to the present invention;
FIG. 2 is a flow chart of a method for obtaining persistent information provided by the present invention;
FIG. 3 is a second flow chart of the dependency diagnosis method according to the present invention;
FIG. 4 is a schematic diagram of a dependency diagnosis apparatus according to the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The batch scheduling system is a system for scheduling the batch tasks according to the set driving dependency relationship so as to realize automatic task scheduling, and solves the high concurrency and the transverse expansion functions of the batch system. Batch tasks are some applications (stored procedures, C-programs, shell programs, etc.) that can be executed at a server and can be scheduled by a batch scheduling system.
At present, the drive dependency relationship of the batch scheduling system is configured by technicians based on certain business logic, but as a simple batch scheduling system is also becoming more and more huge along with the development of business, not only are batch tasks needing to be scheduled more and more, but also the dependency relationship among batch tasks is more and more complex, and the difficulty of manually configuring the dependency relationship is also more and more, so the following problems easily occur:
1) The dependency relationship of the configuration has unnecessary front-back dependency, such as 3 batch tasks, namely, the job-a, the job-b and the job-c, have no dependency relationship, at the moment, one job-d task is added, the normal job-d task can be executed after the job-a is successfully executed, but the precursor dependency of the job-d is configured into the job-a, the job-b and the job-c for some reasons, namely, the job-d needs to be executed after the job-a, the job-b and the job-c are successfully executed, if the job-b and the job-c normally complete the tasks before the job-a is executed, the job-d is not influenced, and the job-d is influenced as long as the job-b or the job-c is more finished than the job-a;
2) The dependency of the configuration is not optimal, for example, 3 batch tasks, namely, the blob-a, the blob-b and the blob-c exist, the dependency is that the blob-b is executed after the execution of the blob-a is finished, then the blob-c is executed after the execution of the blob-b is finished, at this time, a blob-d task is added, and the blob-d task can be executed after the successful execution of the blob-a, but the precursor dependency of the blob-d may be configured as the blob-b or the blob-c for some reasons, namely, the blob-d needs to be executed after the successful execution of the blob-b or the blob-c. If the successful execution of the job-c and the job-d takes a few hours, the method is equivalent to the situation that the job-d waits for a few hours all the time, so that the total batch time consumption is increased, and the timeliness of the actual business service is affected.
In view of the above, the present invention provides a method for diagnosing a dependency relationship. FIG. 1 is a schematic flow chart of a dependency diagnosis method according to the present invention, which is applied to a batch scheduling system, as shown in FIG. 1, and includes:
step 110, obtaining persistence information of each batch of tasks;
step 120, determining a first dependency relationship of batch tasks to be diagnosed based on persistence information of each batch task, wherein the batch tasks to be diagnosed belong to each batch task;
step 130, diagnosing a second dependency relationship of the batch task to be diagnosed based on the first dependency relationship of the batch task to be diagnosed, so as to obtain a diagnosis result, wherein the second dependency relationship is preconfigured.
In particular, considering the problem that the existing method needs to consume manpower to process batches, the manpower cost is high, and as batch dispatching systems often have a lot of tasks, some systems can reach tens of thousands or even hundreds of thousands, whether the dependency relationship configuration of each batch task is reasonable or not and whether the dependency relationship configuration of each batch task is optimal or not can not be combed one by one in a manual judgment mode, the judgment efficiency is low, and the timeliness of actual business service is affected.
In this regard, in the embodiment of the present invention, persistent information of each batch task in the batch scheduling system is first obtained, where the persistent information, that is, information that can be stored in a persistent manner for each batch task, may include information about operation information, task names, and other information about each batch task. Further, after the persistence information of each batch of tasks is obtained, the persistence information of each batch of tasks can be recorded in a task persistence information table, so that the subsequent dependency diagnosis can be conveniently invoked. It can be understood that various operations of each batch task and the relevance of each batch task on the operation can be judged through the persistence information of each batch task, and on the basis, the actual most reasonable dependency relationship, namely the first dependency relationship, of the batch task to be diagnosed can be judged according to the persistence information of each batch task.
The batch tasks to be diagnosed, namely the batch tasks needing to be subjected to dependency diagnosis, belong to one of the batch tasks in the batch scheduling system, can be batch tasks selected by a user, can be batch tasks selected automatically by a program, and can be batch tasks newly added or modified in the batch scheduling system, and the embodiment of the invention is not particularly limited to the above. The dependencies may include predecessor and successor relationships of the batch task to be diagnosed. The determination manner of the first dependency relationship of the batch task to be diagnosed may be determined only according to the persistence information of each batch task, or may be determined by combining the dependency relationship of the original configuration of the batch task to be diagnosed on the basis of the persistence information of each batch task, which is not particularly limited in the embodiment of the present invention.
And then diagnosing the dependency relationship pre-configured for the batch tasks to be diagnosed in the batch scheduling system, namely the second dependency relationship, according to the judged first dependency relationship of the batch tasks to be diagnosed, so as to obtain a diagnosis result. Here, the diagnosis result may include whether the second dependency of the batch task to be diagnosed is reasonable, and if not, may include an unreasonable specific type, for example, an unnecessary dependency is configured, a dependency that is not optimal, a dependency that is lacking, or the like. Further, when the diagnosis result indicates that the second dependency relationship of the batch task to be diagnosed is unreasonable, an optimized configuration suggestion may be given according to the determined first dependency relationship of the batch task to be diagnosed, or the second dependency relationship may be directly replaced by the determined first dependency relationship of the batch task to be diagnosed.
It should be noted that, by diagnosing the batch task as a whole and implementing automatic judgment of rationality and optimal solution of follow-up dependence of the batch task precursor according to persistence information of each batch task, problems existing in batch task dependency configuration when the batch task is numerous are automatically handled, thereby greatly improving the diagnosis efficiency of the batch task dependency and guaranteeing the rationality of the batch task dependency configuration. In addition, if the large data platform has the changes such as the new addition, the modification and the like of the batch tasks, the persistence information of each batch task needs to be obtained again, the first dependency relationship of the batch tasks to be diagnosed is determined again, and the second dependency relationship of the batch tasks to be diagnosed is diagnosed again, so that the configuration rationality of the dependency relationship of the batch tasks to be diagnosed is ensured.
According to the method provided by the embodiment of the invention, the first dependency relationship of the batch task to be diagnosed is determined by acquiring the persistence information of each batch task, and the second dependency relationship of the batch task to be diagnosed is diagnosed based on the first dependency relationship of the batch task to be diagnosed, so that the dependency relationship of the batch task can be automatically diagnosed, the accuracy of diagnosis is ensured, the unreasonable follow-up dependency and non-optimal dependency of the batch task are reduced, the diagnosis efficiency of the batch task dependency relationship is improved, the manual tedious operation is saved, the daily development period is shortened, and the timeliness of the service after batch scheduling task production can be ensured.
Based on the above embodiment, step 120 includes:
and determining a first dependency relationship of the batch task to be diagnosed based on the operation information for the data table and/or the file, which is included in the persistence information of each batch task.
Specifically, considering that a batch task may have an operation on a data table and may have an operation on a file, when obtaining persistence information of each batch task, the embodiment of the present invention may obtain operation information of each batch task for the data table, and operation information of each batch task for the file, where the operation information may include specific operation action types, such as querying, inserting, updating, deleting, initializing, and the like, and may further include an object to be operated, time information of the operation, and the like, for example, which data table or which file is specifically queried as an object to be queried.
Based on the above, the relevance of each batch task on the operation can be analyzed according to the operation information aiming at the data table and/or the file and included in the persistence information of each batch task, so that the actual most reasonable dependency relationship of the batch task to be diagnosed, namely the first dependency relationship, is judged and used for the subsequent dependency relationship diagnosis.
The determining manner of the first dependency relationship of the batch task to be diagnosed may specifically be determining a batch task that may actually affect the batch task to be diagnosed according to a data table and/or a file related to the batch task to be diagnosed and a batch task having operation information on the data table and/or the file before the batch task to be diagnosed, so as to determine the first dependency relationship of the batch task to be diagnosed, or may be calculating the relevance between other batch tasks and the batch task to be diagnosed according to the operation information of each batch task on the data table and/or the file, and determining the first dependency relationship of the batch task to be diagnosed according to the batch task with the strongest relevance to the batch task to be diagnosed.
Based on any of the above embodiments, determining a first dependency relationship of a batch task to be diagnosed based on operation information for a data table and/or a file included in persistence information of each batch task includes:
determining an associated batch task associated with the batch task to be diagnosed based on operation information for the data table and/or the file included in the persistence information of each batch task;
and determining a first dependency relationship of the batch task to be diagnosed based on the operation information for the same data table and/or the same file, which is included in the persistence information of the batch task to be diagnosed and the associated batch task.
Specifically, according to the operation information for the data table and/or the file included in the persistence information of the batch task to be diagnosed, the data table and/or the file related to the batch task to be diagnosed can be determined, then according to the operation information for the data table and/or the file included in the persistence information of other batch tasks, batch tasks with the operation information for the same data table and/or the same file before the batch task to be diagnosed can be found out, namely, the batch tasks can be used as related batch tasks related to the batch task to be diagnosed, finally, batch tasks which can actually affect the batch task to be diagnosed can be screened out from the related batch tasks according to the operation information for the same data table and/or the same file included in the persistence information of the two, and therefore, the most reasonable actual dependency relationship, namely, the first dependency relationship, of the batch task to be diagnosed can be obtained.
For example, the batch task to be diagnosed is job-c, the batch task to be diagnosed relates to a data table A, the relevant batch task with operation information on the data table A is known to have job-a, job-b through searching persistence information, then, the operation information is known to have query operation on the data table A by job-c, the data table A is known to have insertion operation by job-a, the data table A is known to have query operation by job-b, and the most reasonable precursor of job-c is job-a because the query operation on the data table A is not changed, namely, job-c can be executed after job-a is executed;
for another example, the batch task to be diagnosed is job-d, the batch task to be diagnosed relates to the data table B, the associated batch task with operation information on the data table B is known to have job-B and job-c through searching the persistence information, then, the operation information is known to have the insertion operation on the data table B by job-d, the data table B is initialized by job-B, the data table B is inserted by job-c, and the insertion operation on the data table B by job-c is not influenced by the insertion operation on the data table B by job-d, so that the most reasonable precursor of job-d is job, namely, job-B can be executed after job-B is executed.
Based on any of the above embodiments, step 110 includes:
determining a persistence information acquisition program corresponding to each batch task based on the program language of each batch task;
and acquiring the persistence information of each batch of tasks based on the persistence information acquisition program corresponding to each batch of tasks.
Specifically, aiming at different programming languages applied to batch tasks, such as shell language, java language or sql language, different persistent information acquisition programs are developed according to the embodiment of the invention, so that automatic acquisition of persistent information is realized, and the efficiency of persistent information acquisition is greatly improved. On the basis, the persistence information of each batch task can be acquired according to the persistence information acquisition program corresponding to each batch task.
According to the method provided by the embodiment of the invention, the program languages applied by the batch tasks are considered to be different, and corresponding persistent information acquisition programs are developed for different batch tasks, so that the suitability of the programs is improved, the efficiency and the effectiveness of persistent information acquisition are improved, and the efficiency and the accuracy of dependency diagnosis are further improved.
Based on any of the above embodiments, based on the persistence information acquiring program corresponding to each batch task, acquiring persistence information of each batch task includes:
based on a persistence information acquisition program corresponding to any batch of tasks, acquiring persistence information of the batch of tasks;
and if the acquisition of the persistence information of the batch of tasks is unsuccessful, acquiring the persistence information of the batch of tasks which is manually input.
Specifically, considering that the automatic acquisition of the persistence information of a special batch task may not succeed, the embodiment of the invention can acquire the persistence information of any batch task by the following method:
firstly, a persistence information acquisition program corresponding to the batch task is applied to acquire persistence information of the batch task, whether the persistence information is successfully acquired is judged, if the persistence information acquisition of the batch task is unsuccessful, prompt information can be sent out to request manual acquisition and input of the persistence information of the batch task, and on the basis, the persistence information of the batch task which is manually input can be acquired. Finally, the persistence information of all batch tasks can be obtained, and data support is provided for the follow-up dependency diagnosis.
Further, fig. 2 is a schematic flow chart of the method for obtaining persistent information provided by the present invention, as shown in fig. 2, in order to ensure that persistent information of all batch tasks in a batch scheduling system is recorded in a task persistent information table, whether the persistent information of all batch tasks is obtained is judged to be completed or not first, if the persistent information of a batch task is not obtained to be completed, the persistent information of the batch task can be obtained based on a persistent information obtaining program, whether the obtaining is successful or not is judged, if the obtaining is successful, the obtained persistent information can be directly recorded into a task persistent information table, if the obtaining is unsuccessful, the manually obtained and input persistent information of the batch task is required to be requested, and on the basis, the manually input persistent information of the batch task can be obtained and recorded into the task persistent information table.
Based on any embodiment, taking account of the change of the file on the batch scheduling system, in order to ensure that the persistence information of each batch task in the task persistence information table is latest and complete and avoid repeated acquisition of the persistence information, the persistence information acquisition program in the embodiment of the invention can start the persistence information acquisition program to acquire persistence information of the corresponding batch task and update the acquired persistence information into the task persistence information table when detecting that the newly added batch task joins the batch scheduling system or when modifying the batch task which has joined the batch scheduling system based on the task persistence information table obtained by the persistence information of the original batch task.
Similarly, a dependency diagnosis program can be applied to realize automatic diagnosis of the dependency of the batch tasks, and when a newly added batch task is detected to be added into the batch scheduling system or when the batch tasks added into the batch scheduling system are modified, the dependency diagnosis program should start diagnosis of the dependency of the batch tasks, so that configuration rationality of the previous and subsequent dependency of each batch task in the batch scheduling system is ensured.
Based on any of the above embodiments, the second dependency relationship is determined based on a directed acyclic graph of the batch scheduling system.
Specifically, considering that the DAG (Direct Acyclic Graph, directed acyclic graph) technology can be used to express the driving dependency relationship between events and manage the scheduling relationship between tasks, the batch scheduling system in the embodiment of the invention can manage the driving dependency relationship between each batch task by utilizing the characteristics of the directed acyclic graph, and on the basis, the preset dependency relationship, namely the second dependency relationship, of the batch task to be diagnosed in the batch scheduling system can be determined according to the continuous edge of the corresponding node of the batch task to be diagnosed in the directed acyclic graph by reading the directed acyclic graph of the batch scheduling system. In the directed acyclic graph, each batch task is taken as a node, and a preset dependency relationship among each batch task is taken as an edge.
Further, fig. 3 is a second flow chart of the dependency diagnosis method provided by the present invention, as shown in fig. 3, the specific flow chart of the method is as follows: firstly, selecting a batch task to be diagnosed from all batch tasks, immediately reading a directed acyclic graph of a batch scheduling system to obtain a second dependency relationship of the batch task to be diagnosed, reading persistence information of all batch tasks recorded in a task persistence information table, and diagnosing the second dependency relationship of the batch task to be diagnosed by combining the second dependency relationship of the batch task to be diagnosed and persistence information of all batch tasks on the basis, thereby obtaining a diagnosis result.
If the second dependency relationship of the batch task to be diagnosed is consistent with the first dependency relationship judged according to the persistence information of each batch task, the diagnosis result is reasonable, and if the second dependency relationship is inconsistent with the first dependency relationship, the diagnosis result is unreasonable.
Based on any of the above embodiments, the present invention provides a dependency diagnosis system, comprising:
1) Task persistence information acquisition module: developing a persistence information acquisition program corresponding to each batch task according to the program language of each batch task, automatically acquiring persistence information of each batch task according to the persistence information corresponding to each batch task, recording the persistence information in a task persistence information table, and acquiring the persistence information of each batch task in a manual mode according to special batch tasks;
2) A dependency diagnosis module: the second dependency relationship which is preconfigured for a certain batch task in the batch scheduling system can be combined with the record in the task persistence information table to automatically diagnose whether the second dependency relationship of the batch task is reasonable or not, and reasonable configuration suggestions are given.
According to the system provided by the embodiment of the invention, the first dependency relationship of the batch task to be diagnosed is determined by acquiring the persistence information of each batch task, and the second dependency relationship of the batch task to be diagnosed is diagnosed based on the first dependency relationship of the batch task to be diagnosed, so that the dependency relationship of the batch task can be automatically diagnosed, the accuracy of diagnosis is ensured, the unreasonable follow-up dependency and non-optimal dependency of the batch task are reduced, the diagnosis efficiency of the batch task dependency relationship is improved, the manual tedious operation is saved, the daily development period is shortened, and the timeliness of the service after batch scheduling task production can be ensured.
The dependency diagnosis apparatus provided by the present invention will be described below, and the dependency diagnosis apparatus described below and the dependency diagnosis method described above may be referred to correspondingly to each other.
Based on any one of the above embodiments, the present invention provides a dependency diagnosis apparatus. Fig. 4 is a schematic structural diagram of a dependency diagnosis apparatus according to the present invention, as shown in fig. 4, the apparatus includes:
an obtaining unit 410, configured to obtain persistence information of each batch of tasks;
a determining unit 420, configured to determine, based on persistence information of each batch task, a first dependency relationship of the batch task to be diagnosed, where the batch task to be diagnosed belongs to each batch task;
the diagnosing unit 430 is configured to diagnose a second dependency relationship of the batch task to be diagnosed based on the first dependency relationship of the batch task to be diagnosed, so as to obtain a diagnosis result, where the second dependency relationship is preconfigured.
According to the device provided by the embodiment of the invention, the first dependency relationship of the batch task to be diagnosed is determined by acquiring the persistence information of each batch task, and the second dependency relationship of the batch task to be diagnosed is diagnosed based on the first dependency relationship of the batch task to be diagnosed, so that the dependency relationship of the batch task can be automatically diagnosed, the accuracy of diagnosis is ensured, the unreasonable follow-up dependency and non-optimal dependency of the batch task are reduced, the diagnosis efficiency of the batch task dependency relationship is improved, the manual tedious operation is saved, the daily development period is shortened, and the timeliness of the service after batch scheduling task production is ensured.
Based on any of the above embodiments, the determining unit 420 includes:
and determining a first dependency relationship of the batch task to be diagnosed based on the operation information for the data table and/or the file, which is included in the persistence information of each batch task.
Based on any of the above embodiments, determining a first dependency relationship of a batch task to be diagnosed based on operation information for a data table and/or a file included in persistence information of each batch task includes:
determining an associated batch task associated with the batch task to be diagnosed based on operation information for the data table and/or the file included in the persistence information of each batch task;
and determining a first dependency relationship of the batch task to be diagnosed based on the operation information for the same data table and/or the same file, which is included in the persistence information of the batch task to be diagnosed and the associated batch task.
Based on any of the above embodiments, the acquisition unit 410 includes:
determining a persistence information acquisition program corresponding to each batch task based on the program language of each batch task;
and acquiring the persistence information of each batch of tasks based on the persistence information acquisition program corresponding to each batch of tasks.
Based on any of the above embodiments, based on the persistence information acquiring program corresponding to each batch task, acquiring persistence information of each batch task includes:
based on a persistence information acquisition program corresponding to any batch of tasks, acquiring persistence information of the batch of tasks;
and if the acquisition of the persistence information of the batch of tasks is unsuccessful, acquiring the persistence information of the batch of tasks which is manually input.
Based on any of the above embodiments, the second dependency relationship is determined based on a directed acyclic graph of the batch scheduling system.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a method of diagnosing a dependency, the method comprising: obtaining persistence information of each batch of tasks; determining a first dependency relationship of batch tasks to be diagnosed based on the persistence information of each batch task, wherein the batch tasks to be diagnosed belong to each batch task; and diagnosing a second dependency relationship of the batch task to be diagnosed based on the first dependency relationship of the batch task to be diagnosed to obtain a diagnosis result, wherein the second dependency relationship is preconfigured.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing a method of diagnosing a dependency provided by the methods described above, the method comprising: obtaining persistence information of each batch of tasks; determining a first dependency relationship of batch tasks to be diagnosed based on the persistence information of each batch task, wherein the batch tasks to be diagnosed belong to each batch task; and diagnosing a second dependency relationship of the batch task to be diagnosed based on the first dependency relationship of the batch task to be diagnosed to obtain a diagnosis result, wherein the second dependency relationship is preconfigured.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method of diagnosing a dependency provided by the above methods, the method comprising: obtaining persistence information of each batch of tasks; determining a first dependency relationship of batch tasks to be diagnosed based on the persistence information of each batch task, wherein the batch tasks to be diagnosed belong to each batch task; and diagnosing a second dependency relationship of the batch task to be diagnosed based on the first dependency relationship of the batch task to be diagnosed to obtain a diagnosis result, wherein the second dependency relationship is preconfigured.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for diagnosing a dependency relationship, comprising:
obtaining persistence information of each batch of tasks;
determining a first dependency relationship of batch tasks to be diagnosed based on the persistence information of each batch task, wherein the batch tasks to be diagnosed belong to each batch task;
and diagnosing a second dependency relationship of the batch task to be diagnosed based on the first dependency relationship of the batch task to be diagnosed to obtain a diagnosis result, wherein the second dependency relationship is preconfigured.
2. The method for diagnosing a dependency relationship according to claim 1, wherein determining a first dependency relationship of the batch task to be diagnosed based on the persistence information of each batch task comprises:
and determining a first dependency relationship of the batch task to be diagnosed based on operation information for a data table and/or a file included in the persistence information of each batch task.
3. The method according to claim 2, wherein determining the first dependency of the batch task to be diagnosed based on the operation information for the data table and/or the file included in the persistence information of each batch task includes:
determining an associated batch task associated with the batch task to be diagnosed based on operation information for a data table and/or a file included in the persistence information of each batch task;
and determining a first dependency relationship of the batch task to be diagnosed based on operation information for the same data table and/or the same file, which is included in the persistence information of the batch task to be diagnosed and the associated batch task.
4. The method for diagnosing a dependency relationship according to claim 1, wherein the obtaining the persistence information of each batch of tasks comprises:
determining a persistence information acquisition program corresponding to each batch task based on the program language of each batch task;
and acquiring the persistence information of each batch of tasks based on the persistence information acquisition program corresponding to each batch of tasks.
5. The method according to claim 4, wherein the obtaining the persistence information of each batch of tasks based on the persistence information obtaining program corresponding to each batch of tasks includes:
based on a persistence information acquisition program corresponding to any batch of tasks, acquiring persistence information of any batch of tasks;
and if the persistent information of any batch of tasks is not successfully obtained, obtaining the persistent information of any batch of tasks which are manually input.
6. The method of diagnosing a dependency relationship according to any one of claims 1 to 5, wherein the second dependency relationship is determined based on a directed acyclic graph of a batch scheduling system.
7. A dependency diagnosis apparatus, comprising:
the acquisition unit is used for acquiring the persistence information of each batch of tasks;
the determining unit is used for determining a first dependency relationship of the batch tasks to be diagnosed based on the persistence information of each batch task, and the batch tasks to be diagnosed belong to each batch task;
the diagnosis unit is used for diagnosing a second dependency relationship of the batch task to be diagnosed based on the first dependency relationship of the batch task to be diagnosed to obtain a diagnosis result, wherein the second dependency relationship is preconfigured.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the dependency diagnosis method according to any one of claims 1 to 6 when the program is executed.
9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the dependency diagnosis method according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method for diagnosing a dependency according to any one of claims 1 to 6.
CN202210068326.2A 2022-01-20 2022-01-20 Dependency diagnosis method, dependency diagnosis device, electronic equipment and storage medium Pending CN116521404A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210068326.2A CN116521404A (en) 2022-01-20 2022-01-20 Dependency diagnosis method, dependency diagnosis device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210068326.2A CN116521404A (en) 2022-01-20 2022-01-20 Dependency diagnosis method, dependency diagnosis device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116521404A true CN116521404A (en) 2023-08-01

Family

ID=87394515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210068326.2A Pending CN116521404A (en) 2022-01-20 2022-01-20 Dependency diagnosis method, dependency diagnosis device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116521404A (en)

Similar Documents

Publication Publication Date Title
CN112394942B (en) Distributed software development compiling method and software development platform based on cloud computing
CN108280150B (en) Distributed asynchronous service distribution method and system
US20070250517A1 (en) Method and Apparatus for Autonomically Maintaining Latent Auxiliary Database Structures for Use in Executing Database Queries
CN106599167B (en) System and method for supporting increment updating of database
CN105719126A (en) System and method for internet big data task scheduling based on life cycle model
US20040205167A1 (en) Automatic configuration of performance management tools
CN111695877A (en) Computer-implemented project resource management method, system, device and readable medium
CN114139923A (en) Task relevance analysis method and device and computer readable storage medium
CN112445600A (en) Method and system for issuing offline data processing task
CN111679852A (en) Detection method and device for conflict dependency library
US7797334B2 (en) Automated downloading from mainframe to local area network
CN112905461A (en) Method and device for executing automatic interface test case
CN112507168A (en) Application workflow processing method and framework
CN116521404A (en) Dependency diagnosis method, dependency diagnosis device, electronic equipment and storage medium
CN113010276A (en) Task scheduling method and device, terminal equipment and storage medium
CN111259619A (en) Control method and device for configuration object, storage medium and verification platform
CN113792026B (en) Method and device for deploying database script and computer-readable storage medium
CN114691519A (en) Interface automation test method, device, equipment and storage medium
GB2507874A (en) Comparing man-hours for manual and automated testing
CN113434360A (en) Method and system for monitoring operation
CN113220592A (en) Processing method and device for automated testing resources, server and storage medium
CN113111072A (en) Business data file generation method and device
CN112764989A (en) Method for monitoring start-stop time of application service
CN112882910A (en) Interface pressure testing method and device of workflow engine
CN115941834B (en) Automatic operation method, device, equipment and storage medium of smart phone

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination