CN113986596A

CN113986596A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN113986596A
Application number: CN202111271183.7A
Authority: CN
Inventors: 钱佳; 张蕤
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-28

Abstract

The disclosure relates to a data processing method, a data processing device, an electronic device and a storage medium, and relates to the technical field of computers. The method comprises the following steps: acquiring recovery information and a fixed task instance set of an initial complement event; determining at least two target task instances from a plurality of task instances based on the recovery information and the set of fixed task instances; determining target restoration information based on the restoration information and at least one task instance to be pruned; and executing the target complement event to enable each target task instance of the at least two target task instances to generate data according to the target recovery information. According to the method and the device, the electronic equipment can indicate the task instances with smaller quantity to generate data so as to obtain data required by a user, and timely output of downstream data can be guaranteed. Meanwhile, since data generation operation does not need to be executed on the at least one task instance to be pruned, resource waste can be reduced, and the production period of data required by a user is shortened.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

At present, when data generated by a certain task instance has quality problems, a complement device can execute blocking operation on all downstream task instances of the task instance; and when the generated data is recovered normally, a recovery operation can be performed on all the downstream task instances.

However, in one case, the data required by the user may be generated based on only a portion of the total number of downstream task instances. Thus, performing the blocking operation and/or the recovery operation on each of the all downstream task instances may waste a large amount of resources, and prolong the production period of the user demand data.

Disclosure of Invention

The present disclosure provides a data processing method, an apparatus, an electronic device and a storage medium, which solve the technical problems that a number complementing device performs blocking operation and/or recovery operation on all downstream task instances of a certain task instance, wastes a large amount of resources, and prolongs the production cycle of user demand data.

The technical scheme of the embodiment of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, a data processing method is provided. The method can comprise the following steps: acquiring recovery information and a fixed task instance set of an initial complement event, wherein the recovery information comprises identifications of a plurality of task instances and a dependency relationship among the plurality of task instances, and the fixed task instance set comprises at least two fixed task instances; determining at least two target task instances from the plurality of task instances based on the recovery information and the set of fixed task instances, the at least two target task instances comprising the at least two fixed task instances; determining target recovery information based on the recovery information and at least one to-be-pruned task instance, wherein the target recovery information comprises identifications of the at least two target task instances and a dependency relationship between the at least two target task instances, the target recovery information is recovery information of a target complement event, and the at least one to-be-pruned task instance is a task instance except the at least two target task instances in the plurality of task instances; and executing the target complement event to enable each target task instance of the at least two target task instances to generate data according to the target recovery information.

Optionally, the determining at least two target task instances from the plurality of task instances specifically includes: when the first task instance belongs to the fixed task instance set, determining that the first task instance is a target task instance, and the first task instance is one of the plurality of task instances.

Optionally, the determining at least two target task instances from the plurality of task instances specifically includes: when the first task instance does not belong to the fixed task instance set, determining that the first task instance exists in a downstream task instance, wherein the first task instance is one of the plurality of task instances; and when the first task instance is not a root task instance in the plurality of task instances and a fixed task instance exists in a downstream task instance corresponding to the first task instance, determining the first task instance as a target task instance.

Optionally, the determining the target restoration information based on the restoration information and the at least one to-be-pruned task instance specifically includes: generating an initial directed acyclic graph based on recovery information of the initial complement event, wherein the initial directed acyclic graph is a directed acyclic graph corresponding to the initial complement event and comprises task instance nodes and edges, the task instance nodes are used for representing task instances, and the edges are used for connecting the task instance nodes with dependency relationships; and executing pruning operation on at least one to-be-pruned task instance node included in the initial directed acyclic graph to obtain a target directed acyclic graph, wherein the at least one to-be-pruned task instance node is a node represented by the at least one to-be-pruned task instance.

Optionally, the current task instance node is a root node in the target directed acyclic graph or a non-root node in the target directed acyclic graph, and the generating data of each of the at least two target task instances according to the target recovery information specifically includes: when the current task instance node is the non-root node, determining the number of direct upstream successful nodes of the current task instance node, wherein the direct upstream successful nodes are the nodes with successful operation states in the direct upstream nodes of the current task instance node, and the direct upstream nodes are the upstream task instance nodes with edges between the direct upstream nodes and the current task instance node; when the number of direct upstream successful nodes equals the number of direct upstream nodes; and determining that a current task instance starts to generate data, wherein the current task instance is the task instance represented by the current task instance node.

Optionally, the target recovery information further includes priorities of the at least two target task instances, and the data processing method further includes: when the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes and the direct upstream nodes are the same as the direct upstream nodes of the task instance node to be identified, determining whether the priority of the task instance to be identified is higher than the priority of the current task instance, wherein the task instance node to be identified is the task instance node which is characterized by the task instance node to be identified and is at least two task instance nodes included by the target directed acyclic graph and is other than the current task instance node; and when the priority of the task instance to be identified is higher than that of the current task instance, determining that the task instance to be identified preferentially starts to generate data.

Optionally, the data processing method further includes: when the first task instance does not belong to the fixed task instance set and no fixed task instance exists in the downstream task instance corresponding to the first task instance, adding a preset identifier to the first task instance and the downstream task instance corresponding to the first task instance, wherein the preset identifier is used for representing that the first task instance and the downstream task instance corresponding to the first task instance are not target task instances, and the first task instance is one of the plurality of task instances.

According to a second aspect of the embodiments of the present disclosure, there is provided a data processing apparatus. The apparatus may include: the device comprises an acquisition module, a determination module and a processing module; the obtaining module is configured to obtain recovery information of an initial complement event and a fixed task instance set, wherein the recovery information includes identifications of a plurality of task instances and a dependency relationship among the plurality of task instances, and the fixed task instance set includes at least two fixed task instances; the determination module is configured to determine at least two target task instances from the plurality of task instances based on the recovery information and the set of fixed task instances, wherein the at least two target task instances comprise the at least two fixed task instances; the determining module is further configured to determine target recovery information based on the recovery information and at least one to-be-pruned task instance, where the target recovery information includes identifications of the at least two target task instances and a dependency relationship between the at least two target task instances, the target recovery information is recovery information of a target complement event, and the at least one to-be-pruned task instance is a task instance other than the at least two target task instances in the plurality of task instances; the processing module is configured to execute the target complement event to enable each of the at least two target task instances to generate data according to the target recovery information.

Optionally, the determining module is specifically configured to determine that the first task instance is a target task instance and the first task instance is one of the plurality of task instances when the first task instance belongs to the fixed task instance set.

Optionally, the determining module is specifically configured to determine that the first task instance exists in a downstream task instance when the first task instance does not belong to the fixed task instance set, where the first task instance is one of the plurality of task instances; the determining module is specifically configured to determine the first task instance as a target task instance when the first task instance is not a root task instance in the plurality of task instances and a fixed task instance exists in a downstream task instance corresponding to the first task instance.

Optionally, the processing module is specifically configured to generate an initial directed acyclic graph based on the recovery information of the initial complement event, where the initial directed acyclic graph is a directed acyclic graph corresponding to the initial complement event, the initial directed acyclic graph includes task instance nodes and edges, the task instance nodes are used to represent task instances, and the edges are used to connect task instance nodes having a dependency relationship; the processing module is specifically configured to perform pruning operation on at least one to-be-pruned task instance node included in the initial directed acyclic graph to obtain a target directed acyclic graph, where the at least one to-be-pruned task instance node is a node represented by the at least one to-be-pruned task instance; the determining module is specifically configured to determine the target recovery information based on the target directed acyclic graph.

Optionally, the current task instance node is a root node in the target directed acyclic graph or a non-root node in the target directed acyclic graph; the determining module is further configured to determine, when the current task instance node is the non-root node, a number of direct upstream successful nodes of the current task instance node, where the direct upstream successful nodes are nodes whose operation states are successful in the direct upstream nodes of the current task instance node, and the direct upstream nodes are upstream task instance nodes having edges with the current task instance node; the determining module is further specifically configured to determine that a current task instance starts generating data when the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes, and the current task instance is a task instance characterized by the current task instance node.

Optionally, the target recovery information further includes priorities of the at least two target task instances; the determining module is further configured to determine whether the priority of the task instance to be identified is higher than the priority of the current task instance when the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes and the direct upstream nodes are the same as the direct upstream nodes of the task instance node to be identified, wherein the task instance node to be identified is a task instance node, except the current task instance node, of at least two task instance nodes included in the target directed acyclic graph, and the task instance to be identified is a task instance characterized by the task instance node to be identified; the determining module is further configured to determine that the task instance to be identified starts generating data preferentially when the priority of the task instance to be identified is higher than the priority of the current task instance.

Optionally, the processing module is further configured to, when the first task instance does not belong to the fixed task instance set and no fixed task instance exists in the downstream task instance corresponding to the first task instance, add a preset identifier to the first task instance and the downstream task instance corresponding to the first task instance, where the preset identifier is used to characterize that the first task instance and the downstream task instance corresponding to the first task instance are not target task instances, and the first task instance is one of the plurality of task instances.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, which may include: a processor and a memory configured to store processor-executable instructions; wherein the processor is configured to execute the instructions to implement any of the above-described optional data processing methods of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having instructions stored thereon, which, when executed by an electronic device, enable the electronic device to perform any one of the above-mentioned optional data processing methods of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the optional data processing method of any one of the first aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

based on any one of the above aspects, in the present disclosure, the electronic device may obtain recovery information of an initial complement event and a fixed task instance set, and determine, based on the recovery information and the fixed task instance set, at least two target task instances from a plurality of task instances corresponding to the initial complement event; since the at least two target task instances belong to the plurality of task instances and the at least two target task instances include at least two fixed task instances included in the fixed task instance set, the electronic device may determine a part of the plurality of task instances (i.e., at least two target task instances) whose number is smaller than the number of the plurality of task instances. The electronic device may then determine target recovery information based on the recovery information and at least one to-be-pruned task instance (i.e., a task instance of the plurality of task instances other than the at least two target task instances). Furthermore, the electronic device executes the target complement event to enable each target task instance of the at least two target task instances to generate data according to the target recovery information, namely the electronic device indicates the task instances with smaller number to generate data so as to obtain data required by a user, and timely output of downstream data can be guaranteed. Meanwhile, the electronic equipment does not need to perform data generation operation on the at least one task instance to be pruned, so that resource waste can be reduced, and the production period of data required by a user can be shortened.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a schematic flow chart illustrating a data processing method provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating a further data processing method provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating a further data processing method provided by an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating a further data processing method provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a pruning operation performed according to an embodiment of the present disclosure;

FIG. 6 is a flow chart illustrating a further data processing method provided by an embodiment of the present disclosure;

FIG. 7 is a flow chart illustrating a further data processing method provided by an embodiment of the present disclosure;

FIG. 8 is a flow chart illustrating a further data processing method provided by an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a data processing apparatus provided in an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of another data processing apparatus provided in the embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components.

The data to which the present disclosure relates may be data that is authorized by a user or sufficiently authorized by parties.

As described in the background, since the padding device performs the blocking operation and/or the recovery operation on all the downstream task instances of a certain task instance, a large amount of resources may be wasted, and the production cycle of the user demand data may be prolonged. Based on this, the embodiments of the present disclosure provide a data processing method, where the electronic device indicates a smaller number of task instances to generate data, so as to obtain data required by a user, and may ensure timely output of downstream data. Meanwhile, as data generation operation does not need to be carried out on at least one task instance to be pruned, the resource waste can be reduced, and the production period of data required by a user is shortened.

The data processing method, the data processing device, the electronic equipment and the storage medium provided by the embodiment of the disclosure are applied to a scene of generating data (or recovering data). When the electronic device acquires the recovery information of the initial complement event and the fixed task instance set, each of the at least two target task instances can generate data according to the method provided by the embodiment of the disclosure.

The data processing method provided by the embodiment of the disclosure is exemplarily described below with reference to the accompanying drawings:

it is understood that the electronic device executing the data processing method provided by the embodiment of the present disclosure may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) \ Virtual Reality (VR) device, and other devices that can install and use a content community application, and the present disclosure does not impose any particular limitation on the specific form of the electronic device. The system can be used for man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment and the like.

As shown in fig. 1, a data processing method provided by the embodiment of the present disclosure may include S101-S104.

S101, the electronic equipment obtains recovery information of the initial complement event and a fixed task instance set.

The recovery information comprises identifications of a plurality of task instances and dependency relations among the plurality of task instances, and the fixed task instance set comprises at least two fixed task instances.

It should be understood that the plurality of task instances are task instances corresponding to the initial complement event. This initial complement event may be applied in scenarios where historical data is generated (or historical data is restored). Specifically, the electronic device executes the initial complement event, and may perform a data generation operation on the plurality of task instances, that is, the plurality of task instances may generate corresponding data.

In the embodiment of the present disclosure, the above scenario of generating the historical data may be divided into the following cases:

in one case, the current task instance may be a task instance generated in a current year (e.g., 2021 years), based on which data for the current year may be generated, but the electronic device may need to acquire data generated in a historical year (e.g., 2019 years). Therefore, the electronic device needs to create and trigger a complement event, the task instance corresponding to the complement event can correspond to the task instance generated in the historical year, and the electronic device can obtain the data generated in the historical year based on the task instance generated in the historical year, that is, generate historical data.

In one case, the relevant data has been originally generated in the past years, but may have been deleted during subsequent data processing. In this way, the electronic device may process (or operate) the task instance corresponding to the complement event (i.e., the task instance corresponding to the historical year) based on the complement event to generate the related data, i.e., generate the historical data.

In another case, the electronic device may determine that data generated by the current task instance has a quality problem, but the current task instance is not the source task instance that generated the data having the quality problem (hereinafter referred to as problem data). Therefore, the electronic equipment can create a complement event so as to generate historical data based on the task instance corresponding to the complement event, and can trace back to the source task instance generating the problem data.

It will be appreciated that the relationship between tasks and task instances is similar to the relationship between programs and processes, with a task instance being generated each time the task is executed. Different task instances may produce different data partitions, which may be understood as partitions that store data generated (or resulting from operations) by the task instance.

In the embodiment of the present disclosure, the dependency relationship among the plurality of task instances is used to characterize an upstream-downstream relationship among the plurality of task instances. For example, assuming that the plurality of task instances includes a first task instance and a second task instance, if there is a dependency relationship between the first task instance and the second task instance, it is described that the first task instance is a downstream task instance of the second task instance, or the first task instance is an upstream task instance of the second task instance.

It should be noted that the task instance (i.e., the plurality of task instances) corresponding to the initial complement event may be understood as a complement instance. That is, if the electronic device executes the initial complement event, the task instance (i.e., the complement instance) corresponding to the initial complement event is complemented. Further, executing the task instance corresponding to the initial complement event, specifically, generating data for the task instance corresponding to the initial complement event, that is, supplementing the data for the task instance corresponding to the initial complement event.

Optionally, the identifier of one task instance may be a primary key (key) of the task instance, and the primary key may be a combination of a name of a task corresponding to the task instance and a generation time corresponding to the task instance.

It should be appreciated that for at least two fixed task instances included in the set of fixed task instances described above, the at least two fixed task instances are task instances entered by a user. The at least two fixed task instances may include a source task instance for generating the problem data, that is, the problem data is generated from the source task instance, where the source task instance may be a root task instance (i.e., an uppermost stream task instance) of the plurality of task instances, or may be a non-root task instance of the plurality of task instances.

In an implementation manner of the embodiment of the present disclosure, the at least two fixed task instances may also include an end point task instance, and the end point task instance may be understood as a task instance concerned by the user. Specifically, the data generated by the endpoint task instance is data that the user cares about, and the data that the user cares about is data that the user needs.

In another implementation manner of the embodiment of the present disclosure, the at least two fixed task instances may further include at least one intermediate task instance, and one intermediate task instance may be understood as one task instance existing in a path from the source task instance to the destination task instance. The data generated by the source task instance can reach the end point task instance after the processing operation of the intermediate task instance, and the end point task instance further generates the data required by the user.

It is to be understood that the at least two fixed task instances are task instances included in the plurality of task instances.

S102, the electronic equipment determines at least two target task instances from the plurality of task instances based on the recovery information of the initial complement event and the fixed task instance set.

Wherein, the at least two target task instances comprise the at least two fixed task instances.

It should be understood that the at least two target task instances are all task instances required by the user, that is, the electronic device may obtain the data required by the user based on all task instances required by the user. Specifically, the electronic device may generate data required by the user based on the source task instance, all intermediate task instances between the source task instance and the destination task instance, and the destination task instance.

It is understood that all of the task instances required by the user may be some of the plurality of task instances described above. That is, the electronic device can obtain the data required by the user based on part of the task instances in the plurality of task instances.

S103, the electronic equipment determines target recovery information based on the recovery information of the initial complement event and at least one to-be-pruned task instance.

The target recovery information includes identifiers of the at least two target task instances and a dependency relationship between the at least two target task instances, the target recovery information is recovery information of a target complement event, and the at least one task instance to be pruned is a task instance other than the at least two target task instances in the plurality of task instances.

In connection with the above description of the embodiments, it should be understood that the dependency relationship between the at least two target task instances is used to characterize the upstream and downstream relationship between the at least two target task instances.

It is to be understood that, after the electronic device determines at least two target task instances, the at least one task instance to be pruned may be determined from the plurality of task instances. And the electronic device may further obtain the target restoration information by combining the restoration information of the initial complement event, specifically, the identifier of the at least one to-be-pruned task instance included in the restoration information, and the dependency relationship between the at least one to-be-pruned task instance and other task instances.

S104, the electronic equipment executes the target complement event so as to enable each target task instance of the at least two target task instances to generate data according to the target recovery information.

It should be understood that the electronic device executes the target complement event, that is, each task instance (i.e., each target task instance) corresponding to the target complement event is indicated to perform a data generation operation, so that each target task instance can generate data.

It can be understood that the case where the electronic device instructs a target task instance to generate data can be divided into the following two cases:

in one case, the target task instance does not generate data before, and the electronic device indicates the target task instance to generate data, that is, indicates that the target task instance generates data for the first time.

In another case, the target task instance has previously generated data (but the generated data may have been deleted), so that the electronic device instructs the target task instance to generate data, i.e., to regenerate data for instructing the target task instance, and may also be understood as instructing the target task instance to perform a rerun recovery.

Optionally, the electronic device may further perform a blocking operation on the target complement event, so that each of the at least two target task instances suspends generating data.

Optionally, the electronic device may terminate the initial complement event.

It should be understood that the initial complement event may be being executed, that is, each of the plurality of task instances is generating data, and the electronic device terminates the initial complement event, that is, generating data for terminating each of the plurality of task instances, so that the electronic device may instruct each of the at least two target task instances to generate data.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: from S101 to S104, the electronic device may obtain recovery information of an initial complement event and a fixed task instance set, and determine, based on the recovery information and the fixed task instance set, at least two target task instances from a plurality of task instances corresponding to the initial complement event; since the at least two target task instances belong to the plurality of task instances and the at least two target task instances include at least two fixed task instances included in the fixed task instance set, the electronic device may determine a part of the plurality of task instances (i.e., at least two target task instances) whose number is smaller than the number of the plurality of task instances. The electronic device may then determine target recovery information based on the recovery information and at least one to-be-pruned task instance (i.e., a task instance of the plurality of task instances other than the at least two target task instances). Furthermore, the electronic device executes the target complement event to enable each target task instance of the at least two target task instances to generate data according to the target recovery information, namely the electronic device indicates the task instances with smaller number to generate data so as to obtain data required by a user, and timely output of downstream data can be guaranteed. Meanwhile, the electronic equipment does not need to perform data generation operation on the at least one task instance to be pruned, so that resource waste can be reduced, and the production period of data required by a user can be shortened.

With reference to fig. 1, as shown in fig. 2, in an implementation manner of the embodiment of the present disclosure, the determining at least two target task instances from among the plurality of task instances may specifically include S1021.

And S1021, when the first task instance belongs to the fixed task instance set, the electronic equipment determines that the first task instance is the target task instance.

Wherein the first task instance is one of the plurality of task instances

It should be understood that when a certain task instance (e.g., a first task instance) in the plurality of task instances belongs to the fixed task instance set, the first task instance is illustrated as one of the at least two fixed task instances. In conjunction with the description of the above embodiments, it should be understood that the at least two fixed task instances are user-input task instances, and the electronic device may determine the user-input task instance as a target task instance, that is, determine the first task instance as a task instance required by the user.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: as known from S1021, when a certain task instance (e.g., a first task instance) in the plurality of task instances belongs to the fixed task instance set, it is described that the first task instance is one of at least two fixed task instances included in the fixed task instance set. And because the at least two fixed task instances are task instances input by the user, the electronic device can determine the task instances input by the user as target task instances, specifically, can determine the first task instance as the target task instance, can conveniently and quickly determine the target task instances, and improves the determination efficiency of the target task instances.

With reference to fig. 1, as shown in fig. 3, in an implementation manner of the embodiment of the present disclosure, the determining at least two target task instances from among the plurality of task instances may specifically further include S1022 to S1023.

And S1022, when the first task instance does not belong to the fixed task instance set, the electronic equipment determines that the first task instance exists in a downstream task instance.

Wherein the first task instance is one of the plurality of task instances.

It should be understood that when a certain task instance (e.g., a first task instance) in the plurality of task instances does not belong to the fixed task instance set, it is stated that the first task instance is not a user-input task instance, and the first task instance is a task instance other than the at least two fixed task instances in the plurality of task instances. When the first task instance exists in the downstream task instances, it is stated that the first task instance is not the most downstream task instance in the plurality of task instances, the electronic device determines that each task instance exists in the downstream task instance, that is, determines that each task instance is not the most downstream task instance in the plurality of task instances.

Optionally, when the first task instance does not have a downstream task instance, the electronic device may further determine that the first task instance is a most downstream task instance in the plurality of task instances.

And S1023, when the first task instance is not a root task instance in the plurality of task instances and a fixed task instance exists in a downstream task instance corresponding to the first task instance, the electronic equipment determines the first task instance as a target task instance.

It should be understood that when the first task instance is not a root task instance of the plurality of task instances (i.e., an upstream-most task instance of the plurality of task instances), indicating that the first task instance exists both an upstream task instance and a downstream task instance, the first task instance may correspond to at least one downstream task instance. When a fixed task instance exists in a downstream task instance corresponding to the first task instance, the first task instance is described as the intermediate task instance. That is, the data generated by the fixed task instance (e.g., the most downstream task instance) existing in the downstream task instance corresponding to the first task instance is the data to be generated based on (or via) the intermediate task instance. The electronic device can determine the first task instance as the target task instance, namely, the first task instance as the task instance required by the user.

In an implementation manner of the embodiment of the present disclosure, when a certain task instance (e.g., a first task instance) in the plurality of task instances does not belong to the fixed task instance set, and a fixed task instance does not exist in a downstream task instance corresponding to the first task instance, the electronic device may determine that the first task instance and the downstream task instance corresponding to the first task instance are not target task instances.

In another implementation of the embodiments of the present disclosure, when a task instance (e.g., a first task instance) in the plurality of task instances does not have a downstream task instance and the first task instance does not belong to the set of fixed task instances, the electronic device may determine that the first task instance is not a target task instance.

In another implementation of the embodiments of the present disclosure, when each of the plurality of task instances is a root task instance of the plurality of task instances and the each task instance does not belong to the set of fixed task instances, the electronic device determines that the each task instance is not a target task instance.

In the embodiments of the present disclosure, the electronic device may determine whether a certain task instance (e.g., the first task instance) of the plurality of task instances is a target task instance based on a depth-first search (or a depth traversal algorithm). Specifically, for the plurality of task instances, the electronic device may determine whether a root task instance (i.e., an uppermost task instance) in the plurality of task instances is a target task instance, then determine whether a directly downstream task instance of the root task instance is the target task instance, and then determine whether a downstream task instance of the directly downstream task instance is the target task instance. After the electronic device determines a list (or a path) of task instances, it determines a next list (e.g., a list where another task instance immediately downstream from the root task instance is located).

It should be noted that, when each of the plurality of task instances is recursively traversed, the electronic device is in a top-down order (i.e., from the most upstream task instance to the most downstream task instance), and the process of determining the result (i.e., determining whether each task instance is the target task instance) is bottom-up, i.e., when the most downstream task instance is determined to be the target task instance (or not the target task instance), the determination result may be returned to the immediately upstream task instance of the most downstream task instance until the most upstream task instance is returned, so that the electronic device can determine whether each of the plurality of task instances is the target task instance.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: as can be seen from S1022 to S1023, when a task instance (e.g., a first task instance) in the plurality of task instances does not belong to the fixed task instance set, it is described that the first task instance is a task instance other than at least two fixed task instances included in the fixed task instance set, and the electronic device may determine that the first task instance exists in a downstream task instance, that is, determine that the first task instance is not a downstream-most task instance; when the first task instance is not a root task instance in the plurality of task instances, indicating that the first task instance exists in both an upstream task instance and a downstream task instance; and when a fixed task instance exists in the downstream task instance corresponding to the first task instance, it is described that the electronic device needs to obtain data corresponding to the existing fixed task instance based on the data generated by the first task instance. In this way, the electronic device may target the first task instance as a target task instance, i.e., determine that the first task instance is a task instance required by the user. The method and the device can accurately and effectively determine whether each task instance in the plurality of task instances is the target task instance, and the determination efficiency of the target task instance is improved.

With reference to fig. 1, as shown in fig. 4, in an implementation manner of the embodiment of the present disclosure, the determining target restoration information based on the restoration information of the initial complement event and the at least one to-be-pruned task instance specifically includes S1031 to S1033.

And S1031, the electronic equipment generates an initial directed acyclic graph based on the recovery information of the initial complement event.

The initial directed acyclic graph is a directed acyclic graph corresponding to the initial complement event, the initial directed acyclic graph comprises task instance nodes and edges, the task instance nodes are used for representing task instances, and the edges are used for connecting the task instance nodes with dependency relationships.

In conjunction with the above description of the embodiments, it should be understood that the dependency relationships between the task instances described above are used to characterize the upstream and downstream relationships between the task instances. For example, assuming that the initial directed acyclic graph includes a first task instance node and a second task instance node, when the first task instance node (the first task instance node is an end point of an edge or an end point in an arrow direction on the edge) depends on the second task instance node (the second task instance node is a start point of the edge or a start point in an arrow direction on the edge), it is indicated that the first task instance node is a (directly) downstream task instance node of the second task instance node.

S1032, the electronic equipment executes pruning operation on at least one to-be-pruned task instance node included in the initial directed acyclic graph to obtain the target directed acyclic graph.

And the at least one node of the task instance to be pruned is the node represented by the at least one task instance to be pruned.

In conjunction with the description of the above embodiment, it should be understood that the at least one task instance node to be pruned is a task instance node other than at least two target task instance nodes (i.e., task instance nodes characterized by the at least two target task instances) in the plurality of task instance nodes included in the initial directed acyclic graph.

It can be understood that the target directed acyclic graph includes target task instance nodes and target edges, the target task instance nodes are nodes represented by target task instances, and the target edges are used for connecting the target task instance nodes with dependency relationships.

In the embodiment of the present disclosure, the electronic device performs a pruning operation on a certain task instance node to be pruned, that is, deletes the task instance node to be pruned and an edge corresponding to the task instance node to be pruned in the initial directed acyclic graph (where the corresponding edge is specifically an edge between the task instance node to be pruned and each target task instance node). In this way, the electronic device may obtain the target directed acyclic graph, and further only need to perform data generation operation on the task instance (i.e., the target task instance) represented by the target task instance node included in the target directed acyclic graph.

Exemplarily, fig. 5 is an example of obtaining a target directed acyclic graph (i.e., directed acyclic graph 102) by an electronic device performing a pruning operation on a to-be-pruned task instance node included in an initial directed acyclic graph (i.e., directed acyclic graph 101) according to an embodiment of the present disclosure. It should be understood that the nodes represented by a letter in FIG. 5 represent one task instance node included in the directed acyclic graph, e.g., the nodes represented by letter A represent task instance node A.

As shown in fig. 5, the directed acyclic graph 101 includes 11 task instance nodes and 13 edges. Specifically, the 11 task instance nodes include 3 fixed task instance nodes, namely a task instance node B, a task instance node C, and a task instance node K; in the data processing method provided by the embodiment of the present disclosure, the electronic device determines that the 11 task instance nodes also include 3 task instance nodes to be pruned, that is, a task instance node a, a task instance node F, and a task instance node J; the 11 task instance nodes also include 5 intermediate task instance nodes (i.e., nodes other than the fixed task instance node and the task instance node to be pruned in the plurality of task instance nodes), that is, a task instance node D, a task instance node E, a task instance node G, a task instance node H, and a task instance node I.

Continuing with fig. 5, the electronic device may determine that task instance node B, task instance node C, task instance node D, task instance node E, task instance node G, task instance node H, task instance node I, and task instance node K are target task instance nodes, and determine the 3 task instance nodes to be pruned. The electronic device performs pruning operation on the 3 to-be-pruned task instance nodes (specifically, deletes the 3 to-be-pruned task instance nodes and the corresponding 4 edges) to obtain the directed acyclic graph 102, where 8 task instance nodes included in the directed acyclic graph 102 are all target task instance nodes.

It should be noted that the 8 target task instance nodes shown in fig. 5 include 3 fixed task instance nodes and 5 intermediate task instance nodes.

S1033, the electronic equipment determines target recovery information based on the target directed acyclic graph.

In connection with the above description of the embodiments, it should be understood that the target recovery information includes the identification of at least two target task instances, and the dependency relationship between the at least two target task instances. The target directed acyclic graph comprises target task instance nodes and target edges, the target task instance nodes are nodes represented by target task instances, and the target edges are used for connecting the target task instance nodes with dependency relationships.

To this end, the electronic device may determine, based on each target task instance node included in the target directed acyclic graph, an identifier of each target task instance of the at least two target task instances; and determining the dependency relationship between each target task instance and other target task instances based on each target edge included in the target directed acyclic graph, namely determining the target recovery information.

In an implementation manner of the embodiment of the present disclosure, the electronic device may further store the target recovery information in a database.

It should be understood that the electronic device stores the target recovery information in the database, that is, the identifier of each of the at least two target task instances and the dependency relationship between each target task instance and other target task instances are stored in the database. In this way, when the data required by the user is obtained next time, the electronic device can directly obtain the target recovery information from the database.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: as known from S1031 to S1033, the electronic device may generate an initial directed acyclic graph based on the recovery information of the initial complement event; and executing pruning operation on at least one to-be-pruned task instance node included in the initial directed acyclic graph to obtain a target directed acyclic graph. Because the at least one to-be-pruned task instance represented by the at least one to-be-pruned task instance node is a task instance other than the at least two target task instances in the plurality of task instances, the electronic device can obtain a target directed acyclic graph with fewer nodes and fewer edges, and can perform data generation operation on each target task instance node included in the target directed acyclic graph, namely each target task instance node, so that resource waste is reduced, and the resource utilization rate is improved. The electronic device can determine the target recovery information based on the target directed acyclic graph, so that the determination efficiency of the target recovery information can be improved, and the generation efficiency of data is further improved.

With reference to fig. 4, as shown in fig. 6, in an implementation manner of the embodiment of the present disclosure, a current task instance node is a root node in the target directed acyclic graph or a non-root node in the target directed acyclic graph, and the generating data of each target task instance in at least two target task instances according to the target recovery information specifically includes S1041 to S1042.

S1041, when the current task instance node is a non-root node, the electronic device determines the number of direct upstream successful nodes of the current task instance node.

The direct upstream successful node is a node with a successful operation state in the direct upstream node of the current task instance node, and the direct upstream node is an upstream task instance node with an edge between the direct upstream node and the current task instance node.

It should be understood that when the current task instance node is a non-root node, it is indicated that there are other task instance nodes upstream of the current task instance node, and that there are dependencies directly (or indirectly) with the current task instance node. The upstream node having a direct dependency relationship with the current task instance node is a direct upstream node having an edge with the current task instance node.

It can be understood that the electronic device may obtain an operating state of each of the at least two target task instances, where the operating state of each target task instance is an operating state of a node (i.e., each target task instance node) that is characterized by each target task instance. And when the running state of one target task instance is running successfully, the target task instance is described to have successfully generated data. Namely, the direct upstream successful node is a target task instance node which has successfully generated data in the direct upstream node of the current task instance node.

In the embodiment of the present disclosure, the running state of one task instance (including the target task instance) may include waiting to run, running successfully, running failure, and the like. Specifically, the to-be-run indicates that the task instance temporarily does not start to generate data; the running indicates that the task instance is generating data; the successful operation indicates that the task instance has successfully generated the data; the failure to run identifies that the task instance failed to successfully generate data.

S1042, when the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes, the electronic device determines that the current task instance starts to generate data.

And the current task instance is the task instance represented by the current task instance node.

It should be understood that when the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes, it indicates that all the nodes in the direct upstream nodes of the current task instance node have successfully generated data, i.e. the running states of all the nodes are running successfully. In this manner, the electronic device can determine that the current task instance begins generating data.

Otherwise, that is, when the number of the direct upstream successful nodes is not equal to the number of the direct upstream nodes, it indicates that the operation status of some nodes in the direct upstream nodes may be pending, running or failed. At this time, the electronic device may instruct the partial node to continue to generate (or regenerate) data until the operation status of the partial node is updated to operation success.

Optionally, when the current task instance starts to generate data, the electronic device may update the running state of the current task instance to be running.

It will be appreciated that the running state of the current task instance before starting to generate data may be pending, such that when the current task instance starts to generate data, the electronic device updates the running state to running, i.e., indicating that the current task instance is generating data.

Optionally, when the current task instance has successfully generated the data, the electronic device may update the running state of the current task instance to be running successfully.

Illustratively, in conjunction with the example in fig. 5 above, assume that the current task instance node is task instance node K in directed acyclic graph 102. When the number of successful nodes immediately upstream of the task instance node K is equal to 3 (the immediately upstream nodes of the task instance node K include a task instance node G, a task instance node H, and a task instance node I), the electronic device determines that the task instance node K starts generating data.

In an implementation manner of the embodiment of the present disclosure, when the current task instance node is the root node, the electronic device may determine an operation state of the current task instance node. When the operation state of the current task instance node is operation success and other direct upstream nodes do not exist in the direct downstream node of the current task instance node, the electronic equipment determines that the direct downstream node starts to generate data.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: from S1041 to S1042, when the current task instance node is a non-root node, it indicates that there is an upstream task instance node in the current task instance node, and the electronic device determines the number of direct upstream successful nodes of the current task instance node; when the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes of the current task instance node, it indicates that all of the direct upstream nodes of the current task instance node have successfully generated data, and thus, the electronic device may determine that the current task instance starts to generate data. In the embodiment of the disclosure, the electronic device may determine the operating states of all nodes in the direct upstream node of the current task instance, and further determine whether the current task instance node starts to generate data, and may conveniently and effectively determine whether each target task instance of at least two target task instances meets the precondition of data generation, thereby improving the accuracy of data generation.

With reference to fig. 6, as shown in fig. 7, in an implementation manner, the target recovery information further includes priorities of the at least two target task instances, and the data processing method provided in the embodiment of the present disclosure further includes S105 to S106.

And S105, when the number of the direct upstream successful nodes is equal to that of the direct upstream nodes and the direct upstream nodes are the same as those of the task instance nodes to be identified, the electronic equipment determines whether the priority of the task instance to be identified is higher than that of the current task instance.

The task instance node to be identified is a task instance node out of the at least two task instance nodes included in the target directed acyclic graph and outside the current task instance node, and the task instance to be identified is a task instance represented by the task instance node to be identified.

In the above description, it should be understood that the direct upstream successful node is a node whose operation status is successful in the direct upstream node of the current task instance node, and the direct upstream node is an upstream task instance node having an edge with the current task instance node. When the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes, it is described that all the nodes in the direct upstream nodes of the current task instance node have successfully generated data, that is, the running states of all the nodes are running successfully.

It is to be understood that when the immediate upstream node of the current task instance node is the same as the immediate upstream node of the task instance node to be identified, it is illustrated that the current task instance node and the task instance node to be identified have the same upstream task instance node. Specifically, the upstream task instance node having the target edge (or the dependency relationship) with the current task instance node is the same as the upstream task instance node having the target edge with the task instance node to be identified.

S106, when the priority of the task instance to be identified is higher than that of the current task instance, the electronic equipment determines that the task instance to be identified preferentially starts to generate data.

It should be understood that different tasks may correspond to different priorities, the priority of one task may be understood as attribute information of the task, the priorities of different task instances generated by the same task are the same, and the different priorities of different tasks may be configured in advance.

It is to be appreciated that, in the process of generating data for each of the at least two target task instances, the resources that the electronic device can allocate for the at least two target task instances may be limited. At this time, the electronic device may determine a task instance with a higher priority and preferentially process the task instance with the higher priority, that is, determine that the task instance with the higher priority preferentially starts to generate data.

Optionally, when the priority of the task instance to be identified is lower than the priority of the current task instance, the electronic device may determine that the current task instance preferentially starts to generate data, that is, determine that the current task instance is preferentially processed.

Illustratively, the example in fig. 5 described above is incorporated. Assume that the current task instance is a task instance represented by a task instance node H included in the directed acyclic graph 102, and the task instance to be identified is a task instance represented by a task instance node I included in the directed acyclic graph 102. And assuming that the priority of the task instance represented by the task instance node I is higher than that of the task instance represented by the task instance node H, the electronic equipment determines that the task instance represented by the task instance node I is prior to start generating data.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: from S105 to S106, when the number of direct upstream successful nodes of the current task instance is equal to the number of direct upstream nodes of the current task instance node, it is described that all nodes in the direct upstream nodes of the current task instance node have successfully generated data, that is, the running states of all nodes are running successfully; when the direct upstream node of the current task instance node is the same as the direct upstream node of the task instance node to be identified, the current task instance node and the task instance node to be identified are provided with the same upstream task instance node; at this time, the electronic device may determine whether the priority of the task instance to be identified is higher than that of the current task instance, and when the priority of the task instance to be identified is higher than that of the current task instance, determine that the task instance to be identified preferentially starts to generate data. In the embodiment of the disclosure, when two target task instances have the same upstream task instance and all the upstream task instances of the two target task instances have successfully generated data, the electronic device may determine that the priority levels of the two target task instances are respectively low and preferentially process the target task instance with a higher priority, specifically, determine that the target task instance with a higher priority preferentially starts to generate data, and can reasonably determine the processing order of each target task instance, thereby improving the effectiveness of data processing.

With reference to fig. 1, as shown in fig. 8, the data processing method provided by the embodiment of the present disclosure further includes S107.

S107, when the first task instance does not belong to the fixed task instance set and no fixed task instance exists in the downstream task instance corresponding to the first task instance, adding preset identifications to the first task instance and the downstream task instance corresponding to the first task instance by the electronic equipment.

The preset identifier is used for representing that the first task instance and the downstream task instance corresponding to the first task instance are not target task instances, and the first task instance is one of the plurality of task instances.

In conjunction with the above description of the embodiments, it should be understood that when a downstream task instance exists in a certain task instance (e.g., a first task instance) of the plurality of task instances, it is stated that the first task instance is not the most downstream task instance. And when the first task instance does not belong to the fixed task instance set and no fixed task instance exists in the downstream task instance corresponding to the first task instance, the first task instance is explained as the intermediate task instance. In this way, the electronic device may determine that neither the first task instance nor the downstream task instance corresponding to the first task instance is the target task instance, that is, determine that both the first task instance and the downstream task instance corresponding to the first task instance are the to-be-pruned task instances. Such as the task instance represented by task instance node F and the task instance represented by task instance node J shown in fig. 5 above.

It will be appreciated that the first task instance and/or each downstream task instance to which the first task instance corresponds may exist in more than one column (or more than one path). After determining that the first task instance and the downstream task instance corresponding to the first task instance are not target task instances, the electronic device may add a preset identifier to the first task instance and the downstream task instance corresponding to the first task instance, so that when the electronic device traverses to a next column (or a next path) (where the next column includes the first task instance and/or the downstream task instance corresponding to the first task instance), the electronic device may directly determine that the first task instance and the downstream task instance corresponding to the first task instance are not target task instances based on the preset identifier without repeating the traversal.

In an implementation manner of the embodiment of the present disclosure, when a certain task instance (for example, a first task instance) in the plurality of task instances does not belong to the fixed task instance set, and the first task instance does not have a downstream task instance, the electronic device may also add the preset identifier to the first task instance.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: s109 indicates that when a certain task instance (e.g., a first task instance) in the plurality of task instances does not belong to the fixed task instance set, the first task instance is an intermediate task instance or a task instance to be pruned; and because the downstream task instance corresponding to the first task instance does not have a fixed task instance, it is described that the first task instance and the downstream task instance corresponding to the first task instance are not intermediate task instances through which the fixed task instance needs to pass. In this way, the electronic device may add a preset identifier to the first task instance and the downstream task instance corresponding to the first task instance, so that when the electronic device traverses the first task instance and/or the downstream task instance corresponding to the first task instance next time, repeated traversal is not required, and the first task instance and the downstream task instance corresponding to the first task instance may be directly determined not to be the target task instance based on the preset identifier. The determining efficiency of the target task instance can be improved, and the generation period of the data is further shortened.

It is understood that, in practical implementation, the electronic device according to the embodiments of the present disclosure may include one or more hardware structures and/or software modules for implementing the corresponding data processing methods, and these hardware structures and/or software modules may constitute an electronic device. Those of skill in the art will readily appreciate that the present disclosure can be implemented in hardware or a combination of hardware and computer software for implementing the exemplary algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

Based on such understanding, the embodiment of the present disclosure also provides a data processing apparatus, and fig. 9 shows a schematic structural diagram of the data processing apparatus provided by the embodiment of the present disclosure. As shown in fig. 9, the data processing apparatus 20 may include: an acquisition module 201, a determination module 202 and a processing module 203.

An obtaining module 201 configured to obtain recovery information of an initial complement event and a fixed task instance set, where the recovery information includes identifications of a plurality of task instances and a dependency relationship between the plurality of task instances, and the fixed task instance set includes at least two fixed task instances.

A determining module 202 configured to determine at least two target task instances from the plurality of task instances based on the recovery information and the set of fixed task instances, the at least two target task instances including the at least two fixed task instances.

The determining module 202 is further configured to determine target recovery information based on the recovery information and at least one to-be-pruned task instance, where the target recovery information includes identifications of the at least two target task instances and a dependency relationship between the at least two target task instances, the target recovery information is recovery information of a target complement event, and the at least one to-be-pruned task instance is a task instance other than the at least two target task instances in the plurality of task instances.

The processing module 203 is configured to execute the target complement event to enable each of the at least two target task instances to generate data according to the target recovery information.

Optionally, the determining module 202 is specifically configured to determine that the first task instance is a target task instance and the first task instance is one of the plurality of task instances when the first task instance belongs to the fixed task instance set.

Optionally, the determining module 202 is specifically configured to determine that the first task instance exists in a downstream task instance when the first task instance does not belong to the fixed task instance set, where the first task instance is one of the plurality of task instances.

The determining module 202 is further specifically configured to determine the first task instance as a target task instance when the first task instance is not a root task instance in the plurality of task instances and a fixed task instance exists in a downstream task instance corresponding to the first task instance.

Optionally, the processing module 203 is specifically configured to generate an initial directed acyclic graph based on the recovery information of the initial complement event, where the initial directed acyclic graph is a directed acyclic graph corresponding to the initial complement event, the initial directed acyclic graph includes task instance nodes and edges, the task instance nodes are used to represent task instances, and the edges are used to connect task instance nodes with a dependency relationship.

The processing module 203 is further specifically configured to perform pruning on at least one to-be-pruned task instance node included in the initial directed acyclic graph to obtain a target directed acyclic graph, where the at least one to-be-pruned task instance node is a node represented by at least one to-be-pruned task instance.

The determining module 202 is specifically configured to determine the target recovery information based on the target directed acyclic graph.

Optionally, the current task instance node is a root node in the target directed acyclic graph or a non-root node in the target directed acyclic graph.

The determining module 202 is further specifically configured to, when the current task instance node is the non-root node, determine the number of direct upstream successful nodes of the current task instance node, where the direct upstream successful nodes are nodes whose operation states are successful in the direct upstream nodes of the current task instance node, and the direct upstream nodes are upstream task instance nodes having edges with the current task instance node.

The determining module 202 is further specifically configured to determine that a current task instance starts generating data when the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes, where the current task instance is a task instance characterized by the current task instance node.

Optionally, the target recovery information further includes priorities of the at least two target task instances.

The determining module 202 is further configured to determine whether the priority of the task instance to be identified is higher than the priority of the current task instance when the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes and the direct upstream nodes are the same as the priority of the task instance node to be identified, where the task instance node to be identified is a task instance node, except the current task instance node, of at least two task instance nodes included in the target directed acyclic graph, and the task instance to be identified is a task instance characterized by the task instance node to be identified.

The determining module 202 is further configured to determine that the task instance to be identified preferentially starts to generate data when the priority of the task instance to be identified is higher than the priority of the current task instance.

Optionally, the processing module 203 is further configured to, when the first task instance does not belong to the fixed task instance set and no fixed task instance exists in the downstream task instance corresponding to the first task instance, add a preset identifier to the first task instance and the downstream task instance corresponding to the first task instance, where the preset identifier is used to characterize that the first task instance and the downstream task instance corresponding to the first task instance are not target task instances, and the first task instance is one of the plurality of task instances.

As described above, the embodiments of the present disclosure may perform division of functional modules on a data processing apparatus according to the above method example. The integrated module can be realized in a hardware form, and can also be realized in a software functional module form. In addition, it should be further noted that the division of the modules in the embodiments of the present disclosure is schematic, and is only a logic function division, and there may be another division manner in actual implementation. For example, the functional blocks may be divided for the respective functions, or two or more functions may be integrated into one processing block.

With regard to the data processing apparatus in the foregoing embodiments, the specific manner in which each module performs operations and the beneficial effects thereof have been described in detail in the foregoing method embodiments, and are not described herein again.

Fig. 10 is a schematic structural diagram of another data processing apparatus provided by the present disclosure. As shown in fig. 10, the data processing apparatus 30 may include at least one processor 301 and a memory 303 for storing processor-executable instructions. Wherein the processor 301 is configured to execute instructions in the memory 303 to implement the data processing method in the above-described embodiments.

In addition, the data processing device 30 may also include a communication bus 302 and at least one communication interface 304.

The processor 301 may be a Central Processing Unit (CPU), a micro-processing unit, an ASIC, or one or more integrated circuits for controlling the execution of programs according to the present disclosure.

The communication bus 302 may include a path that conveys information between the aforementioned components.

The communication interface 304 may be any device, such as a transceiver, for communicating with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.

The memory 303 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and connected to the processing unit by a bus. The memory may also be integrated with the processing unit.

The memory 303 is used for storing instructions for executing the disclosed solution, and is controlled to be executed by the processor 301. The processor 301 is configured to execute instructions stored in the memory 303 to implement the functions of the disclosed method.

In particular implementations, processor 301 may include one or more CPUs such as CPU0 and CPU1 in fig. 10 for one embodiment.

In particular implementations, data processing apparatus 30 may include multiple processors, such as processor 301 and processor 307 in fig. 10, for example, as an example. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In one implementation, the data processing apparatus 30 may further include an output device 305 and an input device 306. The output device 305 is in communication with the processor 301 and may display information in a variety of ways. For example, the output device 305 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 306 is in communication with the processor 301 and can accept user input in a variety of ways. For example, the input device 306 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.

Those skilled in the art will appreciate that the configuration shown in fig. 10 does not constitute a limitation of the data processing apparatus 30, and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.

In addition, the present disclosure also provides a computer-readable storage medium including instructions, which when executed by a processor, cause the processor to perform the data processing method provided as the above embodiment.

In addition, the present disclosure also provides a computer program product comprising instructions which, when executed by a processor, cause the processor to perform the data processing method as provided in the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A data processing method, comprising:

acquiring recovery information and a fixed task instance set of an initial complement event, wherein the recovery information comprises identifications of a plurality of task instances and a dependency relationship among the plurality of task instances, and the fixed task instance set comprises at least two fixed task instances;

determining at least two target task instances from the plurality of task instances based on the recovery information and the set of fixed task instances, wherein the at least two target task instances comprise the at least two fixed task instances;

determining target recovery information based on the recovery information and at least one to-be-pruned task instance, wherein the target recovery information includes identifiers of the at least two target task instances and a dependency relationship between the at least two target task instances, the target recovery information is recovery information of a target complement event, and the at least one to-be-pruned task instance is a task instance other than the at least two target task instances in the plurality of task instances;

and executing the target complement event to enable each target task instance of the at least two target task instances to generate data according to the target recovery information.

2. The data processing method of claim 1, wherein the determining at least two target task instances from the plurality of task instances comprises:

when a first task instance belongs to the fixed task instance set, determining that the first task instance is a target task instance, and the first task instance is one of the plurality of task instances.

3. The data processing method of claim 1, wherein the determining at least two target task instances from the plurality of task instances comprises:

when a first task instance does not belong to the fixed task instance set, determining that the first task instance exists in a downstream task instance, wherein the first task instance is one of the plurality of task instances;

and when the first task instance is not a root task instance in the plurality of task instances and a fixed task instance exists in a downstream task instance corresponding to the first task instance, determining the first task instance as a target task instance.

4. The data processing method according to claim 1, wherein said determining target recovery information based on the recovery information and at least one to-be-pruned task instance comprises:

generating an initial directed acyclic graph based on recovery information of the initial complement event, wherein the initial directed acyclic graph is a directed acyclic graph corresponding to the initial complement event, the initial directed acyclic graph comprises task instance nodes and edges, the task instance nodes are used for representing task instances, and the edges are used for connecting the task instance nodes with dependency relationship;

executing pruning operation on at least one to-be-pruned task instance node included in the initial directed acyclic graph to obtain a target directed acyclic graph, wherein the at least one to-be-pruned task instance node is a node represented by the at least one to-be-pruned task instance;

and determining the target recovery information based on the target directed acyclic graph.

5. The data processing method according to claim 4, wherein the current task instance node is a root node in the target directed acyclic graph or a non-root node in the target directed acyclic graph, and the causing each of the at least two target task instances to generate data according to the target recovery information comprises:

when the current task instance node is the non-root node, determining the number of direct upstream successful nodes of the current task instance node, wherein the direct upstream successful nodes are the nodes of which the running states are successful in the direct upstream nodes of the current task instance node, and the direct upstream nodes are the upstream task instance nodes with edges between the direct upstream nodes and the current task instance node;

and when the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes, determining that a current task instance starts to generate data, wherein the current task instance is a task instance represented by the current task instance node.

6. The data processing method of claim 5, wherein the target recovery information further includes priorities of the at least two target task instances, the method further comprising:

when the number of the direct upstream successful nodes is equal to the number of the direct upstream nodes and the direct upstream nodes are the same as those of task instance nodes to be identified, determining whether the priority of a task instance to be identified is higher than that of the current task instance, wherein the task instance nodes to be identified are task instance nodes except the current task instance node in at least two task instance nodes included in the target directed acyclic graph, and the task instance to be identified is a task instance represented by the task instance nodes to be identified;

and when the priority of the task instance to be identified is higher than that of the current task instance, determining that the task instance to be identified preferentially starts to generate data.

7. A data processing apparatus, comprising: the device comprises an acquisition module, a determination module and a processing module;

the obtaining module is configured to obtain recovery information of an initial complement event and a fixed task instance set, where the recovery information includes identifiers of a plurality of task instances and a dependency relationship between the plurality of task instances, and the fixed task instance set includes at least two fixed task instances;

the determining module is configured to determine at least two target task instances from the plurality of task instances based on the recovery information and the set of fixed task instances, where the at least two target task instances include the at least two fixed task instances;

the determining module is further configured to determine target restoration information based on the restoration information and at least one to-be-pruned task instance, where the target restoration information includes identifiers of the at least two target task instances and a dependency relationship between the at least two target task instances, the target restoration information is restoration information of a target complement event, and the at least one to-be-pruned task instance is a task instance other than the at least two target task instances in the plurality of task instances;

the processing module is configured to execute the target complement event, so that each of the at least two target task instances generates data according to the target recovery information.

8. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory configured to store the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the data processing method of any one of claims 1-6.

9. A computer-readable storage medium having instructions stored thereon, wherein the instructions in the computer-readable storage medium, when executed by an electronic device, enable the electronic device to perform the data processing method of any one of claims 1-6.

10. A computer program product, characterized in that it comprises computer instructions which, when run on an electronic device, cause the electronic device to carry out the data processing method according to any one of claims 1 to 6.