Detailed Description
Referring to fig. 1, fig. 1 is a diagram illustrating an architecture of a Jstorm-based real-time computing system in the related art.
In the architecture of the real-time computing system shown in fig. 1, a real-time data transmission platform, a computing system for streaming computing (also referred to as a streaming processing application) based on the jstorm architecture, and an hbase database for storing real-time computing results are included.
The real-time data transmission platform can collect massive data to be calculated related to the service (such as log data related to the service) in real time. The computing system can subscribe the data in the real-time data computing platform, extract the data in the real-time data transmission platform through the connection with the real-time data transmission platform, perform real-time computation, and store the result of the real-time computation to the hbase database.
However, the architecture of the real-time computing system shown in fig. 1 has the following disadvantages because it is a single-point model on the whole link, i.e. only one node of the computing system participates in the real-time computation of data:
on one hand, once the computing node where the computing system is located fails, data computation is interrupted, and the whole service is unavailable.
On the other hand, since only one node of the computing system participates in the real-time computation of data, when the amount of data to be computed in the real-time data transmission platform is large, the computing system may not compute in real time, which may result in unnecessary service delay.
In order to solve the above problems, the architecture of the conventional Jstorm-based real-time computing system shown in fig. 1 is improved in the present application, and a new architecture of the Jstorm-based real-time computing system is proposed, in which the new architecture includes an arbitration subsystem and several computing subsystems based on the Jstorm architecture; the plurality of computing subsystems are deployed in a distributed mode; the arbitration subsystem can detect a computing subsystem with a normal state in the plurality of computing subsystems based on heartbeat messages sent by the computing subsystems at regular time, and dynamically allocate computing tasks to the detected computing subsystem with the normal state, and the computing subsystems with the normal state can extract a data subset corresponding to the computing tasks from a data set to be computed based on the computing tasks allocated by the arbitration subsystem to perform data computation;
on one hand, each computing subsystem is distributed, and each computing subsystem with a normal state is only responsible for computing partial data in a data set to be computed, so that a computing node where any computing subsystem with a normal state is located fails, when an arbitration subsystem redistributes computing data carried by the computing subsystem with the failure to other computing subsystems with normal states, only the computing of the partial data is affected, and therefore the whole unavailability in service is not caused, and the stability of the real-time computing system is improved.
On the other hand, each computing subsystem deployed in a distributed manner participates in real-time computing in parallel, so that the computing performance of the real-time computing system can be remarkably improved, and when the data volume to be computed is too large, the service delay can be effectively reduced.
The present application is described below with reference to specific embodiments and specific application scenarios.
Referring to fig. 2, fig. 2 is a diagram illustrating a Jstorm-based real-time computing method applied to a real-time computing system according to an embodiment of the present application, where the real-time computing system includes an arbitration subsystem and a plurality of computing subsystems based on a Jstorm architecture; the plurality of computing subsystems are deployed in a distributed mode; wherein the arbitration subsystem interacts with the plurality of compute subsystems to perform the steps of:
step 201, each computing subsystem sends heartbeat messages to an arbitration subsystem at regular time;
202, the arbitration subsystem detects a computing subsystem with a normal state in the plurality of computing subsystems based on heartbeat messages sent by the computing subsystems at regular time;
step 203, the arbitration subsystem respectively allocates calculation tasks for the detected calculation subsystems in normal states based on a preset task allocation strategy; when the abnormal state of any normal computing subsystem is detected, computing tasks are redistributed to the normal computing subsystem in the current state based on the preset task distribution strategy;
step 204, each computing subsystem acquires the computing tasks distributed by the arbitration subsystem, and extracts data subsets corresponding to the computing tasks from a data set to be computed based on a preset data extraction strategy to perform data computation;
step 205, each computing subsystem determines whether the computing task allocated by the arbitration subsystem is updated, and when the computing task allocated by the arbitration subsystem is updated, a data subset corresponding to the updated computing task is re-extracted from a preset data set to be computed based on the preset data extraction policy to perform data computation.
The Jstorm is a streaming data processing framework and is widely applied to real-time computing systems.
The real-time computing system can be generally referred to as a computing platform which is based on Jstorm architecture and has the real-time computing processing and computing capability of mass data.
The plurality of computing subsystems based on the Jstorm framework are computing subsystems adopting the Jstorm framework as an underlying engine.
The data to be calculated may be data related to a service, which needs to be processed and calculated by the real-time computing system in real time. In practical application, the data to be calculated may be mass log data related to the service collected by a real-time data transmission platform deployed in the real-time computing system, and the real-time computing system may calculate the mass log data related to the service to implement a specific service function.
For example, in an illustrated "friend browsing article recommendation" service scenario, after a user a checks an article through client software, behavior data may flow back to a real-time computing system in the form of log data, the real-time computing system may perform real-time computing on the reflowed log data, query friend information of the user a, generate a corresponding article recommendation policy, then reflow the generated article recommendation policy to the recommendation system, and the recommendation system may push information similar to that "your friend a has watched xx articles" to a friend of the user a based on the article recommendation policy.
In this example, the architecture of the conventional real-time computing system shown in fig. 1 may be improved, and by introducing an arbitration subsystem based on the architecture of the conventional real-time computing system and performing distributed deployment on the original computing system to divide the original computing system into a plurality of computing subunits, the problems of insufficient system stability and service delay in the conventional real-time computing system shown in fig. 1 may be solved.
Referring to fig. 3, fig. 3 is a diagram illustrating an architecture of an improved Jstorm-based real-time computing system according to the present embodiment.
In the architecture of the real-time computing system shown in fig. 3, including a real-time data transmission platform, an arbitration subsystem, and several computing subsystems for streaming computing based on the jstorm architecture; the plurality of computing subsystems are deployed in a distributed manner.
For example, in one illustrated embodiment, the number of computing subsystems may be distributed to be deployed in physically distinct data centers; for example, the data can be distributed and deployed in data centers located in different cities, and in this way, disaster recovery backup of data at a cross-city level can be realized.
Wherein:
the real-time transmission platform is used for collecting massive data to be calculated related to services in real time, generating fixed-size data entries from the collected data, and storing the data in a local data set (such as a database) to be calculated.
For example, the real-Time transmission platform may be a Time Tunnel platform developed by the ariziban group. The Time Tunnel platform is a real-Time data transmission platform built based on a thrift communication framework. In the Time Tunnel platform, a queue (queue) is usually used as a minimum unit of data to be processed, and the Time Tunnel platform can generate a fixed-size queue from collected mass data and store the fixed-size queue in a local database of the Time Tunnel platform. Each queue may be referred to as a data entry to be computed at this time.
The arbitration subsystem is used for detecting the computing subsystems in normal state in the plurality of computing subsystems based on the heartbeat messages sent by the computing subsystems at fixed time; and dynamically and respectively allocating the computing tasks to the detected computing subsystems in normal states based on a preset task allocation strategy.
The plurality of computing subsystems can be functionally identical computing subsystems (for example, the computing subsystems can process data in the same format together) and are used for sending heartbeat messages to the arbitration subsystem at regular time; and acquiring the calculation tasks distributed by the arbitration subsystem, and extracting a data subset corresponding to the calculation tasks dynamically distributed by the arbitration subsystem from a database of the real-time data transmission platform based on a preset data extraction strategy to perform data calculation.
In this embodiment, the hardware structure for carrying the real-time transmission platform, the arbitration subsystem, and the computation subsystem is not particularly limited, and in practical applications, the hardware structure may be a server, a server cluster, or a cloud platform constructed based on the server cluster,
the technical solution of the present application is described in detail below with reference to the architecture of the real-time computing system shown in fig. 3.
Referring to fig. 3, in an embodiment shown, each computing subsystem shown in fig. 3 may be a distributed system including a plurality of computing nodes, and each computing subsystem may periodically send a heartbeat message to the arbitration subsystem through each computing node when sending the heartbeat message to the arbitration subsystem.
The computing nodes refer to processing resources which can be used for independent data computation in each computing subsystem; for example, the compute node may be one of the compute subsystems that may be used for independent data computation. In this case, the computing subsystem may be understood as a multi-process distributed system.
The distributed system refers to processing resources which can be used for performing independent data calculation in the calculation subsystem, and can be distributed on different physical devices; for example, when the hardware architecture of the computing subsystem is a server cluster, processes that can be used for data computation in the computing subsystem may run on different physical servers and participate in data computation in parallel.
In addition, it should be noted that the timing sending period of the computing subsystem sending the heartbeat message to the arbitration subsystem at regular time is not particularly limited in this example, and may be set by user in a self-defined manner based on actual user requirements;
for example, in order to ensure that the arbitration subsystem can timely detect the computing subsystem with abnormal state, the timing transmission cycle for periodically transmitting the heartbeat message may be set to be a smaller cycle; for example, the heartbeat message is sent every 1 minute.
In this example, after the arbitration subsystem receives the heartbeat messages sent out by the computing nodes in the computing subsystems at regular time, the computing subsystems in a normal state in the computing subsystems can be detected based on the received heartbeat messages.
In an embodiment shown, after receiving the heartbeat message sent by each computing node in each computing subsystem at regular time, the arbitration subsystem may count the number of computing nodes that have successfully received the heartbeat message in each computing subsystem, and then detect the state of each computing subsystem based on the counted number.
On one hand, for any computing subsystem, if the heartbeat messages sent by all the computing nodes in the computing subsystem are successfully received within the above-mentioned timed sending period (for example, within 1 minute), it may be determined that the computing subsystem is in a normal state.
On the other hand, for any computing subsystem, if the heartbeat messages sent by all the computing nodes in the computing subsystem are not successfully received in the timing sending period of the heartbeat messages, the number of the computing nodes which do not successfully receive the heartbeat messages can be further counted, and the ratio of the number to the total number of the computing nodes of the computing subsystem is calculated. If the ratio reaches a preset threshold, the computing subsystem can be determined to have abnormal state.
The preset threshold may be set by a user in a self-defined manner based on actual user requirements, and is not particularly limited in this example.
Similarly, if the heartbeat information sent by all the computing nodes in any abnormal state of the computing subsystem is successfully received in the timing sending period (for example, within 2 minutes) of the next heartbeat message, the abnormal state recovery of the computing subsystem can be determined.
For example, assuming that each computing subsystem is respectively distributed and deployed in data centers of different cities, the computing node is a process that can be used for performing data computation in each computing subsystem. In this case, the data centers in different cities may be subjected to the timing anomaly detection in the above manner. For any data center, if the heartbeat messages sent by all processes in the data center are successfully received within the timing sending period of the heartbeat messages, it can be determined that the data center is currently in a normal state.
On the contrary, if the percentage of the total number of processes reaches a preset threshold (for example, 50%) after the heartbeat messages sent by all the processes in the data center are not successfully received in the timed heartbeat message sending period, the data center may have a machine room level fault at this time because most of the processes in the data center have an abnormality, and thus, it may be determined that the data center is currently in an abnormal state. In addition, if the heartbeat information sent by all processes in the computing subsystem is successfully received for any computing subsystem with abnormal state in the timing sending period of the next heartbeat message, the computing subsystem can be determined to be recovered from the abnormal state.
In this example, after the arbitration subsystem detects the normal-state computing subsystems in the computing subsystems, the number of the current-state computing subsystems can be counted, and computing tasks are allocated to the normal-state computing subsystems based on a preset task allocation strategy.
In an illustrated embodiment, the preset allocation policy may specifically include acquiring identification information of each computing subsystem, sorting the identification information of each computing subsystem, allocating corresponding computing tasks to each computing subsystem based on the sorted order of the identification information of each computing subsystem, and generating corresponding task numbers.
The identification information of each computing subsystem may refer to information such as a number of a data center in which each computing subsystem is located.
For example, referring to fig. 3, it is assumed that each computing subsystem is respectively distributed and deployed in data centers in different cities, and the identification information of each computing subsystem at this time may be a machine room number of the data center where the computing subsystem is located. Assuming that the real-time computing system comprises 4 computing subsystems in total, and the machine room numbers of the data center where the computing subsystems are located are zue, ztg, gtj and su18, when the arbitration subsystem allocates computing tasks to the computing subsystems, the arbitration subsystem can firstly sort the machine room numbers corresponding to the computing subsystems; for example, the characters can be sorted according to the first letter, and when the first letter is the same, the characters are sorted according to the next letter; in this case, the ordered sequence is gtj-su18-ztg-zue, and in this case, the arbitration subsystem may allocate the computation tasks to the computation subsystems in order according to the ordered sequence; for example, at this time, compute task 0 may be allocated to machine room gtj, compute task 1 may be allocated to machine room su18, compute task 2 may be allocated to machine room ztg, and compute task 3 may be allocated to machine room zue.
Of course, in practical applications, besides the above-described allocation policy of the computation tasks, the allocation of the computation tasks may also be implemented by other policies, which are not listed in this example.
In this example, after the arbitration subsystem allocates the corresponding computation tasks to each computation subsystem, each computation subsystem may also periodically query the arbitration subsystem for the task number of the computation task allocated to itself based on the timing transmission cycle of the heartbeat message.
Of course, in practical applications, the task numbers allocated by the arbitration subsystem to the computing subsystems, and the services may be periodically pushed to the computing subsystems by the arbitration subsystem based on the timing transmission cycle of the heartbeat message, which is not particularly limited in this example.
In this example, after querying the task number allocated to each computing subsystem by the arbitration system, each computing subsystem may extract, based on a preset data extraction policy, a data subset corresponding to the task number from a to-be-computed data set of the real-time data transmission platform.
The preset data extraction strategy may be a strategy for performing average distribution on the data in the data set to be calculated based on the actual number of the calculation subsystems in the normal current state.
For example, in one embodiment shown, the predetermined data extraction policy may include creating a data subset corresponding to the task number of the computing task (initializing an empty data subset); performing a remainder logic operation (% operation) on the data number of the to-be-calculated data entry in the to-be-calculated data set and the number of the calculation subsystems detected by the arbitration subsystem and having a normal state; then matching the result of the remainder logic operation with the task number corresponding to the calculation task; when the result of the logical operation of the remainder of any to-be-calculated data entry in the to-be-calculated data set matches the task number corresponding to the calculation task, the to-be-calculated data entry may be extracted from the to-be-calculated task set based on the data connection with the real-time data transmission platform, and stored in the data subset corresponding to the calculation task.
For example, assume that the real-time data transmission platform is a TT platform, the to-be-computed data set includes 128 queues numbered 0 to 127, assume that 4 currently normal computing subsystems participate in the real-time computation on the to-be-computed data set, and the computing tasks allocated by the arbitration system to the 4 computing subsystems are 0, 1, 2, and 3, respectively. Based on the preset data extraction strategy, the queue number contained in the data subset corresponding to each calculation task meets the following formula:
queue_no%M=task_no
the queue _ no is a queue number, the M is the number of the computing subsystems in the normal state, and the task _ no is a corresponding task number. After the calculation by the above formula, a corresponding task number is calculated for each queue number, and at this time, the corresponding queue may be extracted from the data set to be calculated based on the queue number, and then the extracted queue is stored in the data subset corresponding to the calculated task number. For example, the finally calculated data subsets corresponding to the task numbers of the respective calculation subunits are [0,4, …,124], [1,5, …,125], [2,6, …,126], [3,7, …,127 ]. Where each number in the above-shown data subset corresponds to a queue number.
In this way, the data items in the data set to be calculated can be evenly distributed to the data subsystems for real-time calculation.
Of course, in practical applications, in addition to the data extraction strategy described above, the data items may be extracted by other strategies to generate the data subsets corresponding to the respective computation tasks, which are not listed in this example.
In this example, after each computing subsystem extracts a data subset corresponding to the task number allocated to itself by the arbitration subsystem from the data set to be computed of the real-time data transmission platform based on the data extraction policy shown above, real-time data computation may be performed on data in the data subset, and after the computation is completed, the computation result is stored, so that the relevant service system can be called conveniently.
For example, referring to fig. 3, for a Jstorm-based computing subsystem, a distributed database, such as hbase, may be deployed in its system for storing results of real-time computations.
The arbitration subsystem is described in detail above, and after detecting the computing subsystems in normal current states, the arbitration subsystem allocates the processing procedure of the computing task to the computing subsystems in normal states.
Since the arbitration subsystem detects the abnormality of each computing subsystem as timing detection, the state of the computing subsystem, which is already determined to be in a normal state at present, may still change during the timing transmission period of the next heartbeat message.
Therefore, in order to cope with such changes, when the arbitration subsystem allocates the computation tasks to the computation subsystems in normal states, a dynamic allocation mode may be adopted, that is, the allocation result of the computation tasks may be dynamically adjusted based on the actual number of the computation subsystems in normal states detected in each cycle.
In this example, for any computing subsystem that has been determined to be in a normal state, if it is determined that the computing subsystem has an abnormal state based on the same abnormality detection policy shown above in the timed transmission period of the next heartbeat message, since the computing subsystem has already been assigned a computing task at that time, in this case, the arbitration subsystem may re-assign the computing task based on the preset task assignment policy shown above based on the actual number of computing subsystems in a normal state.
Similarly, for any computing subsystem that has been determined to be in an abnormal state, if it is determined that the computing subsystem recovers from the abnormal state based on the same abnormality detection policy shown above in the timed transmission cycle of the next heartbeat message, the arbitration subsystem may also re-allocate computing tasks based on the preset task allocation policy shown above based on the actual number of computing subsystems in which the current state is normal.
For a real-time process of re-performing the distribution of the calculation task based on the preset task distribution policy, please refer to the previous description of this embodiment, which is not described again.
Therefore, in this way, when the arbitration subsystem detects that the computing subsystem with the normal state has the abnormal state in the timing sending period of any heartbeat message, the computing subsystem with the abnormal state can switch the computing data carried by the computing subsystem with the abnormal state to the computing subsystems with the normal state by reallocating the computing tasks, and when the timing sending period of the heartbeat message is set to be small enough, the stability of the real-time computing system can be obviously enhanced; for example, when the timing transmission period is 1 minute, the real-time computing system may switch the computing data carried by the computing subsystem with the abnormal state to the computing subsystems with the normal state at the minute level.
Of course, in practical applications, the preset data extraction policy may be the above-described policy of equal distribution, or may not be the same.
For example, when any computing subsystem determined to be in a normal state has a state anomaly, the arbitration subsystem may not perform redistribution of computing tasks, but only switch the computing data carried by the computing subsystem in which the state anomaly has occurred to a specified computing subsystem (for example, the computing subsystem with the lowest current computing pressure) among all computing subsystems in which the state anomaly has occurred, and switch back the computing data switched to the specified computing subsystem after the state anomaly of the computing subsystem is recovered, which is not described in detail in this example.
Correspondingly, for each computing subsystem, after acquiring the task number allocated by the arbitration subsystem to itself and extracting the data subset corresponding to the data number from the to-be-computed data set based on the preset data extraction policy, since the arbitration subsystem detects the abnormality of each computing subsystem as a timing detection, and if the number of computing subsystems in a normal state changes, the arbitration subsystem is triggered to reallocate the computing tasks, in this case, each computing subsystem may also periodically determine whether the computing tasks allocated by the arbitration subsystem to itself are updated in order to cope with the possible changes of the computing tasks.
When judging whether the computing task distributed by the arbitrator to the computing subsystems is updated or not, each computing subsystem can be realized by judging whether the number of the computing subsystems in a normal current state is changed or not according to the task number corresponding to the computing task distributed to the arbitrator.
In an embodiment shown, when each computing subsystem judges whether the computing task allocated by the arbitrator for itself is updated, the computing subsystem may perform exclusive or logical operation (or budget) on the task number corresponding to the computing task and the number of computing subsystems in normal state detected by the arbitrator at regular time; then judging whether the result of the XOR logical operation changes or not; and if the result of the exclusive-or logic operation changes, determining that the calculation task distributed by the arbitration subsystem changes.
In this example, if the arbitration subsystem determines, through the determination logic shown above, that the computation task allocated by the arbitration subsystem is updated, the data subset corresponding to the updated computation task may be re-extracted from the data set to be computed based on the preset data extraction policy, so as to perform real-time data computation.
For an implementation process of re-extracting the data subset corresponding to the updated computation task based on the preset data extraction policy, please refer to the previous description of this embodiment, and will not be described again.
According to the embodiments, the architecture of the conventional Jstorm-based real-time computing system is improved, and a new architecture of the Jstorm-based real-time computing system is provided, wherein the new architecture comprises an arbitration subsystem and a plurality of computing subsystems based on the Jstorm architecture; the plurality of computing subsystems are deployed in a distributed mode; the arbitration subsystem can detect a computing subsystem with a normal state in the plurality of computing subsystems based on heartbeat messages sent by the computing subsystems at regular time, and dynamically allocate computing tasks to the detected computing subsystem with the normal state, and the computing subsystems with the normal state can extract a data subset corresponding to the computing tasks from a data set to be computed based on the computing tasks allocated by the arbitration subsystem to perform data computation;
on one hand, each computing subsystem is distributed, and each computing subsystem with a normal state is only responsible for computing partial data in a data set to be computed, so that a computing node where any computing subsystem with a normal state is located fails, when an arbitration subsystem redistributes computing data carried by the computing subsystem with the failure to other computing subsystems with normal states, only the computing of the partial data is affected, and therefore the whole unavailability in service is not caused, and the stability of the real-time computing system is improved.
On the other hand, each computing subsystem deployed in a distributed manner participates in real-time computing in parallel, so that the computing performance of the real-time computing system can be remarkably improved, and when the data volume to be computed is too large, the service delay can be effectively reduced.
Corresponding to the method embodiment, the application also provides an embodiment of the device.
Referring to fig. 4, the present application provides a Jstorm-based real-time computing apparatus 40, which is applied to an arbitration subsystem in a real-time computing system, where the real-time computing system further includes several computing subsystems based on a Jstorm architecture; the plurality of computing subsystems are deployed in a distributed mode;
referring to fig. 5, the hardware architecture involved in the arbitration subsystem of the Jstorm-based real-time computing device 40 generally includes a CPU, a memory, a non-volatile memory, a network interface, an internal bus, etc.; in the case of a software implementation, the real-time computing device 40 can be generally understood as a computer program loaded in a memory, and a logic device formed by combining software and hardware after being executed by a CPU, where the device 40 includes:
the detection module 401 detects a normal-state computing subsystem in the plurality of computing subsystems based on a heartbeat message sent by each computing subsystem at regular time;
the allocation module 402 is configured to allocate, based on a preset task allocation policy, computing tasks to the detected computing subsystems in normal states, respectively, so that the computing subsystems extract data subsets corresponding to the computing tasks from a data set to be computed to perform data computation;
the allocation module 402, when detecting that any normal computing subsystem is abnormal, reallocates the computing tasks to the normal computing subsystem in the current state based on the preset task allocation policy.
In this example, the computing subsystem is a distributed system comprising a plurality of computing nodes;
the detection module 401 specifically:
receiving heartbeat messages sent by each computing node in each computing subsystem at fixed time;
counting the number of the computing nodes which successfully receive the heartbeat message in each computing subsystem;
and when the heartbeat messages sent by all the computing nodes in any computing subsystem are successfully received in the timed sending period of the heartbeat messages, determining that the state of the computing subsystem is normal.
In this example, the detection module 401 further:
when the heartbeat messages sent by all the computing nodes in any computing subsystem are not successfully received in the timed sending period of the heartbeat messages, and the ratio of the number of the computing nodes which do not successfully receive the heartbeat messages to the total number of the computing nodes of the computing subsystem reaches a preset threshold value, determining that the state of the computing subsystem is abnormal;
and when the heartbeat information sent by all the computing nodes in the computing subsystem with abnormal state is successfully received in the next timing sending period of the heartbeat message, determining that the state of the computing subsystem is recovered abnormally.
In this example, the assignment module 402 further:
and when the abnormal recovery of the state of the computing subsystem with any abnormal state is detected, redistributing the computing tasks to the computing subsystem with the normal current state based on the preset task allocation strategy.
In this example, the preset allocation policy includes:
acquiring identification information of each computing subsystem;
sorting the identification information of each computing subsystem;
and respectively distributing corresponding calculation tasks for the calculation subsystems based on the sequence after the sequencing aiming at the identification information of the calculation subsystems, and generating corresponding task numbers.
In this example, the computing subsystems are distributed and deployed in data centers which are physically different from each other.
Referring to fig. 6, the present application proposes another Jstorm-based real-time computing apparatus 60, which is applied to any computing subsystem in a real-time computing system, where the real-time computing system includes an arbitration subsystem and several computing subsystems based on the Jstorm architecture;
the plurality of computing subsystems are deployed in a distributed mode; referring to fig. 7, the hardware architecture of the computing subsystem carrying the real-time computing device 60 generally includes a CPU, a memory, a non-volatile memory, a network interface, an internal bus, and the like; in the case of software implementation, the real-time computing device 60 can be generally understood as a computer program loaded in a memory, and a logic device formed by combining software and hardware after being executed by a CPU, where the device 60 includes:
the sending module 601 is configured to send a heartbeat message to an arbitration subsystem at regular time, so that the arbitration subsystem detects a normal-state computing subsystem in the plurality of computing subsystems based on the heartbeat message, and distributes a computing task to the normal-state computing subsystem;
the extracting module 602 is configured to obtain the computation tasks allocated by the arbitration subsystem, and extract a data subset corresponding to the computation tasks from a data set to be computed based on a preset data extraction policy to perform data computation;
the judging module 603 judges whether the calculation task distributed by the arbitration subsystem is updated;
the extracting module 602, when the computation task allocated by the arbitration subsystem is updated, re-extracts the data subset corresponding to the updated computation task from the preset data set to be computed based on the preset data extraction policy to perform data computation.
In this example, the computing subsystem is a distributed system comprising a plurality of computing nodes;
the sending module 601 specifically:
and respectively sending heartbeat messages to the arbitration subsystem at regular time through each computing node in the computing subsystem.
In this example, the preset data extraction policy includes:
creating a subset of data corresponding to a task number of the computing task;
performing remainder logic operation on the data numbers of the data items to be calculated in the data set to be calculated and the number of the calculation subsystems in normal state in the plurality of calculation subsystems detected by the arbitration subsystem;
matching the result of the remainder logic operation with the task number corresponding to the calculation task;
and when the result of the logical operation of the remainder of any data entry to be calculated is matched with the task number corresponding to the calculation task, extracting the data entry to be calculated, and storing the data entry to be calculated into the data subset corresponding to the calculation task.
In this example, the determining module 603 specifically:
performing exclusive-or logic operation on the task number corresponding to the calculation task and the number of calculation subsystems in a normal state in the plurality of calculation subsystems detected by the arbitration subsystem;
judging whether the result of the XOR logical operation changes or not;
and if the result of the exclusive-OR logic operation changes, determining that the calculation task distributed by the arbitration subsystem changes.
In this example, the computing subsystems are distributed and deployed in data centers which are physically different from each other.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.