CN111274067B - Method and device for executing computing task - Google Patents

Method and device for executing computing task Download PDF

Info

Publication number
CN111274067B
CN111274067B CN201811473744.XA CN201811473744A CN111274067B CN 111274067 B CN111274067 B CN 111274067B CN 201811473744 A CN201811473744 A CN 201811473744A CN 111274067 B CN111274067 B CN 111274067B
Authority
CN
China
Prior art keywords
node
calculation
master node
instruction
new
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811473744.XA
Other languages
Chinese (zh)
Other versions
CN111274067A (en
Inventor
姚思雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201811473744.XA priority Critical patent/CN111274067B/en
Publication of CN111274067A publication Critical patent/CN111274067A/en
Application granted granted Critical
Publication of CN111274067B publication Critical patent/CN111274067B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a method and a device for executing a computing task, and relates to the technical field of computers. One embodiment of the method comprises the following steps: sending a calculation instruction to a main node, wherein the calculation instruction carries a calculation task and receives original data generated by executing the calculation task and returned by the main node; if the fact that the master node cannot execute the calculation task is monitored, selecting a new master node from the slave nodes according to a preset rule; and sending a new calculation instruction to the new master node, wherein the new calculation instruction carries the original data, and receiving the original data which is returned by the new master node and is generated by continuously executing the calculation task according to the original data. This embodiment guarantees both high performance advantages and high availability of the master node.

Description

Method and device for executing computing task
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for executing a computing task.
Background
Spark is a computation engine, and for Spark clusters, the deployment mode is mainly Spark on Yarn, and the Spark on Yarn deployment mode is divided into two modes: cluster mode and Client mode.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: the Spark on YARN CLIENT mode has the problem of single point of failure and does not have high availability; whereas Spark on Yarn Cluster mode performs computational tasks by arbitrary nodes, the high performance advantage of the nodes cannot be exploited.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a method and apparatus for executing a computing task, which can ensure both high performance advantage and high availability of a master node.
To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a method of performing a computing task.
The method for executing the computing task comprises the following steps: sending a calculation instruction to a main node, wherein the calculation instruction carries a calculation task and receives original data generated by executing the calculation task and returned by the main node; if the fact that the master node cannot execute the calculation task is monitored, selecting a new master node from the slave nodes according to a preset rule; and sending a new calculation instruction to the new master node, wherein the new calculation instruction carries the original data, and receiving the original data which is returned by the new master node and is generated by continuously executing the calculation task according to the original data.
In one embodiment, after sending the calculation instructions to the master node, the method further comprises: transmitting a data recording instruction to the master node at intervals; receiving the original data generated by executing the computing task and returned by the master node, wherein the original data comprises: and receiving the original data generated by the execution of the calculation task and sent by the master node according to the recorded data instruction.
In one embodiment, the method further comprises pre-ordering each slave node in order of higher to lower performance of each slave node; selecting a new master node from slave nodes according to a preset rule, including: and taking the first slave node in the ordering as a new master node.
In one embodiment, the monitoring that the master node is unable to perform the computing task includes: and sending a monitoring instruction to the main node, and if the response message returned by the main node according to the monitoring instruction is not received within preset time, confirming that the main node cannot execute the calculation task.
In one embodiment, after sending the new calculation instruction to the new master node, the method further comprises: and receiving an ending instruction returned by the new master node after the execution of the calculation task, and clearing the original data according to the ending instruction.
To achieve the above object, according to another aspect of an embodiment of the present invention, there is provided an apparatus for performing a computing task.
An apparatus for performing a computing task according to an embodiment of the present invention includes: the first receiving and transmitting unit is used for sending a calculation instruction to the main node, wherein the calculation instruction carries a calculation task and receives original data generated by executing the calculation task and returned by the main node; the processing unit is used for selecting a new master node from slave nodes according to a preset rule if the master node is monitored to be incapable of executing the calculation task; and the second receiving and transmitting unit is used for sending a new calculation instruction to the new main node, wherein the new calculation instruction carries the original data and receives the original data which is returned by the new main node and is generated by continuously executing the calculation task according to the original data.
In one embodiment, the apparatus further comprises: the preprocessing unit is used for sending a data recording instruction to the main node at intervals after sending a calculation instruction to the main node; the first transceiver unit is specifically configured to: and receiving the original data generated by the execution of the calculation task and sent by the master node according to the recorded data instruction.
In an embodiment, the preprocessing unit is specifically further configured to: sequencing all slave nodes in advance according to the order of the performance of the slave nodes from high to low; the processing unit is specifically configured to: and taking the first slave node in the ordering as a new master node.
In an embodiment, the processing unit is specifically further configured to: and sending a monitoring instruction to the main node, and if the response message returned by the main node according to the monitoring instruction is not received within preset time, confirming that the main node cannot execute the calculation task.
In an embodiment, the processing unit is specifically further configured to: after a new calculation instruction is sent to the new master node, if an ending instruction returned by the new master node after the new master node executes the calculation task is received, the original data is cleared according to the ending instruction.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus.
An electronic device according to an embodiment of the present invention includes: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the method for executing the computing task provided by the embodiment of the invention.
To achieve the above object, according to still another aspect of an embodiment of the present invention, a computer-readable medium is provided.
A computer readable medium of an embodiment of the present invention stores a computer program thereon, which when executed by a processor implements a method for performing computing tasks provided by the embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: the method has the advantages that the main node is controlled to firstly execute the calculation task in a mode of sending the calculation instruction carrying the calculation task to the main node, so that the high performance advantages of the main node, such as strong scheduling performance, high speed of executing the calculation task and the like, are fully utilized, if the main node cannot execute the calculation task, a new main node is designated to continue to execute the calculation task according to the original data returned by the main node, so that high availability is ensured, and therefore, the high performance advantages and the high availability of the main node are ensured. When the execution of the computing tasks is completed, all original data are removed, so that the execution of each computing task does not influence the execution of the next computing task, and the accuracy of the execution of the computing tasks is ensured.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is an exemplary framework diagram of a prior art method of performing computing tasks;
FIG. 2 is a schematic diagram of the main flow of a method of performing a computing task according to a first embodiment of the present invention;
FIG. 3 is an exemplary framework diagram of a method of performing a computing task in accordance with a first embodiment of the present invention;
FIG. 4 is a signaling interaction diagram of a method of performing a computational task according to a second embodiment of the present invention;
FIG. 5 is an exemplary framework diagram of a method of performing computing tasks according to a second embodiment of the invention;
FIG. 6 is a schematic diagram of the main units of an apparatus for performing computing tasks according to an embodiment of the invention;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 8 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It is noted that embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Spark (Spark is a fast general-purpose computing engine designed for large-scale data processing, now forming a widely-developed and widely-used ecosystem, spark is a general-purpose engine that can be used to perform various operations including SQL queries, text processing, machine learning, etc., while before Spark appears, we generally need to learn various engines to handle these needs separately) has become the most popular technology in the big data distributed computing field, for the deployment of Spark clusters, spark on Yarn (Yarn is a resource coordinator, yet Another Resource Negotiator) is a relatively common deployment mode, and Spark on Yarn uses the resource management function provided by Hadoop (Hadoop is a distributed system infrastructure) Yarn components to coordinate and allocate resources required by Spark clusters to perform computing tasks. But even Spark on Yarn, two modes are known: cluster mode and Client mode.
As shown in fig. 1, the Spark on YARN CLIENT mode is to deploy a driver only on a client node, the performance of the client node is far better than that of each slave node in the Spark cluster, the client node submits a calculation task to the Spark cluster, and the Spark cluster schedules and coordinates each slave node through the driver only deployed on the client node to complete the execution of the calculation task. Because the performance of the client node is higher than that of each slave node, the scheduling capability of the driver is improved and the execution speed of the computing task is increased when the driver is deployed on the client node compared with when the driver is deployed on the slave node (assuming that the driver is deployed on the slave node). And all logs of the Spark cluster are recorded on the client node, and when the size of the Spark cluster is large, real-time positioning can be quickly completed through the logs of the client node. However, in this mode, as long as the client node fails, the computing task cannot be performed, and it is necessary to wait until the client node returns to normal, so that the Spark on YARN CLIENT mode has a problem of single point of failure, that is, does not have high availability.
As shown in FIG. 1, spark on Yarn Cluster is a mode in which a driver is deployed on a plurality of slave nodes, a client node submits a computing task to a Spark cluster, and the Spark cluster schedules and coordinates each slave node to complete execution of the computing task by any driver deployed on the slave node. When the slave node executing the calculation task fails, other slave nodes deploying the driver continue to execute the calculation task, the calculation task cannot stop executing because of the failure of the slave node executing the calculation task, and high availability is ensured. However, since the slave node performing the computing task is arbitrary, the high-performance client node may not participate in the execution process of the computing task, and the Spark on Yarn Cluster mode may not fully utilize the high-performance advantage of the client node.
To solve the problems of the prior art, a first embodiment of the present invention provides a method for performing a computing task, as shown in fig. 2 and 3, applied to a distributed coordination server, the method comprising:
Step 201, a calculation instruction is sent to a master node, the calculation instruction carries a calculation task, and raw data generated by executing the calculation task and returned by the master node is received.
In this step, in the implementation, as shown in fig. 3, a client node with high performance and deployed with a driver is used as a master node, and the driver deployed in the master node schedules and coordinates each slave node in the Spark cluster, thereby completing execution of a computing task. Therefore, the high-performance master node firstly executes the calculation task, so that the high-performance advantages of the master node, such as strong scheduling capability, high execution speed of executing the calculation task, high real-time positioning speed and the like, are ensured. In addition, the distributed coordination server may send a calculation instruction to the master node according to the address of the master node. The master node may send the raw data to the distributed coordination server according to the address of the distributed coordination server. Furthermore, the original data may be operation breakpoint information, which may specifically include: resource scheduler (DAGScheduler) information for dependent jobs in the computing task and resource scheduler (TaskScheduler) information between dependent jobs in the computing task.
Step S202, if the fact that the master node cannot execute the calculation task is monitored, a new master node is selected from slave nodes according to a preset rule.
In this step, if the node with high performance and the driver disposed includes the client node and the plurality of slave nodes, and the client node as the master node cannot execute the computing task, a new master node may be selected from the slave nodes with high performance and the driver disposed, so that the scheduling capability of the driver is not reduced, and the execution speed of the computing task is always maintained at a fast level. If the node with high performance and the deployed driver only comprises the client node, and the client node serving as the master node cannot execute the computing task, a new master node can be selected from the slave nodes (for example, the slave node 1 or the slave node 2) with the deployed driver, so that the scheduling capability of the driver is reduced, the execution speed of the computing task is also reduced, but the computing task is still executed, and the high performance advantage of the master node is ensured. The execution of the computing task is not stopped because the client node serving as the master node cannot execute the computing task, and the high performance advantage and the high availability of the master node are ensured. In specific implementation, the selection of a new master node from slave nodes according to the preset rule may be performed in the manner of the second embodiment, or may be performed in a random manner, where a driver is deployed, and it should be understood that, without affecting the embodiment of the present invention, a person skilled in the art may flexibly set the preset rule.
Step 203, a new calculation instruction is sent to the new master node, where the new calculation instruction carries the original data, and receives the original data generated by the new master node that continues to execute the calculation task according to the original data.
In this step, the distributed coordination server may send new calculation instructions to the new master node according to the address of the new master node when it is implemented. It should be understood that the new master node continues to perform the computing task according to the original data, and generates the original data when the computing task continues to be performed, and the new master node returns the original data generated when the computing task continues to be performed to the distributed coordination server. If the slave node 1 is selected as a new master node to continue to execute the calculation task, the slave node 1 as the new master node continues to execute the calculation task, and when the calculation task is not executed, the distributed coordination server monitors that the slave node 1 as the new master node cannot execute the calculation task, selects the new master node from the slave nodes according to a preset rule, for example, selects the slave node 2 as the new master node, sends a new calculation instruction to the slave node 2 as the new master node, and the new calculation instruction carries original data generated by the slave node 1 to continue to execute the calculation task, and the slave node 2 as the new master node continues to execute the calculation task until the calculation task is executed.
To solve the problems of the prior art, a second embodiment of the present invention provides a method for executing a computing task, which is applied to a distributed coordination server, and in the second embodiment, a specific process is described with reference to fig. 4 and 5 by using a specific example, as follows:
the method comprises the steps that firstly, a calculation instruction is sent to a main node, wherein the calculation instruction carries a calculation task; the slave nodes are ordered in the order of the performance of the slave nodes from high to low.
In this step, in implementation, the distributed coordination server sends a calculation instruction to the master node, where the sending manner may specifically be: the address of the client node, the address of the slave node 1 and the address of the slave node 2 are stored in the distributed coordination server in advance, and the distributed coordination server transmits a calculation instruction to the master node according to the address of the client node serving as the master node. The distributed coordination server is also called a Zookeeper (Zookeeper is a distributed, open-source distributed application coordination service, and can provide consistency services for distributed applications, and functions include configuration maintenance, domain name service, distributed synchronization, group service, and the like), and will be exemplified by the Zookeeper. It should be noted that, the performance of the slave node 1 and the performance of the slave node 2 are stored in the Zookeeper, and the Zookeeper orders the slave node 1 and the slave node 2 according to the order of the slave node performance from high to low, and the ordering result is: slave node 1, slave node 2.
In addition, in the embodiment of the invention, a client node in the spark cluster submits a calculation task input by a user to the spark cluster, a Zookeeper acquires the calculation task from the spark cluster and generates a calculation instruction according to the calculation task, the calculation instruction carries the calculation task, the Zookeeper sends the calculation instruction to the client node serving as a master node, and the client node firstly executes the calculation task.
And a second step of sending a data recording instruction to the master node at intervals after sending a calculation instruction to the master node.
In this step, when implemented, the Zookeeper may start to count when sending the calculation instruction to the master node, and send the record instruction to the master node every 5 minutes. In addition, the time length of each interval is determined by the Zookeeper according to the calculated amount of the calculation task, for example, when the calculated amount of the calculation task is small, the Zookeeper sends a record data instruction to the master node every 5 minutes; when the calculation amount of the calculation task is large, the Zookeeper sends a data recording instruction to the master node every 15 minutes. Further, the record data instruction is a Zookeeper self-contained instruction, and the record data instruction is used for requesting original data generated by executing the computing task from a node executing the computing task.
And thirdly, receiving the original data generated by the execution of the calculation task and sent by the master node according to the recorded data instruction.
In this step, the address of the Zookeeper is stored in advance in the client node as the master node, and includes the ip address and port of the distributed coordination server. The host node executes the computing task according to the received computing instruction, and at this time, the computing task is executed by the driver deployed in the high-performance client node, so that the advantages of the high-performance client node, such as strong scheduling capability, high speed of executing the computing task, fast real-time positioning, and the like, are fully utilized. And if the master node receives the data recording instruction in the process of executing the calculation task, the master node sends the original data generated by executing the calculation task to the Zookeeper according to the address of the Zookeeper. In addition, since the Zookeeper transmits the record data instruction once at intervals, the master node transmits the raw data generated by executing the calculation task at intervals to the Zookeeper. And the Zookeeper receives the original data generated by executing the calculation task and sent by the master node according to the recorded data instruction.
And step four, sending a monitoring instruction to the main node, and if a response message returned by the main node according to the monitoring instruction is not received within preset time, confirming that the main node cannot execute the calculation task.
In the implementation of this step, the heartbeat detection mechanism of the Zookeeper can be utilized to confirm that the master node cannot execute the computing task, and the specific process is as follows: and the Zookeeper transmits a monitoring instruction to the main node, and if the response message returned by the main node according to the monitoring instruction is not received within preset time, the Zookeeper confirms that the main node cannot execute the calculation task. It should be appreciated that one skilled in the art may flexibly confirm the method by which the master node cannot perform the computing task without affecting the embodiments of the present invention. It should be noted that, if the response message returned by the master node according to the monitoring instruction is received within the preset time, the Zookeeper confirms that the master node can execute the computing task, and the master node continues to execute the computing task.
In addition, the fourth step is performed after the calculation instruction is sent in the first step, and the first step, the second step, the fourth step, and the like are only for convenience of description, and are not the actual execution order of each step.
And fifthly, after confirming that the master node cannot execute the calculation task, taking the slave node which is ranked first in the ranking as a new master node.
In this step, as is clear from fig. 5, the driver is deployed at the client node, the slave node 1, and the slave node 2, and thus the client node, the slave node 1, and the slave node 2 can perform the calculation task, and the slave node 3 does not deploy the driver, and thus the slave node 3 cannot perform the calculation task. In addition, the slave node ranked first in the ranking is the slave node 1, so that the Zookeeper takes the slave node 1 as a new master node. Furthermore, when the high-performance client node cannot execute the calculation task, the Zookeeper selects a new master node, and the new master node continues to execute the calculation task, so that the high-performance advantage of the client node is fully utilized, and the high availability is ensured.
And sixthly, sending a new calculation instruction to the new master node, wherein the new calculation instruction carries the original data, and receiving the original data which is returned by the new master node and is generated by continuously executing the calculation task according to the original data.
In this step, in the implementation, the Zookeeper uses the pre-stored address of the slave node 1 to send a new calculation instruction to the new master node, where the new calculation instruction carries the original data generated by the master node executing the calculation task.
The Zookeeper transmits a record data instruction to a new master node at intervals after transmitting the new calculation instruction to the new master node. It should be noted that, the specific implementation manner of sending the instruction for recording data to the new master node is the same as the second step, and will not be described herein again.
And the slave node 1 serving as a new master node continuously executes the calculation task according to the original data in the new calculation instruction, and if a record data instruction sent by the Zookeeper is received in the process of continuously executing the calculation task, the original data generated by continuously executing the calculation task is sent to the Zookeeper according to the address of the prestored Zookeeper. In addition, since the Zookeeper transmits the data recording instruction once at intervals, the new master node transmits the original data generated by continuing to execute the calculation task at intervals to the Zookeeper. And the Zookeeper receives the original data which is returned by the new main node and is generated by continuously executing the calculation task according to the original data.
And seventhly, receiving an ending instruction returned by the new master node after the execution of the calculation task, and clearing the original data according to the ending instruction.
In this step, it should be understood that the slave node 1 as the new master node may not perform the calculation task completely, and at this time, the slave node 1 as the new master node sends an end instruction to the Zookeeper when performing the calculation task completely, and the Zookeeper clears all the raw data according to the end instruction, where all the raw data includes the raw data generated by the client node performing the calculation task, and the raw data generated by the slave node 1 continuing to perform the calculation task. The slave node 1 as the new master node may malfunction in the process of continuing to execute the calculation task, so that the calculation task cannot be executed continuously, at this time, the slave node 2 is used as the new master node, and the calculation task is executed continuously by the slave node 2 according to the manner described in the step six and the step seven until the calculation task is executed. In addition, all the original data can carry the identification of the calculation task, and all the original data can be rapidly cleared according to the identification of the calculation task. Furthermore, after all the original data are removed, the Zookeeper can process the next calculation task, the calculation task processed at this time does not influence the processing of the next calculation task, and the accuracy of executing the calculation task is improved.
In the embodiment of the invention, the main node is controlled to firstly execute the calculation task in a mode of sending the calculation instruction carrying the calculation task to the main node, so that the high performance advantages of the main node, such as strong scheduling performance, high speed of executing the calculation task and the like, are fully utilized, if the main node cannot execute the calculation task, a new main node is appointed to continue to execute the calculation task according to the original data returned by the main node, so that the high availability is ensured, and the high performance advantages and the high availability of the main node are ensured at the same time. When the execution of the computing tasks is completed, all original data are removed, so that the execution of each computing task does not influence the execution of the next computing task, and the accuracy of the execution of the computing tasks is ensured.
The method of performing a computing task is described above in connection with fig. 2-5, and the apparatus for performing a computing task is described below in connection with fig. 6.
To solve the problems of the prior art, a third embodiment of the present invention provides an apparatus for performing a computing task, as shown in fig. 6, including:
The first transceiver 601 is configured to send a calculation instruction to a master node, where the calculation instruction carries a calculation task, and receive, returned by the master node, raw data generated by executing the calculation task.
And the processing unit 602 is configured to select a new master node from the slave nodes according to a preset rule if it is monitored that the master node cannot execute the calculation task.
And the second transceiver unit 603 is configured to send a new calculation instruction to the new master node, where the new calculation instruction carries the original data, and receive the original data generated by the new master node according to which the calculation task is continuously executed.
It should be understood that the manner of implementing the third embodiment is the same as that of implementing the first embodiment, and will not be described in detail herein.
It is to solve the problems of the prior art that a fourth embodiment of the present invention provides an apparatus for performing a computing task, the apparatus comprising:
the preprocessing unit is used for sending a data recording instruction to the main node at intervals after sending a calculation instruction to the main node; the slave nodes are ordered in advance according to the order of the performance of the slave nodes from high to low.
The first receiving and transmitting unit is used for sending a calculation instruction to the main node, wherein the calculation instruction carries a calculation task and receives raw data generated by the main node according to the calculation task execution sent by the recording data instruction.
And the processing unit is used for taking the first slave node in the sequence as a new master node if the master node is monitored to be incapable of executing the calculation task.
In this unit, in implementation, the processing unit is specifically configured to send a monitoring instruction to the master node, and if a response message returned by the master node according to the monitoring instruction is not received within a preset time, it is confirmed that the master node cannot execute the computing task. The processing unit is specifically further configured to, after sending a new calculation instruction to the new master node, clear the original data according to the end instruction if the end instruction returned by the new master node after the new master node completes execution of the calculation task is received.
And the second receiving and transmitting unit is used for sending a new calculation instruction to the new main node, wherein the new calculation instruction carries the original data and receives the original data which is returned by the new main node and is generated by continuously executing the calculation task according to the original data.
It should be understood that the manner of implementing the fourth embodiment is the same as that of implementing the second embodiment, and will not be described here again.
In the embodiment of the invention, the main node is controlled to firstly execute the calculation task in a mode of sending the calculation instruction carrying the calculation task to the main node, so that the high performance advantages of the main node, such as strong scheduling performance, high speed of executing the calculation task and the like, are fully utilized, if the main node cannot execute the calculation task, a new main node is appointed to continue to execute the calculation task according to the original data returned by the main node, so that the high availability is ensured, and the high performance advantages and the high availability of the main node are ensured at the same time. When the execution of the computing tasks is completed, all original data are removed, so that the execution of each computing task does not influence the execution of the next computing task, and the accuracy of the execution of the computing tasks is ensured.
FIG. 7 illustrates an exemplary system architecture 700 of a method of performing a computing task or an apparatus performing a computing task to which embodiments of the invention may be applied.
As shown in fig. 7, a system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 is the medium used to provide communication links between the terminal devices 701, 702, 703 and the server 705. The network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 705 via the network 704 using the terminal devices 701, 702, 703 to receive or send messages or the like. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 701, 702, 703.
The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 705 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 701, 702, 703. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the method for executing the computing task provided by the embodiment of the present invention is generally executed by the server 705, and accordingly, the device for executing the computing task is generally disposed in the server 705.
It should be understood that the number of terminal devices, networks and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 8, there is illustrated a schematic diagram of a computer system 800 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 801.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present invention may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes a first transceiver unit, a processing unit, and a second transceiver unit. The names of these units do not limit the unit itself in some cases, for example, the processing unit may also be described as "if it is monitored that the master node cannot perform the calculation task, a new unit of the master node is selected from the slave nodes according to a preset rule".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: sending a calculation instruction to a main node, wherein the calculation instruction carries a calculation task and receives original data generated by executing the calculation task and returned by the main node; if the fact that the master node cannot execute the calculation task is monitored, selecting a new master node from the slave nodes according to a preset rule; and sending a new calculation instruction to the new master node, wherein the new calculation instruction carries the original data, and receiving the original data which is returned by the new master node and is generated by continuously executing the calculation task according to the original data.
According to the technical scheme of the embodiment of the invention, the main node is controlled to firstly execute the calculation task in a mode of sending the calculation instruction carrying the calculation task to the main node, so that the high performance advantages of the main node, such as strong scheduling performance, high speed of executing the calculation task and the like, are fully utilized, if the main node cannot execute the calculation task, a new main node is designated to continue to execute the calculation task according to the original data returned by the main node, so that high availability is ensured, and therefore, the high performance advantages and the high availability of the main node are ensured. When the execution of the computing tasks is completed, all original data are removed, so that the execution of each computing task does not influence the execution of the next computing task, and the accuracy of the execution of the computing tasks is ensured.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of performing a computing task, for use with a distributed coordination server, comprising:
sending a calculation instruction to a main node, wherein the calculation instruction carries a calculation task and receives original data generated by executing the calculation task and returned by the main node; the master node is a client node in which a driver is deployed;
if the fact that the master node cannot execute the calculation task is monitored, selecting a new master node from the slave nodes according to a preset rule; the slave node deploys a driver;
Sending a new calculation instruction to the new main node, wherein the new calculation instruction carries the original data and receives the original data generated by the new main node according to the original data to continue executing the calculation task;
after sending the calculation instruction to the master node, the method further comprises: transmitting a data recording instruction to the master node at intervals;
Receiving the original data generated by executing the computing task and returned by the master node, wherein the original data comprises:
and receiving the original data generated by the execution of the calculation task and sent by the master node according to the recorded data instruction.
2. The method according to claim 1, wherein the method further comprises: sequencing all slave nodes in advance according to the order of the performance of the slave nodes from high to low;
selecting a new master node from slave nodes according to a preset rule, including:
And taking the first slave node in the ordering as a new master node.
3. The method of claim 1, wherein monitoring that the master node is unable to perform the computing task comprises:
and sending a monitoring instruction to the main node, and if the response message returned by the main node according to the monitoring instruction is not received within preset time, confirming that the main node cannot execute the calculation task.
4. The method of claim 1, wherein after sending a new calculation instruction to the new master node, the method further comprises:
And receiving an ending instruction returned by the new master node after the execution of the calculation task, and clearing the original data according to the ending instruction.
5. An apparatus for performing a computing task, the apparatus disposed on a distributed coordination server, comprising:
The first receiving and transmitting unit is used for sending a calculation instruction to the main node, wherein the calculation instruction carries a calculation task and receives original data generated by executing the calculation task and returned by the main node; the master node is a client node in which a driver is deployed;
The processing unit is used for selecting a new master node from slave nodes according to a preset rule if the master node is monitored to be incapable of executing the calculation task; the slave node deploys a driver;
the second receiving and transmitting unit is used for sending a new calculation instruction to the new main node, wherein the new calculation instruction carries the original data and receives the original data which is returned by the new main node and is generated by continuously executing the calculation task according to the original data;
The apparatus further comprises:
The preprocessing unit is used for sending a data recording instruction to the main node at intervals after sending a calculation instruction to the main node;
The first transceiver unit is specifically configured to:
and receiving the original data generated by the execution of the calculation task and sent by the master node according to the recorded data instruction.
6. The apparatus according to claim 5, wherein the preprocessing unit is further specifically configured to:
sequencing all slave nodes in advance according to the order of the performance of the slave nodes from high to low;
The processing unit is specifically configured to:
And taking the first slave node in the ordering as a new master node.
7. The apparatus according to claim 5, wherein the processing unit is further specifically configured to:
and sending a monitoring instruction to the main node, and if the response message returned by the main node according to the monitoring instruction is not received within preset time, confirming that the main node cannot execute the calculation task.
8. The apparatus according to claim 5, wherein the processing unit is further specifically configured to:
After a new calculation instruction is sent to the new master node, if an ending instruction returned by the new master node after the new master node executes the calculation task is received, the original data is cleared according to the ending instruction.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-4.
10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-4.
CN201811473744.XA 2018-12-04 2018-12-04 Method and device for executing computing task Active CN111274067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811473744.XA CN111274067B (en) 2018-12-04 2018-12-04 Method and device for executing computing task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811473744.XA CN111274067B (en) 2018-12-04 2018-12-04 Method and device for executing computing task

Publications (2)

Publication Number Publication Date
CN111274067A CN111274067A (en) 2020-06-12
CN111274067B true CN111274067B (en) 2024-06-14

Family

ID=70998575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811473744.XA Active CN111274067B (en) 2018-12-04 2018-12-04 Method and device for executing computing task

Country Status (1)

Country Link
CN (1) CN111274067B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025205A (en) * 2016-01-30 2017-08-08 华为技术有限公司 A kind of method and apparatus of training pattern in distributed system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10176097B2 (en) * 2014-12-16 2019-01-08 Samsung Electronics Co., Ltd. Adaptable data caching mechanism for in-memory cluster computing
CN104793990B (en) * 2015-04-21 2018-08-17 中国海洋大学 A kind of multiple timings method for scheduling task and system
CN104915250B (en) * 2015-06-03 2018-04-06 电子科技大学 It is a kind of to realize the method for making MapReduce data localization in the industry
KR101792189B1 (en) * 2016-03-04 2017-10-31 연세대학교 산학협력단 Apparatus and method for processing big data
CN105808334B (en) * 2016-03-04 2016-12-28 山东大学 A kind of short optimization of job system and method for MapReduce based on resource reuse
CN106844055B (en) * 2017-01-25 2020-02-28 北京百分点信息科技有限公司 Task execution method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025205A (en) * 2016-01-30 2017-08-08 华为技术有限公司 A kind of method and apparatus of training pattern in distributed system

Also Published As

Publication number Publication date
CN111274067A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN108737270B (en) Resource management method and device for server cluster
US20190228303A1 (en) Method and apparatus for scheduling resource for deep learning framework
CN105071976B (en) Data transmission method and device
JP2019102064A (en) Method and apparatus for processing task in smart device
US8903981B2 (en) Method and system for achieving better efficiency in a client grid using node resource usage and tracking
CN109245908B (en) Method and device for switching master cluster and slave cluster
CN113742031A (en) Node state information acquisition method and device, electronic equipment and readable storage medium
CN110321252B (en) Skill service resource scheduling method and device
CN110673959A (en) System, method and apparatus for processing tasks
CN113760488B (en) Method, apparatus, device and computer readable medium for scheduling tasks
CN112306851A (en) Automatic testing method and device
CN111770176B (en) Traffic scheduling method and device
CN109428926B (en) Method and device for scheduling task nodes
CN112398669B (en) Hadoop deployment method and device
CN110798495B (en) Method and server for end-to-end message push in cluster architecture mode
CN107294911B (en) Data packet monitoring method and device, remote procedure call system and equipment
CN111831503A (en) Monitoring method based on monitoring agent and monitoring agent device
CN114296953A (en) Multi-cloud heterogeneous system and task processing method
CN112825525B (en) Method and apparatus for processing transactions
WO2018188607A1 (en) Stream processing method and device
CN107045452B (en) Virtual machine scheduling method and device
CN113760638A (en) Log service method and device based on kubernets cluster
CN111274067B (en) Method and device for executing computing task
CN111382953A (en) Dynamic process generation method and device
CN113472638B (en) Edge gateway control method, system, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant