CN107678752B

CN107678752B - Task processing method and device for heterogeneous cluster

Info

Publication number: CN107678752B
Application number: CN201710772904.XA
Authority: CN
Inventors: 温圣召; 周汉清; 刘传秀; 张家军
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-08-31
Filing date: 2017-08-31
Publication date: 2021-09-21
Anticipated expiration: 2037-08-31
Also published as: US20190065251A1; US10977076B2; CN107678752A

Abstract

The invention discloses a task processing method and a device for a heterogeneous cluster, wherein the method comprises the following steps: receiving a task request and a basic execution environment; scheduling heterogeneous equipment according to the task request; compiling the basic execution environment into an execution environment corresponding to the scheduled heterogeneous equipment, and deploying the basic execution environment to the scheduled heterogeneous equipment; causing the scheduled heterogeneous device to execute the task request. Only a user is required to provide a basic execution environment for a task, and the user is not required to write a version of execution environment for each type of hardware platform, so that the codes of heterogeneous equipment are developed quickly, and the development and maintenance cost is reduced.

Description

Task processing method and device for heterogeneous cluster

[ technical field ] A method for producing a semiconductor device

The invention relates to a computer application technology, in particular to a task processing method and device for a heterogeneous cluster.

[ background of the invention ]

With the development of big data and deep learning technology, mass data are trained and learned through a deep learning method, and a method for finally learning a set of accurate deep models is more and more emphasized by various technical companies. The more complex and powerful depth model can deeply reveal complex and rich information carried in mass data and can make more accurate prediction on future or unknown events. Deep learning applications include speech recognition, image recognition, natural language processing, search for advertising CTR predictions, and the like. Since deep learning applications typically rely on huge computation and communication, training and bringing online in the form of heterogeneous clusters of CPU + heterogeneous accelerators (GPU, FPGA, etc.) has become the mainstream way to support deep learning applications.

Since the development of heterogeneous clusters requires the user to have a certain grasp on the hardware architecture and the specific programming languages (e.g., CUDA, OPENCL, MPI). Currently, how to efficiently develop and deploy deep learning applications on heterogeneous clusters has become an important challenge to promote and accelerate the use of deep learning in various fields.

The current solution is mainly to develop a specific version for each hardware (GPU, FPGA) of the heterogeneous cluster, and then to match the corresponding hardware for use. However, this solution has several problems:

1. the development cost is huge: in order to write a hardware device program, project developers need to learn certain hardware knowledge, which is inconvenient for quick iteration of software.

2. The maintenance cost is huge: since there is one version of code for each hardware platform, multiple copies of code with uniform functionality need to be maintained.

[ summary of the invention ]

The application provides a task processing method and device for a heterogeneous cluster, and development and maintenance costs can be reduced.

One aspect of the present application provides a task processing method for a heterogeneous cluster, including:

receiving a task request and a basic execution environment;

scheduling heterogeneous equipment according to the task request;

compiling the basic execution environment into an execution environment corresponding to the scheduled heterogeneous equipment, and deploying the basic execution environment to the scheduled heterogeneous equipment; building a deep learning network;

and enabling the scheduled heterogeneous equipment to execute the task request, and training the deep learning network.

The above-described aspects and any possible implementation further provide an implementation in which the basic execution environment is built based on a heterogeneous high-performance library of preset basic hardware device types.

The above-described aspect and any possible implementation further provide an implementation, where the task request includes: the identification of the basic execution environment, the configuration information of the task and the data information of the task.

The foregoing aspects and any possible implementations further provide an implementation where the scheduling a heterogeneous device according to the task request includes:

and scheduling heterogeneous equipment for the task request according to the data information of the task in the task request and the available resource limit of the heterogeneous cluster.

The foregoing aspect and any possible implementation manner further provide an implementation manner, where compiling the basic execution environment into an execution environment corresponding to the scheduled heterogeneous device includes:

and switching the heterogeneous high-performance library of the basic hardware device type into a heterogeneous high-performance library corresponding to the scheduled heterogeneous device type according to the type of the heterogeneous device scheduled for the task request, and generating an execution environment corresponding to the scheduled heterogeneous device.

The above aspect and any possible implementation further provide an implementation in which the causing the scheduled heterogeneous device to execute the task request includes:

and sending a task instruction comprising the data information of the task to the scheduled heterogeneous equipment so that the scheduled heterogeneous equipment performs distributed computation according to the task instruction.

In another aspect of the present application, a task processing device facing a heterogeneous cluster is provided, including:

the receiving module is used for receiving the task request and the basic execution environment;

the scheduling module is used for scheduling the heterogeneous equipment according to the task request;

the deployment module is used for compiling the basic execution environment into an execution environment corresponding to the scheduled heterogeneous equipment and deploying the basic execution environment to the scheduled heterogeneous equipment; building a deep learning network;

and the execution module is used for triggering the scheduled heterogeneous equipment to execute the task request and training the deep learning network.

The foregoing aspect and any possible implementation further provide an implementation, where the scheduling module is specifically configured to:

The above-described aspects and any possible implementation further provide an implementation, where the deployment module is specifically configured to:

The above-described aspect and any possible implementation further provide an implementation, where the execution module is specifically configured to:

In another aspect of the present invention, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the program.

In another aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method as set forth above.

Based on the above description, it can be seen that, by adopting the scheme of the present invention, the basic execution environment is compiled into the execution environment corresponding to the scheduled heterogeneous device, and only the user needs to provide the basic execution environment for the task, and does not need to write a version of execution environment for each type of hardware platform, thereby realizing the rapid development of the heterogeneous device code and reducing the development and maintenance cost.

[ description of the drawings ]

FIG. 1 is a schematic diagram of a system architecture according to the present invention;

FIG. 2 is a flowchart of a task processing method for a heterogeneous cluster according to the present invention;

FIG. 3 is a structural diagram of a task processing device facing heterogeneous clusters according to the present invention;

fig. 4 illustrates a block diagram of an exemplary computer system/server 012 suitable for use in implementing embodiments of the invention.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to facilitate understanding of the present invention, a system architecture to which the present invention relates will first be described. As shown in fig. 1, the system provided by the present invention mainly includes: the system comprises user side equipment, a heterogeneous cluster scheduling server and a heterogeneous cluster. In embodiments of the present invention, a user, such as a developer, may use a client device to perform task development, which is ultimately performed by the compute nodes in the heterogeneous cluster. The heterogeneous cluster scheduling server is mainly used for realizing two functions in the embodiment of the invention: firstly, scheduling a heterogeneous cluster is realized based on a task request from user side equipment; and secondly, the task developed by the user based on the basic execution environment is converted into the task based on the heterogeneous equipment execution environment.

The computing nodes in the heterogeneous cluster are responsible for executing the distributed tasks; the computing nodes can be heterogeneous devices such as CPUs, GPUs, FPGAs or ARM. The compute nodes and the heterogeneous cluster scheduling server may be connected through remote direct data access such as RDMA or TCP communication. A user can submit a task request to the heterogeneous cluster scheduling server through the Web front end. In the embodiment of the present invention, the related task may be any task that can be implemented based on a heterogeneous network, and in the embodiment of the present invention, a deep learning network training task is described as an example, for example, deep learning network training such as speech recognition, image recognition, natural language processing, search advertisement CTR prediction, and the like.

Fig. 2 is a flowchart of a task processing method for a heterogeneous cluster according to the present invention, and as shown in fig. 2, the method includes:

step S201, receiving a task request and a basic execution environment;

step S202, scheduling the heterogeneous cluster according to the task request;

step S203, compiling the basic execution environment into an execution environment corresponding to the scheduled heterogeneous equipment, and deploying the basic execution environment to the scheduled heterogeneous equipment;

and step S204, enabling the scheduled heterogeneous equipment to execute the task request.

The execution subject of the method described in fig. 2 is a heterogeneous cluster scheduling server.

The basic execution environment is an execution environment constructed by developers based on a heterogeneous high-performance library of a basic hardware device type, in this embodiment, a CPU can be used as the basic hardware device type, that is, the execution environment is constructed based on a heterogeneous high-performance library of a CPU version, and the basic execution environment can also be constructed based on a heterogeneous high-performance library of a GPU, an FPGA, or an ARM type.

The heterogeneous high performance library supports a variety of hardware device computing (e.g., CPU, GPU, FPGA, ARM) and communications (e.g., RDMA remote direct data access, TCP). The high-performance library is composed of highly optimized algorithm building modules and is suitable for all data analysis stages (preprocessing, conversion, analysis, modeling, verification and decision making). The method is specially used for common data platforms including Hadoop, Spark and the like, and can improve the efficiency of data access.

In a preferred implementation of step S201,

the task request includes: and executing information such as environment identification, configuration information of the task, data information of the task and the like.

The task can be a parallel operation task and a deep learning network training task.

Taking a deep learning network training task as an example, the task request includes: and executing information such as environment identification, network configuration of the task, training information of the task and the like. Preferably, the training information of the task includes: storage path of training data in shared memory, training parameters for deep learning.

For the storage path, the training data may be stored on a shared memory, such as a distributed file system (HDFS), so that the user may provide an address of the training data on the HDFS and configure a list of filenames of the training data.

Training parameters for deep learning, specifically refer to the relevant configuration requirements for the compute nodes running the deep learning framework. For example: may include at least one of a number of threads per node, an update interval, whether to warm start and/or whether to automatically tune in parameters.

The execution environment is packed and submitted to a heterogeneous cluster scheduling server while a task request is submitted; it should be noted that it is necessary to ensure that the execution environment of the CPU library version can work properly.

In a preferred implementation manner of this embodiment, a basic execution environment may be constructed in advance based on a heterogeneous high-performance library of a CPU version, and an identifier is set for the basic execution environment and stored in a shared memory. And when the heterogeneous cluster scheduling server receives the task request, downloading the corresponding execution environment from the shared memory according to the execution environment identifier in the task request.

Preferably, the user can submit the task request through a command line mode, and can also submit the task request through a visual interface of a web front end.

In a preferred implementation of step S202,

the heterogeneous cluster scheduling server schedules heterogeneous equipment resources for the task request according to data information in the task request and the heterogeneous cluster available resource limit;

taking a deep learning network training task as an example, the heterogeneous cluster scheduling server schedules heterogeneous equipment resources for the task request according to training information in the task request and an available resource limit of a heterogeneous cluster; preferably, the heterogeneous cluster resource management server may determine the number of required heterogeneous devices according to the size of the training data of the deep learning task and the computing capacity of the heterogeneous devices in a three-level scheduling mode. For example: when the data is less than 10GB, scheduling the data to a single heterogeneous device for operation; when the data is larger than 10GB and smaller than 1TB, scheduling the data to 4 heterogeneous devices for operation; and when the data is larger than 1TB, scheduling the data to the whole heterogeneous cluster for operation.

And if the available resource limit of the heterogeneous cluster cannot meet the task request, enabling the task request to enter a waiting state.

And if the available resource limit of the heterogeneous cluster can meet the task request, scheduling heterogeneous equipment for the task request, and taking the heterogeneous equipment as a computing node.

In a preferred implementation of step S203,

compiling the basic execution environment into an execution environment corresponding to the scheduled heterogeneous device comprises:

and the heterogeneous cluster scheduling server switches the heterogeneous high-performance library of the CPU version into the heterogeneous high-performance library of the version corresponding to the scheduled heterogeneous equipment according to the type of the heterogeneous equipment scheduled for the task request, constructs an execution environment corresponding to the type of the scheduled heterogeneous equipment, and packs and deploys the execution environment onto the scheduled heterogeneous equipment.

For example, when the heterogeneous device to be scheduled by the heterogeneous cluster scheduling server for the task request is one or more of a GPU, an FPGA, and an ARM, the heterogeneous high-performance library of the CPU version needs to be switched to the heterogeneous high-performance library of the GPU, the FPGA, or the ARM version corresponding to the scheduled heterogeneous device; when the heterogeneous equipment scheduled by the heterogeneous cluster scheduling server for the task request is a GPU, the heterogeneous high-performance library of the CPU version does not need to be switched.

Taking a deep learning network training task as an example, according to the network configuration of the task in the task request, the execution environment is used to generate a deep learning network including a training network and a testing network.

The training network is a deep learning network structure used in the training task execution process, and the testing network is a deep learning network structure used in the prediction process. The specific structure and related parameters of the training network and the testing network can be modified and defined according to requirements.

In a preferred implementation of step S204,

and the heterogeneous cluster scheduling server generates a task instruction for indicating the scheduled heterogeneous equipment to perform distributed computation according to the task request.

Taking a deep learning network training task as an example, the heterogeneous cluster scheduling server generates a task instruction for instructing the scheduled heterogeneous devices to perform distributed training on the deep learning network according to the training information of the task request.

Specifically, a task instruction is sent to the scheduled heterogeneous device, the scheduled heterogeneous device is triggered to acquire training data from the shared memory according to training information of a task in the task instruction, and the deep learning network is trained.

In a preferred implementation of this embodiment,

in the task execution process, a user can check the execution progress of a task request in real time through a front-end page provided by a heterogeneous cluster scheduling server;

preferably, the heterogeneous cluster scheduling server periodically sends a task state query request to the scheduled heterogeneous equipment; and acquiring the execution progress of the task request inquired by the scheduled heterogeneous equipment according to the task state inquiry request. And the heterogeneous cluster scheduling server provides the execution progress of the scheduled heterogeneous equipment to the task request to the user through a Web front-end page.

Preferably, the scheduled heterogeneous device sends the execution progress of the task request to the heterogeneous cluster scheduling server at regular time, and the heterogeneous cluster scheduling server provides the execution progress of the task request by the scheduled heterogeneous device to a user through a Web front-end page.

It should be noted that the foregoing method embodiments are described as a series of acts or combinations for simplicity in explanation, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Fig. 3 is a structural diagram of a task processing device facing a heterogeneous cluster according to the present invention, where the device may be disposed in a heterogeneous cluster scheduling server to complete operations in the method embodiment illustrated in fig. 2. As shown in fig. 3, includes:

a receiving module 301, configured to receive a task request and a basic execution environment;

a scheduling module 302, configured to schedule the heterogeneous cluster according to the task request;

a deployment module 303, configured to compile the basic execution environment into an execution environment corresponding to the scheduled heterogeneous device, and deploy the execution environment to the scheduled heterogeneous device;

an execution module 304, configured to trigger the scheduled heterogeneous device to execute the task request.

The basic execution environment is an execution environment constructed by a developer based on a heterogeneous high-performance library of a basic hardware device type, and in this embodiment, a CPU may be used as the basic hardware device type, that is, the execution environment is constructed based on a heterogeneous high-performance library of a CPU version. Basic execution environments can also be built based on GPU, FPGA or ARM type heterogeneous high performance libraries.

In a preferred implementation of the receiving module 301,

Taking a deep learning network training task as an example, the task request includes: and executing information such as environment identification, network configuration of the task, training information of the task and the like.

Preferably, the training information of the task includes: storage path of training data in shared memory, training parameters for deep learning.

For the storage path, since the training data are all stored in a shared memory, such as a distributed file system (HDFS), the user can provide the address of the training data on the HDFS and configure a file name list of the training data.

In a preferred implementation manner of this embodiment, an execution environment of the heterogeneous high-performance library may be constructed in advance based on the heterogeneous high-performance library of the CPU version, and an identifier is set for the execution environment and stored in the shared memory. And when the heterogeneous cluster scheduling server receives the task request, downloading the corresponding execution environment from the shared memory according to the execution environment identifier in the task request.

In a preferred implementation of the scheduling module 302,

the scheduling module 302 schedules heterogeneous device resources for the task request according to the data information in the task request and the heterogeneous cluster available resource limit;

taking a deep learning network training task as an example, the scheduling module 302 schedules heterogeneous device resources for the task request according to training information in the task request and an available resource limit of a heterogeneous cluster; preferably, the scheduling module 302 may determine the number of required heterogeneous devices according to the size of the training data of the deep learning task and the computing capability of the heterogeneous devices in a three-level scheduling mode. For example, when the data is less than 10GB of data, the data is scheduled to a single heterogeneous device to run; when the data is larger than 10GB and smaller than 1TB, scheduling the data to 4 heterogeneous devices for operation; and when the data is larger than 1TB, scheduling the data to the whole heterogeneous cluster for operation.

In a preferred implementation of the deployment module 303,

compiling the basic execution environment into the execution environment corresponding to the scheduled heterogeneous device comprises: the deployment module 303 switches the heterogeneous high-performance library of the CPU version to a heterogeneous high-performance library corresponding to the scheduled heterogeneous device according to the type of the heterogeneous device scheduled for the task request, constructs an execution environment corresponding to the type of the scheduled heterogeneous device, and deploys the execution environment onto the scheduled heterogeneous device in a packed manner.

For example, when the heterogeneous device scheduled by the scheduling module 302 is one or more of a GPU, an FPGA, and an ARM, the heterogeneous high performance library of the CPU version needs to be switched to the heterogeneous high performance library of the GPU, the FPGA, or the ARM version corresponding to the scheduled heterogeneous device; when the heterogeneous device scheduled by the scheduling module 302 is a GPU, the heterogeneous high-performance library of the CPU version is not switched.

In a preferred implementation of execution module 304,

the execution module 304 generates a task instruction instructing the scheduled heterogeneous device to perform distributed computation according to the task request.

Taking a deep learning network training task as an example, sending a task instruction to the scheduled heterogeneous equipment, triggering the scheduled heterogeneous equipment to acquire training data from the shared memory according to training information of the task in the task instruction, and training the deep learning network.

In a preferred implementation manner of this embodiment, the apparatus further includes a monitoring module, configured to provide an execution progress of the task request in real time.

Preferably, the monitoring module sends a task state query request to the scheduled heterogeneous device periodically; and acquiring the execution progress of the task request inquired by the scheduled heterogeneous equipment according to the task state inquiry request. And the monitoring module provides the execution progress of the scheduled heterogeneous equipment to the task request to the user through a Web front-end page.

Preferably, the scheduled heterogeneous device sends the execution progress of the task request to the monitoring module at regular time, and the monitoring module provides the execution progress of the task request by the scheduled heterogeneous device to the user through a Web front-end page.

In the embodiment of the invention, the basic execution environment is compiled into the execution environment corresponding to the scheduled heterogeneous equipment, so that only the user needs to provide the basic execution environment for the task, and the user does not need to compile a version of execution environment for each type of hardware platform, thereby realizing the rapid development of the heterogeneous equipment codes and reducing the development and maintenance cost.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Fig. 4 illustrates a block diagram of an exemplary computer system/server 012 suitable for use in implementing embodiments of the invention. The computer system/server 012 shown in fig. 4 is only an example, and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.

As shown in fig. 4, the computer system/server 012 is embodied as a general purpose computing device. The components of computer system/server 012 may include, but are not limited to: one or more processors or processing units 016, a system memory 028, and a bus 018 that couples various system components including the system memory 028 and the processing unit 016.

Bus 018 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer system/server 012 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 012 and includes both volatile and nonvolatile media, removable and non-removable media.

System memory 028 can include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)030 and/or cache memory 032. The computer system/server 012 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 034 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be connected to bus 018 via one or more data media interfaces. Memory 028 can include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the present invention.

Program/utility 040 having a set (at least one) of program modules 042 can be stored, for example, in memory 028, such program modules 042 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof might include an implementation of a network environment. Program modules 042 generally perform the functions and/or methodologies of embodiments of the present invention as described herein.

The computer system/server 012 may also communicate with one or more external devices 014 (e.g., keyboard, pointing device, display 024, etc.), hi the present invention, the computer system/server 012 communicates with an external radar device, and may also communicate with one or more devices that enable a user to interact with the computer system/server 012, and/or with any device (e.g., network card, modem, etc.) that enables the computer system/server 012 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 022. Also, the computer system/server 012 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 020. As shown in fig. 4, the network adapter 020 communicates with the other modules of the computer system/server 012 via bus 018. It should be appreciated that although not shown in fig. 4, other hardware and/or software modules may be used in conjunction with the computer system/server 012, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 016 executes the programs stored in the system memory 028, thereby performing the functions and/or methods of the described embodiments of the present invention.

The computer program described above may be provided in a computer storage medium encoded with a computer program that, when executed by one or more computers, causes the one or more computers to perform the method flows and/or apparatus operations shown in the above-described embodiments of the invention.

With the development of time and technology, the meaning of media is more and more extensive, and the propagation path of computer programs is not limited to tangible media any more, and can also be downloaded from a network directly and the like. Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A task processing method facing heterogeneous clusters is characterized by comprising the following steps:

receiving a task request and a basic execution environment; the basic execution environment is constructed based on a preset heterogeneous high-performance library of basic hardware equipment types, wherein the set identification of the basic execution environment is stored in a shared memory;

scheduling heterogeneous equipment according to the task request;

compiling the basic execution environment into an execution environment corresponding to the scheduled heterogeneous equipment, and deploying the basic execution environment to the scheduled heterogeneous equipment;

triggering the scheduled heterogeneous devices to execute the task request; wherein the content of the first and second substances,

and switching a preset heterogeneous high-performance library of the basic hardware device type into a heterogeneous high-performance library corresponding to the scheduled heterogeneous device type according to the type of the heterogeneous device scheduled for the task request, and generating an execution environment corresponding to the scheduled heterogeneous device.

2. The method of claim 1,

the task request includes: the identification of the basic execution environment, the configuration information of the task and the data information of the task.

3. The method of claim 1, wherein scheduling a heterogeneous device according to the task request comprises:

4. The method of claim 2, wherein the triggering the scheduled heterogeneous device to execute the task request comprises:

5. A task processing apparatus oriented to a heterogeneous cluster, comprising:

the receiving module is used for receiving the task request and the basic execution environment; the basic execution environment is constructed based on a preset heterogeneous high-performance library of basic hardware equipment types, wherein the set identification of the basic execution environment is stored in a shared memory;

the deployment module is used for compiling the basic execution environment into an execution environment corresponding to the scheduled heterogeneous equipment and deploying the basic execution environment to the scheduled heterogeneous equipment;

the execution module is used for triggering the scheduled heterogeneous equipment to execute the task request; wherein the content of the first and second substances,

the deployment module is specifically configured to:

and switching the heterogeneous high-performance library of the specific hardware equipment type into a heterogeneous high-performance library corresponding to the scheduled heterogeneous equipment type according to the type of the heterogeneous equipment scheduled for the task request, and generating an execution environment corresponding to the scheduled heterogeneous equipment.

6. The apparatus of claim 5,

7. The apparatus of claim 5, wherein the scheduling module is specifically configured to:

8. The apparatus of claim 6, wherein the execution module is specifically configured to:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of any one of claims 1 to 4.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1 to 4.