CN113835852A - Task data scheduling method and device - Google Patents

Task data scheduling method and device Download PDF

Info

Publication number
CN113835852A
CN113835852A CN202110991440.8A CN202110991440A CN113835852A CN 113835852 A CN113835852 A CN 113835852A CN 202110991440 A CN202110991440 A CN 202110991440A CN 113835852 A CN113835852 A CN 113835852A
Authority
CN
China
Prior art keywords
task data
execution unit
target
equipment
bus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110991440.8A
Other languages
Chinese (zh)
Other versions
CN113835852B (en
Inventor
刘长坤
郑�硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Medical Systems Co Ltd
Original Assignee
Neusoft Medical Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Medical Systems Co Ltd filed Critical Neusoft Medical Systems Co Ltd
Priority to CN202110991440.8A priority Critical patent/CN113835852B/en
Publication of CN113835852A publication Critical patent/CN113835852A/en
Application granted granted Critical
Publication of CN113835852B publication Critical patent/CN113835852B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/484Precedence

Abstract

The invention discloses a task data scheduling method and device, relates to the technical field of data processing, and mainly aims to solve the problems of low scheduling efficiency and poor accuracy of the conventional task data. The method comprises the following steps: acquiring task data to be scheduled, and determining target operation equipment based on the processing complexity of the task data; analyzing the device function type of the target operation device, and calling a device operation load marking sequence matched with the target operation device based on the device function type; and if the target execution unit matched with the task data is found to be not marked with the occupation identification based on the equipment operation load marking sequence, executing the task data based on the target execution unit and marking the occupation identification for the target execution unit.

Description

Task data scheduling method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for scheduling task data.
Background
In the process of processing computationally intensive tasks for a large number of data fields, such as scientific computing tasks including computational fluid dynamics, signal and image processing, deep learning, etc., the computing power and data transmission power of computing devices have become major processing bottlenecks. With the appearance of the technical bottleneck of a Central Processing Unit (CPU) as a core processing device, people start to transfer a large-scale calculation task to a graphics card (GPU) for processing, so that a plurality of different calculation devices such as the GPU are added on the basis of one CPU device, and the calculation building mode that the calculation devices work cooperatively, namely heterogeneous calculation, is realized through a proper scheduling method.
At present, the existing heterogeneous computing platform generally adopts a linear mode to execute scheduling processes of reading data, executing computation and writing data, and provides a scheduling method based on an instruction queue based on a GPU computing interface, but because a GPU driver controls and uses a plurality of command queues, the task adjustment is not facilitated by human, and when the computing load is unbalanced, the situation of computing resource waste exists, the requirement of efficient scheduling for different devices and different computing tasks cannot be met, and the scheduling efficiency and accuracy of data of different tasks are greatly reduced.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for scheduling task data, and mainly aims to solve the problems of low scheduling efficiency and poor accuracy of the existing task data.
According to an aspect of the present invention, there is provided a method for scheduling task data, including:
acquiring task data to be scheduled, and determining target operation equipment based on the processing complexity of the task data;
analyzing the device function type of the target operation device, and calling a device operation load marking sequence matched with the target operation device based on the device function type, wherein the device operation load marking sequence is used for representing the occupied conditions of different execution units, and the occupied execution units are marked by corresponding occupied marks;
and if the target execution unit matched with the task data is found to be not marked with the occupation identification based on the equipment operation load marking sequence, executing the task data based on the target execution unit and marking the occupation identification for the target execution unit.
Further, before the task data to be scheduled is acquired and the target running device is determined based on the processing complexity of the task data, the method further includes:
determining the device function type of each running device, and establishing an abstract model of the running device based on the device function type, wherein the abstract model comprises at least two execution units;
defining equipment operation idle state and bus operation idle state based on each execution unit in the abstract model;
and generating a device running load marking sequence based on the device running idle state and the bus running idle state, wherein the device running load marking sequence comprises the content of whether the device execution unit is marked with the occupation identifier or not and the content of whether the bus execution unit is marked with the occupation identifier or not.
Further, the device function types include a first device type and a second device type, and the establishing an abstract model of the running device based on the device function types includes:
determining the abstract number of hardware equipment corresponding to each running equipment belonging to the first equipment type and the second equipment type;
dividing the operating equipment to obtain an equipment execution unit comprising a storage function and a calculation function and a bus execution unit;
establishing a first device abstraction model and a second device abstraction model which are matched with the abstract number of the hardware devices on the basis of the device execution unit and the bus execution unit;
the first device type comprises a CPU, and the second device type comprises a GPU, an FPGA or a DSP.
Further, if the device running load marking sequence finds that the target execution unit matched with the task data is not marked with the occupation identifier, the method further includes, before executing the task data based on the target execution unit and marking the occupation identifier for the target execution unit:
if the device operation load marking sequence of the target operation device is called as a device operation load marking sequence matched with the first device type, judging whether all the calculation execution units in the device operation load marking sequence are in an occupied state, and calling a device operation load marking sequence matched with the second device type to search for the calculation execution units when all the calculation execution units are in the occupied state; or the like, or, alternatively,
if the device operation load marking sequence of the target operation device is called to be the second device type, whether all bus execution units and/or all storage execution units in the device operation load marking sequence are in an occupied state is judged, and when all bus execution units and/or all storage execution units are in the occupied state, the device operation load marking sequence matched with the second device type is called to search the bus execution units and/or the storage execution units.
Further, after generating the device operation load flag sequence based on the device operation idle state and the bus operation idle state, the method further includes:
when detecting that the device execution unit and/or the bus execution unit in the device operation load marking sequence execute task data, configuring an occupation identifier of the device execution unit and/or the bus execution unit in the device operation load marking sequence based on the task data.
Further, the determining a target running device based on the processing complexity of the task data comprises:
determining processing complexity based on data processing characteristics of the task data, wherein the data processing characteristics are used for representing at least one of time consumption, occupied space and processing mode of data processing of the task data;
and selecting a scheduling strategy according to the processing complexity of the task data, and determining target operation equipment according to the selected scheduling strategy, wherein the scheduling strategy comprises a static scheduling strategy and a dynamic scheduling strategy, the static scheduling strategy is a scheduling mode for determining the target operation equipment based on the task data by performing pre-analysis, and the dynamic scheduling strategy is a scheduling mode for determining the target operation equipment based on an occupied identifier in an equipment operation load mark sequence.
Further, the acquiring task data to be scheduled includes:
responding to a task data processing request, analyzing and splitting the task data, determining at least one item of task data after analyzing and splitting as task data to be scheduled, wherein the analyzing and splitting is used for representing the processing of analyzing and dividing the calculation process of the task data.
According to another aspect of the present invention, there is provided a task data scheduling apparatus, including:
the system comprises an acquisition module, a scheduling module and a scheduling module, wherein the acquisition module is used for acquiring task data to be scheduled and determining target operation equipment based on the processing complexity of the task data;
the calling module is used for analyzing the device function type of the target operation device and calling a device operation load marking sequence matched with the target operation device based on the device function type, wherein the device operation load marking sequence is used for representing the occupied conditions of different execution units, and the occupied execution units are marked by corresponding occupied identifiers;
and the execution module is used for executing the task data based on the target execution unit and marking the occupation identification for the target execution unit if the unmarked occupation identification of the target execution unit matched with the task data is found based on the equipment operation load marking sequence.
Further, the apparatus further comprises:
the device comprises an establishing module, a judging module and a judging module, wherein the establishing module is used for determining the device function type of each running device and establishing an abstract model of the running device based on the device function type, and the abstract model comprises at least two execution units;
the definition module is used for defining an equipment operation idle state and a bus operation idle state based on each execution unit in the abstract model;
and the generating module is used for generating a device running load marking sequence based on the device running idle state and the bus running idle state, wherein the device running load marking sequence comprises the content of whether the device execution unit is marked to occupy the identifier and the content of whether the bus execution unit is marked to occupy the identifier.
Further, the device function types include a first device type and a second device type, and the establishing module includes:
the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for determining the abstract number of hardware equipment corresponding to each running equipment belonging to a first equipment type and a second equipment type;
the dividing unit is used for dividing an execution area of the running equipment to obtain an equipment execution unit comprising a storage function and a calculation function and a bus execution unit;
and the establishing unit is used for establishing a first equipment abstraction model and a second equipment abstraction model which are matched with the abstract number of the hardware equipment based on the equipment executing unit and the bus executing unit.
Further, the generating module is specifically configured to, when it is detected that the device execution unit and/or the bus execution unit in the device operation load flag sequence execute task data, configure, based on the task data, an occupation identifier of the device execution unit and/or the bus execution unit in the device operation load flag sequence.
Further, the retrieval module comprises:
the first calling unit is used for judging whether all the calculation execution units in the equipment running load tag sequence are in an occupied state or not if the equipment running load tag sequence calling the target running equipment is an equipment running load tag sequence matched with the first equipment type, and calling an equipment running load tag sequence matched with the second equipment type to search for the calculation execution unit when all the calculation execution units are in the occupied state; or the like, or, alternatively,
and the second calling unit is used for judging whether all the bus execution units and/or all the storage execution units in the device operation load marking sequence are in an occupied state or not if the device operation load marking sequence calling the target operation device is of a second device type, and calling a device operation load marking sequence matched with the second device type to search the bus execution units and/or the storage execution units when all the bus execution units and/or all the storage execution units are in the occupied state.
Further, the obtaining module comprises:
the computing unit is used for determining processing complexity based on data processing characteristics of the task data, wherein the data processing characteristics are used for representing at least one of time consumption, occupied space and processing mode of the task data for data processing;
and the determining unit is used for selecting a scheduling strategy according to the processing complexity of the task data and determining the target operation equipment according to the selected scheduling strategy, wherein the scheduling strategy comprises a static scheduling strategy and a dynamic scheduling strategy, the static scheduling strategy is a scheduling mode for determining the target operation equipment based on the task data through pre-analysis, and the dynamic scheduling strategy is a scheduling mode for determining the target operation equipment based on an occupied identifier in an equipment operation load mark sequence.
Further, the obtaining module further comprises:
the splitting unit is used for responding to a task data processing request, analyzing and splitting the task data, determining at least one item of task data after the analysis and splitting processing as the task data to be scheduled, and the analysis and splitting processing is used for representing the processing of analyzing and splitting the calculation process of the task data.
According to another aspect of the present invention, there is provided a storage medium having at least one executable instruction stored therein, where the executable instruction causes a processor to perform an operation corresponding to the scheduling method of the task data.
According to still another aspect of the present invention, there is provided a terminal including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the scheduling method of the task data.
By the technical scheme, the technical scheme provided by the embodiment of the invention at least has the following advantages:
compared with the prior art, the task data to be scheduled are obtained, and the target operation equipment is determined based on the processing complexity of the task data; analyzing the device function type of the target operation device, and calling a device operation load marking sequence matched with the target operation device based on the device function type, wherein the device operation load marking sequence is used for representing the occupied conditions of different execution units, and the occupied execution units are marked by corresponding occupied marks; if the target execution unit matched with the task data is found to be not marked with the occupation identifier based on the equipment operation load marking sequence, the task data is executed based on the target execution unit, the occupation identifier is marked for the target execution unit, namely, the task data is distributed in a non-queue manner (the task can be distributed without waiting if the occupation condition of the equipment is detected), the continuity of the equipment execution task is ensured, the accurate task scheduling requirement when different running equipment is in a load state is greatly met, compared with the scheduling method of the instruction queue in the prior art, the problem of task scheduling resource waste under the condition of uneven calculation load is avoided, and therefore the scheduling accuracy, namely the scheduling efficiency of the task data is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flowchart illustrating a task data scheduling method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating another task data scheduling method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for scheduling task data according to another embodiment of the present invention;
FIG. 4 is a block diagram illustrating an apparatus for scheduling task data according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
An embodiment of the present invention provides a method for scheduling task data, as shown in fig. 1, the method includes:
101. task data to be scheduled are obtained, and target operation equipment is determined based on the processing complexity of the task data.
In the embodiment of the invention, after determining the task data required to be subjected to data processing, the heterogeneous computing platform serving as the current execution main body performs the scheduling method of the task data in the application, and heterogeneous computing is a computing mode for forming a system by using multi-functional units of different types of instruction sets and architectures to perform data processing on the task data. In the embodiment of the present invention, the heterogeneous computing platform as the current execution subject may include, but is not limited to, a cloud device, a server, a client, and other devices, and therefore, when performing heterogeneous computing, task data needs to be accurately scheduled to achieve the purpose of efficiently processing data. The task data is task content to be subjected to heterogeneous computation, and the task data may be task data that can be divided or task data that cannot be divided, and embodiments of the present invention are not particularly limited. The processing complexity is the complexity of data processing on the task data, and in the embodiment of the invention, the processing complexity can be determined based on the data size of the task data, the processing mode of the task data, the transmission time of the task data and other condition contents, and generally, the processing complexity can be classified into constant task complexity and variable task complexity, so that the target operation equipment is determined.
It should be noted that, in the embodiment of the present invention, in order to implement accurate scheduling of different task data, the heterogeneous computing platform as the current execution end is divided into multiple running devices, including but not limited to running devices such as a CPU, a GPU, a Programmable Gate array (fpga) (field Programmable Gate array), and a Digital Signal Processing (DSP). For example, a heterogeneous computing platform composed of a CPU + GPU may be partitioned into multiple execution devices, including a CPU1, a CPU2, a CPU3. + GPU1, a GPU2, and a GPU3.. so as to determine one target execution device from the multiple execution devices based on processing complexity.
102. Analyzing the device function type of the target operation device, and calling a device operation load marking sequence matched with the target operation device based on the device function type.
In the embodiment of the present invention, since the data processing on the task data is performed based on the target execution device, after the target execution device is determined, the device function type needs to be parsed, that is, the device function type is used to indicate the type of the data processing function performed by the device, and includes a first device type (for example, a CPU) and a second device type (for example, a GPU, an FPGA, or a DSP), so as to invoke a device execution load flag sequence matching the device function type. The device operation load flag sequence is used to represent the situation that different execution units are occupied, and may be generated and stored in a form of a table, where a position occupied by an occupied execution unit in the device operation load flag sequence is marked by a corresponding occupation identifier, and the execution unit is a functional unit for processing task data by execution devices of different device function types, and includes at least two execution units, such as a bus execution unit and a device execution unit, where the device execution unit includes, but is not limited to, a computation execution unit, a storage execution unit (an input data storage execution unit, an output data storage execution unit), and the like. Because the device operation load marking sequence records whether different execution units mark occupation marks generated by executing the task data, the execution unit of the target operation device capable of executing the task data can be determined.
It should be noted that, because each target operation device relies on hardware and a bus to perform data transmission and data processing functions, the target operation devices of different device function types all include a bus, and thus cooperate with a CPU, a GPU, and the like to complete data transmission and data processing. Therefore, the device operation load flag sequence also records an occupation identifier as to whether the bus of the execution unit is occupied by the task data.
103. And if the target execution unit matched with the task data is found to be not marked with the occupation identification based on the equipment operation load marking sequence, executing the task data based on the target execution unit and marking the occupation identification for the target execution unit.
In the embodiment of the present invention, since the occupation identifier of each task data for whether different execution devices are occupied is recorded in the device operation load flag sequence, a target execution unit matched with the task data is first determined, where the target execution unit is an execution unit in the target operation device, such as a computing unit and a storage unit, and then it is determined whether the target execution unit is marked with the occupation identifier, if the occupation identifier is marked, the target execution unit is occupied, and if the occupation identifier is not marked, the target execution unit is idle. Specifically, when the target execution unit does not mark the occupation identifier, the task data may be executed based on the target execution unit, and scheduling of the task data is completed, that is, as long as the task execution condition is satisfied, the execution unit may be arranged with tasks, instead of mechanically waiting in sequence, to implement a task allocation mechanism that is not like a queue, and achieve a task scheduling mode that is executable and allocated, thereby greatly improving the utilization rate and task processing efficiency of the device, and meanwhile, in order to ensure that other task data will not perform task processing based on the target execution unit, the task data is executed based on the target execution unit, and the occupation identifier is marked for the target execution unit, so that when other task data is scheduled, the target execution unit is indicated to be occupied.
In an embodiment of the present invention, for further limitation and description, as shown in fig. 2, before step 101 acquires task data to be scheduled, and determines a target running device based on a processing complexity of the task data, the method further includes: 201. determining the equipment function type of each running equipment, and establishing an abstract model of the running equipment based on the equipment function type; 202. defining equipment operation idle state and bus operation idle state based on each execution unit in the abstract model; 203. generating a device operational load flag sequence based on the device operational idle state and the bus operational idle state.
In the embodiment of the present invention, after the task data is determined, the target running device needs to be determined according to the processing complexity, and the target running device includes one of the plurality of execution units in the running devices with different device function types, so that in order to implement independence of the execution units to accurately find the execution object capable of performing the task data, the abstract model of the running device is established based on the device function type. For example, establishing an abstract model for a CPU device type as a first device type includes a1, a 2.., and typically, a plurality of multi-core CPUs may be used as one device, so that only one a1 abstract model is established, and establishing an abstract model for a second device type includes b1, b 2.. Furthermore, the abstract model is divided into execution units, where the abstract model includes at least two types of execution units, such as at least a bus execution unit and a device execution unit, and the device execution unit includes, but is not limited to, a computation execution unit, a storage execution unit (input data storage execution unit, output data storage execution unit), and the like. For the operating device of the non-first device type, an input storage unit, an output storage unit, and the like of the operating device may be further divided, and the embodiment of the present invention is not particularly limited. For example, with the device GPUb: b1 constructs an abstract model, and divides 1 st and 2 nd computation units into b1-Cmpt1, b1-Cmpt2,. b1-Cmptn, and divides 1 st and 2 nd storage regions of Input data into b1-Input1, b1-Input2,. b 1-Input. The 1 st, 2 nd and p storage areas of the Output data are marked as b1-Output1, b1-Output2, and b1-Output, wherein the processes of inputting and outputting the data need to occupy a bus, and the calculation process needs to occupy a calculation execution unit of the equipment.
In addition, since the abstract model is an abstract structure for dividing execution units into execution units for each running device, after the abstract model is constructed, a device running idle state and a bus running idle state are defined for each execution unit, such as the device running idle state shown in table 1 and the bus running idle state shown in table 2, where 0 is represented as an idle state, so as to generate a device running load flag sequence based on the device running idle state and the bus running idle state, that is, a mapping relationship indicating that different execution units are not occupied by task data and are in idle states, so that the device running load flag sequence includes contents of whether the device execution units are marked with occupation identifiers and contents of whether the bus execution units are marked with occupation identifiers.
TABLE 1
Figure BDA0003232490940000101
TABLE 2
Bus input output Direction Buffer cache area
GPU:b1 0 -
GPU:b2 0 -
In table 2, the device operation load flag sequence is a list constructed in the form of an unsigned integer matrix, each row corresponds to a GPU operation device, and if a column element in the bus input/output Direction of the bus execution unit is 0, it may indicate that the bus device is in an idle state. If the column element in the bus input/output Direction of the bus execution unit is 1, it indicates that the current bus is inputting data. The element is 2, indicating that the current bus is outputting data. The element in the Buffer column indicates the sequence number of the execution unit where data is currently being Input or output, for example, in the execution unit Input1, the record is 1, and in the execution unit Input2, the record is 2, which specifically matches the number of the divided specific execution units. Where the Buffer column number is meaningless when the bus is idle. When the device operation load flag sequence indicates that a certain computing operation unit, a data input operation unit, or a data output operation unit of a certain operation device is occupied, the execution unit or area cannot be used again. When an execution device occupies the bus, the execution device cannot request the bus again to execute. When the accumulated value of the bus occupied by each device is larger than or equal to the bus capacity, each device can not apply for the bus execution any more. When one task data needs to be scheduled, the scheduling system of the current execution main body determines the processing complexity according to the property of the task data and the like, selects available execution equipment, usually has execution equipment with priority, and uses other types of equipment when the execution equipment is not available.
It should be noted that, during reading task data, the corresponding execution unit corresponding to the read task data should be assigned with a task number. During the process of task data, the corresponding calculation execution unit and the execution unit corresponding to the read and write data are assigned with the task number. After the task data is processed, the execution unit corresponding to the corresponding write data is assigned as the task number until the data is written back to the main memory. When a resource in an execution unit is no longer in use, the corresponding execution unit should be assigned a value of 0. When the data transmission process is executed, the content in the corresponding column of the bus execution unit is assigned according to the transmission direction and the area number, and the value is set to 0 after the transmission is finished.
In an embodiment of the present invention, for further definition and explanation, the device function types include a first device type and a second device type, and the establishing an abstract model of the running device based on the device function types in step 201 specifically includes: determining the abstract number of hardware equipment corresponding to each running equipment belonging to the first equipment type and the second equipment type; dividing the operating equipment to obtain an equipment execution unit comprising a storage function and a calculation function and a bus execution unit; and establishing a first device abstraction model and a second device abstraction model which are matched with the abstract number of the hardware devices on the basis of the device execution unit and the bus execution unit.
In order to improve the scheduling accuracy, flexibility and efficiency of task data, when an abstract model is established, the number of the abstract hardware devices of a first device type and a second device type is abstracted, so that an abstract model with the matched number is established. The first device type is a CPU device type, and the second device type includes a GPU device type, an FPGA device type, a DSP device type, and other devices. Meanwhile, since the heterogeneous computing process accesses the input storage area and the output storage area, in order to avoid interfering with other processes of reading and writing data simultaneously, each operating device should at least divide two or more input storage areas and output storage areas to alternately perform reading and writing, and further, divide the execution areas in each operating device to obtain a device execution unit including a storage function and a computing function. The device execution unit includes a calculation execution unit and a storage execution unit (input data storage execution unit and output data storage execution unit), and the calculation execution unit may be divided into a plurality of units or may be one unit. The first equipment abstraction model is a CPU equipment abstraction model, and the second equipment abstraction model is other equipment abstraction models including a GPU equipment type, an FPGA equipment type, a DSP equipment type and the like.
For further definition and illustration, in an embodiment of the present invention, after the step 203 generates a device operation load flag sequence based on the device operation idle state and the bus operation idle state, the method further includes: when detecting that the device execution unit and/or the bus execution unit in the device operation load marking sequence execute task data, configuring an occupation identifier of the device execution unit and/or the bus execution unit in the device operation load marking sequence based on the task data.
In the embodiment of the present invention, in order to implement intelligence and accuracy of scheduling task data and avoid a situation that multiple tasks occupy the same execution unit, whether task data is executed by an equipment execution unit and/or a bus execution unit is detected in real time, if it is detected that the equipment execution unit and the bus execution unit execute the task data, an occupation identifier of each equipment execution unit and the bus execution unit in an equipment running load tag sequence is configured based on the task data, where the occupation identifier may be an identifier in any form.
It should be noted that, when the device running load flag sequence is initially generated, each device running and the bus running are both defined as an idle state, that is, when no calculation of any task data is performed, the corresponding calculation execution unit is defined as an idle state. And when a certain input or output storage execution unit is not used as the input or output of the calculation execution unit, the data input from the main memory GPU is not waited, and the data output to the main memory GPU is not waited, the corresponding storage area of the running equipment is in an idle state. The CPU device can directly access the main memory at any time, therefore, the input or output storage execution unit of the CPU device can be configured to be in an idle state at all times. In addition, since the bus bandwidth is limited, the computation process of any device may be idle or not idle at any time, but only a limited number of GPU devices can be configured to perform read or write processes simultaneously, which number is referred to as the bus capacity.
In an embodiment of the present invention, for further limitation and description, if it is found that the target execution unit matching the task data is not marked with the occupation identifier based on the device operation load marking sequence, before the task data is executed based on the target execution unit and the occupation identifier is marked for the target execution unit, the method further includes: if the device operation load marking sequence of the target operation device is called as a device operation load marking sequence matched with the first device type, judging whether all the calculation execution units in the device operation load marking sequence are in an occupied state, and calling a device operation load marking sequence matched with the second device type to search for the calculation execution units when all the calculation execution units are in the occupied state; or, if the device operation load flag sequence calling the target operation device is of the second device type, determining whether all bus execution units and/or all storage execution units in the device operation load flag sequence are in an occupied state, so that when all bus execution units and/or all storage execution units are in the occupied state, calling a device operation load flag sequence matched with the second device type to search for the bus execution units and/or the storage execution units.
In the embodiment of the present invention, it is further defined that the running devices include a first device type and a second device type, and therefore, after the device running load flag sequence matching the first device type is called based on the device function type, or the device running load flag sequence matching the second device type is called, scheduling is performed according to different device types.
Specifically, if the device operation load flag sequence of the calling target operation device is a device operation load flag sequence matched with the first device type (such as a CPU device type), it is indicated that the CPU device is preferentially scheduled and determined, and then it is determined whether a column corresponding to the computing execution unit in the device operation load flag sequence table is in an unoccupied state, that is, whether 0 indicating an idle state exists. If yes, an idle calculation execution unit can be selected in a random mode to carry out calculation occupation. Otherwise, if all columns corresponding to the computation execution units are marked as non-0, it indicates that the computation execution units of the CPU device are occupied by tasks, and the GPU device should be used instead, that is, the device running load marking sequence matching the second device type (e.g., GPU device type) is called to search for the computation execution units.
Specifically, if the device operation load flag sequence of the calling target operation device is of a second device type (e.g., GPU device type), it indicates that the GPU device is preferentially scheduled, and then it is determined whether all the bus execution units and/or all the storage execution units are in an occupied state, that is, if the capacity in the column corresponding to the bus execution unit is not full and there is an execution unit, it indicates that the bus is available and the input data execution unit is idle, one execution unit is selected in a random manner, and a device whose computation execution unit and output data area are not fully occupied is preferentially selected, so as to execute a data reading process according to the input data execution unit in an idle state. Otherwise, if the occupation identifier of the bus exists, it indicates that the bus is occupied, or no input data execution unit is available, the CPU device is used instead, that is, the device running load flag sequence matching the second device type (e.g., GPU device type) is called to search for the bus execution unit and/or the memory execution unit.
In the embodiment of the present invention, if the CPU device is preferentially scheduled to determine as the a flow and the GPU device is preferentially scheduled to determine as the b flow, in the process of scheduling and determining the b flow, the input data execution units of the GPU device are all occupied at the same time, and it is determined that the GPU device is blocked and the corresponding resource can be waited for to be free. Because the CPU device can usually directly obtain the task data, the data reading process and the data writing process of the CPU device do not need to be considered, and the problem of bus occupation does not need to be considered, when the task data in the heterogeneous computation is scheduled, the computation execution unit in the corresponding device operation load tag sequence table should be assigned as the task number, so that after the task data processing is finished, the task number serving as the placeholder is set to 0 again. To identify an idle state. Correspondingly, in the process of performing the scheduling judgment in the flow a, if the calculation execution units of the CPU device are all occupied at the same time, it is determined that the CPU device is blocked and the corresponding resource can be waited for to be free. After the data reading is completed, if an idle calculation execution unit and an idle output data execution unit exist on the CPU device, the calculation execution unit may be scheduled to perform calculation, and the calculation result may be stored in the output data execution unit. After the task data processing is completed, the current execution end can determine whether to write the data back to the main memory according to the condition of the bus occupation. When any execution unit cannot execute due to the occupied resources, the execution unit is determined to be blocked and waits until the resources are free.
In an embodiment of the present invention, for further definition and explanation, the step 102 of parsing the device function type of the target operating device, and invoking a device operation load flag sequence matching with the target operating device based on the device function type includes: if the calculation execution unit found by the equipment running load marking sequence matched with the second equipment type is in an occupied state, indicating the task data to enter a waiting operation; or, if the bus execution unit and/or the storage execution unit found by the device running load tag sequence matched with the second device type is in an occupied state, indicating that the task data enters a waiting operation.
In the embodiment of the present invention, in order to enable each execution unit to better perform processing on task data, in the process of allocating the computation execution unit, the bus execution unit, and the storage execution unit, if the computation execution unit found by the device operation load flag sequence matched with the second device type is in an occupied state, it indicates that there is no computation execution unit that can be executed, and therefore, the current execution main body may instruct the task data to enter a waiting operation. Correspondingly, if the bus execution unit and/or the storage execution unit found by the device running load tag sequence matched with the second device type is in an occupied state, it indicates that there is no bus execution unit and/or storage execution unit that can be executed, and therefore, the current execution subject may instruct the task data to enter a waiting operation.
For further definition and explanation, in an embodiment of the present invention, as shown in fig. 3, the determining a target running device based on the processing complexity of the task data in step 101 includes: 1011. determining a processing complexity based on data processing characteristics of the task data; 1012. and selecting a scheduling strategy according to the processing complexity of the task data, and determining target operation equipment according to the selected scheduling strategy.
In order to accurately schedule the task data to perform data processing of the corresponding execution unit, the task data may be data content which can be split, and in the embodiment of the present invention, the processing complexity is determined based on the data processing characteristics of the task data, so as to select the scheduling policy based on the processing complexity. The data processing characteristics are used for representing at least one of time consumption, occupied space and processing mode of data processing of the task data so as to determine the change situation of task complexity, for example, the complexity of data transmission, and the transmission time can be estimated by dividing the transmitted data quantity by the bandwidth of a data bus to serve as the processing complexity; the complexity of data calculation can be estimated by calculating the amount of tasks, such as dividing the number of four arithmetic operations by the Flops index of the device, to obtain the calculation time as the processing complexity. Meanwhile, because the processing complexity can be classified into constant task complexity and variable task complexity, in the process of determining the specific method of the target operation equipment, firstly, a scheduling strategy is determined based on the processing complexity end, wherein the scheduling strategy comprises a static scheduling strategy and a dynamic scheduling strategy, so that the operation equipment with different equipment function types is determined based on different scheduling strategies. The static scheduling policy is a scheduling mode for determining a target operating device based on task data through pre-analysis, the dynamic scheduling policy is a scheduling mode for determining a target operating device based on an occupation identifier in a device operating load tag sequence, and different scheduling modes can be matched with operating devices of different device function types in combination with different processing complexities, and specifically, the scheduling mode can include 4 combination modes:
1. and if the task complexity is not changed, selecting a static scheduling strategy to determine the target operation equipment. The static policy is to analyze and determine a scheduling mode of the target operating device in advance, so that the target operating device can be determined and selected in advance by combining the task data with the properties, the operating state and the operating examples of each operating device; or, the task data is processed in the GPU: b1 is running on a device much less than on other devices, it may be predetermined that such task data is pinned to the GPU: b1, the embodiments of the present invention are not limited in particular.
2. And if the task complexity is not changed, selecting a dynamic scheduling strategy to determine the target operation equipment. The dynamic scheduling policy is to determine a scheduling manner of the target operating device based on the occupancy identifier in the device operation load tag sequence, so that before the scheduling process in the embodiment of the present invention is executed, the running devices with different task data random scheduling values may be used, and the running time may be counted.
3. And if the task complexity changes, selecting a static scheduling strategy to determine the target operation equipment. The static scheduling is a scheduling mode for analyzing and determining the target operating device in advance, so that historical processing information of task data of the same type can be collected in advance, a training machine learning model is completed in advance based on the historical processing information, the processing complexity of the current task data or the processing prediction result is predicted, and in actual operation, the prediction result is used for judging and selecting proper operating devices.
4. And if the task complexity changes, selecting a dynamic scheduling strategy to determine the target operation equipment. The dynamic scheduling policy is to determine a scheduling manner of the target operating device based on the occupancy identifier in the device operation load tag sequence, and meanwhile, since most task data does not change too severely in processing complexity among several consecutive task data, the current execution end may schedule the task data to different operating devices to estimate the operation efficiency thereof and provide a basis for selecting the operating device for other task data every time a stage passes.
It should be noted that, in the embodiment of the present invention, any one of the static scheduling policy and the dynamic scheduling policy may be selected according to the processing complexity of the task data, or the dynamic scheduling policy and the static scheduling policy may be combined to select the operating device that operates optimally for the task data as the target operating device, so as to avoid scheduling the task data on the non-optimal device. For example, after determining the dynamic scheduling policy according to the processing complexity, the target execution devices 1, 3, and 5 may be selected, and the target execution devices 1, 3, and 5 may be screened again in combination with the static complexity to obtain only one or the most optimal ones for scheduling, which is not limited in the embodiment of the present invention.
In an embodiment of the present invention, for further limitation and description, the acquiring task data to be scheduled in step 101 specifically includes: responding to the task data processing request, analyzing and splitting the task data, and determining at least one item of task data after analyzing and splitting as the task data to be scheduled.
In order to implement accurate processing of task data and improve scheduling accuracy and effectiveness, when a task data processing request is received, the task data is analyzed and split, the analysis and splitting processing is used for representing the processing of analyzing and splitting a calculation process of the task data, that is, the types of different task data can be analyzed and split, for example, the image task data can be split into image data blocks, the video task data can be split into frames of images, and the fluid grid task data can be split into regional data, and the embodiment of the invention is not particularly limited.
It should be noted that, after the task data is split, each split sub-task data may be scheduled as a single task data, so that when each running device is in an idle state, the task data is dynamically scheduled to the corresponding running device. Meanwhile, for task data, it is necessary to preferentially provide a compilable code suitable for each system executing scheduling, so as to perform efficient splitting.
In one application scenario in the embodiments of the present invention, based on a GPU: b1, GPU: b2, each execution unit in the abstract model defines a device operation idle state and a bus operation idle state, and generates the GPU based on the device operation idle state and the bus operation idle state: b1 and GPU: b2, running a load tag sequence table, such as the initial state table shown in table 3, on the device, where the initial state table includes: b1 and GPU: b2 corresponding to the input memory execution units 1, 2 and output memory execution units 1, 2 and computation execution units, respectively, and further including bus execution units Direction, Buffer.
TABLE 3
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 0 0 0 0 0 0 -
GPU:b2 0 0 0 0 0 0 -
When the task data 1 and the task data 2 are respectively transmitted to the GPU by the main memory CPUa 1: b1, GPU: b2 Input1 execution unit, GPU: b1 and GPU: the elements in the Input1 execution unit at b2 are configured as placeholder identification task 1 sequence number and task 2 sequence number, respectively, as shown in table 4, while the bus execution unit is occupied, configured as 1, respectively.
TABLE 4
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 1 0 0 0 0 1 1
GPU:b2 2 0 0 0 0 1 1
The data transmission of the task data 1 is completed, the task data enters the computing execution unit Cmpt1, and the result is stored in the Output storage execution unit Output 1. Meanwhile, the task data 3 is processed by the main memory CPU: a1 to GPU: b1, the configuration placeholder is stored in the Input storage execution unit Input2 as shown in table 5, and the element in the bus execution unit Buffer records the execution unit sequence number 2 where the data is currently being Input or output.
TABLE 5
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 1 3 1 1 0 1 2
GPU:b2 2 0 0 0 0 1 1
The data transmission of the task data 2 is completed, the task data enters the computing execution unit Cmpt1, and the result is stored in the Output storage execution unit Output 1. Meanwhile, the task data 4 is processed by the main memory CPU: a1 to GPU: b2, the configuration placeholder is stored in the Input storage execution unit Input2 as shown in table 6, and the element in the bus execution unit Buffer records the execution unit sequence number 2 where the data is currently being Input or output.
TABLE 6
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 1 3 1 1 0 1 2
GPU:b2 2 4 2 2 0 1 2
Task data 2 computation is complete. After the data transmission of the task data 4 is completed, the task data enters the calculation execution unit, and the result is stored in the Output storage execution unit Output 2. Task data 5 is generated by the main memory CPU: a1 to GPU: the Input of b2 stores the configuration placeholder in execution unit Input2 as shown in table 7.
TABLE 7
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 1 3 1 1 0 1 2
GPU:b2 5 4 4 2 4 1 1
The data transfer of the task data 5 is completed. The calculation result of the task data 2 starts to be output. The calculation of the task data 1 is completed. Task data 3 enters the compute execution unit and stores the result in the Output2 area. Task 6 is performed by the main memory CPU: a1 to GPU: the Input of b1 stores the configuration placeholder in execution unit Input1 as shown in table 8.
TABLE 8
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 6 3 6 1 6 1 6
GPU:b2 5 4 4 2 4 6 1
Task data 3, 4 are calculated. Task 2 transfer is complete. The task data 5 enters the computation execution unit and stores the result in the data Output execution unit Output 1. Task data 7 is generated by the main memory CPU: a1 to GPU: the Input of b2 stores the configuration placeholder in execution unit Input2 as shown in table 9.
TABLE 9
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 6 0 0 1 3 1 1
GPU:b2 5 7 5 5 4 1 2
The task data 6 transmission is completed. The results of the calculation of task data 1 begin to be output, as shown by the configuration placeholders shown in Table 10.
Watch 10
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 6 0 0 1 3 2 1
GPU:b2 5 7 5 5 4 1 2
The task data 1, 7 transmission is completed. The task data 6 enters the compute execution unit and stores the result in the data store execution unit Output 1. The results of the calculations for the task data 3, 4 begin to be output, as shown by the configuration placeholders shown in Table 11.
TABLE 11
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 6 0 0 6 3 2 2
GPU:b2 5 7 5 5 4 2 2
The task data 5, 6 is computed complete, as is the configuration placeholder shown in table 12.
TABLE 12
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 0 0 0 6 3 2 2
GPU:b2 0 7 0 5 4 2 2
The transmission of the task data 3, 4 is completed and the settlement results of the task data 5, 6 are output. The task data 7 enters the execution unit and stores the calculation result in the Output unit Output2, such as the configuration placeholder shown in table 13.
Watch 13
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 0 0 0 6 0 2 1
GPU:b2 0 7 7 5 7 2 1
The transmission of the task data 5 and 6 is completed, the calculation of the task data 7 is completed, and the calculation result starts to be output, such as the configuration placeholder shown in table 14.
TABLE 14
Figure BDA0003232490940000201
Figure BDA0003232490940000211
The task data 7 transmission is complete and the computation is complete, as shown by the configuration placeholder shown in table 15. When the task data is accessed again, whether an execution unit in an idle state exists is determined according to the occupation identifier recorded in the device operation load mark sequence table so as to occupy the space, and the occupied execution unit is configured with the corresponding occupation identifier.
Watch 15
Input1 Input2 Cmpt1 Output1 Output2 Direction Buffer
GPU:b1 0 0 0 6 0 0 -
GPU:b2 0 0 0 0 0 0 -
Compared with the prior art, the embodiment of the invention provides a task data scheduling method, and the task data to be scheduled are acquired, and the target operation equipment is determined based on the processing complexity of the task data; analyzing the device function type of the target operation device, and calling a device operation load marking sequence matched with the target operation device based on the device function type, wherein the device operation load marking sequence is used for representing the occupied conditions of different execution units, and the occupied execution units are marked by corresponding occupied marks; if the target execution unit matched with the task data is found to be not marked with the occupation identification based on the equipment operation load marking sequence, the task data is executed based on the target execution unit, and the occupation identification is marked for the target execution unit, so that the task accurate scheduling requirement when different operation equipment is in a load state is greatly met, the problem of task scheduling resource waste under the condition of uneven calculation load is avoided, and the scheduling accuracy, namely the scheduling efficiency of the task data is improved.
Further, as an implementation of the method shown in fig. 1, an embodiment of the present invention provides a task data scheduling apparatus, and as shown in fig. 4, the apparatus includes:
an obtaining module 31, configured to obtain task data to be scheduled, and determine a target running device based on processing complexity of the task data;
the invoking module 32 is configured to analyze the device function type of the target running device, and invoke a device running load flag sequence matched with the target running device based on the device function type, where the device running load flag sequence is used to represent occupied situations of different execution units, where an occupied execution unit is marked by a corresponding occupied identifier;
and the executing module 33 is configured to execute the task data based on the target executing unit and mark the occupation identifier for the target executing unit if the unmarked occupation identifier of the target executing unit matched with the task data is found based on the device operation load marking sequence.
Further, the apparatus further comprises:
the device comprises an establishing module, a judging module and a judging module, wherein the establishing module is used for determining the device function type of each running device and establishing an abstract model of the running device based on the device function type, and the abstract model comprises at least two execution units;
the definition module is used for defining an equipment operation idle state and a bus operation idle state based on each execution unit in the abstract model;
and the generating module is used for generating a device running load marking sequence based on the device running idle state and the bus running idle state, wherein the device running load marking sequence comprises the content of whether the device execution unit is marked to occupy the identifier and the content of whether the bus execution unit is marked to occupy the identifier.
Further, the device function types include a first device type and a second device type, and the establishing module includes:
the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for determining the abstract number of hardware equipment corresponding to each running equipment belonging to a first equipment type and a second equipment type;
the dividing unit is used for dividing an execution area of the running equipment to obtain an equipment execution unit comprising a storage function and a calculation function and a bus execution unit;
and the establishing unit is used for establishing a first equipment abstraction model and a second equipment abstraction model which are matched with the abstract number of the hardware equipment based on the equipment executing unit and the bus executing unit.
Further, the generating module is specifically configured to, when it is detected that the device execution unit and/or the bus execution unit in the device operation load flag sequence execute task data, configure, based on the task data, an occupation identifier of the device execution unit and/or the bus execution unit in the device operation load flag sequence.
Further, the retrieval module comprises:
the first calling unit is used for judging whether all the calculation execution units in the equipment running load tag sequence are in an occupied state or not if the equipment running load tag sequence calling the target running equipment is an equipment running load tag sequence matched with the first equipment type, and calling an equipment running load tag sequence matched with the second equipment type to search for the calculation execution unit when all the calculation execution units are in the occupied state; or the like, or, alternatively,
and the second calling unit is used for judging whether all the bus execution units and/or all the storage execution units in the device operation load marking sequence are in an occupied state or not if the device operation load marking sequence calling the target operation device is of a second device type, and calling a device operation load marking sequence matched with the second device type to search the bus execution units and/or the storage execution units when all the bus execution units and/or all the storage execution units are in the occupied state.
Further, the obtaining module comprises:
the computing unit is used for determining processing complexity based on data processing characteristics of the task data, wherein the data processing characteristics are used for representing at least one of time consumption, occupied space and processing mode of the task data for data processing;
and the determining unit is used for selecting a scheduling strategy according to the processing complexity of the task data and determining the target operation equipment according to the selected scheduling strategy, wherein the scheduling strategy comprises a static scheduling strategy and a dynamic scheduling strategy, the static scheduling strategy is a scheduling mode for determining the target operation equipment based on the task data through pre-analysis, and the dynamic scheduling strategy is a scheduling mode for determining the target operation equipment based on an occupied identifier in an equipment operation load mark sequence.
Further, the obtaining module further comprises:
the splitting unit is used for responding to a task data processing request, analyzing and splitting the task data, determining at least one item of task data after the analysis and splitting processing as the task data to be scheduled, and the analysis and splitting processing is used for representing the processing of analyzing and splitting the calculation process of the task data.
Compared with the prior art, the embodiment of the invention provides a task data scheduling device, and the task data to be scheduled are acquired, and the target operation equipment is determined based on the processing complexity of the task data; analyzing the device function type of the target operation device, and calling a device operation load marking sequence matched with the target operation device based on the device function type, wherein the device operation load marking sequence is used for representing the occupied conditions of different execution units, and the occupied execution units are marked by corresponding occupied marks; if the target execution unit matched with the task data is found to be not marked with the occupation identification based on the equipment operation load marking sequence, the task data is executed based on the target execution unit, and the occupation identification is marked for the target execution unit, so that the task accurate scheduling requirement when different operation equipment is in a load state is greatly met, the problem of task scheduling resource waste under the condition of uneven calculation load is avoided, and the scheduling accuracy, namely the scheduling efficiency of the task data is improved.
According to an embodiment of the present invention, a storage medium is provided, and the storage medium stores at least one executable instruction, and the computer executable instruction can execute the scheduling method of the task data in any method embodiment.
Fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the terminal.
As shown in fig. 5, the terminal may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.
Wherein: the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408.
A communication interface 404 for communicating with network elements of other devices, such as clients or other servers.
The processor 402 is configured to execute the program 410, and may specifically perform relevant steps in the foregoing scheduling method embodiment of task data.
In particular, program 410 may include program code comprising computer operating instructions.
The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The terminal comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 410 may specifically be configured to cause the processor 402 to perform the following operations:
acquiring task data to be scheduled, and determining target operation equipment based on the processing complexity of the task data;
analyzing the device function type of the target operation device, and calling a device operation load marking sequence matched with the target operation device based on the device function type, wherein the device operation load marking sequence is used for representing the occupied conditions of different execution units, and the occupied execution units are marked by corresponding occupied marks;
and if the target execution unit matched with the task data is found to be not marked with the occupation identification based on the equipment operation load marking sequence, executing the task data based on the target execution unit and marking the occupation identification for the target execution unit.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for scheduling task data, comprising:
acquiring task data to be scheduled, and determining target operation equipment based on the processing complexity of the task data;
analyzing the device function type of the target operation device, and calling a device operation load marking sequence matched with the target operation device based on the device function type, wherein the device operation load marking sequence is used for representing the occupied conditions of different execution units, and the occupied execution units are marked by corresponding occupied marks;
and if the target execution unit matched with the task data is found to be not marked with the occupation identification based on the equipment operation load marking sequence, executing the task data based on the target execution unit and marking the occupation identification for the target execution unit.
2. The method of claim 1, wherein before the obtaining task data to be scheduled and determining a target running device based on processing complexity of the task data, the method further comprises:
determining the device function type of each running device, and establishing an abstract model of the running device based on the device function type, wherein the abstract model comprises at least two execution units;
defining an equipment operation idle state and a bus operation idle state based on each execution unit in the abstract model, wherein the execution unit for operating the equipment comprises a bus execution unit and an equipment execution unit;
and generating a device running load marking sequence based on the device running idle state and the bus running idle state, wherein the device running load marking sequence comprises the content of whether the device execution unit is marked with the occupation identifier or not and the content of whether the bus execution unit is marked with the occupation identifier or not.
3. The method of claim 2, wherein the device function types comprise a first device type and a second device type, and wherein building the abstract model of the running device based on the device function types comprises:
determining the abstract number of hardware equipment corresponding to each running equipment belonging to the first equipment type and the second equipment type;
dividing the operating equipment to obtain an equipment execution unit comprising a storage function and a calculation function and a bus execution unit;
establishing a first device abstraction model and a second device abstraction model matching the abstracted number of hardware devices based on the device execution unit and the bus execution unit,
the first device type comprises a CPU, and the second device type comprises a GPU, an FPGA or a DSP.
4. The method according to claim 3, wherein if it is found that the target execution unit matching the task data is not marked with the occupation identifier based on the device running load marking sequence, before executing the task data based on the target execution unit and marking the occupation identifier for the target execution unit, the method further comprises:
if the device operation load marking sequence of the target operation device is called as a device operation load marking sequence matched with the first device type, judging whether all the calculation execution units in the device operation load marking sequence are in an occupied state, and calling a device operation load marking sequence matched with the second device type to search for the calculation execution units when all the calculation execution units are in the occupied state; or the like, or, alternatively,
if the device operation load marking sequence of the target operation device is called to be the second device type, whether all bus execution units and/or all storage execution units in the device operation load marking sequence are in an occupied state is judged, and when all bus execution units and/or all storage execution units are in the occupied state, the device operation load marking sequence matched with the first device type is called to search the bus execution units and/or the storage execution units.
5. The method of claim 2, wherein generating a device operational load flag sequence based on the device operational idle state and the bus operational idle state comprises:
when detecting that the device execution unit and/or the bus execution unit in the device operation load marking sequence execute task data, configuring an occupation identifier of the device execution unit and/or the bus execution unit in the device operation load marking sequence based on the task data.
6. The method of any of claims 1-5, wherein determining a target running device based on the processing complexity of the task data comprises:
determining processing complexity based on data processing characteristics of the task data, wherein the data processing characteristics are used for representing at least one of time consumption, occupied space and processing mode of data processing of the task data;
and selecting a scheduling strategy according to the processing complexity of the task data, and determining target operation equipment according to the selected scheduling strategy, wherein the scheduling strategy comprises a static scheduling strategy and a dynamic scheduling strategy, the static scheduling strategy is a scheduling mode for determining the target operation equipment based on the task data by performing pre-analysis, and the dynamic scheduling strategy is a scheduling mode for determining the target operation equipment based on an occupied identifier in an equipment operation load mark sequence.
7. The method according to any of claims 1-6, wherein said obtaining task data to be scheduled comprises:
responding to a task data processing request, analyzing and splitting the task data, determining at least one item of task data after analyzing and splitting as task data to be scheduled, wherein the analyzing and splitting is used for representing the processing of analyzing and dividing the calculation process of the task data.
8. A task data scheduling apparatus, comprising:
the system comprises an acquisition module, a scheduling module and a scheduling module, wherein the acquisition module is used for acquiring task data to be scheduled and determining target operation equipment based on the processing complexity of the task data;
the calling module is used for analyzing the device function type of the target operation device and calling a device operation load marking sequence matched with the target operation device based on the device function type, wherein the device operation load marking sequence is used for representing the occupied conditions of different execution units, and the occupied execution units are marked by corresponding occupied identifiers;
and the execution module is used for executing the task data based on the target execution unit and marking the occupation identification for the target execution unit if the unmarked occupation identification of the target execution unit matched with the task data is found based on the equipment operation load marking sequence.
9. A storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the scheduling method of task data according to any one of claims 1 to 6.
10. A terminal, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the scheduling method of the task data according to any one of claims 1-6.
CN202110991440.8A 2021-08-26 2021-08-26 Task data scheduling method and device Active CN113835852B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110991440.8A CN113835852B (en) 2021-08-26 2021-08-26 Task data scheduling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110991440.8A CN113835852B (en) 2021-08-26 2021-08-26 Task data scheduling method and device

Publications (2)

Publication Number Publication Date
CN113835852A true CN113835852A (en) 2021-12-24
CN113835852B CN113835852B (en) 2024-04-12

Family

ID=78961442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110991440.8A Active CN113835852B (en) 2021-08-26 2021-08-26 Task data scheduling method and device

Country Status (1)

Country Link
CN (1) CN113835852B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018076238A1 (en) * 2016-10-27 2018-05-03 华为技术有限公司 Heterogeneous system, computation task assignment method and device
CN110018893A (en) * 2019-03-12 2019-07-16 平安普惠企业管理有限公司 A kind of method for scheduling task and relevant device based on data processing
US20190227621A1 (en) * 2018-01-23 2019-07-25 Nec Corporation System management device
CN113127160A (en) * 2019-12-30 2021-07-16 阿里巴巴集团控股有限公司 Task scheduling method, system and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018076238A1 (en) * 2016-10-27 2018-05-03 华为技术有限公司 Heterogeneous system, computation task assignment method and device
US20190227621A1 (en) * 2018-01-23 2019-07-25 Nec Corporation System management device
CN110018893A (en) * 2019-03-12 2019-07-16 平安普惠企业管理有限公司 A kind of method for scheduling task and relevant device based on data processing
CN113127160A (en) * 2019-12-30 2021-07-16 阿里巴巴集团控股有限公司 Task scheduling method, system and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PEILUN DU,ET AL: "feature-aware task scheduling on CPU-FPGA heterogeneous platforms", 2019 IEEE 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS *
许明睿: "异构集群下分布式计算任务聚类调度算法研究", 信息科技, no. 2 *

Also Published As

Publication number Publication date
CN113835852B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
US10831562B2 (en) Method and system for operating a data center by reducing an amount of data to be processed
CN114741207B (en) GPU resource scheduling method and system based on multi-dimensional combination parallelism
CN106569892B (en) Resource scheduling method and equipment
CN111506434B (en) Task processing method and device and computer readable storage medium
CN111708639A (en) Task scheduling system and method, storage medium and electronic device
CN116263701A (en) Computing power network task scheduling method and device, computer equipment and storage medium
CN107977275B (en) Task processing method based on message queue and related equipment
CN113010286A (en) Parallel task scheduling method and device, computer equipment and storage medium
CN112181613A (en) Heterogeneous resource distributed computing platform batch task scheduling method and storage medium
CN116089477B (en) Distributed training method and system
CN114816777A (en) Command processing device, method, electronic device and computer readable storage medium
CN111913816A (en) Implementation method, device, terminal and medium for clusters in GPGPU (general purpose graphics processing unit)
CN113835852B (en) Task data scheduling method and device
CN116260876A (en) AI application scheduling method and device based on K8s and electronic equipment
CN112130977B (en) Task scheduling method, device, equipment and medium
CN114741166A (en) Distributed task processing method, distributed system and first equipment
CN114724103A (en) Neural network processing system, instruction generation method and device and electronic equipment
CN113835953A (en) Statistical method and device of job information, computer equipment and storage medium
CN112114967A (en) GPU resource reservation method based on service priority
CN110955461A (en) Processing method, device and system of computing task, server and storage medium
CN115292053B (en) CPU, GPU and NPU unified scheduling method of mobile terminal CNN
CN113377439A (en) Heterogeneous computing method and device, electronic equipment and storage medium
CN113282383B (en) Task scheduling method, task processing method and related products
CN114721834B (en) Resource allocation processing method, device, equipment, vehicle and medium
CN111782482B (en) Interface pressure testing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant