CN111756802A - Method and system for scheduling data stream tasks on NUMA platform - Google Patents

Method and system for scheduling data stream tasks on NUMA platform Download PDF

Info

Publication number
CN111756802A
CN111756802A CN202010456848.0A CN202010456848A CN111756802A CN 111756802 A CN111756802 A CN 111756802A CN 202010456848 A CN202010456848 A CN 202010456848A CN 111756802 A CN111756802 A CN 111756802A
Authority
CN
China
Prior art keywords
task
numa
data
node
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010456848.0A
Other languages
Chinese (zh)
Other versions
CN111756802B (en
Inventor
都政
沙士豪
温志伟
舒继武
罗秋明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202010456848.0A priority Critical patent/CN111756802B/en
Publication of CN111756802A publication Critical patent/CN111756802A/en
Application granted granted Critical
Publication of CN111756802B publication Critical patent/CN111756802B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/78Architectures of resource allocation
    • H04L47/783Distributed allocation of resources, e.g. bandwidth brokers

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Bus Control (AREA)

Abstract

The invention discloses a method and a system for scheduling data stream tasks on a NUMA platform, wherein the method comprises the following steps: marking the data flow graph and the data flow task according to the state of the data flow in the calculation operation; recording node information of the NUMA platform and memory access bandwidth of a memory between nodes; allocating an initial task of the data stream to an idle processor core of any NUMA node; and selecting the processor core with the minimum data transmission time cost from all the idle processor cores at present, and scheduling the newly ready task to the processor core with the minimum data transmission time cost for running. The invention provides a dynamic scheduling method according to the data storage characteristics of the NUMA platform and the characteristics of the node data stream, and the method ensures that the time consumed by data transmission between nodes is minimum and the overall calculation execution efficiency is improved by selecting a proper NUMA node as an operation node of a ready data stream task.

Description

Method and system for scheduling data stream tasks on NUMA platform
Technical Field
The invention relates to the technical field of multitask data processing, in particular to a method and a system for scheduling data stream tasks on a NUMA platform.
Background
At present, the development direction of processors is developed from the direction of simply improving the running speed of the processors to the direction of a multi-core processor, and large-scale distributed systems are more and more common. Conventionally, programming is performed by using a structure of executing commands sequentially, and in this mode, data is often static and is continuously accessed. Making programs not particularly well supported by multi-core processors and large distributed systems. And the data flow programming emphasizes that the data is used as driving power, and the connection operation of input and output is clearly defined. And a command mode is not adopted, and related operations are immediately executed when data are ready and input is effective, so that the data flow programming is parallel in nature and can be well operated in a multi-core processor and a large-scale distributed system. NUMA employs a distributed memory model, except that the processors in all nodes have access to all of the system physical memory. However, the time required for each processor to access memory within the node may be much less than the time it takes to access memory within some remote node in the prior art.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to solve the defect of low computational execution efficiency caused by long memory access time in a NUMA platform processor, thereby providing a method and a system for scheduling a base data stream task on a NUMA platform.
In order to achieve the purpose, the invention provides the following technical scheme:
in a first aspect, an embodiment of the present invention provides a method for scheduling a data stream task on a NUMA platform, including the following steps:
marking the data flow graph and the data flow task according to the state of the data flow in the calculation operation;
recording node information of the NUMA platform and memory access bandwidth of a memory between nodes;
allocating an initial task of the data stream to an idle processor core of any NUMA node;
and selecting the processor core with the minimum data transmission time cost from all the idle processor cores at present, and scheduling the newly ready task to the processor core with the minimum data transmission time cost for running.
In one embodiment, the data stream in a compute run state comprises: the completed task is marked as an F state; the running task is marked as an R state; a task is not prepared and marked as a U state; when the data flow starts to be calculated, an initial task is used as the input of the whole data flow graph, the initial task is in an R state, all the other data flow tasks are in a U state, along with the advance of the data flow calculation, the task in the R state is continuously operated and completed, and the input data of the subsequent data flow tasks are ready to enter the R state.
In one embodiment, scheduling is required each time a data flow task transitions from the U state to the R state.
In one embodiment, each task records its own estimated computation time and the required storage size Dsize ═ of the n input data (D)0,D1,...,Dn-1) The n values of the tasks are different, the n of the initial task is equal to 0, and the n values of the rest nodes are larger than 0.
In an embodiment, the step of recording node information of the NUMA platform and memory access bandwidth of a memory between nodes includes:
marking processor cores of a NUMA platform as Ci,jWherein i is used for representing the NUMA node number where the processor core is located, and j is used for representing the number given to the interior of the node by the processor core;
binding threads and data to different NUMA nodes to be tested to obtain a query access bandwidth;
communication matrix M for acquiring NUMA platform cross access bandwidth records with k nodescrossElement B in the communication matrixi,jThe method is used for recording the memory access bandwidth when the processor core on the NUMA node i accesses the memory on the node j, wherein the communication matrix
Figure BDA0002509534340000031
In an embodiment, its projected busy time T is recorded for several processor cores on each nodebusyT busy0 characterizes the current processor core as idle.
In one embodiment, the candidate processor Cc,kThe method for calculating the data transfer time cost includes:
according to NUMA node position set A of N predecessor tasks (N)0,N1,...,Nn-1) And the amount of data D required for the current tasksize=(D0,D1,...,Dn-1) The data transfer time cost is calculated as:
Figure BDA0002509534340000032
wherein D isiAmount of data required for the ith task, BC,NiAccessing node N for processor core on NUMA node CiMemory access bandwidth in memory.
In one embodiment, after all input data are copied to the corresponding processor core nodes according to the scheduling result, the predicted time T of the current processor core is updated according to the calculation duration of the data flow taskbusyWorkload time, which characterizes the current processor as busy.
In a second aspect, an embodiment of the present invention provides a system for scheduling a data stream task on a NUMA platform, including:
the task marking module is used for marking the data flow graph and the data flow task according to the state of the data flow in the calculation operation;
the node information and memory access bandwidth recording module is used for recording the node information of the NUMA platform and the memory access bandwidth of the memory between the nodes;
the initial task allocation module is used for allocating the initial tasks of the data stream to idle processor cores of any NUMA node;
and the task scheduling module is used for selecting the processor core with the minimum data transmission time cost from all the current idle processor cores and scheduling the newly ready task to the processor core with the minimum transmission time cost for operation.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions are configured to cause the computer to execute the method for scheduling a data stream task on a NUMA platform according to the first aspect of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer device, including: the scheduling method comprises a memory and a processor, wherein the memory and the processor are communicatively connected with each other, the memory stores computer instructions, and the processor executes the computer instructions to execute the scheduling method of the data stream task on the NUMA platform according to the first aspect of the embodiment of the present invention.
The technical scheme of the invention has the following advantages:
in order to complete the scheduling of the data flow task, firstly, a data flow graph and the data flow task are labeled according to the state of the data flow in the calculation operation process; then recording node information of the NUMA platform and memory access bandwidth of memories among the nodes; allocating an initial task of the data stream to an idle processor core of any NUMA node; and selecting the processor core with the minimum data transmission time cost from all the idle processor cores at present, and scheduling the newly ready task to the processor core with the minimum data transmission time cost for running. The invention provides a dynamic scheduling method by combining the characteristics of data flow according to the data storage characteristics of the NUMA platform, and the appropriate NUMA node is selected as the operation node of the ready data flow task, so that the time consumed for data transmission among the nodes is minimized, and the overall calculation execution efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a diagrammatic representation of the time and distance a processor spends accessing local and non-local memory in a NUMA framework in accordance with an embodiment of the present invention;
FIG. 2 is a flowchart of a specific example of a method for scheduling data stream tasks on a NUMA platform according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a dataflow graph in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a NUMA system having 18 cores in an embodiment of the invention;
FIG. 5 is a block diagram of a specific example of a scheduling system for dataflow tasks on a NUMA platform according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device in an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1
Data flow programming is a high performance parallel programming model that addresses the problem of efficiency utilization of multi-core processors. The data flow programming is obviously different from the traditional programming language, the data flow programming is executed in a data driving mode, data needing to be processed are distributed to each core, the data calculation and communication are separated, potential parallelism in a flow program is fully mined by using the parallel characteristic of software pipelining through task scheduling and distribution, and load balance among the cores is achieved. In the data flow example, a static example of a data flow program will be described as a directed graph according to its structure, in which nodes represent computing units, edges represent data transmission paths, data is transmitted between adjacent nodes through the edges, nodes consume data to perform computation, and output the generated data to an input-output sequence as the input of the next computing unit.
Non-uniform memory access (NUMA), an architectural model of how memory is accessed by multiple CPUs, is a computer memory design for multiple processors, with memory access time dependent on the memory location of the processor. In the NUMA architecture, a physical cpu (generally including multiple logical cpus or multiple cores) constitutes a node, and the node includes not only a cpu but also a group of memory slots, that is, a physical cpu and a block of memory constitute a node. Each cpu can access the memory under its own node, and can also access the memories of other nodes.
NUMA employs a distributed memory model, except that the processors in all nodes have access to all of the system physical memory. However, the time required for each processor to access memory within the node may be much less than the time it takes to access memory within some remote nodes. Under NUMA, a processor accesses its own local memory faster than non-local memory (memory from where to another processor or memory shared between processors), and the time it takes to access non-local memory is also related to the distance between nodes, as shown in FIG. 1, the closer the distance the less time it takes.
The embodiment of the present invention provides a method for scheduling a data stream task on a NUMA platform, as shown in fig. 2, when computing the data stream task by using a multiprocessor on the NUMA platform, aiming at a data storage characteristic of the NUMA platform, the method including:
step S1: and marking the data flow graph and the data flow task according to the state of the data flow in the calculation operation.
A complete data stream calculation is composed of a plurality of data stream tasks and data input, each data stream task needs to wait for a plurality of prepositive input data to be ready, and when the prepositive data of the task are ready, the data stream task is ready to be executed. When the data flow task is executed, new data can be output, and the data is used as the prepositive input data of the subsequent task.
In the embodiment of the present invention, in order to complete the scheduling of the data flow task, the data flow graph and the data flow task are labeled in the data flow calculation operation. Data flow tasks are labeled into three categories: the task is completed, and the type is marked as F; the type of the running task is represented as R; the task is not ready and the type is denoted as U. The initial computing run is in the U state, except for the initial task(s) which are in the R state. As dataflow computations advance, R tasks continue to run to completion, thereby making the input data of subsequent dataflow tasks ready to enter the ready R state. Each task records its own estimated computation time Workload and the required storage space size Dsize of n input data (D0, D1.., Dn-1), the n values of the tasks are different, n of the initial task is 0, and n of the rest nodes is more than 0. The initial task is ready by default as an input to the entire dataflow graph.
The dataflow diagram shown in fig. 3: each node represents a task and nodes A, B, C represent 3 tasks, respectively. The small squares represent data, passing between dataflow tasks. Once the input data for a task is ready, execution of the task begins. The task will be assigned to a computing unit, i.e. CPU, to execute. R/F/U respectively represents the state of the data flow task. As can be seen from FIG. 2, task C is ready with only one data difference generated by task A, changes its state from R to F when task A ends, and generates data for the subsequent task. At this time, all the pre-input data of the task C are prepared, the state is changed from U to R, and the task scheduling is started.
Step S2: and recording node information of the NUMA platform and memory access bandwidth of memories between the nodes.
In the embodiment of the invention, when the data stream calculation is operated on the NUMA platform, the data stream task is allocated to the NUMA node and executed by the CPU to which the node belongs after being ready, and due to the characteristics of the data stream calculation, data is transferred between different NUMA nodes.
In order to complete the scheduling of data stream tasks, the embodiment of the invention records node information of a NUMA platform and memory access bandwidth of a memory between nodes, and a processor core is marked as Ci, j, wherein i is used for representing the NUMA node number where the processor core is located, and j is used for representing the number given to the processor core and then the number given to the interior of the node.
Obtaining the cross access bandwidth by binding the thread and the data on different NUMA nodes for testing, and recording the cross access bandwidth as a matrix M for a NUMA system with k nodescrossAnd the internal element Bi, j is used for recording the access bandwidth when the processor core on the NUMA node i accesses the memory on the node j:
Figure BDA0002509534340000091
for several processor cores on each node, each records its predicted busy time TbusyWhen T isbusy0 means that the current processor core is idle.
As shown in fig. 4, putting into a NUMA platform has 9 nodes, each node having 2 processor cores, and memory space belonging to the node. A NUMA system having 18 cores is constructed. The node numbers are 0-8, and the processors Ci and j (i is 0-8 and j is 0-1). The node communication matrix provided by the existing general system can be represented by the following table, each value represents the access distance between different nodes, and the larger the distance is, the slower the data transmission between the nodes is.
Figure BDA0002509534340000101
The embodiment of the invention uses the bandwidth instead of the distance when calculating the priority, and according to the actual test or the hardware data thereof, the communication matrix of the cross-access bandwidth is shown as a table, each value of the cross-access bandwidth represents the access bandwidth between different nodes, and the larger the bandwidth is, the faster the data transmission between the nodes is, and the closer the distance is.
Figure BDA0002509534340000111
Step S3: the initial task of the data flow is assigned to the idle processor core of any NUMA node. I.e. the initial task is allocated to the state T of the current processor core busy0 in the processor core.
Step S4: and selecting the processor core with the minimum data transmission time cost from all the idle processor cores at present, and scheduling the newly ready task to the processor core with the minimum data transmission time cost for running.
Since the access bandwidth of data is different between different NUMA nodes (the farther the nodes are apart, the smaller the access bandwidth), different task scheduling methods will cause the time it takes for data to be transmitted between nodes to be different. According to the characteristics of the NUMA platform, the embodiment of the invention selects the proper NUMA node as the operation node of the ready data flow task, so that the time consumed for data transmission between the nodes is the least.
After the initial task is allocated to the idle processor core of any NUMA node, the newly ready task will be all idle (T) from the currentbusy0), selecting a core with the minimum data transmission cost TC (transfercost) value from the processor cores, and scheduling the core to run on the core.
The embodiment of the invention treats the selected processor Cc,kThe calculation method of the time cost TC of data transmission comprises: according to NUMA node position set A (N) of its N predecessor tasks0,N1,...,Nn-1)And the amount of data D required for the current tasksize(D0,D1,...,Dn-1) Calculating the time cost of data transmission
Figure BDA0002509534340000121
Wherein D isiAmount of data required for the ith task, BC,NiAccessing node N for processor core on NUMA node CiThe memory access bandwidth during the memory is the sum of the data volume required by the current task and the memory access bandwidth of the processor core at the NUMA node and the access node, and the data transmission time cost value is obtained. And then scheduling the newly ready task data to the processor core with the minimum time cost value for transmission, so that the time consumed for data transmission between the nodes is minimum, and the overall calculation execution efficiency is improved.
In embodiments of the present invention, scheduling is required each time a task transitions from the U state to the R state. According to the scheduling result, after all input data are copied to the local node (to accommodate the delay caused by the copied data), the estimated Tbusy or Workload time of the processor core is updated by the computing time Workload of the task, and the processor is indicated to be in a busy state.
According to the scheduling method of the data stream task on the NUMA platform, provided by the embodiment of the invention, a dynamic scheduling method is provided according to the data storage characteristics of the NUMA platform and by combining the characteristics of the data stream, and by selecting a proper NUMA node as an operation node of the ready data stream task, the time consumed for data transmission among the nodes is minimized, and the overall calculation execution efficiency is improved.
Example 2
An embodiment of the present invention provides a system for scheduling a data stream task on a NUMA platform, as shown in fig. 5, including:
the task marking module 1 is used for marking the data flow graph and the data flow task according to the state of the data flow in the calculation operation; this module executes the method described in step S1 in embodiment 1, and is not described herein again.
The node information and memory access bandwidth recording module 2 is used for recording the node information of the NUMA platform and the memory access bandwidth of the memory between the nodes; this module executes the method described in step S2 in embodiment 1, and is not described herein again.
An initial task allocation module 3, configured to allocate an initial task of a data stream to an idle processor core of any NUMA node; this module executes the method described in step S3 in embodiment 1, and is not described herein again.
And the task scheduling module 4 is used for selecting the processor core with the minimum data transmission time cost from all the idle processor cores at present, and scheduling the newly ready task to the processor core with the minimum transmission time cost for operation. This module executes the method described in step S4 in embodiment 1, and is not described herein again.
The scheduling system of the data stream task on the NUMA platform provided by the embodiment of the invention provides a dynamic scheduling method by combining the characteristics of the data stream according to the data storage characteristics of the NUMA platform, and the appropriate NUMA node is selected as the operation node of the ready data stream task, so that the time consumed by data transmission between the nodes is minimized, and the overall computation execution efficiency is improved.
Example 3
An embodiment of the present invention provides a computer device, as shown in fig. 6, the device may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or in another manner, and fig. 6 takes the connection by the bus as an example.
The processor 51 may be a Central Processing Unit (CPU). The Processor 51 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 52, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as the corresponding program instructions/modules in the embodiments of the present invention. The processor 51 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 52, namely, implementing the scheduling method of the data stream task on the NUMA platform in the above method embodiment.
The memory 52 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 51, and the like. Further, the memory 52 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 52 may optionally include memory located remotely from the processor 51, and these remote memories may be connected to the processor 51 via a network. Examples of such networks include, but are not limited to, the internet, intranets, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 52 and when executed by the processor 51, perform the scheduling method of dataflow tasks on a NUMA platform in embodiment 1.
The details of the computer device can be understood by referring to the corresponding related descriptions and effects in embodiment 1, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program that can be stored in a computer-readable storage medium and that when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims (11)

1. A method for scheduling data stream tasks on a NUMA platform is characterized by comprising the following steps:
marking the data flow graph and the data flow task according to the state of the data flow in the calculation operation;
recording node information of the NUMA platform and memory access bandwidth of a memory between nodes;
allocating an initial task of the data stream to an idle processor core of any NUMA node;
and selecting the processor core with the minimum data transmission time cost from all the idle processor cores at present, and scheduling the newly ready task to the processor core with the minimum data transmission time cost for running.
2. The method for scheduling data flow tasks on a NUMA platform according to claim 1, wherein the data flow in a compute running state comprises: the completed task is marked as an F state; the running task is marked as an R state; a task is not prepared and marked as a U state; when the data flow starts to be calculated, an initial task is used as the input of the whole data flow graph, the initial task is in an R state, all the other data flow tasks are in a U state, along with the advance of the data flow calculation, the task in the R state is continuously operated and completed, and the input data of the subsequent data flow tasks are ready to enter the R state.
3. The method of claim 2, wherein scheduling is required each time a dataflow task transitions from U state to R state.
4. A method for scheduling data flow tasks on a NUMA platform according to claim 1, wherein each task has its own estimated computation time and the required memory size Dsize ═ of n input data recorded (D)0,D1,...,Dn-1) The n values of the tasks are different, the n of the initial task is equal to 0, and the n values of the rest nodes are larger than 0.
5. The method for scheduling data stream tasks on a NUMA platform according to claim 4, wherein the step of recording node information of the NUMA platform and memory access bandwidth of a memory between nodes includes:
marking processor cores of a NUMA platform as Ci,jWherein i is used for representing the NUMA node number where the processor core is located, and j is used for representing the number given to the interior of the node by the processor core;
binding threads and data to different NUMA nodes to be tested to obtain a query access bandwidth;
communication matrix M for acquiring NUMA platform cross access bandwidth records with k nodescrossElement B in the communication matrixi,jThe method is used for recording the memory access bandwidth when the processor core on the NUMA node i accesses the memory on the node j, wherein the communication matrix
Figure FDA0002509534330000021
6. The method for scheduling data flow tasks on NUMA platform according to claim 5, wherein the predicted busy time T of each of the processor cores on each node is recordedbusy,Tbusy0 characterizes the current processor core as idle.
7. The method for scheduling data stream tasks on a NUMA platform as claimed in claim 5, wherein the candidate processor C isc,kThe method for calculating the data transfer time cost includes:
according to NUMA node position set A of N predecessor tasks (N)0,N1,...,Nn-1) And the amount of data D required for the current tasksize=(D0,D1,...,Dn-1) The data transfer time cost is calculated as:
Figure FDA0002509534330000031
where Di is the amount of data required by the ith task, BC,NiAccessing node N for processor core on NUMA node CiMemory access bandwidth in memory.
8. The method for scheduling data stream tasks on a NUMA platform as claimed in claim 7, wherein after all input data are copied to corresponding processor core nodes according to the scheduling result, the predicted time T of the current processor core is updated according to the calculated time length of the data stream taskbusyWorkload time, which characterizes the current processor as busy.
9. A system for scheduling data streaming tasks on a NUMA platform, comprising:
the task marking module is used for marking the data flow graph and the data flow task according to the state of the data flow in the calculation operation;
the node information and memory access bandwidth recording module is used for recording the node information of the NUMA platform and the memory access bandwidth of the memory between the nodes;
the initial task allocation module is used for allocating the initial tasks of the data stream to idle processor cores of any NUMA node;
and the task scheduling module is used for selecting the processor core with the minimum data transmission time cost from all the current idle processor cores and scheduling the newly ready task to the processor core with the minimum transmission time cost for operation.
10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of scheduling data flow tasks on a NUMA platform of any one of claims 1 to 8.
11. A computer device, comprising: a memory and a processor, the memory and the processor being communicatively coupled to each other, the memory storing computer instructions, the processor executing the computer instructions to perform the method of scheduling data flow tasks on a NUMA platform as recited in any one of claims 1 to 8.
CN202010456848.0A 2020-05-26 2020-05-26 Method and system for scheduling data stream tasks on NUMA platform Active CN111756802B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010456848.0A CN111756802B (en) 2020-05-26 2020-05-26 Method and system for scheduling data stream tasks on NUMA platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010456848.0A CN111756802B (en) 2020-05-26 2020-05-26 Method and system for scheduling data stream tasks on NUMA platform

Publications (2)

Publication Number Publication Date
CN111756802A true CN111756802A (en) 2020-10-09
CN111756802B CN111756802B (en) 2021-09-03

Family

ID=72674561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010456848.0A Active CN111756802B (en) 2020-05-26 2020-05-26 Method and system for scheduling data stream tasks on NUMA platform

Country Status (1)

Country Link
CN (1) CN111756802B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231099A (en) * 2020-10-14 2021-01-15 北京中科网威信息技术有限公司 Memory access method and device of processor
WO2023050712A1 (en) * 2021-09-30 2023-04-06 苏州浪潮智能科技有限公司 Task scheduling method for deep learning service, and related apparatus

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158927A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 EMS memory sharing system, device and method
CN102520915A (en) * 2011-11-25 2012-06-27 华为技术有限公司 Method and device for threading serial program in nonuniform memory access system
CN102831012A (en) * 2011-06-16 2012-12-19 日立(中国)研究开发有限公司 Task scheduling device and task scheduling method in multimode distributive system
CN103729248A (en) * 2012-10-16 2014-04-16 华为技术有限公司 Method and device for determining tasks to be migrated based on cache perception
CN105389211A (en) * 2015-10-22 2016-03-09 北京航空航天大学 Memory allocation method and delay perception-memory allocation apparatus suitable for memory access delay balance among multiple nodes in NUMA construction
CN105760220A (en) * 2016-01-29 2016-07-13 湖南大学 Task and data scheduling method and device based on hybrid memory
CN106095576A (en) * 2016-06-14 2016-11-09 上海交通大学 Under virtualization multi-core environment, nonuniformity I/O accesses resources of virtual machine moving method
US20180165120A1 (en) * 2015-08-26 2018-06-14 Netapp, Inc. Migration between cpu cores
CN109388490A (en) * 2017-08-07 2019-02-26 杭州华为数字技术有限公司 A kind of memory allocation method and server
US20190079805A1 (en) * 2017-09-08 2019-03-14 Fujitsu Limited Execution node selection method and information processing apparatus
CN109491785A (en) * 2018-10-24 2019-03-19 龙芯中科技术有限公司 Internal storage access dispatching method, device and equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158927A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 EMS memory sharing system, device and method
CN102831012A (en) * 2011-06-16 2012-12-19 日立(中国)研究开发有限公司 Task scheduling device and task scheduling method in multimode distributive system
CN102520915A (en) * 2011-11-25 2012-06-27 华为技术有限公司 Method and device for threading serial program in nonuniform memory access system
CN103729248A (en) * 2012-10-16 2014-04-16 华为技术有限公司 Method and device for determining tasks to be migrated based on cache perception
US20180165120A1 (en) * 2015-08-26 2018-06-14 Netapp, Inc. Migration between cpu cores
CN105389211A (en) * 2015-10-22 2016-03-09 北京航空航天大学 Memory allocation method and delay perception-memory allocation apparatus suitable for memory access delay balance among multiple nodes in NUMA construction
CN105760220A (en) * 2016-01-29 2016-07-13 湖南大学 Task and data scheduling method and device based on hybrid memory
CN106095576A (en) * 2016-06-14 2016-11-09 上海交通大学 Under virtualization multi-core environment, nonuniformity I/O accesses resources of virtual machine moving method
CN109388490A (en) * 2017-08-07 2019-02-26 杭州华为数字技术有限公司 A kind of memory allocation method and server
US20190079805A1 (en) * 2017-09-08 2019-03-14 Fujitsu Limited Execution node selection method and information processing apparatus
CN109491785A (en) * 2018-10-24 2019-03-19 龙芯中科技术有限公司 Internal storage access dispatching method, device and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231099A (en) * 2020-10-14 2021-01-15 北京中科网威信息技术有限公司 Memory access method and device of processor
WO2023050712A1 (en) * 2021-09-30 2023-04-06 苏州浪潮智能科技有限公司 Task scheduling method for deep learning service, and related apparatus

Also Published As

Publication number Publication date
CN111756802B (en) 2021-09-03

Similar Documents

Publication Publication Date Title
US9665404B2 (en) Optimization of map-reduce shuffle performance through shuffler I/O pipeline actions and planning
WO2021254135A1 (en) Task execution method and storage device
US10241880B2 (en) Efficient validation/verification of coherency and snoop filtering mechanisms in computing systems
CN113918101B (en) Method, system, equipment and storage medium for writing data cache
CN103970520A (en) Resource management method and device in MapReduce framework and framework system with device
CN111190735B (en) On-chip CPU/GPU pipelining calculation method based on Linux and computer system
CN111756802B (en) Method and system for scheduling data stream tasks on NUMA platform
EP3662376B1 (en) Reconfigurable cache architecture and methods for cache coherency
CN111309805B (en) Data reading and writing method and device for database
CN105988856B (en) Interpreter memory access optimization method and device
US20210255793A1 (en) System and method for managing conversion of low-locality data into high-locality data
WO2020008392A2 (en) Predicting execution time of memory bandwidth intensive batch jobs
CN105740249B (en) Processing method and system in parallel scheduling process of big data job
WO2023124304A1 (en) Chip cache system, data processing method, device, storage medium, and chip
CN108763421B (en) Data searching method and system based on logic circuit
CN115878333A (en) Method, device and equipment for judging consistency between process groups
US20220067872A1 (en) Graphics processing unit including delegator and operating method thereof
CN117093335A (en) Task scheduling method and device for distributed storage system
KR20220142059A (en) In-memory Decoding Cache and Its Management Scheme for Accelerating Deep Learning Batching Process
US20240211302A1 (en) Dynamic provisioning of portions of a data processing array for spatial and temporal sharing
US11989581B2 (en) Software managed memory hierarchy
US20230004855A1 (en) Co-operative and adaptive machine learning execution engines
EP3985507A1 (en) Electronic device and method with scheduling
CN116680296A (en) Large-scale graph data processing system based on single machine
JP2021117577A (en) Information processing device, information processing method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant