CN116048759A - Data processing method, device, computer and storage medium for data stream - Google Patents

Data processing method, device, computer and storage medium for data stream Download PDF

Info

Publication number
CN116048759A
CN116048759A CN202310031867.2A CN202310031867A CN116048759A CN 116048759 A CN116048759 A CN 116048759A CN 202310031867 A CN202310031867 A CN 202310031867A CN 116048759 A CN116048759 A CN 116048759A
Authority
CN
China
Prior art keywords
node
data
nodes
task
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310031867.2A
Other languages
Chinese (zh)
Inventor
王梅
李粤平
罗秋明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Polytechnic
Original Assignee
Shenzhen Polytechnic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Polytechnic filed Critical Shenzhen Polytechnic
Priority to CN202310031867.2A priority Critical patent/CN116048759A/en
Publication of CN116048759A publication Critical patent/CN116048759A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a data processing method of a data stream, which comprises the following steps: according to the embodiment of the invention, an operating system acquires a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connected with the nodes of the dependency relationship DAG graph represent the transmission quantity and the data traffic of the node task; distributing threads for node tasks according to the total amount of ready nodes, direct subsequent nodes and system threads in the dependency relationship DAG graph; and scheduling the threads according to the system load, the node task communication relationship and the traffic of the current processor. The embodiment of the invention provides a supporting method of the data stream execution mode on the operating system level, so that the efficiency and the optimizable space are greatly improved.

Description

Data processing method, device, computer and storage medium for data stream
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data processing method, a data processing device, a computer and a storage medium of a data stream.
Background
The development direction of the processor is from the direction of simply increasing the running speed of the processor to the direction of the multi-core processor, and the large-scale distributed system is also more and more popular. Programming has traditionally employed a structure of sequential execution commands, in which data is often "static" and the operations to access the data are continued. So that the program is not particularly good for support of multi-core processors and large distributed systems. While data flow programming emphasizes the use of data as driving power, defining well-defined input and output connection operations. Instead of commands, related operations are performed immediately whenever data is ready, i.e., input is valid, so that the data flow programming is essentially parallel and can well run on multi-core processors as well as large distributed systems.
In the current massively parallel application context, the data stream computation is superior to the existing mainstream control stream execution mode in both the programming mode and the execution mode. In the current processor environment, which is still a control flow, the data flow execution mode can be implemented at the application level, for example, the internal execution engine of the Tensorf ow processes execution of tasks in the data flow execution mode. There are also specialized databases (e.g., taskf low) that implement data stream execution modes in the context of existing control stream processors, control stream operating systems, and control stream programming languages.
However, since the operating system level is not supported, there are significant limitations in both efficiency and optimizable space.
Disclosure of Invention
To solve the above technical problem, an embodiment of the present invention provides a data processing method for a data stream, including:
the method comprises the steps that an operating system obtains a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connected with the nodes of the dependency relationship DAG graph represent the transmission quantity and the data quantity of the node task;
distributing threads for node tasks according to the total amount of ready nodes, direct subsequent nodes and system threads in the dependency relationship DAG graph;
and scheduling the threads according to the system load, the node task communication relationship and the traffic of the current processor.
Further, the allocating threads for the node tasks according to the ready node, the direct subsequent node and the total system thread in the dependency DAG graph includes:
and sequencing ready nodes of the node tasks in the dependency relationship DAG graph according to the number of edges from more to less, and distributing the node tasks to threads of the ready nodes which are positioned at the first place and are online.
Further, the scheduling the threads according to the system load, the node task communication relation and the traffic of the current processor comprises the following steps:
counting the task data volume in each processor to obtain the total task data volume on the corresponding processor;
performing pre-scheduling according to a preset scheduling algorithm to schedule tasks to each processor one by one;
calculating total delay estimation of all processors according to a pre-scheduling result and the total task amount;
and evaluating various pre-scheduling results, and binding threads of the ready node tasks on a pre-scheduling processor with the minimum total delay estimation for data processing.
Further, the computing a total delay estimate for all processors based on the pre-scheduling result and the total task amount comprises:
calculating the data transmission time estimation value edge of the node task on the dependency relationship DAG graph=the sum of the time when all input data are copied from the NUMA node where the predecessor node is located to the NUMA node where the processor is located;
acquiring total data capacity and final cache capacity share of node tasks on each processor core to obtain a ratio k of the total data capacity to the cache;
the total delay estimate td=ridge+total data capacity x k x is calculated for each processing core, where x is an empirical value.
Further, the nodes of the remaining unassigned threads are offline ready nodes and offline direct subsequent nodes, and further include:
and tracking the ready state of the online blocking node by an operating system according to the precedence dependence in the PCB, wherein the ready state of the offline direct subsequent node is tracked and supported by a user code or a user state running library.
A data processing apparatus for a data stream, comprising:
the system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connected with the nodes of the dependency relationship DAG graph represent the transmission quantity and the data quantity of the node task;
the processing module is used for distributing threads for node tasks according to the total amount of ready nodes, direct subsequent nodes and system threads in the dependency relationship DAG graph;
and the execution module is used for scheduling the threads according to the system load of the current processor, the node task communication relationship and the traffic.
Further, the processing module is further configured to sort ready nodes of the node tasks in the dependency relationship DAG graph according to the number of edges from more to less, and allocate the node tasks to threads of ready nodes that are located first and online.
Further, the execution module includes:
the first acquisition sub-module is used for counting the task data volume in each processor to obtain the total task data volume on the corresponding processor;
the first processing sub-module is used for carrying out pre-scheduling according to a preset scheduling algorithm to schedule tasks to each processor one by one;
a second processing sub-module for calculating a total delay estimate for all processors based on the pre-scheduling result and the total task amount;
and the first execution sub-module is used for evaluating various pre-scheduling results, and binding threads of the ready node tasks on a pre-scheduling processor with the minimum total delay estimation for data processing.
Further, the first execution submodule includes:
a second obtaining sub-module, configured to calculate a data transfer time estimate edge=sum of times when all input data is copied from a NUMA node where a predecessor node is located to a NUMA node where the processor is located for an edge of the node task on the dependency DAG graph;
the third obtaining submodule is used for obtaining the total data capacity of the node tasks on each processor core and the capacity share of the final cache to obtain the ratio k of the total data capacity to the cache;
and a second execution sub-module, configured to calculate a total delay estimate td=ridge+total data capacity λ for each processing core, where λ is an empirical value.
Further, the nodes of the remaining unassigned threads are offline ready nodes and offline direct subsequent nodes, and the execution module is further configured to track the ready state of the online blocking node according to the precedence dependence in the PCB, where the ready state of the offline direct subsequent nodes is tracked and supported by a user code or a user state runtime.
A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of a data processing method of a data stream as described above.
A storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of a data processing method of a data stream as described above.
According to the embodiment of the invention, an operating system acquires a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connected with the nodes of the dependency relationship DAG graph represent the transmission quantity and the data traffic of the node task; distributing threads for node tasks according to the total amount of ready nodes, direct subsequent nodes and system threads in the dependency relationship DAG graph; and scheduling the threads according to the system load, the node task communication relationship and the traffic of the current processor. The embodiment of the invention provides a supporting method of the data stream execution mode on the operating system level, so that the efficiency and the optimizable space are greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a data processing method of a data stream according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data flow according to an embodiment of the present invention;
FIG. 3 is a basic block diagram of a data processing apparatus for data flow according to an embodiment of the present invention;
fig. 4 is a basic structural block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to enable those skilled in the art to better understand the present invention, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present invention with reference to the accompanying drawings.
In some of the flows described in the specification and claims of the present invention and in the foregoing figures, a plurality of operations occurring in a particular order are included, but it should be understood that the operations may be performed out of order or performed in parallel, with the order of operations such as 101, 102, etc., being merely used to distinguish between the various operations, the order of the operations themselves not representing any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
Referring to fig. 1, fig. 1 shows a data processing method of a data stream according to an embodiment of the present invention, and as shown in fig. 1, the method specifically includes the following steps:
s1, an operating system acquires a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connected with the nodes of the dependency relationship DAG graph represent the transmission quantity and the data quantity of the node task;
data stream programming is a high performance parallel programming model that solves the problem of efficient utilization of multi-core processors. The data flow programming is obviously different from the traditional programming language, the data flow programming is executed in a data driving mode, the data to be processed is distributed to each core, the calculation and the communication of the data are separated, and the potential parallelism in the flow program is fully mined by utilizing the parallel characteristic of software flow through task scheduling and distribution, so that the load among the cores is balanced. In the data flow paradigm, a static instance of a data flow program is described in terms of its structure as a directed graph DAG. As shown in fig. 2, the nodes in the figure represent the calculation units, and the edges represent the data transmission paths. And transmitting data between adjacent nodes through edges, calculating node consumption data, and outputting the generated data to an input-output sequence as the input of a next calculation unit.
It should be noted that, in the embodiment of the present invention, the data flow task manages the whole data flow calculation task in a DAG directed acyclic graph manner. Executing data flow tasks by using threads as carriers, modifying information of a Process Control Block (PCB), adding fields for sequential dependency relationship among the data flow tasks, adding fields for recording the size (byte number) of data corresponding to an output edge, adding a field for recording the stack frame length in a stack required by the tasks, adding a data preparation condition count of a data flow task node, and adding a data flow task activation mark.
S2, distributing threads for node tasks according to the ready nodes, the direct subsequent nodes and the total system threads in the dependency relationship DAG graph;
specifically, step S2 is to sort the ready nodes of the node tasks in the dependency relationship DAG graph according to the number of edges from more to less, and allocate the node tasks to threads of the ready nodes that are located first and online.
The operating system counts the node tasks it is running for each processor core, sums the data size of the output edges required by these tasks and the stack frame length required by the tasks to obtain the total data capacity. And counting the total data capacity on each core for new task scheduling basis. Wherein counting node tasks of the currently processed data from the DAG directed graph terminal and counting total data capacity of executing the node tasks on all processor cores, comprises:
searching a current ready node associated with a target node where data in the current process are located from the DAG directed graph;
and step two, summing the data size of the edge between the target node and the current ready node and the required stack frame length to obtain the total data capacity of each processor core.
And S3, scheduling the threads according to the system load, the node task communication relationship and the traffic of the current processor.
Specifically, step S3 includes the steps of:
step one, counting the task data volume in each processor to obtain the total task data volume on the corresponding processor;
step two, performing pre-scheduling according to a preset scheduling algorithm to schedule tasks to each processor one by one;
step three, calculating total delay estimation of all processors according to a pre-scheduling result and the total task amount;
in practical application, the third step includes: calculating the data transmission time estimation value edge of the node task on the dependency relationship DAG graph=the sum of the time when all input data are copied from the NUMA node where the predecessor node is located to the NUMA node where the processor is located; acquiring total data capacity and final cache capacity share of node tasks on each processor core to obtain a ratio k of the total data capacity to the cache; the total delay estimate td=ridge+total data capacity x k x is calculated for each processing core, where x is an empirical value.
Where there are multiple cores, the total data capacity and last level cache capacity share of the node tasks on each processor core need to be split equally.
In one embodiment of the invention, the precursor relation and the corresponding data size, the difference of communication cost among the processor cores, the final cache capacity of each processor core, and the total data capacity of the sum of the data size of the output edge of the node task precursor node, the data size of the output edge of the node and the stack frame length of the node task are obtained from the control block PCB of the node task. Traversing all the processor cores to obtain the total data capacity and the final cache capacity of the node tasks on each processor core, and obtaining the ratio k of the total data capacity and the cache.
One embodiment of the present invention adds a construct member to the thread control block task_construct { } construct, e.g., in the Li nux kernel
pre-suc { i nt pre_count; number of predecessor nodes
structpre-suc presides [ ]; the pointer array i nt suc_count of the precursor node; number of successor nodes
struct suc-sucnos [ ]; array of pointer for/successor node
};
Adding a stack frame length required by the node task in the task_struct:
i nt frame_size; stack frame length required for the task of the node
Adding a data preparation condition of the node and a flag of whether to activate or not in the task_struct:
i nt data_ready_count; record of the number of readied predecessor data
i nt act i changed; when data_ready_count=pre_count, activate
In the operating system kernel, the data stream task data overhead on each CPU is increased:
i nt current_sIze [ CPUs ]; each CPU core counts alone
Current_sIze [ n ] records the sum of the data overhead of all data stream tasks on processor cores numbered n, including the sum of multiple output edge data, stack frame lengths.
Taking the data flow task shown in fig. 2 as an example:
after the task a finishes the calculation, the task_struct related information of the task c is updated: the number data_ready_count ready in the record predecessor data is incremented by one, and if data_ready_count=pre_count, the task is activated to activate=1. The same operation as described above is performed for the f task.
If the task c is activated at this time, the newly added data of the patent is utilized to realize the scheduling. Examples of possible scheduling schemes are as follows.
Assuming that the c-task is scheduled to processor i, then calculate: a data transfer time estimate edge on the DAG graph is calculated. And calculating the total capacity of the data on the processor core i and the data of the task c to obtain the ratio k of the total capacity of the data and the cache. Total delay estimate td=ridge+c total data capacity x k x, where x is an empirical value, obtained statistically (e.g., in 1 μs/kB). Traversing all the processor cores, completing the calculation one by one, selecting a core m with the lowest Td, and scheduling the task c to the core m for operation.
And step four, evaluating various pre-scheduling results, and binding threads of the ready node task on a pre-scheduling processor with the minimum total delay estimation for data processing.
In the embodiment of the invention, the nodes of the rest unassigned threads are offline ready nodes and offline direct subsequent nodes, wherein the ready state of the online blocking node is tracked by an operating system according to the precedence dependence in the PCB, and the ready state of the offline direct subsequent nodes is tracked and supported by a user code or a user state operation library.
According to the embodiment of the invention, an operating system acquires a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connected with the nodes of the dependency relationship DAG graph represent the transmission quantity and the data traffic of the node task; distributing threads for node tasks according to the total amount of ready nodes, direct subsequent nodes and system threads in the dependency relationship DAG graph; and scheduling the threads according to the system load, the node task communication relationship and the traffic of the current processor. The embodiment of the invention provides a supporting method of the data stream execution mode on the operating system level, so that the efficiency and the optimizable space are greatly improved.
As shown in fig. 3, in order to solve the above problem, an embodiment of the present invention further provides a data processing apparatus for data flow, including: the system comprises a module 2100, a processing module 2200 and an executing module 2300, wherein the module 2100 is used for acquiring a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connecting the nodes of the dependency relationship DAG graph represent the transmission quantity and the data quantity of the node task; a processing module 2200, configured to allocate threads for node tasks according to the total amount of ready nodes, direct subsequent nodes, and system threads in the dependency DAG graph; the execution module 2300 is configured to schedule the thread according to a system load, a node task communication relationship, and a traffic volume of the current processor.
In some embodiments, the processing module is further configured to sort the ready nodes of the node tasks in the dependency DAG graph according to the number of edges from more to less, and allocate the node tasks to threads of ready nodes that are located first and online.
In some embodiments, the execution module comprises: the first acquisition sub-module is used for counting the task data volume in each processor to obtain the total task data volume on the corresponding processor; the first processing sub-module is used for carrying out pre-scheduling according to a preset scheduling algorithm to schedule tasks to each processor one by one; a second processing sub-module for calculating a total delay estimate for all processors based on the pre-scheduling result and the total task amount; and the first execution sub-module is used for evaluating various pre-scheduling results, and binding threads of the ready node tasks on a pre-scheduling processor with the minimum total delay estimation for data processing.
In some embodiments, the first execution submodule includes: a second obtaining sub-module, configured to calculate a data transfer time estimate edge=sum of times when all input data is copied from a NUMA node where a predecessor node is located to a NUMA node where the processor is located for an edge of the node task on the dependency DAG graph; the third obtaining submodule is used for obtaining the total data capacity of the node tasks on each processor core and the capacity share of the final cache to obtain the ratio k of the total data capacity to the cache; and a second execution sub-module, configured to calculate a total delay estimate td=ridge+total data capacity λ for each processing core, where λ is an empirical value.
In some embodiments, the nodes of the remaining unassigned threads are offline ready nodes and offline direct subsequent nodes, and the execution module is further configured to track a ready state of an online blocking node according to a precedence dependency relationship in the PCB, where the ready state of the offline direct subsequent nodes is provided with a state tracking support by a user code or a user state runtime.
The data processing device of the data flow acquires a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connected with the nodes of the dependency relationship DAG graph represent the transmission quantity and the data traffic of the node task; distributing threads for node tasks according to the total amount of ready nodes, direct subsequent nodes and system threads in the dependency relationship DAG graph; and scheduling the threads according to the system load, the node task communication relationship and the traffic of the current processor. The embodiment of the invention provides a supporting method of the data stream execution mode on the operating system level, so that the efficiency and the optimizable space are greatly improved.
In order to solve the technical problems, the embodiment of the invention also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.
As shown in fig. 4, the internal structure of the computer device is schematically shown. As shown in fig. 4, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the computer device stores an operating system, a database, and computer readable instructions, where the database may store a control information sequence, and the computer readable instructions, when executed by the processor, may cause the processor to implement an image processing method. The processor of the computer device is used to provide computing and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform an image processing method. The network interface of the computer device is for communicating with a terminal connection. Those skilled in the art will appreciate that the structures shown in FIG. 4 are block diagrams only and do not constitute a limitation of the computer device on which the present aspects apply, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.
The processor in this embodiment is configured to execute the specific contents of the acquisition module 2100, the processing module 2200, and the execution module 2300 in fig. 3, and the memory stores program codes and various types of data required for executing the above modules. The network interface is used for data transmission between the user terminal or the server. The memory in the present embodiment stores program codes and data necessary for executing all the sub-modules in the image processing method, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.
The embodiment of the invention provides a computer device, which is characterized in that an operating system acquires a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connected with the nodes of the dependency relationship DAG graph represent the transmission quantity and the data quantity of the node task; distributing threads for node tasks according to the total amount of ready nodes, direct subsequent nodes and system threads in the dependency relationship DAG graph; and scheduling the threads according to the system load, the node task communication relationship and the traffic of the current processor. The embodiment of the invention provides a supporting method of the data stream execution mode on the operating system level, so that the efficiency and the optimizable space are greatly improved.
The present invention also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the image processing method of any of the embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims (10)

1. A data processing method of a data stream, comprising:
the method comprises the steps that an operating system obtains a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connected with the nodes of the dependency relationship DAG graph represent the transmission quantity and the data quantity of the node task;
distributing threads for node tasks according to the total amount of ready nodes, direct subsequent nodes and system threads in the dependency relationship DAG graph;
and scheduling the threads according to the system load, the node task communication relationship and the traffic of the current processor.
2. The method according to claim 1, wherein the allocating threads for node tasks according to the ready node, the directly following node and the total system thread in the dependency DAG graph comprises:
and sequencing ready nodes of the node tasks in the dependency relationship DAG graph according to the number of edges from more to less, and distributing the node tasks to threads of the ready nodes which are positioned at the first place and are online.
3. The method of claim 1, wherein scheduling threads based on system load, node task communication relationships, and traffic of the current processor comprises:
counting the task data volume in each processor to obtain the total task data volume on the corresponding processor;
performing pre-scheduling according to a preset scheduling algorithm to schedule tasks to each processor one by one;
calculating total delay estimation of all processors according to a pre-scheduling result and the total task amount;
and evaluating various pre-scheduling results, and binding threads of the ready node tasks on a pre-scheduling processor with the minimum total delay estimation for data processing.
4. A data processing method according to claim 3, wherein said calculating a total delay estimate for all processors based on the pre-scheduling result and the total task amount comprises:
calculating the data transmission time estimation value edge of the node task on the dependency relationship DAG graph=the sum of the time when all input data are copied from the NUMA node where the predecessor node is located to the NUMA node where the processor is located;
acquiring total data capacity and final cache capacity share of node tasks on each processor core to obtain a ratio k of the total data capacity to the cache;
the total delay estimate td=ridge+total data capacity x k x is calculated for each processing core, where x is an empirical value.
5. The data processing method of claim 1, wherein the remaining unassigned nodes of threads are offline ready nodes and offline directly subsequent nodes, further comprising:
and tracking the ready state of the online blocking node by an operating system according to the precedence dependence in the PCB, wherein the ready state of the offline direct subsequent node is tracked and supported by a user code or a user state running library.
6. A data processing apparatus for a data stream, comprising:
the system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring a dependency relationship DAG graph of a node task and data traffic from a process block PCB of a program, wherein nodes in the dependency relationship DAG graph represent the node task, and edges connected with the nodes of the dependency relationship DAG graph represent the transmission quantity and the data quantity of the node task;
the processing module is used for distributing threads for node tasks according to the total amount of ready nodes, direct subsequent nodes and system threads in the dependency relationship DAG graph;
and the execution module is used for scheduling the threads according to the system load of the current processor, the node task communication relationship and the traffic.
7. The data processing apparatus according to claim 6, wherein,
the processing module is further configured to sort ready nodes of the node tasks in the dependency relationship DAG graph according to the number of edges from more to less, and allocate the node tasks to threads of the ready nodes that are located first and online.
8. The data processing apparatus of claim 6, wherein the execution module comprises:
the first acquisition sub-module is used for counting the task data volume in each processor to obtain the total task data volume on the corresponding processor;
the first processing sub-module is used for carrying out pre-scheduling according to a preset scheduling algorithm to schedule tasks to each processor one by one;
a second processing sub-module for calculating a total delay estimate for all processors based on the pre-scheduling result and the total task amount;
and the first execution sub-module is used for evaluating various pre-scheduling results, and binding threads of the ready node tasks on a pre-scheduling processor with the minimum total delay estimation for data processing.
9. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the data processing method of the data stream of any of claims 1 to 5.
10. A storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the data processing method of the data stream of any of claims 1 to 5.
CN202310031867.2A 2023-01-10 2023-01-10 Data processing method, device, computer and storage medium for data stream Pending CN116048759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310031867.2A CN116048759A (en) 2023-01-10 2023-01-10 Data processing method, device, computer and storage medium for data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310031867.2A CN116048759A (en) 2023-01-10 2023-01-10 Data processing method, device, computer and storage medium for data stream

Publications (1)

Publication Number Publication Date
CN116048759A true CN116048759A (en) 2023-05-02

Family

ID=86121503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310031867.2A Pending CN116048759A (en) 2023-01-10 2023-01-10 Data processing method, device, computer and storage medium for data stream

Country Status (1)

Country Link
CN (1) CN116048759A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421052A (en) * 2023-11-02 2024-01-19 深圳大学 Hardware automatic execution method, system, equipment and medium for data stream task

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421052A (en) * 2023-11-02 2024-01-19 深圳大学 Hardware automatic execution method, system, equipment and medium for data stream task

Similar Documents

Publication Publication Date Title
CN114741207B (en) GPU resource scheduling method and system based on multi-dimensional combination parallelism
CN112711478B (en) Task processing method and device based on neural network, server and storage medium
CN105468439B (en) The self-adaptive parallel method of neighbours in radii fixus is traversed under CPU-GPU isomery frame
JPH09171503A (en) Method and apparatus for parallel processing
CN115543639A (en) Optimization method for distributed execution of deep learning task and distributed system
Deng et al. A data and task co-scheduling algorithm for scientific cloud workflows
KR20220145848A (en) Intelligent buffer tracking system and method for optimized dataflow within integrated circuit architectures
Rauchecker et al. Using high performance computing for unrelated parallel machine scheduling with sequence-dependent setup times: Development and computational evaluation of a parallel branch-and-price algorithm
CN115269204B (en) Memory optimization method and device for neural network compiling
CN116048759A (en) Data processing method, device, computer and storage medium for data stream
CN115016938A (en) Calculation graph automatic partitioning method based on reinforcement learning
CN114662932A (en) Node-hierarchical workflow timing task scheduling method
US20040093477A1 (en) Scalable parallel processing on shared memory computers
CN108108242B (en) Storage layer intelligent distribution control method based on big data
CN114217930A (en) Accelerator system resource optimization management method based on mixed task scheduling
CN116069480B (en) Processor and computing device
CN114860417B (en) Multi-core neural network processor and multi-task allocation scheduling method for same
CN116841710A (en) Task scheduling method, task scheduling system and computer storage medium
Konovalov et al. Job control in heterogeneous computing systems
Li et al. Performance modelling and cost effective execution for distributed graph processing on configurable VMs
Anderson et al. Value-maximizing deadline scheduling and its application to animation rendering
Wang et al. On optimal budget-driven scheduling algorithms for MapReduce jobs in the hetereogeneous cloud
Bensaleh et al. Optimal task scheduling for distributed cluster with active storage devices and accelerated nodes
Sun et al. Schedulability Analysis of Non-preemptive Sporadic Gang Tasks on Hardware Accelerators
Okamura et al. DAG scheduling considering parallel execution for high-load processing on clustered many-core processors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination