CN115525582A - Method and system for task management and data scheduling of page-based inline computing engine - Google Patents

Method and system for task management and data scheduling of page-based inline computing engine Download PDF

Info

Publication number
CN115525582A
CN115525582A CN202211133234.4A CN202211133234A CN115525582A CN 115525582 A CN115525582 A CN 115525582A CN 202211133234 A CN202211133234 A CN 202211133234A CN 115525582 A CN115525582 A CN 115525582A
Authority
CN
China
Prior art keywords
page
data
task
computing engine
inline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211133234.4A
Other languages
Chinese (zh)
Inventor
李树青
王江
孙华锦
***
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd filed Critical Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202211133234.4A priority Critical patent/CN115525582A/en
Publication of CN115525582A publication Critical patent/CN115525582A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a method and a system for task management and data scheduling of a page-based inline computing engine, wherein the method is suitable for the inline computing engine, all tasks in the method use a shared address space to receive data from external equipment, the shared address space adopts a paging management mode to divide the whole address space into a plurality of subspaces taking pages as units, and one task needs to apply for data caching taking pages as units when issuing data commands to the external equipment; the mapping relation between the data of the external equipment participating in the operation and the data in the memory is stored in the information of each virtual address page; the data cache usage management also takes a page as a unit, the data volume of each page is counted, when one page receives data with the size equal to the page size, the page can be recycled and distributed to the next task for use, and better performance can be obtained in an out-of-order scene.

Description

Method and system for task management and data scheduling of page-based inline computing engine
Technical Field
The invention relates to the technical field of chips, in particular to a task management and data scheduling method and system of a page-based inline computing engine.
Background
The inline computing engine is a computing engine located between a data source and a memory on a data flow path, and compared with a traditional computing engine, the inline computing engine can reduce the access times of the memory and effectively save the requirement on the memory bandwidth.
Fig. 1 is a data interaction process of a conventional calculation engine, in which data 1 for calculation is stored in a memory, and data 2 comes from an external device. The data 2 is written into the memory from the external device, and then the calculation engine reads the data 1 and the data 2 from the memory to obtain an operation result and then writes the operation result back to the memory. Thus, under the most typical task described above, the memory needs to be read and written 2 times separately.
When an inline compute engine is used, the data flow is as shown in FIG. 2. The external device writes data 2 directly to the inline computing engine, which now appears to the external device as a memory device. And the inline calculation engine reads the data 1 from the memory while receiving the external equipment data 2, calculates and obtains a result, and writes the result into the memory. Thus, in the typical scenario described above, the memory need only be read and written to once, respectively.
There are many ways to implement the internal of the inline computing engine, but no matter what concrete implementation is adopted, from an abstract perspective, there are the following common features in the scene:
1. because the data acquisition of the external equipment has a certain time delay, namely a certain time difference exists from the time of sending a command to the external equipment to the time of outputting the data. During this time difference, the system is at a performance requirement and needs to continue sending commands to the device or other devices without waiting for one task to complete before starting the next task. Thus, inline computing modules need to support multiple concurrent tasks.
2. For high performance devices, task-to-task completion order often supports out-of-order. In particular, when a plurality of external devices are used, the order is more difficult to guarantee. That is, the order in which tasks are completed is often not the order in which tasks are delivered. For some common protocols, such as PCIe, out-of-order between multiple data blocks in a task may also occur.
Therefore, due to the out-of-order and multi-task, the inline computing engine needs to load the task information corresponding to the current data only when receiving the data written by the external device, and then reads the corresponding data 1 from the memory. Inline computing cannot predict the relevant information of the next incoming data, which is one of the core problems facing inline computing engines. Before the task information and the corresponding data are acquired, the data written from the external device needs to be temporarily stored in an internal cache of the inline computing engine. Compared with an off-chip memory chip such as a DRAM, the on-chip cache is very expensive, and therefore, the increase of the acquisition time of the task information and the corresponding data directly leads to the increase of the chip cost.
How to design the task management and address assignment mechanism can reduce the above time, as well as reduce other side effects. As shown in fig. 3, a common task management mechanism at present is that each task occupies a separate address space range, the address range does not exceed the maximum size of the task, and the actual task occupies a certain range of virtual addresses, typically consecutive addresses starting from the virtual address range offset by 0.
When a task is accepted, a virtual address range is allocated to the task, and the range is informed to the external device. Then, when receiving the data sent by the external device, the task to which the data belongs can be obtained according to the range of the virtual address to which the data address belongs. The advantage of this solution is that the address is statically bound to the task, the mapping relation query is simple, and the management of the data stream can be independent of the task.
However, this solution has a significant disadvantage that after the engine receives the data, it needs to query the task corresponding to the data, and then query the corresponding location of data 1 in the task information according to the offset of the data in the virtual address. Since each task supports a larger address range, more location descriptors of data 1 are also needed (generally, data 1 in memory is managed in a page-wise manner, and each page is described by a separate address). Obviously, it is not practical to store all data 1 location information of all tasks, and therefore, one or several location information is generally cached in sequence in the task information. If the data stream of a task arrives in order of addresses, the data 1 position information is not buffered, and a temporary reading is needed. Such temporary reading will undoubtedly greatly increase the task delay, reduce the efficiency of data cache, and seriously cause the cache overflow.
Disclosure of Invention
In view of the above, the present invention is to provide a method and a system for task management and data scheduling of a page-based inline computing engine, which are applicable to the inline computing engine to solve the above problems.
In view of the above objects, in one aspect, the present invention provides a task management and data scheduling system for a page-based inline computing engine, the system comprising:
an external device control unit for initializing a task to a compute engine; the system is also used for applying for page subspace resources from a page information management module of the computing engine;
the page information management module is used for acquiring resources which can meet the application quantity from the page subspace resource pool as much as possible and returning the page subspace information to the external equipment control unit;
and the external equipment control unit is used for calculating an address interval of a page subspace according to the number of each page subspace, issuing an IO (input/output) command to the external equipment and appointing the address of a data target of the external equipment as the address interval.
As a further aspect of the present invention, the external device control unit does not immediately issue an IO command to the external device after the task is initialized to the computing engine, and the external device control unit is further configured to configure parameters of a corresponding task when the task is initialized to the computing engine.
As a further scheme of the invention, before issuing the IO command, the external device control unit applies for the page subspace resource to a page information management module of the computing engine.
As a further solution of the present invention, when the page information management module obtains the resource that can satisfy the number of applications, if the current number of applications cannot be satisfied, the external device control unit submits the remaining applications that do not satisfy the number again, and if there is no page subspace resource currently, the page information management module does not generate a response, and the external device control unit waits.
As a further solution of the present invention, after receiving the page subspace resource, the external device control unit is further configured to send a request for configuring a page to a task information management module of the compute engine, notify the corresponding task information unit of the obtained page information, and the task information management module loads a data mapping relationship of a specified number of pages from a memory and configures the data mapping relationship to the page information.
As a further scheme of the present invention, the external device controller only needs to issue IO commands to the external device in sequence, and after receiving the IO commands, the external device writes data into the address interval.
As a further scheme of the present invention, when receiving external data, the page information management module is configured to monitor data received by the write interface, determine an address of the current write data, query corresponding mapping information in the page information according to the address, and control the read interface to read corresponding operation data from the memory; the page information management module is also used for monitoring the amount of the received data and recording the data in the page information.
As a further scheme of the invention, after one page receives all data and updates the data count in the task information management, if the received numerical value is equal to the data amount of the current task, the current task is completed, and the task information management module completes the subsequent handshaking and the transmission of the task information.
In another aspect of the present invention, a method for task management and data scheduling of a page-based inline computing engine is provided, which is applicable to the inline computing engine, and includes the following steps:
setting the minimum address range of a receiving port for receiving external equipment data as the size of an internal data cache of the inline computing engine based on the inline computing engine;
dividing the address range into a plurality of subspaces taking pages as units;
each of the page subspaces being used is bound with a task, one task uses at least one page subspace and the used page subspace is created as a data reception address of the external device task;
and storing the task running information by taking the task as a unit through a task information storage module.
As a further scheme of the present invention, the internal data cache of the inline computation engine is a cache used by the inline computation engine to receive external device data and temporarily store the external device data in the engine.
As a further scheme of the present invention, the inline computing engine passively receives external device data, and when the data reaches the receiving port, the inline computing engine searches for an address of the data in the memory participating in the operation according to the data address, and reads the data in the memory.
As a further scheme of the invention, before the memory data reaches the inline computation engine, the data sent by the external device is temporarily stored, and the minimum address range of the received data is set as the cache size.
As a further aspect of the present invention, after the address range is divided into a plurality of subspaces with a page as a unit, the page size is set to be consistent with the page size of the memory for storing the operation data; in the inline calculation engine, a page information storage unit is arranged for each page, and the page information storage unit is used for storing key information of the page during operation, wherein the key information at least comprises data mapping information, task numbers and page data counts.
As a further aspect of the present invention, the data mapping information is used to store peripheral data entering from the page subspace, and perform an operation with data located in a memory; the task number is used for storing the task to which the page subspace belongs currently; the page data count is used to count the total number of data that have been received by the current page subspace.
As a further scheme of the present invention, the query relationship between the page information and the task information is to bind the mapping information to the page information, and to bind the page allocation and the cache allocation, including:
sharing the address space, establishing page information by taking a page as a unit, storing a data mapping relation in the page information, and mapping to task information through task numbers in the page information.
In still another aspect of the present invention, a computer-readable storage medium is further provided, which stores computer program instructions, and when executed, the computer program instructions implement any one of the above methods for task management and data scheduling of a page-based inline computing engine according to the present invention.
In yet another aspect of the present invention, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the computer program when executed by the processor performing any of the above methods for task management and data scheduling of a page-based inline computing engine according to the present invention.
In another aspect of the present invention, there is provided a chip for controlling flow according to any one of the above methods for task management and data scheduling of a page-based inline computing engine according to the present invention, wherein the architecture of the chip has a CPU reset vector register, a CPU release control pin, a CPU release control register, and a debug interface, wherein the chip has a function of controlling flow, and the chip has a function of controlling flow according to the CPU reset vector register, the CPU release control pin, the CPU release control register, and the debug interface
The CPU reset vector register is used for controlling the address of an instruction which is read and executed after the CPU is released;
the CPU release control register is used for controlling CPU release when the chip is electrified;
the CPU release control pin is used for controlling the validity of the CPU release control register;
the debugging interface is used for reading and writing the on-chip RAM and each register to execute the flow control of the chip.
Compared with the traditional implementation mode, the invention has the following main advantages:
1. the invention stores the data mapping relation in the page information, but not in the task information, and all tasks share one address space, therefore, the amount of stored information is reduced to 1/N of the traditional method, and N is the maximum number of supported tasks.
2. The invention stores the mapping relation of the whole address space page in the cache, and binds the application of the cache to the page, therefore, the data to be received of all the tasks currently exist in the cache, and the cache miss and the reloading of the mapping relation are not caused no matter whether the task data arrives out of order, therefore, the invention can obtain better performance under the out-of-order scene.
3. The invention uses a shared address space instead of the independent address space of each task, so the number of the required address spaces is only 1/N of that of the traditional method at most, and the size of the address space of each task in the traditional method needs to be reserved according to the maximum data size, but the method only needs to be reserved according to the cache size, thereby the address space is further reduced in practice. Although the address space is not a direct hardware resource, a higher bit width signal is required to represent the address, which indirectly causes a certain hardware resource overhead.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
In the figure:
FIG. 1 is a data flow diagram of a conventional compute engine;
FIG. 2 is a data flow diagram of a conventional inline compute engine;
FIG. 3 is a diagram illustrating a task mapping by independent virtual addresses according to a conventional task management mechanism;
FIG. 4 is a task management and data scheduling flow diagram of the task management and data scheduling system of the page-based inline computing engine of the present invention;
FIG. 5 is a schematic diagram illustrating a processing flow of page subspaces, page information and task information in a method for task management and data scheduling of a page-based inline computing engine according to the present invention;
FIG. 6 is a schematic diagram of an embodiment of a computer-readable storage medium embodying a method for task management and data scheduling for a page-based inline computing engine of the present invention;
FIG. 7 is a hardware block diagram of an embodiment of a computer device for implementing a method for task management and data scheduling for a page-based inline computing engine according to the present invention;
fig. 8 is a schematic diagram of a frame of an embodiment of a chip of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two non-identical entities with the same name or different parameters, and it is understood that "first" and "second" are only used for convenience of expression and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements does not include all of the other steps or elements inherent in the list.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments and features of the embodiments described below can be combined with each other without conflict.
After the engine receives the data, the task corresponding to the data needs to be queried, and then the position of the corresponding data 1 is queried in the task information according to the offset of the data in the virtual address. Since each task supports a larger address range, more location descriptors of data 1 are also needed (generally, data 1 in memory is managed in a page-wise manner, and each page is described by a separate address). Obviously, it is not practical to store all data 1 location information of all tasks, and therefore, one or several location information is generally cached in sequence in the task information. If the data stream of a task does not arrive in the order of addresses, the data 1 position information is not buffered and needs to be temporarily read. Such temporary reading will undoubtedly greatly increase the task delay, reduce the utilization efficiency of the data buffer, and seriously may cause buffer overflow.
In view of this, the embodiment of the present invention provides a method and a system for task management and data scheduling of a page-based inline computing engine, and the method and the system are mainly characterized in that:
all tasks use a shared address space to receive data from external devices, rather than each task using a separate address space as in the conventional approach.
The shared address space adopts a paging management mode to divide the whole address space into a plurality of sub-spaces taking pages as units. When a task issues a data command to an external device, a data cache needs to be applied in units of pages.
The mapping relation between the data of the external device participating in the operation and the data in the memory is stored in the information of each virtual address page, rather than the task information in the traditional method.
The usage management of the data cache also takes a page as a unit, the data amount of each page is counted, and after a page receives data equal to the size of the page, the page can be recycled and allocated to the next task for use.
In some embodiments of the present invention, referring to fig. 4, a task management and data scheduling system of a page-based inline computing engine is provided, the system includes an external device control unit, a page information management module, and an external device control unit, and key parts inside the engine can be abstracted into the structure in fig. 4.
The external equipment control unit is used for initializing tasks to the computing engine; the system is also used for applying for page subspace resources to a page information management module of the computing engine;
the page information management module is used for acquiring resources which can meet the application quantity from the page subspace resource pool as much as possible and returning the page subspace information to the external equipment control unit;
and the external equipment control unit is used for calculating an address interval of a page subspace according to the number of each page subspace, issuing an IO (input/output) command to the external equipment and appointing the address of a data target of the external equipment as the address interval.
The external device control unit is not used for immediately issuing an IO command to the external device after the task is initialized to the computing engine, and is also used for configuring parameters of the corresponding task when the task is initialized to the computing engine.
In some embodiments, before issuing the IO command, the external device control unit applies for a page subspace resource from a page information management module of the compute engine.
In this embodiment, the external device control unit first needs to initialize a task to the computing engine for configuring parameters of the task, which is the same as the conventional method, and detailed implementation processes are not discussed in the present invention. However, after the task initialization is completed, the external device control unit cannot immediately issue the IO command to the external device.
In some embodiments, when the page information management module obtains the resource that can satisfy the application number, if the current application number cannot be satisfied, the external device control unit resubmits the remaining applications that do not satisfy the number, and if there is no page subspace resource at present, the page information management module does not generate a response, and the external device control unit waits.
Before issuing an IO command, the external device control unit needs to apply for a page subspace resource from a page information management module of the computing engine. Generally, a page subspace resource pool is maintained in a page information management module in a compute engine, because the compute engine includes all information for maintaining the resource pool, but if the compute engine transmits necessary information to an external device control unit, the resource pool may also be maintained in the external device control unit, and at this time, the external device control unit may omit interaction with the compute engine regarding application for resources, but in essence, there is still a process for applying for resources, which also belongs to an embodiment of the present invention.
In some embodiments, the external device control unit is further configured to send a request for configuring a page to a task information management module of the compute engine after receiving the page subspace resource, notify the corresponding task information unit of the obtained page information, and the task information management module loads a data mapping relationship of a specified number of pages from the memory and configures the data mapping relationship to the page information.
The page information management module acquires the resources which can meet the application quantity from the page subspace resource pool as much as possible, and returns the page subspace information to the external equipment control unit. If the current number of applications cannot be satisfied, the external device control unit generally needs to resubmit the remaining number of applications that are not satisfied. If there is no page subspace resource currently, the page information management module does not generate a response, and the external device control unit needs to wait.
In some embodiments, the external device controller only needs to issue IO commands to the external device in sequence, and after receiving the IO commands, the external device writes data to the address space.
After receiving the page subspace resource, the external device control unit sends a request for configuring a page to a task information management module of the computing engine, and notifies the corresponding task information unit of the just obtained page information. Then, the task information management module loads the data mapping relation of the specified page quantity from the memory and configures the data mapping relation to the page information. It should be emphasized that the above interaction process is only one implementation way, and the essence of the present invention is to complete the process of configuring the data mapping to the page information and the process of binding the page information to the task information.
Then the external device control unit calculates the address interval of the page subspace according to each page subspace number, then issues an IO command to the external device, and appoints the address of the data target of the external device as the address interval. Since the task of the conventional method generally shares an address space exclusively, and the external device controller can and only needs to issue IO commands to the external device sequentially, this is one of the differences between the present invention and the conventional method.
And after receiving the IO command, the external equipment writes the data into the address interval. Due to the concurrent and out-of-order nature, the address intervals may interleave the received data.
In some embodiments, when receiving external data, the page information management module is configured to monitor data received by the write interface, determine an address of the current write data, query corresponding mapping information in the page information according to the address, and control the read interface to read corresponding operation data from the memory; the page information management module is also used for monitoring the amount of the received data and recording the data in the page information.
Therefore, when external data is received, the page information management module is responsible for monitoring the data received by the write interface and judging the address of the current write data. Then, corresponding mapping information is inquired in the page information according to the address, and then a reading interface is controlled to read corresponding operation data from the memory. Meanwhile, the page information management module monitors the amount of data that has been received and records it in the page information.
When the address interval of the data entering from the write-in interface changes, that is, the data reception of the current page subspace is temporarily finished, the page information management module judges whether the number of the received data is equal to the size of the page, if so, the page information management module indicates that all the data of the page are completely received, and the page is subjected to a page subspace resource pool; if the size of the page is smaller than the size of the page, the page information management can update the page information of the page subspace, wherein the page size indicates that data belonging to the page subspace exists in the follow-up process, and the page cannot be recycled temporarily. No matter whether the page collects all data or not, the page information needs to send the related information to the task information management module at the moment.
In some embodiments, after a page receives all data and updates the data count in task information management, if the received value is equal to the data amount of the current task, the current task is completed, and the task information management module completes subsequent handshaking and task information transmission.
If one page receives all data and updates the data count in the task information management, and the numerical value is equal to the data volume of the current task, the current task is completed, and the task information management module completes the subsequent handshaking and the transmission of the task information, and the process is similar to the traditional method.
In the embodiment of the present invention, only one block address space is used as a data receiving space, which is shared by all tasks, and the space is divided into subspaces in units of pages. The invention maps data to a page subspace, saves the data mapping relation of the page in page information, and maps the page to a corresponding task. The mechanism for allocating and recycling pages not only needs to acquire task resources and configure the tasks, but also needs to acquire page subspace resources before issuing the tasks, and when the data of one page is collected, the page can be recycled and allocated to another task.
In a second aspect of the embodiments of the present invention, referring to fig. 5, a method for task management and data scheduling of a page-based inline computing engine is provided, which is suitable for the inline computing engine, and the method includes the following steps:
setting a minimum address range of a receiving port for receiving external equipment data as the size of an internal data cache of the inline computing engine based on the inline computing engine;
dividing the address range into a plurality of subspaces taking a page as a unit;
each of the page subspaces being used is bound with a task, one task uses at least one page subspace and the used page subspace is created as a data reception address of the external device task;
and storing the task running information by taking the task as a unit through the task information storage module.
In some embodiments, the inline compute engine internal data cache is a cache used by the inline compute engine to receive external device data and temporarily store the data within the engine.
In some embodiments, the inline computing engine passively receives external device data, and when the data reaches the receiving port, the inline computing engine searches for an address of the data in the memory participating in the operation according to the data address, and reads the data in the memory.
In some embodiments, before the memory data reaches the inline computing engine, the data sent from the external device is temporarily stored, and the minimum address range of the received data is set as the buffer size.
In the embodiment of the invention, the minimum address range of a receiving port of the inline computing engine for receiving the external device data is set as the size of the internal data cache of the inline computing engine. The buffer memory here refers to a buffer memory used by the inline computation engine to receive external device data and temporarily store the external device data in the engine, because the inline computation engine needs to receive the external device data passively, rather than scheduling data actively, and therefore, the inline computation engine cannot know in advance which data will arrive at the receiving port. After the data reaches the receiving port, the inline computing engine needs to find the address of the data in the memory participating in the operation according to the data address, and then reads the data in the memory. Before the memory data reaches the inline computation engine, the data sent by the external device needs to be temporarily stored. Therefore, no matter how the inline compute engine is implemented, there needs to be a cache to implement the above functionality. As mentioned above, the address range of the received data needs to be set to the buffer size at minimum, otherwise, a part of the buffer may not be used all the time, but the address range may be larger than the buffer size.
In some embodiments, after the address range is divided into a plurality of subspaces in units of pages, the page size is set to be consistent with the page size of the memory for storing the operation data; in the inline calculation engine, a page information storage unit is arranged for each page, and the page information storage unit is used for storing key information of the page during operation, wherein the key information at least comprises data mapping information, task numbers and page data counts.
Dividing the address range into a plurality of sub-spaces in units of pages, for example, if the size of a page is set to 4KB, the address space is divided into a plurality of sub-spaces with 4KB size, wherein: the address range of the page 0 subspace is 0-4 KB-1, the address range of the page 1 subspace is 4 KB-8 KB-1, \ 8230; \ 8230;, and so on (the upper and lower bounds of the address of each subspace refer to the offset in the whole address space).
For ease of system management, the page size is typically set to coincide with the page size of the memory in which the operational data is stored.
In the inline calculation engine, a page information storage unit is arranged for each page, namely, a page 0 subspace corresponds to page information 0, a page 1 subspace corresponds to page information 1 \8230, 8230and so on.
In some embodiments, the data mapping information is used for storing peripheral data entering from the page subspace and operating with data located in a memory; the task number is used for storing the task to which the page subspace belongs currently; the page data count is used to count the total number of data that has been received by the current page subspace.
In an embodiment of the present invention, the page information storage unit internally stores key information of the page during operation, wherein the key information at least includes data mapping information, a task number, and a page data count.
The data mapping information stores peripheral data entered from the page subspace, with which data located in the memory the operation is performed, and where in the memory the result of the operation needs to be stored, that is, the data 1 address and the result address.
The task number stores to which task the page subspace currently belongs. When data enters the calculation engine from the address of the page subspace, part of information in the page information storage unit needs to be transferred to the task information unit, for example, the number of current data needs to be informed of task information for task counting.
The page data count is used to count the total amount of data that has been received by the current page subspace. Due to the problem of data disorder, data belonging to different page subspaces can be fragmented and crossed to enter a calculation engine, so that when the data fragment reception of the current page subspace is finished, if the counting value is possibly the page size, the counting value needs to be temporarily stored to wait for more subsequent data input; when the current count value equals the page size, indicating that the entire page has received completion, the page may be reclaimed and allocated to the next task.
In some embodiments, the query relationship between the page information and the task information is to bind the mapping information to the page information, and to bind the page allocation to the cache allocation, including:
sharing the address space, establishing page information by taking a page as a unit, storing a data mapping relation in the page information, and mapping to task information through task numbers in the page information.
In this embodiment, each page subspace being used is bound to one task, and one task may use multiple page subspaces at the same time, for example, when the data size of one task is larger than 1 page size, and when there are multiple page subspaces free during task creation, the task may apply for multiple page subspaces, which will be created as the data receiving address of the external device task at the same time. Given that external devices typically send data concurrently and out of order, these page subspaces are used by tasks at the same time.
The task information storage module stores runtime information in units of tasks, such as computing parameters of a computing engine, context information of a call chain of the tasks in the system, and the like, which are not in the discussion scope of the present invention. Generally, the task information also needs to include a task data count, which is passed by the page information to count the total number of data received by the task.
It is emphasized that the query relationship between the page information and the task information is one of the characteristic points of the present invention that is different from the conventional method. In the traditional method, task information is taken as a core, received data streams are mapped to tasks through task independent address spaces, and then data mapping relations, task parameters and the like are inquired in the task information. The invention shares the address space, establishes page information by taking a page as a unit, saves the data mapping relation and the like in the page information, and maps the data mapping relation to the task information through the task number in the page information. Because the number of concurrent tasks is large and each task has a large number of data mapping relations, the traditional method necessarily needs to store a large amount of data, but the mapping data is not all currently used. The present invention binds the mapping information to the page information, and binds the page allocation with the cache allocation, so that only the mapping information of the data being used needs to be stored, which is a core point that the present invention can obtain advantages compared with the conventional invention.
It should be understood that although the steps are described above in a certain order, the steps are not necessarily performed in the order described. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, some steps of this embodiment may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
In some embodiments, the page-based management scheme allows for clearer flow in the case of normal data flow. Due to the sharing of multiple pages, when an external device encounters some error or a link is in problem, the error location and isolation can be problematic. In general, a page whose bottom line is faulty may affect the current task, but cannot affect other tasks, and this mechanism for sharing a page by multiple tasks is a challenge.
In a third aspect of the embodiment of the present invention, a computer-readable storage medium is further provided, and fig. 6 is a schematic diagram of a computer-readable storage medium of a task management and data scheduling method for a page-based inline computing engine according to the embodiment of the present invention. As shown in fig. 6, the computer-readable storage medium 300 stores computer program instructions 310, the computer program instructions 310 being executable by a processor. The computer program instructions 310, when executed, implement the method of any of the embodiments described above.
It should be understood that all of the embodiments, features and advantages set forth above with respect to the method for task management and data scheduling of a page-based inline computing engine according to the present invention are equally applicable to the system for task management and data scheduling and storage medium of a page-based inline computing engine according to the present invention, without conflicting therewith.
In a fourth aspect of the embodiments of the present invention, there is further provided a computer device 400, including a memory 420 and a processor 410, where the memory stores therein a computer program, and the computer program, when executed by the processor, implements the method of any one of the above embodiments, including the following steps:
setting a minimum address range of a receiving port for receiving external equipment data as the size of an internal data cache of the inline computing engine based on the inline computing engine;
dividing the address range into a plurality of subspaces taking a page as a unit;
each page subspace being used is bound to a task, a task uses at least one page subspace and the used page subspace is created as a data reception address of the external device task;
and storing the task running information by taking the task as a unit through the task information storage module.
Fig. 7 is a schematic hardware structural diagram of an embodiment of a computer device for executing a method for task management and data scheduling of a page-based inline computing engine according to the present invention. Taking the computer device 400 shown in fig. 7 as an example, the computer device includes a processor 410 and a memory 420, and may further include: an input device 430 and an output device 440. The processor 410, the memory 420, the input device 430, and the output device 440 may be connected by a bus or other means, as exemplified by the bus connection in fig. 7. Input device 430 may receive input numeric or character information and generate signal inputs related to task management and data scheduling for a page-based inline computing engine. The output device 440 may include a display device such as a display screen.
Memory 420, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the task management and data scheduling methods of the page-based inline computing engine in the embodiments of the present application. The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of a task management and data scheduling method of the page-based inline computing engine, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected to local modules via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor 410 executes various functional applications of the server and data processing by running the non-volatile software programs, instructions and modules stored in the memory 420, namely, implementing the task management and data scheduling method of the page-based inline computing engine of the above method embodiment, and includes the following steps:
setting a minimum address range of a receiving port for receiving external equipment data as the size of an internal data cache of the inline computing engine based on the inline computing engine;
dividing the address range into a plurality of subspaces taking pages as units;
each of the page subspaces being used is bound with a task, one task uses at least one page subspace and the used page subspace is created as a data reception address of the external device task;
and storing the task running information by taking the task as a unit through the task information storage module.
A fifth aspect of the embodiment of the present invention further provides a chip 500 for performing flow control according to any one of the above methods for task management and data scheduling of a page-based inline computing engine according to the present invention. Fig. 8 is a schematic diagram of a frame of a chip 500 according to the present invention. As shown in FIG. 8, in this embodiment, the chip 500 has a CPU reset vector register 510, a CPU release control pin 520, a CPU release control register 530, and a debug interface 540 in its architecture, wherein
The CPU reset vector register 510 is used to control the address of an instruction that is read and executed after the CPU is released;
the CPU release control register 520 is used to control CPU release when the chip 500 is powered on;
the CPU release control pin 530 is used to control the validity of the CPU release control register 520;
the debug interface 540 is used to read and write the on-chip RAM and the registers to perform the flow control of the chip.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
Finally, it should be noted that the computer-readable storage medium (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM may be available in a variety of forms such as synchronous RAM (DRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The invention provides a method and a system for task management and data scheduling of an inline computing engine based on pages, wherein the data mapping relation is stored in page information instead of task information, and all tasks share one address space, so that the amount of stored information is reduced to 1/N of the traditional method, wherein N is the maximum number of supported tasks.
The invention stores the mapping relation of the whole address space page in the cache, and binds the application of the cache to the page, therefore, the data to be received of all the tasks currently exist in the cache, and the cache miss and the reloading of the mapping relation are not caused no matter whether the task data arrives out of order, therefore, the invention can obtain better performance under the out-of-order scene.
The invention uses a shared address space instead of the independent address space of each task, so the number of the required address space is only 1/N of the traditional method at most, and the size of the address space of each task in the traditional method needs to be reserved according to the maximum data size, but the method only needs to be reserved according to the cache size, thereby the address space is further reduced in practice. Although the address space is not a direct hardware resource, a higher bit width signal is required to represent the address, which indirectly causes a certain hardware resource overhead.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also combinations between technical features in the above embodiments or in different embodiments are possible, and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A task management and data scheduling system of an inline computing engine based on pages is characterized by comprising an external device control unit, a page information management module and an external device control unit;
the external equipment control unit is used for initializing tasks to the computing engine; the system is also used for applying for page subspace resources from a page information management module of the computing engine;
the page information management module is used for acquiring resources which can meet the application quantity from the page subspace resource pool as much as possible and returning the page subspace information to the external equipment control unit;
the external device control unit is used for calculating an address interval of a page subspace according to each page subspace number, issuing an IO command to the external device, and designating the address of the data target of the external device as the address interval.
2. The system for task management and data scheduling for a page-based inline computing engine of claim 1, wherein the external device control unit is further configured to configure parameters of a corresponding task upon initialization of the task to the computing engine.
3. The system for task management and data scheduling of a page-based inline computing engine of claim 2, wherein the external device control unit applies for page subspace resources from a page information management module of the computing engine before issuing the IO command.
4. The system of claim 3, wherein when the page information management module obtains resources that satisfy the number of applications, if the current number of applications cannot be satisfied, the external device control unit resubmits the remaining number of applications that are not satisfied, and if there are no page subspace resources at present, the page information management module does not generate a response, and the external device control unit waits.
5. The system for task management and data scheduling of a page-based inline computing engine of claim 4, wherein the external device control unit, after receiving the page subspace resource, is further configured to send a request for configuring a page to a task information management module of the computing engine, notify the obtained page information to a corresponding task information unit, and the task information management module loads a data mapping relationship of a specified number of pages from the memory and configures the data mapping relationship to the page information.
6. The system for task management and data scheduling of a page-based inline computing engine of claim 5, wherein the external device controller only needs to issue IO commands to the external devices in sequence, and the external devices write data to the address area after receiving the IO commands.
7. The system for task management and data scheduling for a page-based inline computing engine according to claim 6, wherein when receiving external data, the page information management module is configured to monitor data received by the write interface, determine an address of the currently written data, query corresponding mapping information in the page information according to the address, and control the read interface to read corresponding operation data from the memory; the page information management module is also used for monitoring the amount of the received data and recording the data in the page information.
8. The system of claim 1, wherein if a page receives all data and updates the data count in the task information management, and if the received value is equal to the data size of the current task, the current task is completed and the task information management module completes the subsequent handshaking and task information transfer.
9. A method for task management and data scheduling of a page-based inline computing engine, based on the system for task management and data scheduling of a page-based inline computing engine according to any of claims 1-8, adapted to an inline computing engine, the method comprising the steps of:
setting the minimum address range of a receiving port for receiving external equipment data as the size of an internal data cache of the inline computing engine based on the inline computing engine;
dividing the address range into a plurality of subspaces taking a page as a unit;
each of the page subspaces being used is bound with a task, one task uses at least one page subspace and the used page subspace is created as a data reception address of the external device task;
and storing the task running information by taking the task as a unit through the task information storage module.
10. The method for task management and data scheduling of a page-based inline computing engine of claim 9, wherein the inline computing engine internal data cache is a cache used by the inline computing engine to receive external device data and temporarily store the external device data inside the engine;
the inline computing engine passively receives external equipment data, and when the data reaches the receiving port, the inline computing engine searches for the address of the data in the memory participating in the operation according to the data address and reads the data in the memory.
CN202211133234.4A 2022-09-16 2022-09-16 Method and system for task management and data scheduling of page-based inline computing engine Pending CN115525582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211133234.4A CN115525582A (en) 2022-09-16 2022-09-16 Method and system for task management and data scheduling of page-based inline computing engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211133234.4A CN115525582A (en) 2022-09-16 2022-09-16 Method and system for task management and data scheduling of page-based inline computing engine

Publications (1)

Publication Number Publication Date
CN115525582A true CN115525582A (en) 2022-12-27

Family

ID=84697903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211133234.4A Pending CN115525582A (en) 2022-09-16 2022-09-16 Method and system for task management and data scheduling of page-based inline computing engine

Country Status (1)

Country Link
CN (1) CN115525582A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573378A (en) * 2024-01-15 2024-02-20 摩尔线程智能科技(北京)有限责任公司 Memory management method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573378A (en) * 2024-01-15 2024-02-20 摩尔线程智能科技(北京)有限责任公司 Memory management method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US7584345B2 (en) System for using FPGA technology with a microprocessor for reconfigurable, instruction level hardware acceleration
US7702826B2 (en) Method and apparatus by utilizing platform support for direct memory access remapping by remote DMA (“RDMA”)-capable devices
CN113918101B (en) Method, system, equipment and storage medium for writing data cache
US7437617B2 (en) Method, apparatus, and computer program product in a processor for concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers
US20160062802A1 (en) A scheduling method for virtual processors based on the affinity of numa high-performance network buffer resources
US10778815B2 (en) Methods and systems for parsing and executing instructions to retrieve data using autonomous memory
CN110119304B (en) Interrupt processing method and device and server
WO2012058252A1 (en) A method for process synchronization of embedded applications in multi-core systems
US20060150023A1 (en) Debugging apparatus
EP3407184A2 (en) Near memory computing architecture
US10013199B2 (en) Translation bypass by host IOMMU for systems with virtual IOMMU
CN115525582A (en) Method and system for task management and data scheduling of page-based inline computing engine
KR20050076702A (en) Method for transferring data in a multiprocessor system, multiprocessor system and processor carrying out this method
CN116755902A (en) Data communication method and device, processing system, electronic equipment and storage medium
JP6294732B2 (en) Data transfer control device and memory built-in device
CN116360925A (en) Paravirtualization implementation method, device, equipment and medium
CN112328514B (en) Method and device for generating independent process identification by multithreaded processor system
CN115238642A (en) FPGA-based crossbar design system and method for peripheral bus
CN113360130A (en) Data transmission method, device and system
US7000148B2 (en) Program-controlled unit
CN107807888B (en) Data prefetching system and method for SOC architecture
US9400654B2 (en) System on a chip with managing processor and method therefor
CN115525599A (en) High-efficient computing device
CN114116556B (en) Method, system, storage medium and equipment for dynamically distributing queue cache
CN117009265B (en) Data processing device applied to system on chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination