CN117971317B - Method and device for interaction between central processing unit and accelerator and electronic equipment - Google Patents

Method and device for interaction between central processing unit and accelerator and electronic equipment Download PDF

Info

Publication number
CN117971317B
CN117971317B CN202410365390.6A CN202410365390A CN117971317B CN 117971317 B CN117971317 B CN 117971317B CN 202410365390 A CN202410365390 A CN 202410365390A CN 117971317 B CN117971317 B CN 117971317B
Authority
CN
China
Prior art keywords
command
accelerator
queue
processing unit
central processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410365390.6A
Other languages
Chinese (zh)
Other versions
CN117971317A (en
Inventor
李祖松
郇丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Micro Core Technology Co ltd
Original Assignee
Beijing Micro Core Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Micro Core Technology Co ltd filed Critical Beijing Micro Core Technology Co ltd
Priority to CN202410365390.6A priority Critical patent/CN117971317B/en
Publication of CN117971317A publication Critical patent/CN117971317A/en
Application granted granted Critical
Publication of CN117971317B publication Critical patent/CN117971317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Advance Control (AREA)

Abstract

The disclosure provides a method, a device and electronic equipment for interaction between a central processing unit and an accelerator, wherein the method comprises the following steps: when the data transmission requirement is determined to exist, sending a command to a register of an accelerator through a command interface so as to enable the accelerator to execute the command, wherein the register is a local random access memory in the accelerator; and receiving an execution result of the accelerator executing the command. Therefore, the central processing unit can directly send a command to the accelerator through the command interface, and receive an execution result fed back by the accelerator, so that data transmission is completed, and the data transmission speed and efficiency can be improved.

Description

Method and device for interaction between central processing unit and accelerator and electronic equipment
Technical Field
The disclosure relates to the technical field of computer processors, and in particular relates to a method and a device for interaction between a central processing unit and an accelerator, and electronic equipment.
Background
With the rapid development of high-computation-intensive application fields such as artificial intelligence, scientific computing, video processing, network transmission and signal processing, the requirements on the data processing capability of the chip are also increasing. In addition to fully exploiting instruction sets and thread-level parallelism using aggressive superscalar techniques, multi-core, multi-threaded techniques, many chips employ AI (ARTIFICIAL INTELLIGENCE ) processors, graphics processors GPU (Graphics processing unit), digital signal processors DSP, data processors DPU (Data processing unit), and stream processors, etc. as accelerators (accelerators) with general purpose processor CPUs (Central Processing Unit, central processing units) as heterogeneous cores of multi-core processors, further exploiting data-level parallelism (DATA LEVEL PARALLELISM, DLP).
The data transmission between the cpu and the accelerator is realized by DDR (Double DATA RATE SDRAM, double-rate synchronous dynamic random access memory), which has a slow transmission speed and affects the efficiency of data transmission, which is a problem to be solved.
Disclosure of Invention
The disclosure provides a method, a device and an electronic device for interaction between a central processing unit and an accelerator, which aim to solve one of the technical problems in the related art at least to a certain extent.
In a first aspect, an embodiment of the present disclosure provides a method for interacting between a central processing unit and an accelerator, the method including:
when the data transmission requirement is determined to exist, a command is sent to a register of the accelerator through a command interface so that the accelerator executes the command, wherein the register is a local random access memory in the accelerator;
and receiving an execution result of the accelerator execution command.
In one possible implementation, the method provided by the embodiments of the present disclosure, the command includes a blocking mode and a non-blocking mode.
In one possible implementation manner, in the method provided by the embodiment of the present disclosure, the method further includes:
And setting a consistency access channel between the central processing unit and the accelerator, so that the accelerator directly accesses the memory of the central processing unit through the consistency access channel.
In one possible implementation manner, in the method provided by the embodiment of the present disclosure, when it is determined that there is a data transmission requirement, a command is sent to a register of an accelerator through a command interface, including:
in response to a first control instruction, a first command is sent to the register through the command interface, the first command being for indicating to store first data into the accelerator.
In one possible implementation manner, in the method provided by the embodiment of the present disclosure, when it is determined that there is a data transmission requirement, a command is sent to a register of an accelerator through a command interface, including:
In response to the second control instruction, a second command is sent to the register through the command interface, the second command being for instructing the retrieval of the second data from the accelerator.
In a possible implementation manner, in the method provided by the embodiment of the present disclosure, when a specific address space and a command queue are set in a register, and it is determined that a data transmission requirement exists, a command is sent to a register of an accelerator through a command interface, including:
storing the command to a specific address space of the accelerator in response to the third control instruction; or alternatively
Responding to a third control instruction, and storing the command into a command queue; or alternatively
In response to the fourth control instruction, a command is sent to the accelerator in a case where it is determined that the accelerator has executed the current command.
In one possible implementation manner, in the method provided by the embodiment of the present disclosure, after the command is deposited to the command queue in response to the third control instruction, the method further includes:
when no empty item exists in the command queue, determining that the command queue is full;
Stopping sending the command, continuously monitoring the command queue until an empty item exists in the command queue, and storing the command into the command queue.
In one possible implementation manner, in the method provided by the embodiment of the present disclosure, when there is no empty item in the command queue, determining that the command queue is full includes:
When an abnormal instruction fed back by the accelerator is received, determining that the command queue is full; /or
Receiving a return value of a command, and if the return value is a first preset value indicating that the command is not successfully executed, determining that a command queue is full; /or
If the set judging instruction for checking whether the queue has empty items indicates that the queue has no empty items, determining that the command queue is full; /or
If the execution time length of the command is greater than or equal to the preset first command timeout time length, determining that the command queue is full. In one possible implementation manner, in the method provided by the embodiment of the present disclosure, monitoring the command queue continuously until an empty item exists in the command queue, storing the command in the command queue includes:
when the abnormal instruction is stopped being received, determining that the command queue has an empty item, and storing the command into the command queue; /or
Receiving a return value, and if the return value is a second preset value indicating that the command is successfully executed, determining that an empty item exists in the command queue, and storing the command into the command queue; /or
If the set judging instruction for checking whether the queue has an empty item indicates that the empty item exists, determining that the command queue has the empty item, and storing the command into the command queue; /or
If the execution duration of the command is smaller than the timeout duration of the first command, determining that the command queue has an empty item, and storing the command into the command queue.
In one possible implementation manner, in the method provided by the embodiment of the present disclosure, in response to the fourth control instruction, in a case that it is determined that the accelerator has performed the current command, sending the command to the accelerator includes:
If the execution time length of the command is greater than or equal to the preset second command timeout time length, stopping sending the command; /or
And receiving a return value of the command, and stopping sending the command if the return value is a third preset value indicating that the command is not successfully executed.
In one possible implementation manner, in the method provided by the embodiment of the present disclosure, the method further includes:
If the execution duration of the command is less than the overtime duration of the second command, the command is restored to be sent; /or
And receiving a return value of the command, and if the return value is a fourth preset value indicating that the command is successfully executed, recovering to send the command.
In a second aspect, embodiments of the present disclosure provide a central processing unit and accelerator interaction device, including:
The sending unit is used for sending a command to a register of the accelerator through the command interface when the data transmission requirement is determined to exist, so that the accelerator executes the command, wherein the register is a local random access memory in the accelerator;
and the receiving unit is used for receiving the execution result of the accelerator execution command.
In one possible implementation, the device provided by the embodiments of the present disclosure, the command includes a blocking mode and a non-blocking mode.
In a possible implementation manner, in the device provided by the embodiment of the present disclosure, the sending unit is specifically configured to:
And setting a consistency access channel between the central processing unit and the accelerator, so that the accelerator directly accesses the memory of the central processing unit through the consistency access channel.
In a possible implementation manner, in the device provided by the embodiment of the present disclosure, the sending unit is specifically configured to:
in response to a first control instruction, a first command is sent to the register through the command interface, the first command being for indicating to store first data into the accelerator.
In a possible implementation manner, in the device provided by the embodiment of the present disclosure, the sending unit is specifically configured to:
In response to the second control instruction, a second command is sent to the register through the command interface, the second command being for instructing the retrieval of the second data from the accelerator.
In a possible implementation manner, in the device provided by the embodiment of the present disclosure, a specific address space and a command queue are set in a register, and the sending unit is specifically configured to:
storing the command to a specific address space of the accelerator in response to the third control instruction; or alternatively
Responding to a third control instruction, and storing the command into a command queue; or alternatively
In response to the fourth control instruction, a command is sent to the accelerator in a case where it is determined that the accelerator has executed the current command.
In a possible implementation manner, in the device provided by the embodiment of the present disclosure, the sending unit is specifically configured to:
when no empty item exists in the command queue, determining that the command queue is full;
Stopping sending the command, continuously monitoring the command queue until an empty item exists in the command queue, and storing the command into the command queue.
In a possible implementation manner, in the device provided by the embodiment of the present disclosure, the sending unit is specifically configured to:
When an abnormal instruction fed back by the accelerator is received, determining that the command queue is full; /or
Receiving a return value of a command, and if the return value is a first preset value indicating that the command is not successfully executed, determining that a command queue is full; /or
If the set judging instruction for checking whether the queue has empty items indicates that the queue has no empty items, determining that the command queue is full; /or
If the execution time length of the command is greater than or equal to the preset first command timeout time length, determining that the command queue is full.
In a possible implementation manner, in the device provided by the embodiment of the present disclosure, the sending unit is specifically configured to:
when the abnormal instruction is stopped being received, determining that the command queue has an empty item, and storing the command into the command queue; /or
Receiving a return value, and if the return value is a second preset value indicating that the command is successfully executed, determining that an empty item exists in the command queue, and storing the command into the command queue; /or
If the set judging instruction for checking whether the queue has an empty item indicates that the empty item exists, determining that the command queue has the empty item, and storing the command into the command queue; /or
If the execution duration of the command is smaller than the timeout duration of the first command, determining that the command queue has an empty item, and storing the command into the command queue.
In a possible implementation manner, in the device provided by the embodiment of the present disclosure, the sending unit is specifically configured to:
If the execution time length of the command is greater than or equal to the preset second command timeout time length, stopping sending the command; /or
And receiving a return value of the command, and stopping sending the command if the return value is a third preset value indicating that the command is not successfully executed.
In a possible implementation manner, in the device provided by the embodiment of the present disclosure, the sending unit is specifically configured to:
If the execution duration of the command is less than the overtime duration of the second command, the command is restored to be sent; /or
And receiving a return value of the command, and if the return value is a fourth preset value indicating that the command is successfully executed, recovering to send the command.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the method as provided by the first aspect of the embodiments of the present disclosure.
In a fourth aspect, the presently disclosed embodiments provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as provided by the first aspect of the presently disclosed embodiments.
In the embodiment of the disclosure, when a data transmission requirement exists between the central processing unit and the accelerator, the central processing unit sends a command to the accelerator through the command interface and receives an execution result of the accelerator execution command. Therefore, the central processing unit can directly send a command to the accelerator through the command interface, and receive an execution result fed back by the accelerator, so that data transmission is completed, and the data transmission speed and efficiency can be improved.
Additional aspects and advantages of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a method of CPU and accelerator interaction provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a central processing unit interacting with an accelerator provided by an embodiment of the present disclosure;
FIG. 3 is a block diagram of a central processing unit and accelerator interaction device provided by an embodiment of the present disclosure;
fig. 4 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.
The rapid growth of semiconductor technology has led to an increasing speed and integration of microprocessors, and the number and variety of transistor resources that processor designers can utilize to implement a chip has increased. From an architectural perspective, more and more processor designs are evolving towards on-chip multiprocessors, with an increasing number of processor cores integrated on-chip. From the application point of view, with the rapid development of high-computation-intensive application fields such as artificial intelligence, scientific computing, video processing, network transmission and signal processing, the requirements on the data processing capability of the chip are also increasing. In addition to fully exploiting instruction set and thread level parallelism using aggressive superscalar techniques, multi-core, multi-threaded techniques, many chips employ accelerators (accelerators) together with general purpose processors (CPUs) as heterogeneous cores of multi-core processors, further exploiting data level parallelism (DATA LEVEL PARALLELISM, DLP). The accelerator generally adopts a multifunctional component and a wide data path structure, so that the computing capacity of the accelerator is improved, the data level parallelism is mined, and a CPU is required to control the accelerator. A problem with heterogeneous architectures is that the interaction of control and data streams of the architecture can become a bottleneck for the overall architecture.
Accelerators include domain processors, functional accelerators, and application scene accelerators, among others. For example, AI processors, graphics processors GPUs, digital signal processors DSPs, data processors DPUs, stream processors, network processors, image signal processors, display processors, encryption and decryption accelerators, audio accelerators, video codec accelerators, various mathematical operation accelerators, convolution operation accelerators, matrix operation accelerators, and the like.
Memory (Memory) is one of the important components of a computer, namely, an internal Memory and a main Memory, and is used for temporarily storing operation data in a CPU (Central Processing Unit ) and exchanging data with an external Memory such as a hard disk. The memory is a bridge for the CPU to communicate with the peripheral, and all programs in the computer are run in the memory. As long as the computer starts to run, the operating system will call the data to be operated from the memory to the CPU to operate, when the operation is completed, the CPU will send out the result.
CPU (Central Processing Unit ), accelerator (Accelerator), DDR (Double DATA RATE SDRAM, double rate synchronous dynamic random access memory), PCIe (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, high speed serial computer expansion bus standard) are connected through a bus, and SSD (Solid STATE DISK ) is connected with PCIe.
Taking a CPU and an accelerator such as a GPU (Graphics Processing Unit, a graphic processor) as examples of the accelerator, when the CPU and the GPU jointly process data, firstly, the data is read from an SSD through PCIe and stored in the DDR; then, the GPU can read the needed data from the DDR to process, and returns the processed data to the DDR; and the CPU reads the data processed by the GPU from the DDR, and returns the processed data to the DDR. After the data processing is finished, the data is transmitted back to the SSD for storage through PCIe.
Because the data transmission between the CPU and the accelerator is realized through DDR transfer, namely the data interaction of the memory level, the transmission speed is slower, and the data transmission efficiency is affected.
Based on the above, the embodiments of the present disclosure provide a method, an apparatus, and an electronic device for interaction between a central processing unit and an accelerator, where the central processing unit sends a command to the accelerator through a command interface in response to a data transmission requirement between the central processing unit and the accelerator; and receiving an execution result fed back by the accelerator execution command. Therefore, the central processing unit can directly send a command to the accelerator through the command interface, and receive an execution result fed back by the accelerator, so that data transmission is completed, and the data transmission speed and efficiency can be improved.
It should be noted that, the execution body of the method for interaction between the central processing unit and the accelerator in this embodiment may be a device for interaction between the central processing unit and the accelerator, where the device may be implemented by software and/or hardware, and the device may be configured in an electronic device, and the electronic device may include, but is not limited to, a terminal, a server, and so on.
Fig. 1 is a flow chart of a method for interaction between a central processing unit and an accelerator according to an embodiment of the disclosure, as shown in fig. 1, the method includes:
s101: when it is determined that there is a data transfer requirement, a command is sent to a register of the accelerator through a command interface.
In the embodiment of the disclosure, in the case that a data transmission requirement exists between the central processing unit and the accelerator, the central processing unit can directly send a command to a register inside the accelerator through a command interface.
The data transmission requirement exists between the central processing unit and the accelerator, which may be that the central processing unit needs to store data into the accelerator, or that the central processing unit needs to read data from the accelerator, or that the central processing unit needs to store data into the accelerator while reading data from the accelerator, etc., which is not particularly limited in the embodiments of the present disclosure.
For example, the central processor may execute a first control instruction, where the first control instruction is a custm _load instruction, and at this time, a command sent by the central processor may instruct fetching from a local random access memory (SRAM) of the accelerator to a vector/fixed/floating point register, where the address space corresponding to the instruction is exclusive, and the CPU can only access the corresponding accelerator.
For example, the central processor may execute a second control instruction, where the second control instruction is a custm _store instruction, and at this time, a command sent by the central processor may instruct to store data in the vector/fixed point/floating point registers into the SRAM of the accelerator, where the address space corresponding to the instruction is exclusive, and the CPU can only access the corresponding accelerator.
For example, the central processor may execute a third control instruction, which is custm _ sendcmd instruction, at which point the central processor may command the accelerator without waiting for the commands in the accelerator to complete.
Illustratively, the central processor may execute a fourth control instruction, which is custm _ SENDCMDWAITFINISH instruction, at which point the central processor sends a command to the accelerator and must wait for the accelerator to complete execution of the command (a timeout exception may also be set).
In some embodiments, a command interface (Command interface) is provided on the central processor, through which the central processor can send commands directly to the accelerator.
In the embodiment of the disclosure, the central processing unit can directly send the command to the accelerator through the command interface arranged on the central processing unit, so that an efficient data path and a control path can be established between the central processing unit and the accelerator, data transmission is realized, and the data transmission speed and efficiency are improved.
Illustratively, the control path: the central processor may control the accelerator by commands and the reserved portion of instructions may be custom functions supporting blocking (blocking) mode and/or non-blocking (non-blocking) mode. In the blocking mode, the central processing unit waits for the command before the accelerator to be executed and then sends the following command to the accelerator for execution. In the non-blocking mode, the central processing unit can send the following command to the accelerator for execution without waiting for the command before the accelerator to be executed.
Illustratively, the data path: the central processor may support direct data interaction with inter-accelerator registers, SPM (SCRATCHPAD MEMORY, SPM) levels. Setting a CPU interface (Customized CPU interface for ACCESSING DSA memory) for accessing DSA memory customization; providing a consistent access channel: allowing the accelerator to access the consistent memory region.
S102: and receiving an execution result of the accelerator execution command.
In the embodiment of the disclosure, in the case that a data transmission requirement exists between the central processing unit and the accelerator, the central processing unit can send a command to a register of the accelerator through the command interface, and can also receive an execution result fed back by the accelerator executing the command.
In some embodiments, the central processing unit executes a first control instruction, and a command sent by the central processing unit is used for indicating to store first data into the accelerator, receiving an execution result fed back by the accelerator execution command, including: receiving an execution result of stored first data fed back by an accelerator execution command; or receiving an execution result of the first data storage failure fed back by the accelerator execution command; or the memory fed back by the accelerator execution command is insufficient to store the execution result of the first data failure.
The first control instruction may be a custm _store instruction.
In an embodiment of the disclosure, the central processor executes the first control instruction, and the command sent by the central processor to the accelerator may be used to instruct to store the first data into the accelerator, in which case, after the accelerator receives the command, the accelerator may execute the command and store the first data.
Wherein the accelerator may successfully store the first data or the accelerator may fail to store the first data if the accelerator memory space is insufficient to store the first data.
It will be appreciated that in the case where the first data is successfully stored, the accelerator feeds back the execution result to the central processor, and the execution result of the stored first data may be fed back to the central processor, so that the central processor may receive the execution result of the stored first data fed back by the accelerator execution command.
It may be further understood that, in the case of the first data storage failure, the accelerator feeds back the execution result to the central processing unit, and may feed back the execution result of the first data storage failure to the central processing unit, so that the central processing unit may receive the execution result of the first data storage failure fed back by the accelerator executing command.
It may be further understood that, in the case of failure of storing the first data in the insufficient memory, the accelerator feeds back the execution result to the central processor, and may feed back the execution result of failure of storing the first data in the insufficient memory to the central processor, so that the central processor may receive the execution result of failure of storing the first data in the insufficient memory fed back by the accelerator executing the command.
For example, in the case where the central processor executes the first control instruction, where the first control instruction is a custm _store instruction, the central processor sends a command to the accelerator, where the accelerator receives a command to the accelerator to feedback that the particular data has been stored in the vector/fixed point/floating point register, or that the particular data has failed to be stored in the memory, or that the memory has not been sufficiently stored in the memory, and so on.
In some embodiments, the central processing unit executes the second control instruction, and the command sent by the central processing unit is used for indicating to obtain the second data from the accelerator, and receiving the execution result fed back by the accelerator execution command, including: receiving an execution result which is fed back by the accelerator execution command and comprises second data; or receiving an execution result of failure in acquiring the second data, which is fed back by the accelerator execution command; or receiving the execution result of the accelerator execution command feedback, which is not read to the second data.
The second control instruction may be a custm _load instruction.
In an embodiment of the disclosure, the command sent by the central processor to the accelerator may be used to instruct to acquire the second data from the accelerator, in which case, after the accelerator receives the command, the command may be executed to read the second data.
If the second data is stored in the accelerator, the accelerator executes the command, and the second data may be read, and if the second data is not stored in the accelerator, the accelerator may fail to read the second data.
It will be appreciated that in the case of successful reading of the second data, the accelerator feeds back the execution result to the central processor, and the execution result including the second data may be fed back to the central processor, whereby the central processor may receive the execution result including the second data fed back by the accelerator execution command.
It may be further understood that, in the case that the first data read fails, the accelerator feeds back the execution result to the central processor, and may feed back the execution result that the second data acquisition fails to the central processor, so that the central processor may receive the execution result that the accelerator performs the command feedback and that the second data acquisition fails.
It is also understood that, in the case where the second data is not read, the accelerator feeds back the execution result to the central processor, and the execution result of the second data which is not read may be fed back to the central processor, so that the central processor may receive the execution result of the second data which is not read and is fed back by the accelerator execution command.
For example, when the central processor executes the second control instruction, where the second control instruction is a custm _load instruction, the central processor sends a command to the accelerator, where the command may indicate that the specific data is read from the accelerator, the accelerator receives the command, may feed back the specific data to the central processor, or feed back that the specific data is not acquired, or feed back that the specific data is not read, and so on.
By implementing the embodiment of the disclosure, in response to the data transmission requirement between the central processing unit and the accelerator, the central processing unit sends a command to the accelerator through a command interface; the CPU receives the execution result of the accelerator execution command feedback. Therefore, the central processing unit can directly send a command to the accelerator through the command interface, and receive an execution result fed back by the accelerator, so that data transmission is completed, and the data transmission speed and efficiency can be improved.
In some embodiments, in order to transfer data between the central processing unit and the accelerator, a command queue may be set in a register of the accelerator, or a space may be opened up in the register of the accelerator to store commands, so that the central processing unit may send multiple commands to the accelerator, without waiting for the accelerator to send a next command after sending one command, which can improve processing efficiency.
In some embodiments, sending a command to the accelerator includes:
Executing a third control instruction by the central processing unit, and storing the command into a specific address space; or alternatively
Executing a third control instruction by the central processing unit, and storing the command into a command queue; or alternatively
The central processing unit executes the fourth control instruction, and sends a command to the accelerator under the condition that the accelerator is determined to have executed the current command.
In the embodiment of the disclosure, the central processor sends a command to the accelerator through the command interface, and in the case that the central processor executes a third control instruction, the third control instruction is custm _ sendcmd instruction, the central processor may store the command to a specific address space.
In some embodiments, the particular address space may be a block of space in the accelerator's private address space that may hold commands.
In some embodiments, the accelerator executes the commands stored in the specific address space, may execute sequentially according to the order of the commands stored, may execute sequentially according to the priority of the commands from high to low, may execute sequentially according to the priority of the commands and the order of the commands stored, execute sequentially according to the priority from high to low, the order of the commands stored from first to last, and so on.
In the embodiment of the disclosure, the central processor sends a command to the accelerator through the command interface, and in the case that the central processor executes a third control instruction, the third control instruction is custm _ sendcmd instruction, the central processor may store the command to the command queue.
In some embodiments, as shown in FIG. 2, a command queue is provided between the central processor and the accelerator, and commands sent by the central processor to the accelerator may be deposited into the command queue.
In the embodiment of the disclosure, the central processor sends a command to the accelerator through the command interface, and in the case that the central processor executes the fourth control instruction and the third control instruction is custm _ SENDCMDWAITFINISH instruction, the central processor may send the command to the accelerator in the case that it is determined that the accelerator has executed the current command.
In some embodiments, the central processor executes a third control instruction, storing the command to a particular address space, comprising: the central processing unit determines that the specific address space is not full; the central processor stores the command to a specific address space.
In the embodiment of the disclosure, the central processing unit stores the command to the specific address space, and may store the command to the specific address space if it is determined that the specific address space is not full.
It will be appreciated that the size of the particular address space is limited, and that it is only possible to store a limited number of commands, and that in the event that it is determined that the particular address space is not full, commands may be stored to the particular address space.
In some embodiments, the particular address space is not full, there may be room for the particular address space, or the space remaining for the particular address space may also be sufficient to hold commands.
In some embodiments, the central processor executes a third control instruction, depositing the command into a command queue, comprising: the central processing unit determines that the command queue is not full; the CPU stores the commands to the command queue.
In the embodiment of the disclosure, the central processing unit stores the command in the command queue, and can store the command in the command queue under the condition that the command queue is determined to be not full.
It will be appreciated that the number of commands that a command queue can store is limited, and that it can store only a limited number of commands, and that in the event that it is determined that the command queue is not full, commands can be stored to the command queue.
In some embodiments, the central processor determining that the command queue is not full comprises: the central processing unit determines that an empty item exists in the command queue and determines that the command queue is not full.
In the embodiment of the disclosure, the central processing unit determines that the command queue is not full, and may determine that the command queue is not full if it is determined that an empty item exists in the command queue, or may also determine that the command queue is not full if it is determined that the empty item in the command queue is sufficient to store the command.
In some embodiments, the above method further comprises: the central processing unit determines that the command queue is full and determines to stop sending commands.
In the embodiment of the disclosure, the central processor may determine to stop sending the command in case it is determined that the command queue is full.
In some embodiments, the central processor determining that the command queue is full comprises: the central processing unit determines that no empty item exists in the command queue and determines that the command queue is full; or determining that the command queue is full because no empty item exists in the command queue and the command waits to be stored in the command queue for a preset time.
In the embodiment of the disclosure, the central processing unit determines that the command queue is full, and can determine that the command queue is full under the condition that no empty item exists in the command queue.
In the embodiment of the disclosure, the cpu determines that the command queue is full, and may determine that the command queue is full if it is determined that no empty entry exists in the command queue and the command waits to be stored in the command queue for a preset time.
Illustratively, a command queue (command FIFO) is provided between the central processor and the accelerator, into which commands are directed by default. Without waiting for the completion of the execution of the command in the accelerator, the CPU sends the command to the command queue until the command queue is full.
The CPU determines whether the command queue is full. The judging conditions are as follows: the command queue has no empty item, or the number of commands stored in the command queue again is larger than the number of empty items of the command queue, and the command queue is judged to be full.
When the command queue is not full, the command is sent to the command queue.
When the command queue is full, the following processing method can be adopted:
A. The instruction report is abnormal. The instruction returns an exception to the central processor, which stops sending commands until the command queue has an empty entry.
B. the command is designed with a return value given point register, the return values are a first return value and a second return value, specifically, whether the command is set successfully or not can be respectively represented by 0 and 1, and the command can also be represented by other values, and the embodiment of the disclosure is not limited herein; in one example, if the command queue is full, the command returns to the central processor with a return value of 0, indicating that the command was not set successfully, and if the command queue is not full, the command returns to the central processor with a return value of 1, indicating that the command was set successfully.
C. The judgment instruction is designed to check whether the command queue has an empty item. If the empty item exists, the central processing unit sends the command to the command queue, and if the empty item does not exist, the central processing unit stops sending the command until the command queue has the empty item;
D. and setting and executing a first timeout period by the command, wherein the command is waited until the command is sent to a command queue, and if the wait time exceeds the first timeout period, reporting an exception and returning to the central processing unit, and stopping sending the command by the central processing unit until the command queue has an empty item.
In some embodiments, the above method further comprises: the central processing unit determines that a feedback instruction of the accelerator executing the current command is received, the feedback instruction indicates that the current command is executed, and the accelerator is determined to be executed.
In the embodiment of the disclosure, the central processing unit sends a command to the accelerator under the condition that the accelerator is determined to have executed the current command, wherein under the condition that the accelerator is determined to receive a feedback instruction for executing the current command and the feedback instruction indicates that the current command is executed, the accelerator can be determined to have executed the current command.
In some embodiments, the above method further comprises: the central processing unit determines that a feedback instruction of executing the current command by the accelerator is not received within a specific time after the current command is sent out, and determines that the current command is executed abnormally.
In the embodiment of the disclosure, the central processing unit may determine that the current command is executed abnormally when it is determined that the feedback instruction of the accelerator executing the current command is not received within a specific time after the current command is issued.
For example, the cpu needs to wait for the command in the accelerator to be executed before sending the command to the accelerator. If the command in the accelerator has not been executed, the central processor waits.
The following treatments may also be added:
1. Setting the overtime time of executing the second command, and if the command waiting execution time exceeds the overtime time of the second command, reporting an exception and returning to the central processing unit.
2. The command is designed with a return value given point register, the return values are a third return value and a fourth return value, specifically, whether the command is set successfully or not can be respectively represented by 0 and 1, and the command can also be represented by other values, and the embodiment of the disclosure is not limited herein; in one example, if the command in the accelerator is not executed, the command returns to the central processor with a return value of 0, indicating that the command is not set to be successful, and if the command in the accelerator is executed, the command returns to the central processor with a return value of 1, indicating that the command is set to be successful.
By implementing the embodiment of the disclosure, the central processing unit can directly send the command to the accelerator through the command interface, and receive the execution result fed back by the accelerator, so as to complete data transmission, and improve the data transmission speed and efficiency.
In order to implement the above embodiment, the present disclosure further proposes a device for interaction between a central processing unit and an accelerator.
As shown in fig. 3, the cpu and accelerator interaction device includes: a transmitting unit 301 and a receiving unit 302.
A sending unit 301, configured to send a command to a register of the accelerator through a command interface when it is determined that a data transmission requirement exists, so that the accelerator executes the command, where the register is a local random access memory inside the accelerator;
a receiving unit 302, configured to receive an execution result of the accelerator execution command.
In one possible implementation, the device provided by the embodiments of the present disclosure, the command includes a blocking mode and a non-blocking mode.
In one possible implementation manner, in the apparatus provided by the embodiments of the present disclosure, the sending unit 301 is specifically configured to:
in response to a first control instruction, a first command is sent to the register through the command interface, the first command being for indicating to store first data into the accelerator.
In one possible implementation manner, in the apparatus provided by the embodiments of the present disclosure, the sending unit 301 is specifically configured to:
In response to the second control instruction, a second command is sent to the register through the command interface, the second command being for instructing the retrieval of the second data from the accelerator.
In one possible implementation manner, in the apparatus provided in the embodiments of the present disclosure, a specific address space and a command queue are set in a register, and then the sending unit 301 is specifically configured to:
and in response to the second control instruction, sending a second command to the accelerator through the command interface, wherein the second command is used for indicating that the second data is acquired from the accelerator.
In one possible implementation manner, in the apparatus provided by the embodiments of the present disclosure, the sending unit 301 is specifically configured to:
storing the command to a specific address space of the accelerator in response to the third control instruction; or alternatively
Responding to a third control instruction, and storing the command into a command queue; or alternatively
In response to the fourth control instruction, a command is sent to the accelerator in a case where it is determined that the accelerator has executed the current command.
In one possible implementation manner, in the apparatus provided by the embodiments of the present disclosure, the sending unit 301 is specifically configured to:
when no empty item exists in the command queue, determining that the command queue is full;
Stopping sending the command, continuously monitoring the command queue until an empty item exists in the command queue, and storing the command into the command queue.
In one possible implementation manner, in the apparatus provided by the embodiments of the present disclosure, the sending unit 301 is specifically configured to:
When an abnormal instruction fed back by the accelerator is received, determining that the command queue is full; /or
Receiving a return value of a command, and if the return value is a first preset value indicating that the command is not successfully executed, determining that a command queue is full; /or
If the set judging instruction for checking whether the queue has empty items indicates that the queue has no empty items, determining that the command queue is full; /or
If the execution time length of the command is greater than or equal to the preset first command timeout time length, determining that the command queue is full.
In one possible implementation manner, in the apparatus provided by the embodiments of the present disclosure, the sending unit 301 is specifically configured to:
when the abnormal instruction is stopped being received, determining that the command queue has an empty item, and storing the command into the command queue; /or
Receiving a return value, and if the return value is a second preset value indicating that the command is successfully executed, determining that an empty item exists in the command queue, and storing the command into the command queue; /or
If the set judging instruction for checking whether the queue has an empty item indicates that the empty item exists, determining that the command queue has the empty item, and storing the command into the command queue; /or
If the execution duration of the command is smaller than the timeout duration of the first command, determining that the command queue has an empty item, and storing the command into the command queue.
In one possible implementation manner, in the apparatus provided by the embodiments of the present disclosure, the sending unit 301 is specifically configured to:
If the execution time length of the command is greater than or equal to the preset second command timeout time length, stopping sending the command; /or
And receiving a return value of the command, and stopping sending the command if the return value is a third preset value indicating that the command is not successfully executed.
In one possible implementation manner, in the apparatus provided by the embodiments of the present disclosure, the sending unit 301 is specifically configured to:
If the execution duration of the command is less than the overtime duration of the second command, the command is restored to be sent; /or
And receiving a return value of the command, and if the return value is a fourth preset value indicating that the command is successfully executed, recovering to send the command.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
To achieve the above embodiments, the present disclosure also proposes a computer program product that, when executed by an instruction processor in the computer program product, performs a method of central processor and accelerator interaction as proposed by the foregoing embodiments of the present disclosure.
Fig. 4 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure. The electronic device 12 shown in fig. 4 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.
As shown in fig. 4, the electronic device 12 is in the form of a general purpose computing device. Components of the electronic device 12 may include, but are not limited to: one or more processors 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processors 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include industry Standard architecture (Industry Standard Architecture; hereinafter ISA) bus, micro channel architecture (Micro Channel Architecture; hereinafter MAC) bus, enhanced ISA bus, video electronics standards Association (Video Electronics Standards Association; hereinafter VESA) local bus, and peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECTION; hereinafter PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter: RAM) 30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive").
Although not shown in fig. 4, a disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a compact disk read only memory (Compact Disc Read Only Memory; hereinafter CD-ROM), digital versatile read only optical disk (Digital Video Disc Read Only Memory; hereinafter DVD-ROM), or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the various embodiments of the disclosure.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods in the embodiments described in this disclosure.
The electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the electronic device 12, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks, such as a local area network (Local Area Network; hereinafter: LAN), a wide area network (Wide Area Network; hereinafter: WAN), and/or a public network, such as the Internet, through the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 over the bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 12, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processor 16 executes various functional applications, such as implementing the central processor and accelerator interaction method mentioned in the foregoing embodiments, by running programs stored in the system memory 28.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
It should be noted that in the description of the present disclosure, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present disclosure, unless otherwise indicated, the meaning of "a plurality" is two or more.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
Furthermore, each functional unit in the embodiments of the present disclosure may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present disclosure have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present disclosure, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present disclosure.

Claims (12)

1. A method for interacting a central processing unit with an accelerator, the method comprising:
When determining that a data transmission requirement exists, a central processing unit sends a command to a register of an accelerator through a command interface so as to enable the accelerator to execute the command, wherein the register is a local random access memory in the accelerator, and a data path and a control path are established between the central processing unit and the accelerator when the central processing unit sends the command to the accelerator through the command interface arranged on the central processing unit;
The CPU obtains the execution result of the command from the register of the accelerator;
the method further comprises the steps of:
setting a consistency access channel between the central processing unit and the accelerator so that the accelerator directly accesses the memory of the central processing unit through the consistency access channel;
when the data transmission requirement is determined to exist, the central processing unit sends a command to the register of the accelerator through the command interface, and the method comprises the following steps:
in response to a second control instruction, sending a second command to the register through a command interface, wherein the second command is used for indicating that second data is acquired from the accelerator;
and when the command is a second control instruction and the second data is successfully read from the register of the accelerator, the execution result comprises the second data.
2. The method of claim 1, wherein the command comprises a blocking mode and a non-blocking mode.
3. The method of claim 1, wherein the determining that the data transfer requirement exists, the central processor sending a command to a register of the accelerator via the command interface, comprises:
and responding to a first control instruction, and sending a first command to the register through the command interface, wherein the first command is used for indicating that first data is stored in the accelerator.
4. A method according to claim 3, wherein the register has a specific address space and a command queue, and wherein when it is determined that there is a data transmission requirement, the central processor sends a command to the accelerator register through the command interface, comprising:
In response to a third control instruction, depositing the command into the particular address space; or alternatively
Storing the command to the command queue in response to a third control instruction; or alternatively
In response to a fourth control instruction, sending the command to the accelerator if it is determined that the accelerator has executed the current command.
5. The method of claim 4, wherein after depositing the command to a command queue in response to a third control instruction, the method further comprises:
When no empty item exists in the command queue, determining that the command queue is full;
Stopping sending the command, continuously monitoring the command queue until an empty item exists in the command queue, and storing the command into the command queue.
6. The method of claim 5, wherein determining that the command queue is full when no empty entry exists in the command queue comprises:
When an abnormal instruction fed back by the accelerator is received, determining that the command queue is full; /or
Receiving a return value of the command, and if the return value is a first preset value indicating that the command is not successfully executed, determining that the command queue is full; /or
If the set judging instruction for checking whether the queue has empty items indicates that the queue has no empty items, determining that the command queue is full; /or
And if the execution duration of the command is greater than or equal to the preset first command timeout duration, determining that the command queue is full.
7. The method of claim 6, wherein the continuously monitoring the command queue until an empty entry exists in the command queue, depositing the command into the command queue, comprises:
When the abnormal instruction is stopped being received, determining that the command queue has an empty item, and storing the command into the command queue; /or
Receiving the return value, and if the return value is a second preset value indicating that the command is successfully executed, determining that an empty item exists in the command queue, storing the command into the command queue; /or
If the set judging instruction for checking whether the queue has an empty item indicates that the empty item exists, determining that the command queue has the empty item, storing the command into the command queue; /or
If the execution duration of the command is smaller than the first command timeout duration, determining that an empty item exists in the command queue, and storing the command into the command queue.
8. The method of claim 4, wherein the sending the command to the accelerator in response to a fourth control instruction if it is determined that the accelerator has executed a current command comprises:
if the execution duration of the command is greater than or equal to the preset second command timeout duration, stopping sending the command; /or
And receiving a return value of the command, and stopping sending the command if the return value is a third preset value indicating that the command is not successfully executed.
9. The method of claim 8, wherein the method further comprises:
if the execution duration of the command is smaller than the overtime duration of the second command, the command is restored to be sent; /or
And receiving a return value of the command, and if the return value is a fourth preset value indicating that the command is successfully executed, restoring to send the command.
10. A central processing unit and accelerator interaction device, the device comprising:
The system comprises a sending unit, a central processing unit and a control unit, wherein the sending unit is used for determining that a data transmission requirement exists, the central processing unit sends a command to a register of an accelerator through a command interface so as to enable the accelerator to execute the command, the register is a local random access memory in the accelerator, and the central processing unit establishes a data path and a control path between the central processing unit and the accelerator when sending the command to the accelerator through the command interface arranged on the central processing unit;
the receiving unit is used for the central processing unit to acquire the execution result of the command from the register of the accelerator;
The sending unit is specifically configured to:
setting a consistency access channel between the central processing unit and the accelerator so that the accelerator directly accesses the memory of the central processing unit through the consistency access channel;
Responsive to a second control instruction, the central processor sends a second command to the register through a command interface, the second command being for instructing the retrieval of second data from the accelerator;
and when the command is a second control instruction and the second data is successfully read from the register of the accelerator, the execution result comprises the second data.
11. An electronic device, comprising:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 9.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 9.
CN202410365390.6A 2024-03-28 2024-03-28 Method and device for interaction between central processing unit and accelerator and electronic equipment Active CN117971317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410365390.6A CN117971317B (en) 2024-03-28 2024-03-28 Method and device for interaction between central processing unit and accelerator and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410365390.6A CN117971317B (en) 2024-03-28 2024-03-28 Method and device for interaction between central processing unit and accelerator and electronic equipment

Publications (2)

Publication Number Publication Date
CN117971317A CN117971317A (en) 2024-05-03
CN117971317B true CN117971317B (en) 2024-07-02

Family

ID=90846315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410365390.6A Active CN117971317B (en) 2024-03-28 2024-03-28 Method and device for interaction between central processing unit and accelerator and electronic equipment

Country Status (1)

Country Link
CN (1) CN117971317B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011553A (en) * 2019-12-20 2021-06-22 三星电子株式会社 Accelerator, method of operating an accelerator, and apparatus including an accelerator

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070168646A1 (en) * 2006-01-17 2007-07-19 Jean-Francois Collard Data exchange between cooperating processors
US9703603B1 (en) * 2016-04-25 2017-07-11 Nxp Usa, Inc. System and method for executing accelerator call
CN112949847B (en) * 2021-03-29 2023-07-25 上海西井科技股份有限公司 Neural network algorithm acceleration system, scheduling system and scheduling method
CN116483556A (en) * 2023-03-08 2023-07-25 阿里巴巴(中国)有限公司 Data processing method and accelerator system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011553A (en) * 2019-12-20 2021-06-22 三星电子株式会社 Accelerator, method of operating an accelerator, and apparatus including an accelerator

Also Published As

Publication number Publication date
CN117971317A (en) 2024-05-03

Similar Documents

Publication Publication Date Title
US10877766B2 (en) Embedded scheduling of hardware resources for hardware acceleration
KR101392109B1 (en) Providing state storage in a processor for system management mode
JP5385272B2 (en) Mechanism for broadcasting system management interrupts to other processors in a computer system
US7093116B2 (en) Methods and apparatus to operate in multiple phases of a basic input/output system (BIOS)
US5682551A (en) System for checking the acceptance of I/O request to an interface using software visible instruction which provides a status signal and performs operations in response thereto
KR20100058670A (en) Apparatus, system, and method for cross-system proxy-based task offloading
JP2011118871A (en) Method and device for improving turbo performance for event processing
TW200305802A (en) Power conservation techniques for a digital computer
US20210029219A1 (en) Data storage system with processor scheduling using distributed peek-poller threads
TW201305821A (en) Flexible flash commands
JP2003296191A (en) Integrated circuit operable as general purpose processor and processor of peripheral device
US20240143392A1 (en) Task scheduling method, chip, and electronic device
US5371857A (en) Input/output interruption control system for a virtual machine
US8224884B2 (en) Processor communication tokens
US8260995B2 (en) Processor interrupt command response system
US20100161914A1 (en) Autonomous memory subsystems in computing platforms
TW201303870A (en) Effective utilization of flash interface
US6615296B2 (en) Efficient implementation of first-in-first-out memories for multi-processor systems
US20070038429A1 (en) System simulation method
CN117971317B (en) Method and device for interaction between central processing unit and accelerator and electronic equipment
US9965321B2 (en) Error checking in out-of-order task scheduling
US20090013331A1 (en) Token protocol
US11636056B1 (en) Hierarchical arbitration structure
US11803467B1 (en) Request buffering scheme
US10884477B2 (en) Coordinating accesses of shared resources by clients in a computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant