WO2019223383A1 - Direct memory access method and device, dedicated computing chip and heterogeneous computing system - Google Patents

Direct memory access method and device, dedicated computing chip and heterogeneous computing system Download PDF

Info

Publication number
WO2019223383A1
WO2019223383A1 PCT/CN2019/076252 CN2019076252W WO2019223383A1 WO 2019223383 A1 WO2019223383 A1 WO 2019223383A1 CN 2019076252 W CN2019076252 W CN 2019076252W WO 2019223383 A1 WO2019223383 A1 WO 2019223383A1
Authority
WO
WIPO (PCT)
Prior art keywords
dma control
length
control block
output data
input data
Prior art date
Application number
PCT/CN2019/076252
Other languages
French (fr)
Chinese (zh)
Inventor
廖恬瑜
潘国振
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019223383A1 publication Critical patent/WO2019223383A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Definitions

  • One or more embodiments of the present specification relate to the field of computer technology, and in particular, to a direct memory access method, device, dedicated computing chip, and heterogeneous computing system.
  • Heterogeneous computing refers to the control of the overall process of data processing by a general-purpose central processing unit (CPU).
  • CPU central processing unit
  • the general-purpose CPU calls a special-purpose computing chip for calculation.
  • a general-purpose CPU needs to call a Direct Memory Access (DMA) method (a method of transferring memory data through a dedicated hardware module) to transfer input data of a dedicated calculation from a system memory to a device memory. After the special calculation chip completes the calculation, the output data is transmitted back to the system memory.
  • DMA Direct Memory Access
  • the transmission process of the input data may be: 1) accessing a pointer of the queue where the DMA descriptor is located to read the DMA descriptor of the input data (used to describe the address and length of the input data). 2) Access the DMA descriptor of the input data to read the address and length of the input data. 3) Read the input data according to the address and length of the input data.
  • the transmission process of the output data can be: 1) access the pointer of the queue where the DMA descriptor is located to read the DMA descriptor of the output data (used to describe the address and length of the output data). 2) Access the DMA descriptor of the output data to read the address and length of the output data; 3) Write the output data according to the address and length of the output data.
  • a process of heterogeneous computing needs to perform six access operations.
  • One or more embodiments of the present specification describe a direct memory access method, a device, a dedicated computing chip, and a heterogeneous computing system, which can reduce the number of data accesses in a DMA transfer, thereby improving the performance of heterogeneous computing.
  • a direct memory access method including:
  • a corresponding DMA control block is determined in system memory, and the content of the DMA control block includes DMA control information and input data;
  • the system memory is used to store the data used by the general-purpose central processing unit CPU. Data storage space;
  • the device memory refers to a storage space for storing data of a dedicated computing chip
  • a dedicated computing chip including: a direct memory access DMA length register, a DMA control block pointer queue, a DMA data transmission module, and a dedicated computing module;
  • the DMA length register is used to store the length of the input data and the length of the output data
  • the DMA control block pointer queue is used to store multiple DMA control block pointers; the DMA control block pointer points to a DMA control block in system memory; the content of the DMA control block includes DMA control information and input data;
  • a DMA data transmission module configured to move the DMA control information and the input data from system memory to device memory according to the length of the input data, the length of the DMA control information, and the DMA control block pointer; Configured to move the output data from the device memory to the system memory according to the DMA control information and the length of the output data;
  • the dedicated calculation module is configured to calculate the input data and obtain the output data.
  • a heterogeneous computing system including: a general-purpose central processing unit CPU, system memory, a dedicated computing chip and device memory as provided in the second aspect above;
  • the general-purpose CPU is configured to call the dedicated computing chip for heterogeneous computing
  • the system memory is used to store data used by the general-purpose CPU
  • the device memory is configured to store data used by the dedicated computing chip.
  • a direct memory access device including:
  • a determining unit configured to determine a corresponding DMA control block in system memory according to the DMA control block pointer read by the reading unit, and the content of the DMA control block includes DMA control information and input data;
  • the system Memory refers to the storage space used to store data used by the general purpose central processing unit CPU;
  • the determining unit is further configured to determine a total length of the DMA control information and the input data
  • a moving unit configured to move the DMA control information and the input data to a device memory according to the DMA control block pointer read by the reading unit and the total length determined by the determining unit;
  • Device memory refers to the storage space used to store data for dedicated computing chips;
  • a calculation unit configured to perform corresponding calculation on the input data to obtain output data
  • a writing unit configured to write the output data calculated by the calculation unit into the device memory
  • An obtaining unit configured to obtain a length of the output data
  • the moving unit is further configured to move the output data from the device memory to the DMA control block according to the DMA control information and a length of the output data obtained by the obtaining unit.
  • the direct memory access method, device, dedicated computing chip, and heterogeneous computing system read a DMA control block pointer from a DMA control block pointer queue.
  • the corresponding DMA control block is determined in the system memory. Determines the total length of DMA control information and input data in the DMA control block.
  • the DMA control information and input data are moved to the device memory. Perform corresponding calculations on the input data to obtain the output data. Write the output data to the device memory and get the length of the output data.
  • the output data is moved from the device memory to the DMA control block.
  • the scheme provided in this specification performs the following two access operations during the input data transmission process: for the first time, the DMA control block pointer queue is accessed to read the DMA control block pointer. The second time, the DMA control block pointer is accessed to read the DMA control information and input data.
  • the access to the DMA descriptor is reduced.
  • the output data can be moved to the DMA control block directly according to the DMA control information. That is to say, the output data transmission process only needs to perform access to the output data once, and it is no longer necessary to execute the pointer operation of the queue where the DMA descriptor is located and the access operation of the output data descriptor.
  • two DMA transfers can reduce three access operations. This can greatly improve the DMA transmission efficiency of the data, which in turn can improve the performance of heterogeneous computing.
  • FIG. 1 is a schematic structural diagram of a heterogeneous computing system provided in this specification
  • FIG. 2 is a flowchart of a direct memory access method according to an embodiment of the present specification
  • FIG. 3 is a schematic diagram of a direct memory access device according to an embodiment of the present specification.
  • the direct memory access method provided by an embodiment of this specification can be applied to a heterogeneous computing system as shown in FIG. 1.
  • the heterogeneous computing system may include a general-purpose CPU 10, a system memory 20, a dedicated computing chip 30, and a device memory 40 Among them, the general-purpose CPU 10 and the dedicated computing chip 30 may also be referred to as two computing units in a heterogeneous computing system.
  • the general-purpose CPU 10 is used to control the main data processing flow of heterogeneous computing. Specifically includes: a. Preprocessing and preparation of heterogeneous computing input data. b. Calling a special computing chip for heterogeneous computing. c. Query heterogeneous calculation results (also called output data) and return. d. Perform post-processing and output of heterogeneous calculation output data.
  • the system memory 20 is used to store data used by the general-purpose CPU 10.
  • it can store the used data in the form of a DMA control block (a data structure), which occupies a physically continuous address space in the system memory 20.
  • the content of the DMA control block may include DMA control information, input data, and output data.
  • the space occupied by the input data may also be referred to as an input data block.
  • the space occupied by output data can also be referred to as an output data block.
  • the general-purpose CPU 10 may determine the length of the input data and the length of the output data according to the current heterogeneous calculation method.
  • the corresponding input data, the length of the input data, and the length of the output data are usually determined.
  • the DMA control information can be constructed later, and the specific construction process will be described later.
  • part of the content of the DMA control block (DMA control information and input data) is obtained.
  • the general-purpose CPU 10 can write the part of the content into a physically continuous address space of the system memory 20. It can be understood that since the length of the output data is also determined, after the above part of the content, an address space of the above length is usually reserved continuously for writing output data.
  • the above-mentioned DMA control block is constituted by an address space in which a part of content is written in the system memory 20 and a reserved address space.
  • a DMA control block can be formed in the system memory 20.
  • the above DMA control information may include: an offset address of input data, an offset address of output data, and a calculation completion flag.
  • the offset address of the input data can occupy 32 bits (that is, 4 bytes), which can refer to the offset of the space occupied by the input data (or the input data block) relative to the start address of the DMA control block.
  • the actual address of the input data in the system memory 20 may be determined according to the DMA control block start address and the offset address.
  • the definition of the offset address of the output data is the same as the definition of the offset address of the input data.
  • the calculation completion flag can occupy 1 bit (expanded to 4 bytes), which can be cleared by the general-purpose CPU 10 before heterogeneous calculation. After the heterogeneous calculation is completed, the dedicated calculation chip 30 rewrites the flag bit to 1.
  • the CPU 10 polls the calculation completion flag to confirm whether the heterogeneous calculation is completed.
  • the special-purpose calculation chip 30 is configured to cooperate with a general-purpose CPU to perform special-purpose calculation (for example, matrix multiplication and large number multiplication) functions.
  • the dedicated computing chip 30 may be, for example, a Field Programmable Gate Array (FPGA) chip, an Application Specific Integrated Circuit (ASIC) chip, a Graphics Processing Unit (GPU) chip, or the like.
  • the general-purpose CPU 10 has lower calculation efficiency, and uses a special-purpose calculation chip 30 for calculation, and has higher cost performance.
  • the device memory 40 is used to store data of the dedicated computing chip 30. Specifically, when the heterogeneous calculation is started, the dedicated computing chip 30 may read the input data from the device memory 40. When the heterogeneous calculation is completed, the output data may be written into the device memory 40.
  • the dedicated computing chip 30 may specifically include: a DMA length register 31, a DMA control block pointer queue 32, a DMA data transmission module 33, and a dedicated computing module 34.
  • the DMA length register 31 is used to store the length of the input data and the length of the output data. Usually for a specific heterogeneous calculation, the length of the input data and the length of the output data are fixed. That is, the general-purpose CPU 10 may determine the foregoing length according to the heterogeneous calculation method currently performed.
  • the DMA control block pointer queue 32 is used to store a plurality of DMA control block pointers.
  • the DMA control block pointers point to the DMA control blocks in the system memory 20, which can occupy 32 bits. Specifically, each time a DMA control block is formed in the system memory 20, the general-purpose CPU 10 can write a DMA control block pointer corresponding to the DMA control block to the DMA control block pointer queue 32. Since one DMA control block can be formed for one heterogeneous calculation, the DMA control block pointer also corresponds to one heterogeneous calculation.
  • multiple DMA control block pointers in the DMA control block pointer queue 32 can be read at the same time, so that multiple heterogeneous calculations can be performed in parallel. Greatly improve the efficiency of heterogeneous computing. It should be noted that the multiple heterogeneous calculations here belong to the same type, for example, they are all encrypted calculations.
  • the DMA data transmission module 33 is used to move the DMA control information and input data from the system memory 20 to the device memory 40 according to the length of the input data, the length of the DMA control information, and the DMA control block pointer; it is also used to output the data according to the DMA control information and output. The length of the data, the output data is moved from the device memory 40 to the system memory 20.
  • the special-purpose calculation module 34 is used to implement a special-purpose calculation function. Specifically, it is used to calculate input data and obtain output data.
  • the general-purpose CPU 10 in FIG. 1 can call a special-purpose computing chip 30 to perform heterogeneous calculations.
  • this specification improves the direct memory access method. Before executing the direct memory access method for heterogeneous computing provided in this manual, the following steps can be performed first:
  • the general-purpose CPU 10 determines the length of the input data and the length of the output data according to the current heterogeneous calculation method, and writes the length of the input data and the length of the output data into the DMA length register 31.
  • the general-purpose CPU 10 prepares input data for heterogeneous calculation and constructs DMA control information.
  • the DMA control information may include: an offset address of input data, an offset address of output data, and a calculation completion flag.
  • the offset address of the input data can be determined according to the length of the DMA control information.
  • an address space of the above length can be continuously reserved for writing the output data.
  • the address space in which data is written in the system memory 20 and the reserved address space constitute a DMA control block.
  • the general-purpose CPU 10 writes a DMA control block pointer into the DMA control block pointer queue 32, and the DMA control block pointer points to the start address of the formed DMA control block.
  • step 1) can be performed only once, while steps 2) and 3) are It can be executed multiple times based on the number of heterogeneous calculations.
  • FIG. 2 is a flowchart of a direct memory access method according to an embodiment of the present specification.
  • the execution subject of the method may be a dedicated computing chip 30 in FIG. 1.
  • the method may specifically include:
  • Step 210 Read the DMA control block pointer from the DMA control block pointer queue.
  • the DMA control block pointer here points to the start address of the DMA control block, so that the content of the DMA control block can be directly accessed according to the pointer, thereby reducing the number of accesses to the system memory 20, and thus reducing the DMA transfer delay.
  • the dedicated computing chip 30 may poll to check whether the DMA control block pointer queue 32 is empty. If it is not empty, the DMA control block pointer can be read from the head of the queue. Since the DMA control block pointer queue 32 provided in this specification can store multiple DMA control block pointers at the same time, it can more conveniently support DMA asynchronous operations, more conveniently support multiple processes to perform DMA operations independently, and improve DMA transmission efficiency.
  • Step 220 Determine the corresponding DMA control block in the system memory according to the DMA control block pointer.
  • the DMA control block A when the DMA control block pointer A is read, the DMA control block A can be determined; when the DMA control block pointer B is read, the DMA control block B can be determined. It can be understood that the content of the DMA control block read in this step only includes DMA control information and input data.
  • Step 230 Determine the total length of the DMA control information and the input data.
  • the DMA control information in this specification may include: an offset address of input data, an offset address of output data, and a calculation completion flag, and it has a fixed length.
  • the fixed length is the sum of the lengths of the three.
  • the above-mentioned determination process of the total length may be: reading the length of the input data from the DMA length register 31. Determine the total length based on the fixed length and the length of the input data.
  • Step 240 Move the DMA control information and input data to the device memory according to the DMA control block pointer and the total length.
  • the DMA data transmission module 33 may move the DMA control information and input data to the device memory 40 according to the DMA control block pointer and the total length.
  • a physically continuous address space may be divided in the device memory 40 first. After that, the DMA control information and input data can be read from the corresponding DMA control block according to the DMA control block pointer and the total length.
  • the above-mentioned read operation can also be understood as a continuous read read operation.
  • the DMA control information and the input data are written into a preliminarily divided physically continuous address space in the device memory 40. It can be understood that, after the foregoing write operation is performed, the start addresses of the DMA control information and the input data in the device memory 40 are determined.
  • Step 250 Perform corresponding calculations on the input data to obtain output data.
  • the dedicated calculation module 34 may be called to perform corresponding calculations on the input data.
  • the actual address of the input data in the device memory 40 may be determined according to the start address determined in step 240 and the offset address of the input data in the DMA control information. After that, the data input can be read from the device memory 40 according to the actual address, and a dedicated calculation module 34 is called to perform corresponding calculations on the input data.
  • Step 260 Write the output data to the device memory.
  • the actual address of the output data in the device memory 40 may be determined according to the start address determined in step 240 and the offset address of the output data in the DMA control information. After that, the output data can be written into the storage space corresponding to the actual address in the device memory 40.
  • Step 270 Obtain the length of the output data.
  • the length of the output data may be read from the DMA length register 31.
  • Step 280 Move the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data.
  • the output data may be moved from the device memory 40 to the DMA control block according to the offset address of the output data and the length of the output data in the DMA control information.
  • the moving process may specifically include: obtaining a DMA control information and a start address of the input data in the device memory 40.
  • the first actual address of the output data in the device memory 40 is determined according to the offset address and the start address.
  • the second actual address of the output data in the DMA control block is determined.
  • the dedicated computing chip 30 may rewrite the calculation completion flag in the DMA control information, for example, the calculation completion flag may be rewritten to 1.
  • the general-purpose CPU 10 may poll the calculation completion flag. When the calculation completion flag is 1, it indicates that the heterogeneous calculation is completed, and the output data in the system memory 20 may be used.
  • the direct memory access method provided in the embodiment of the present specification can avoid access to the system memory by DMA transmission of separate output data, and obtains the offset address of the output data while obtaining the input data from the system memory. Therefore, after the heterogeneous calculation is completed, the output data is directly moved according to the offset address. It avoids the operation of the general-purpose CPU and reduces the delay of the entire heterogeneous calculation.
  • the DMA block pointer queue provided in this specification only needs to write a 32-bit DMA block pointer at a time, and the amount of data is very small, which directly corresponds to an atomic write operation of a general-purpose CPU, which improves the efficiency of concurrent operations of multiple processes.
  • an embodiment of the present specification further provides a direct memory access device.
  • the device may include:
  • the reading unit 301 is configured to read a DMA control block pointer from a direct memory access DMA control block pointer queue.
  • the determining unit 302 is configured to determine a corresponding DMA control block in the system memory according to the DMA control block pointer read by the reading unit 301, and the content of the DMA control block includes DMA control information and input data.
  • the above system memory refers to a storage space for storing data used by a general-purpose central processing unit CPU.
  • the determining unit 302 is further configured to determine a total length of the DMA control information and the input data.
  • the DMA control information may have a fixed length.
  • the determining unit 302 may be specifically configured to:
  • the length of the input data is determined by the general-purpose CPU according to the heterogeneous calculation method currently performed.
  • the moving unit 303 is configured to move the DMA control information and the input data to the device memory according to the DMA control block pointer read by the reading unit 301 and the total length determined by the determining unit 32.
  • the device memory refers to the storage space used to store data for a dedicated computing chip.
  • the moving unit 303 here may be implemented by the DMA data transmission module 33 in FIG. 1.
  • the calculation unit 304 is configured to perform corresponding calculation on the input data to obtain output data.
  • the calculation unit 304 here may be implemented by a dedicated calculation module 34 in FIG. 1.
  • the writing unit 305 is configured to write output data calculated by the calculation unit 304 into a device memory.
  • the obtaining unit 306 is configured to obtain a length of the output data.
  • the moving unit 303 is further configured to move the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data obtained by the obtaining unit 306.
  • the above DMA control information may include an offset address of output data.
  • the moving unit 303 can be specifically used for:
  • the output data is moved from the device memory to the DMA control block.
  • the reading unit 301 reads a DMA control block pointer from a direct memory access DMA control block pointer queue.
  • the determining unit 302 determines a corresponding DMA control block in the system memory according to the DMA control block pointer.
  • the determining unit 302 is further configured to determine a total length of the DMA control information and the input data.
  • the moving unit 303 moves the DMA control information and the input data to the device memory according to the DMA control block pointer and the total length.
  • the calculation unit 304 performs corresponding calculations on the input data to obtain output data.
  • the writing unit 305 writes the output data into the device memory.
  • the obtaining unit 306 obtains the length of the output data.
  • the moving unit 303 moves the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data. This improves the performance of heterogeneous computing.
  • the direct memory access device provided in the embodiment of the present specification may be a module or a unit in the dedicated computing chip 30 in FIG. 1.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bus Control (AREA)

Abstract

A direct memory access method and device, a dedicated computing chip and a heterogeneous computing system, the method comprising: reading a DMA control block pointer from a DMA control block pointer queue (S210); determining a corresponding DMA control block in a system memory according to the DMA control block pointer (S220); determining the total length of DMA control information and input data in a DMA control block (S230); moving the DMA control information and the input data to a device memory according to the DMA control block pointer and the total length (S240); carrying out corresponding calculations on the input data to obtain output data (S250); writing the output data into the device memory (S260); acquiring the length of the output data (S270); and moving the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data (S280).

Description

直接内存存取方法、装置、专用计算芯片及异构计算***Direct memory access method, device, dedicated computing chip and heterogeneous computing system 技术领域Technical field
本说明书一个或多个实施例涉及计算机技术领域,尤其涉及一种直接内存存取方法、装置、专用计算芯片及异构计算***。One or more embodiments of the present specification relate to the field of computer technology, and in particular, to a direct memory access method, device, dedicated computing chip, and heterogeneous computing system.
背景技术Background technique
异构计算是指由通用中央处理器(Central Processing Unit,CPU)进行数据处理整体流程的控制,当需要进行专用计算时,通用CPU调用专用计算芯片进行计算。具体地,通用CPU需要调用直接内存存取(Direct Memory Access,DMA)方法(一种通过专用的硬件模块实现内存数据的搬移方法),将专用计算的输入数据从***内存传输到设备内存。专用计算芯片完成计算后,再把输出数据传输回***内存。由此可以看出,在一次异构计算中,需要分别进行两次的DMA传输:输入数据的传输和输出数据的传输。Heterogeneous computing refers to the control of the overall process of data processing by a general-purpose central processing unit (CPU). When special-purpose calculation is required, the general-purpose CPU calls a special-purpose computing chip for calculation. Specifically, a general-purpose CPU needs to call a Direct Memory Access (DMA) method (a method of transferring memory data through a dedicated hardware module) to transfer input data of a dedicated calculation from a system memory to a device memory. After the special calculation chip completes the calculation, the output data is transmitted back to the system memory. It can be seen that in one heterogeneous calculation, two separate DMA transfers are required: input data transmission and output data transmission.
具体地,输入数据的传输过程可以为:1)访问DMA描述符所在队列的指针,以读取输入数据的DMA描述符(用于描述输入数据的地址和长度)。2)访问输入数据的DMA描述符,以读取输入数据的地址和长度。3)根据输入数据的地址和长度,读取输入数据。输出数据的传输过程可以为:1)访问DMA描述符所在队列的指针,以读取输出数据的DMA描述符(用于描述输出数据的地址和长度)。2)访问输出数据的DMA描述符,以读取输出数据的地址和长度;3)根据输出数据的地址和长度,写入输出数据。综上,传统技术中,一次异构计算的过程需要执行六次访问操作。Specifically, the transmission process of the input data may be: 1) accessing a pointer of the queue where the DMA descriptor is located to read the DMA descriptor of the input data (used to describe the address and length of the input data). 2) Access the DMA descriptor of the input data to read the address and length of the input data. 3) Read the input data according to the address and length of the input data. The transmission process of the output data can be: 1) access the pointer of the queue where the DMA descriptor is located to read the DMA descriptor of the output data (used to describe the address and length of the output data). 2) Access the DMA descriptor of the output data to read the address and length of the output data; 3) Write the output data according to the address and length of the output data. In summary, in the traditional technology, a process of heterogeneous computing needs to perform six access operations.
发明内容Summary of the Invention
本说明书一个或多个实施例描述了一种直接内存存取方法、装置、专用计算芯片及异构计算***,可以减小DMA传输中的数据访问次数,从而可以提高异构计算的性能。One or more embodiments of the present specification describe a direct memory access method, a device, a dedicated computing chip, and a heterogeneous computing system, which can reduce the number of data accesses in a DMA transfer, thereby improving the performance of heterogeneous computing.
第一方面,提供了一种直接内存存取方法,包括:In a first aspect, a direct memory access method is provided, including:
从直接内存存取DMA控制块指针队列中读取DMA控制块指针;Read the DMA control block pointer from the direct memory access DMA control block pointer queue;
根据所述DMA控制块指针,在***内存中确定对应的DMA控制块,所述DMA控制块的内容包括DMA控制信息和输入数据;所述***内存是指用于存储通用中央处理器CPU使用的数据的存储空间;According to the DMA control block pointer, a corresponding DMA control block is determined in system memory, and the content of the DMA control block includes DMA control information and input data; the system memory is used to store the data used by the general-purpose central processing unit CPU. Data storage space;
确定所述DMA控制信息和所述输入数据的总长度;Determining a total length of the DMA control information and the input data;
根据所述DMA控制块指针以及所述总长度,将所述DMA控制信息和所述输入数据搬移至设备内存;所述设备内存是指用于存储专用计算芯片的数据的存储空间;Moving the DMA control information and the input data to a device memory according to the DMA control block pointer and the total length; the device memory refers to a storage space for storing data of a dedicated computing chip;
对所述输入数据进行相应的计算,得到输出数据;Performing corresponding calculation on the input data to obtain output data;
将所述输出数据写入所述设备内存;Writing the output data into the device memory;
获取所述输出数据的长度;Obtaining the length of the output data;
根据所述DMA控制信息以及所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述DMA控制块。Moving the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data.
第二方面,提供了一种专用计算芯片,包括:直接内存存取DMA长度寄存器、DMA控制块指针队列、DMA数据传输模块以及专用计算模块;In a second aspect, a dedicated computing chip is provided, including: a direct memory access DMA length register, a DMA control block pointer queue, a DMA data transmission module, and a dedicated computing module;
所述DMA长度寄存器,用于存储输入数据的长度以及输出数据的长度;The DMA length register is used to store the length of the input data and the length of the output data;
所述DMA控制块指针队列,用于存储多个DMA控制块指针;所述DMA控制块指针指向***内存中的DMA控制块;所述DMA控制块的内容包括DMA控制信息和输入数据;The DMA control block pointer queue is used to store multiple DMA control block pointers; the DMA control block pointer points to a DMA control block in system memory; the content of the DMA control block includes DMA control information and input data;
DMA数据传输模块,用于根据所述输入数据的长度、所述DMA控制信息的长度以及所述DMA控制块指针,将所述DMA控制信息以及所述输入数据从***内存搬移至设备内存;还用于根据所述DMA控制信息以及所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述***内存;A DMA data transmission module, configured to move the DMA control information and the input data from system memory to device memory according to the length of the input data, the length of the DMA control information, and the DMA control block pointer; Configured to move the output data from the device memory to the system memory according to the DMA control information and the length of the output data;
所述专用计算模块,用于对所述输入数据进行计算,并得到所述输出数据。The dedicated calculation module is configured to calculate the input data and obtain the output data.
第三方面,提供了一种异构计算***,包括:通用中央处理器CPU、***内存、如上述第二方面提供的专用计算芯片和设备内存;In a third aspect, a heterogeneous computing system is provided, including: a general-purpose central processing unit CPU, system memory, a dedicated computing chip and device memory as provided in the second aspect above;
所述通用CPU,用于调用所述专用计算芯片进行异构计算;The general-purpose CPU is configured to call the dedicated computing chip for heterogeneous computing;
所述***内存,用于存储所述通用CPU使用的数据;The system memory is used to store data used by the general-purpose CPU;
所述设备内存,用于存储所述专用计算芯片所使用的数据。The device memory is configured to store data used by the dedicated computing chip.
第四方面,提供了一种直接内存存取装置,包括:According to a fourth aspect, a direct memory access device is provided, including:
读取单元,用于从直接内存存取DMA控制块指针队列中读取DMA控制块指针;A reading unit for reading a DMA control block pointer from a direct memory access DMA control block pointer queue;
确定单元,用于根据所述读取单元读取的所述DMA控制块指针,在***内存中确 定对应的DMA控制块,所述DMA控制块的内容包括DMA控制信息和输入数据;所述***内存是指用于存储通用中央处理器CPU使用的数据的存储空间;A determining unit, configured to determine a corresponding DMA control block in system memory according to the DMA control block pointer read by the reading unit, and the content of the DMA control block includes DMA control information and input data; the system Memory refers to the storage space used to store data used by the general purpose central processing unit CPU;
所述确定单元,还用于确定所述DMA控制信息和所述输入数据的总长度;The determining unit is further configured to determine a total length of the DMA control information and the input data;
搬移单元,用于根据所述读取单元读取的所述DMA控制块指针以及所述确定单元确定的所述总长度,将所述DMA控制信息和所述输入数据搬移至设备内存;所述设备内存是指用于存储专用计算芯片的数据的存储空间;A moving unit, configured to move the DMA control information and the input data to a device memory according to the DMA control block pointer read by the reading unit and the total length determined by the determining unit; Device memory refers to the storage space used to store data for dedicated computing chips;
计算单元,用于对所述输入数据进行相应的计算,得到输出数据;A calculation unit, configured to perform corresponding calculation on the input data to obtain output data;
写入单元,用于将所述计算单元计算的所述输出数据写入所述设备内存;A writing unit, configured to write the output data calculated by the calculation unit into the device memory;
获取单元,用于获取所述输出数据的长度;An obtaining unit, configured to obtain a length of the output data;
所述搬移单元,还用于根据所述DMA控制信息以及所述获取单元获取的所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述DMA控制块。The moving unit is further configured to move the output data from the device memory to the DMA control block according to the DMA control information and a length of the output data obtained by the obtaining unit.
本说明书一个或多个实施例提供的直接内存存取方法、装置、专用计算芯片及异构计算***,从DMA控制块指针队列中读取DMA控制块指针。根据DMA控制块指针,在***内存中确定对应的DMA控制块。确定DMA控制块中的DMA控制信息和输入数据的总长度。根据DMA控制块指针以及总长度,将DMA控制信息和输入数据搬移至设备内存。对输入数据进行相应的计算,得到输出数据。将输出数据写入设备内存,并获取输出数据的长度。根据DMA控制信息以及输出数据的长度,将输出数据从设备内存搬移至DMA控制块。The direct memory access method, device, dedicated computing chip, and heterogeneous computing system provided by one or more embodiments of this specification read a DMA control block pointer from a DMA control block pointer queue. According to the DMA control block pointer, the corresponding DMA control block is determined in the system memory. Determines the total length of DMA control information and input data in the DMA control block. According to the DMA control block pointer and the total length, the DMA control information and input data are moved to the device memory. Perform corresponding calculations on the input data to obtain the output data. Write the output data to the device memory and get the length of the output data. According to the DMA control information and the length of the output data, the output data is moved from the device memory to the DMA control block.
由此可以看出,本说明书提供的方案,在输入数据的传输过程,执行如下两次访问操作:第一次,访问DMA控制块指针队列,以读取DMA控制块指针。第二次,访问DMA控制块指针,以读取DMA控制信息和输入数据。相比传统的技术方案,减少了对DMA描述符的访问。在对输入数据进行相应的计算之后,可以直接根据DMA控制信息,将输出数据搬移至DMA控制块。也即输出数据的传输过程,只执行一次输出数据的访问,不需要再执行DMA描述符所在队列的指针以及输出数据的描述符的访问操作。综上,相比传统技术,两次的DMA传输可以减少3次访问操作。这可以大大提升数据的DMA传输效率,进而可以提高异构计算的性能。It can be seen that the scheme provided in this specification performs the following two access operations during the input data transmission process: for the first time, the DMA control block pointer queue is accessed to read the DMA control block pointer. The second time, the DMA control block pointer is accessed to read the DMA control information and input data. Compared with the traditional technical solution, the access to the DMA descriptor is reduced. After performing corresponding calculations on the input data, the output data can be moved to the DMA control block directly according to the DMA control information. That is to say, the output data transmission process only needs to perform access to the output data once, and it is no longer necessary to execute the pointer operation of the queue where the DMA descriptor is located and the access operation of the output data descriptor. In summary, compared with the traditional technology, two DMA transfers can reduce three access operations. This can greatly improve the DMA transmission efficiency of the data, which in turn can improve the performance of heterogeneous computing.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本说明书实施例的技术方案,下面将对实施例描述中所需要使用 的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present specification more clearly, the drawings used in the description of the embodiments are briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present specification. Those of ordinary skill in the art can also obtain other drawings according to these drawings without paying creative labor.
图1为本说明书提供的异构计算***结构示意图;FIG. 1 is a schematic structural diagram of a heterogeneous computing system provided in this specification;
图2为本说明书一个实施例提供的直接内存存取方法流程图;2 is a flowchart of a direct memory access method according to an embodiment of the present specification;
图3为本说明书一个实施例提供的直接内存存取装置示意图。FIG. 3 is a schematic diagram of a direct memory access device according to an embodiment of the present specification.
具体实施方式Detailed ways
下面结合附图,对本说明书提供的方案进行描述。The solutions provided in this specification are described below with reference to the drawings.
本说明书一个实施例提供的直接内存存取方法可以应用于如图1所示的异构计算***中,该异构计算***可以包括:通用CPU10、***内存20、专用计算芯片30以及设备内存40,其中,通用CPU10和专用计算芯片30也可以称为异构计算***中的两个计算单元。The direct memory access method provided by an embodiment of this specification can be applied to a heterogeneous computing system as shown in FIG. 1. The heterogeneous computing system may include a general-purpose CPU 10, a system memory 20, a dedicated computing chip 30, and a device memory 40 Among them, the general-purpose CPU 10 and the dedicated computing chip 30 may also be referred to as two computing units in a heterogeneous computing system.
通用CPU10用于实现异构计算的数据处理主要流程的控制。具体包括:a.异构计算输入数据的预处理和准备。b.调用专用计算芯片进行异构计算。c.查询异构计算结果(也称输出数据)返回。d.进行异构计算输出数据的后处理和输出结果。The general-purpose CPU 10 is used to control the main data processing flow of heterogeneous computing. Specifically includes: a. Preprocessing and preparation of heterogeneous computing input data. b. Calling a special computing chip for heterogeneous computing. c. Query heterogeneous calculation results (also called output data) and return. d. Perform post-processing and output of heterogeneous calculation output data.
***内存20用于存储通用CPU10使用的数据。在一种实现方式中,其可以以DMA控制块(一种数据结构)的形式来存储使用的数据,该DMA控制块在***内存20中占用一个物理上连续的地址空间。以使用的数据包括输入数据和输出数据为例来说,该DMA控制块的内容可以包括DMA控制信息、输入数据和输出数据。其中,输入数据所占用的空间也可以称为输入数据块。同样,输出数据所占用的空间也可以称为输出数据块。具体地,通用CPU10可以在进行异构计算时,根据当前异构计算方法,确定输入数据的长度和输出数据的长度。需要说明的是,在本说明书中,对一次特定的异构计算,其对应的输入数据以及输入数据的长度和输出数据的长度通常是确定的。之后可以构造出DMA控制信息,具体的构造过程后续进行说明。由此,就得到了DMA控制块的部分内容(DMA控制信息和输入数据)。在得到DMA控制块的部分内容之后,通用CPU10可以将该部分内容写入***内存20的一个物理上连续的地址空间内。可以理解的是,由于输出数据的长度也是确定的,所以,在上述部分内容之后通常会连续地预留上述长度的地址空间,以用于写入输出数据。***内存20中写入部分内容的地址空 间与预留的地址空间就构成了上述DMA控制块。总之,针对一次异构计算,在***内存20中就可以构成一个DMA控制块。The system memory 20 is used to store data used by the general-purpose CPU 10. In one implementation, it can store the used data in the form of a DMA control block (a data structure), which occupies a physically continuous address space in the system memory 20. Taking the data used as input data and output data as an example, the content of the DMA control block may include DMA control information, input data, and output data. Among them, the space occupied by the input data may also be referred to as an input data block. Similarly, the space occupied by output data can also be referred to as an output data block. Specifically, when performing the heterogeneous calculation, the general-purpose CPU 10 may determine the length of the input data and the length of the output data according to the current heterogeneous calculation method. It should be noted that in this specification, for a specific heterogeneous calculation, the corresponding input data, the length of the input data, and the length of the output data are usually determined. The DMA control information can be constructed later, and the specific construction process will be described later. Thus, part of the content of the DMA control block (DMA control information and input data) is obtained. After obtaining a part of the content of the DMA control block, the general-purpose CPU 10 can write the part of the content into a physically continuous address space of the system memory 20. It can be understood that since the length of the output data is also determined, after the above part of the content, an address space of the above length is usually reserved continuously for writing output data. The above-mentioned DMA control block is constituted by an address space in which a part of content is written in the system memory 20 and a reserved address space. In short, for a heterogeneous calculation, a DMA control block can be formed in the system memory 20.
需要说明的是,上述DMA控制信息可以包括:输入数据的偏移地址、输出数据的偏移地址以及计算完成标志。其中,输入数据的偏移地址可以占用32个bit位(即4个字节),其可以是指输入数据所占用的空间(或者输入数据块)相对DMA控制块起始地址的偏移量。具体地,根据DMA控制块起始地址以及该偏移地址,可以确定输入数据在***内存20中的实际地址。输出数据的偏移地址的定义与输入数据的偏移地址的定义相同。计算完成标志可以占用1个bit位(扩展为4个字节),其可以是在异构计算前被通用CPU10清0。在异构计算完成后,专用计算芯片30将该标志位改写为1。CPU10通过轮询该计算完成标志,确认异构计算是否完成。It should be noted that the above DMA control information may include: an offset address of input data, an offset address of output data, and a calculation completion flag. The offset address of the input data can occupy 32 bits (that is, 4 bytes), which can refer to the offset of the space occupied by the input data (or the input data block) relative to the start address of the DMA control block. Specifically, the actual address of the input data in the system memory 20 may be determined according to the DMA control block start address and the offset address. The definition of the offset address of the output data is the same as the definition of the offset address of the input data. The calculation completion flag can occupy 1 bit (expanded to 4 bytes), which can be cleared by the general-purpose CPU 10 before heterogeneous calculation. After the heterogeneous calculation is completed, the dedicated calculation chip 30 rewrites the flag bit to 1. The CPU 10 polls the calculation completion flag to confirm whether the heterogeneous calculation is completed.
专用计算芯片30用于配合通用CPU完成专用计算(如,矩阵乘以及大数模乘等)功能。该专用计算芯片30例如可以为现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片,专用集成电路(Application Specific Integrated Circuit,ASIC)芯片,图形处理器(Graphics Processing Unit,GPU)芯片等。通用CPU10计算效率较低,使用专用计算芯片30进行计算,性价比更高。The special-purpose calculation chip 30 is configured to cooperate with a general-purpose CPU to perform special-purpose calculation (for example, matrix multiplication and large number multiplication) functions. The dedicated computing chip 30 may be, for example, a Field Programmable Gate Array (FPGA) chip, an Application Specific Integrated Circuit (ASIC) chip, a Graphics Processing Unit (GPU) chip, or the like. The general-purpose CPU 10 has lower calculation efficiency, and uses a special-purpose calculation chip 30 for calculation, and has higher cost performance.
设备内存40用于存储专用计算芯片30的数据。具体地,当开始进行异构计算时,专用计算芯片30可以从设备内存40中读取输入数据。当异构计算完成时,可以将输出数据写入设备内存40。The device memory 40 is used to store data of the dedicated computing chip 30. Specifically, when the heterogeneous calculation is started, the dedicated computing chip 30 may read the input data from the device memory 40. When the heterogeneous calculation is completed, the output data may be written into the device memory 40.
图1中,专用计算芯片30具体可以包括:DMA长度寄存器31、DMA控制块指针队列32、DMA数据传输模块33以及专用计算模块34。In FIG. 1, the dedicated computing chip 30 may specifically include: a DMA length register 31, a DMA control block pointer queue 32, a DMA data transmission module 33, and a dedicated computing module 34.
DMA长度寄存器31用于存储输入数据的长度以及输出数据的长度。通常对于一次特定的异构计算,输入数据的长度以及输出数据的长度是固定的。也即通用CPU10可以根据当前所执行的异构计算方法来确定上述长度。The DMA length register 31 is used to store the length of the input data and the length of the output data. Usually for a specific heterogeneous calculation, the length of the input data and the length of the output data are fixed. That is, the general-purpose CPU 10 may determine the foregoing length according to the heterogeneous calculation method currently performed.
DMA控制块指针队列32用于存储多个DMA控制块指针,该DMA控制块指针指向***内存20中的DMA控制块,其可以占用32bit位。具体地,在***内存20中每构成一个DMA控制块,通用CPU10就可以向DMA控制块指针队列32写入与该DMA控制块对应的DMA控制块指针。由于针对一次异构计算就可以构成一个DMA控制块,所以DMA控制块指针也是与一次异构计算相对应的。当异构处理***中有多个进程处理任务时,DMA控制块指针队列32中的多个DMA控制块指针就可以被同时读取,由 此可以实现多次异构计算可以并行进行,这可以大大提高异构计算的效率。需要说明的是,此处的多次异构计算属于同一类型,如,均为加密计算等。The DMA control block pointer queue 32 is used to store a plurality of DMA control block pointers. The DMA control block pointers point to the DMA control blocks in the system memory 20, which can occupy 32 bits. Specifically, each time a DMA control block is formed in the system memory 20, the general-purpose CPU 10 can write a DMA control block pointer corresponding to the DMA control block to the DMA control block pointer queue 32. Since one DMA control block can be formed for one heterogeneous calculation, the DMA control block pointer also corresponds to one heterogeneous calculation. When there are multiple processes processing tasks in the heterogeneous processing system, multiple DMA control block pointers in the DMA control block pointer queue 32 can be read at the same time, so that multiple heterogeneous calculations can be performed in parallel. Greatly improve the efficiency of heterogeneous computing. It should be noted that the multiple heterogeneous calculations here belong to the same type, for example, they are all encrypted calculations.
DMA数据传输模块33用于根据输入数据的长度、DMA控制信息的长度以及DMA控制块指针,将DMA控制信息以及输入数据从***内存20搬移至设备内存40;还用于根据DMA控制信息以及输出数据的长度,将输出数据从设备内存40搬移至***内存20。The DMA data transmission module 33 is used to move the DMA control information and input data from the system memory 20 to the device memory 40 according to the length of the input data, the length of the DMA control information, and the DMA control block pointer; it is also used to output the data according to the DMA control information and output. The length of the data, the output data is moved from the device memory 40 to the system memory 20.
专用计算模块34用于实现专用计算功能。具体地,用于对输入数据进行计算,并得到输出数据。The special-purpose calculation module 34 is used to implement a special-purpose calculation function. Specifically, it is used to calculate input data and obtain output data.
如前所述,图1中的通用CPU10可以调用专用计算芯片30进行异构计算。为了提高异构计算的性能,本说明书对直接内存存取方法进行了改进。在执行本说明书提供的用于异构计算的直接内存存取方法之前,可以先执行如下步骤:As mentioned above, the general-purpose CPU 10 in FIG. 1 can call a special-purpose computing chip 30 to perform heterogeneous calculations. In order to improve the performance of heterogeneous computing, this specification improves the direct memory access method. Before executing the direct memory access method for heterogeneous computing provided in this manual, the following steps can be performed first:
1)通用CPU10根据当前异构计算方法,确定输入数据的长度和输出数据的长度,并把输入数据的长度和输出数据的长度写入DMA长度寄存器31。1) The general-purpose CPU 10 determines the length of the input data and the length of the output data according to the current heterogeneous calculation method, and writes the length of the input data and the length of the output data into the DMA length register 31.
2)通用CPU10准备异构计算的输入数据,并构造DMA控制信息。由上述内容可知,DMA控制信息可以包括:输入数据的偏移地址、输出数据的偏移地址以及计算完成标志。当DMA控制块中的内容按照DMA控制信息、输入数据以及输出数据存放时,对输入数据的偏移地址,可以根据DMA控制信息的长度来确定。由于DMA控制信息的长度=输入数据的偏移地址的长度+输出数据的偏移地址的长度+计算完成标志的长度=4+4+4=12。因此,输入数据的偏移地址可以为:12(此处为10进制表示方法)。对输出数据的偏移地址,可以根据DMA控制信息与输入数据的总长度来确定。假设输入数据的长度为100个字节,那么输出数据的偏移地址为:12+100=112(此处为10进制表示方法)。在完成上述构造过程之后,可以将DMA控制信息以及输入数据写入***内存20的一个物理上连续的地址空间内,并将计算完成标志清0。2) The general-purpose CPU 10 prepares input data for heterogeneous calculation and constructs DMA control information. It can be known from the foregoing that the DMA control information may include: an offset address of input data, an offset address of output data, and a calculation completion flag. When the content in the DMA control block is stored in accordance with the DMA control information, input data, and output data, the offset address of the input data can be determined according to the length of the DMA control information. The length of the DMA control information = the length of the offset address of the input data + the length of the offset address of the output data + the length of the calculation completion flag = 4 + 4 + 4 = 12. Therefore, the offset address of the input data can be: 12 (here the decimal representation method). The offset address of the output data can be determined according to the total length of the DMA control information and the input data. Assuming that the length of the input data is 100 bytes, the offset address of the output data is: 12 + 100 = 112 (here is the decimal representation method). After completing the above construction process, the DMA control information and input data can be written into a physically continuous address space of the system memory 20, and the calculation completion flag is cleared to 0.
可以理解的是,由于输出数据的长度也是确定的,所以,在完成上述写入过程之后可以连续地预留上述长度的地址空间,以用于写入输出数据。***内存20中写入数据的地址空间与预留的地址空间就构成了一个DMA控制块。It can be understood that, since the length of the output data is also determined, after the completion of the above writing process, an address space of the above length can be continuously reserved for writing the output data. The address space in which data is written in the system memory 20 and the reserved address space constitute a DMA control block.
3)通用CPU10向DMA控制块指针队列32中写入DMA控制块指针,该DMA控制块指针指向构成的DMA控制块的起始地址。3) The general-purpose CPU 10 writes a DMA control block pointer into the DMA control block pointer queue 32, and the DMA control block pointer points to the start address of the formed DMA control block.
需要说明的是,针对多次异构计算,当该多次异构计算属于同一类型(如,均为加 密计算)时,上述步骤1)可以只执行一次,而步骤2)和步骤3)则可以是根据异构计算的次数循环多次执行的。It should be noted that for multiple heterogeneous calculations, when the multiple heterogeneous calculations belong to the same type (for example, all are encrypted calculations), the above step 1) can be performed only once, while steps 2) and 3) are It can be executed multiple times based on the number of heterogeneous calculations.
可以理解的是,在DMA控制块指针队列32中写入DMA控制块指针之后,就可以执行本说明书提供的用于异构计算的直接内存存取方法。It can be understood that after the DMA control block pointer is written in the DMA control block pointer queue 32, the direct memory access method for heterogeneous calculation provided in this specification can be executed.
图2为本说明书一个实施例提供的直接内存存取方法流程图。所述方法的执行主体可以为图1中的专用计算芯片30。如图2所示,所述方法具体可以包括:FIG. 2 is a flowchart of a direct memory access method according to an embodiment of the present specification. The execution subject of the method may be a dedicated computing chip 30 in FIG. 1. As shown in FIG. 2, the method may specifically include:
步骤210,从DMA控制块指针队列中读取DMA控制块指针。Step 210: Read the DMA control block pointer from the DMA control block pointer queue.
此处的DMA控制块指针指向DMA控制块的起始地址,从而根据该指针可以直接对DMA控制块的内容进行访问,由此可以减少对***内存20的访问次数,从而可以降低DMA传输延迟。The DMA control block pointer here points to the start address of the DMA control block, so that the content of the DMA control block can be directly accessed according to the pointer, thereby reducing the number of accesses to the system memory 20, and thus reducing the DMA transfer delay.
具体地,专用计算芯片30可以轮询检查DMA控制块指针队列32是否为空。如果不为空,则可以从队列的头部读取DMA控制块指针。本说明书提供的DMA控制块指针队列32由于可以同时存放多个DMA控制块指针,所以可以更加方便的支持DMA异步操作,更加方便的支持多进程各自独立进行DMA操作,提高了DMA的传输效率。Specifically, the dedicated computing chip 30 may poll to check whether the DMA control block pointer queue 32 is empty. If it is not empty, the DMA control block pointer can be read from the head of the queue. Since the DMA control block pointer queue 32 provided in this specification can store multiple DMA control block pointers at the same time, it can more conveniently support DMA asynchronous operations, more conveniently support multiple processes to perform DMA operations independently, and improve DMA transmission efficiency.
步骤220,根据DMA控制块指针,在***内存中确定对应的DMA控制块。Step 220: Determine the corresponding DMA control block in the system memory according to the DMA control block pointer.
以图1为例来说,当读取DMA控制块指针A时,可以确定DMA控制块A;当读取DMA控制块指针B时,可以确定DMA控制块B。可以理解的是,在该步骤中读取到的DMA控制块的内容只包括DMA控制信息和输入数据。Taking FIG. 1 as an example, when the DMA control block pointer A is read, the DMA control block A can be determined; when the DMA control block pointer B is read, the DMA control block B can be determined. It can be understood that the content of the DMA control block read in this step only includes DMA control information and input data.
步骤230,确定DMA控制信息和输入数据的总长度。Step 230: Determine the total length of the DMA control information and the input data.
根据上述内容可知,本说明书中的DMA控制信息可以包括:输入数据的偏移地址、输出数据的偏移地址以及计算完成标志,且其具有固定长度。该固定长度为上述三者的长度之和。在DMA控制信息的长度固定时,上述总长度的确定过程可以为:从DMA长度寄存器31中读取输入数据的长度。根据固定长度以及输入数据的长度,确定总长度。It can be known from the foregoing that the DMA control information in this specification may include: an offset address of input data, an offset address of output data, and a calculation completion flag, and it has a fixed length. The fixed length is the sum of the lengths of the three. When the length of the DMA control information is fixed, the above-mentioned determination process of the total length may be: reading the length of the input data from the DMA length register 31. Determine the total length based on the fixed length and the length of the input data.
步骤240,根据DMA控制块指针以及总长度,将DMA控制信息和输入数据搬移至设备内存。Step 240: Move the DMA control information and input data to the device memory according to the DMA control block pointer and the total length.
此处可以是由DMA数据传输模块33根据DMA控制块指针以及总长度,将DMA控制信息和输入数据搬移至设备内存40。Here, the DMA data transmission module 33 may move the DMA control information and input data to the device memory 40 according to the DMA control block pointer and the total length.
在一种实现方式中,在执行上述搬移操作之前,可以先在设备内存40中划分一块物理上连续的地址空间。之后,可以根据DMA控制块指针以及总长度,从对应的DMA控制块中读取DMA控制信息和输入数据。此处,由于DMA控制块在***内存20中占用物理上连续的地址空间,所以上述读取的操作也可以理解为是一种连读的读取操作。将DMA控制信息和输入数据写入设备内存40中预先划分的物理上连续的地址空间。可以理解的是,在执行上述写入操作之后,DMA控制信息和输入数据在设备内存40中的起始地址是确定的。In an implementation manner, before performing the foregoing moving operation, a physically continuous address space may be divided in the device memory 40 first. After that, the DMA control information and input data can be read from the corresponding DMA control block according to the DMA control block pointer and the total length. Here, since the DMA control block occupies a physically continuous address space in the system memory 20, the above-mentioned read operation can also be understood as a continuous read read operation. The DMA control information and the input data are written into a preliminarily divided physically continuous address space in the device memory 40. It can be understood that, after the foregoing write operation is performed, the start addresses of the DMA control information and the input data in the device memory 40 are determined.
步骤250,对输入数据进行相应的计算,得到输出数据。Step 250: Perform corresponding calculations on the input data to obtain output data.
此处,可以是调用专用计算模块34对输入数据进行相应的计算。具体地,可以根据步骤240中确定的起始地址以及DMA控制信息中输入数据的偏移地址,确定输入数据在设备内存40的实际地址。之后,可以根据该实际地址,从设备内存40中读取该数据输入,并调用专用计算模块34对输入数据进行相应的计算。Here, the dedicated calculation module 34 may be called to perform corresponding calculations on the input data. Specifically, the actual address of the input data in the device memory 40 may be determined according to the start address determined in step 240 and the offset address of the input data in the DMA control information. After that, the data input can be read from the device memory 40 according to the actual address, and a dedicated calculation module 34 is called to perform corresponding calculations on the input data.
步骤260,将输出数据写入设备内存。Step 260: Write the output data to the device memory.
具体地,可以根据步骤240中确定的起始地址以及DMA控制信息中输出数据的偏移地址,确定输出数据在设备内存40的实际地址。之后,可以将输出数据写入设备内存40中该实际地址对应的存储空间。Specifically, the actual address of the output data in the device memory 40 may be determined according to the start address determined in step 240 and the offset address of the output data in the DMA control information. After that, the output data can be written into the storage space corresponding to the actual address in the device memory 40.
步骤270,获取输出数据的长度。Step 270: Obtain the length of the output data.
如,可以是从DMA长度寄存器31中读取输出数据的长度。For example, the length of the output data may be read from the DMA length register 31.
步骤280,根据DMA控制信息以及输出数据的长度,将输出数据从设备内存搬移至DMA控制块。Step 280: Move the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data.
具体地,可以是根据DMA控制信息中的输出数据的偏移地址以及输出数据的长度,将输出数据从设备内存40搬移至DMA控制块。该搬移过程具体可以为:获取DMA控制信息和输入数据在设备内存40的起始地址。根据偏移地址以及起始地址,确定输出数据在设备内存40的第一实际地址。根据偏移地址以及DMA控制块指针,确定输出数据在DMA控制块的第二实际地址。根据第一实际地址以及输出数据的长度,从设备内存40中读取输出数据。将输出数据写入DMA控制块中第二实际地址对应的位置。因为DMA控制块位于***内存20中,所以该写入的步骤也可以为:将输出数据写入***内存20中第二实际地址对应的位置。Specifically, the output data may be moved from the device memory 40 to the DMA control block according to the offset address of the output data and the length of the output data in the DMA control information. The moving process may specifically include: obtaining a DMA control information and a start address of the input data in the device memory 40. The first actual address of the output data in the device memory 40 is determined according to the offset address and the start address. According to the offset address and the DMA control block pointer, the second actual address of the output data in the DMA control block is determined. Read the output data from the device memory 40 according to the first actual address and the length of the output data. Write the output data to a position corresponding to the second actual address in the DMA control block. Because the DMA control block is located in the system memory 20, the writing step may also be: writing output data to a position corresponding to a second actual address in the system memory 20.
在执行完成步骤280之后,专用计算芯片30可以对DMA控制信息中的计算完成标 志进行改写,如,可以将计算完成标志改写为1。通用CPU10可以轮询计算完成标志,当计算完成标志为1时,表示该异构计算完成,可以使用***内存20中的输出数据。After executing the completion step 280, the dedicated computing chip 30 may rewrite the calculation completion flag in the DMA control information, for example, the calculation completion flag may be rewritten to 1. The general-purpose CPU 10 may poll the calculation completion flag. When the calculation completion flag is 1, it indicates that the heterogeneous calculation is completed, and the output data in the system memory 20 may be used.
综上,本说明书实施例提供的直接内存存取方法,可以避免单独的输出数据的DMA传输对***内存的访问,利用从***内存中获取输入数据的同时获取了输出数据的偏移地址。从而在异构计算完成后,直接根据该偏移地址进行输出数据的搬移。避免了通用CPU在其中的操作,减少了整个异构计算的延迟。此外,本说明书提供的DMA块指针队列每次只需要写入一个32bit的DMA块指针,数据量很小,直接对应一个通用CPU的原子写操作,提高了多进程进行并发操作的效率。In summary, the direct memory access method provided in the embodiment of the present specification can avoid access to the system memory by DMA transmission of separate output data, and obtains the offset address of the output data while obtaining the input data from the system memory. Therefore, after the heterogeneous calculation is completed, the output data is directly moved according to the offset address. It avoids the operation of the general-purpose CPU and reduces the delay of the entire heterogeneous calculation. In addition, the DMA block pointer queue provided in this specification only needs to write a 32-bit DMA block pointer at a time, and the amount of data is very small, which directly corresponds to an atomic write operation of a general-purpose CPU, which improves the efficiency of concurrent operations of multiple processes.
与上述直接内存存取方法对应地,本说明书一个实施例还提供的一种直接内存存取装置,如图3所示,该装置可以包括:Corresponding to the above direct memory access method, an embodiment of the present specification further provides a direct memory access device. As shown in FIG. 3, the device may include:
读取单元301,用于从直接内存存取DMA控制块指针队列中读取DMA控制块指针。The reading unit 301 is configured to read a DMA control block pointer from a direct memory access DMA control block pointer queue.
确定单元302,用于根据读取单元301读取的DMA控制块指针,在***内存中确定对应的DMA控制块,该DMA控制块的内容包括DMA控制信息和输入数据。上述***内存是指用于存储通用中央处理器CPU使用的数据的存储空间。The determining unit 302 is configured to determine a corresponding DMA control block in the system memory according to the DMA control block pointer read by the reading unit 301, and the content of the DMA control block includes DMA control information and input data. The above system memory refers to a storage space for storing data used by a general-purpose central processing unit CPU.
确定单元302,还用于确定DMA控制信息和输入数据的总长度。The determining unit 302 is further configured to determine a total length of the DMA control information and the input data.
可选地,DMA控制信息可以具有固定长度。确定单元302具体可以用于:Alternatively, the DMA control information may have a fixed length. The determining unit 302 may be specifically configured to:
从DMA长度寄存器中读取输入数据的长度。该输入数据的长度是由通用CPU根据当前所执行的异构计算方法确定的。Read the length of the input data from the DMA length register. The length of the input data is determined by the general-purpose CPU according to the heterogeneous calculation method currently performed.
根据固定长度以及输入数据的长度,确定总长度。Determine the total length based on the fixed length and the length of the input data.
搬移单元303,用于根据读取单元301读取的DMA控制块指针以及确定单元32确定的总长度,将DMA控制信息和输入数据搬移至设备内存。该设备内存是指用于存储专用计算芯片的数据的存储空间。The moving unit 303 is configured to move the DMA control information and the input data to the device memory according to the DMA control block pointer read by the reading unit 301 and the total length determined by the determining unit 32. The device memory refers to the storage space used to store data for a dedicated computing chip.
此处的搬移单元303可以由图1中的DMA数据传输模块33实现。The moving unit 303 here may be implemented by the DMA data transmission module 33 in FIG. 1.
计算单元304,用于对输入数据进行相应的计算,得到输出数据。The calculation unit 304 is configured to perform corresponding calculation on the input data to obtain output data.
此处的计算单元304可以由图1中的专用计算模块34实现。The calculation unit 304 here may be implemented by a dedicated calculation module 34 in FIG. 1.
写入单元305,用于将计算单元304计算的输出数据写入设备内存。The writing unit 305 is configured to write output data calculated by the calculation unit 304 into a device memory.
获取单元306,用于获取输出数据的长度。The obtaining unit 306 is configured to obtain a length of the output data.
搬移单元303,还用于根据DMA控制信息以及获取单元306获取的输出数据的长度,将输出数据从设备内存搬移至DMA控制块。The moving unit 303 is further configured to move the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data obtained by the obtaining unit 306.
可选地,上述DMA控制信息可以包括输出数据的偏移地址。Optionally, the above DMA control information may include an offset address of output data.
搬移单元303具体可以用于:The moving unit 303 can be specifically used for:
根据输出数据的偏移地址以及输出数据的长度,将输出数据从设备内存搬移至DMA控制块。According to the offset address of the output data and the length of the output data, the output data is moved from the device memory to the DMA control block.
本说明书上述实施例装置的各功能模块的功能,可以通过上述方法实施例的各步骤来实现,因此,本说明书一个实施例提供的装置的具体工作过程,在此不复赘述。The functions of the functional modules of the device in the foregoing embodiments of this specification can be implemented through the steps of the method embodiments described above. Therefore, the specific working process of the device provided by one embodiment of this specification is not repeated here.
本说明书一个实施例提供的直接内存存取装置,读取单元301从直接内存存取DMA控制块指针队列中读取DMA控制块指针。确定单元302根据DMA控制块指针,在***内存中确定对应的DMA控制块。确定单元302,还用于确定DMA控制信息和输入数据的总长度。搬移单元303根据DMA控制块指针以及总长度,将DMA控制信息和输入数据搬移至设备内存。计算单元304对输入数据进行相应的计算,得到输出数据。写入单元305将输出数据写入设备内存。获取单元306获取输出数据的长度。搬移单元303根据DMA控制信息以及输出数据的长度,将输出数据从设备内存搬移至DMA控制块。由此,以提高异构计算的性能。In a direct memory access device provided by an embodiment of the present specification, the reading unit 301 reads a DMA control block pointer from a direct memory access DMA control block pointer queue. The determining unit 302 determines a corresponding DMA control block in the system memory according to the DMA control block pointer. The determining unit 302 is further configured to determine a total length of the DMA control information and the input data. The moving unit 303 moves the DMA control information and the input data to the device memory according to the DMA control block pointer and the total length. The calculation unit 304 performs corresponding calculations on the input data to obtain output data. The writing unit 305 writes the output data into the device memory. The obtaining unit 306 obtains the length of the output data. The moving unit 303 moves the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data. This improves the performance of heterogeneous computing.
需要说明的是,本说明书实施例提供的直接内存存取装置可以为图1中专用计算芯片30中的一个模块或者单元。It should be noted that the direct memory access device provided in the embodiment of the present specification may be a module or a unit in the dedicated computing chip 30 in FIG. 1.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本说明书所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。Those skilled in the art should appreciate that, in one or more of the above examples, the functions described in this specification may be implemented by hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored in or transmitted over as one or more instructions or code on a computer-readable medium.
以上所述的具体实施方式,对本说明书的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本说明书的具体实施方式而已,并不用于限定本说明书的保护范围,凡在本说明书的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本说明书的保护范围之内。The specific implementation manners described above further describe the purpose, technical solutions, and beneficial effects of the present specification. It should be understood that the foregoing descriptions are merely specific implementation manners of the present description, and are not intended to limit the scope of the present description. The scope of protection, any modification, equivalent replacement, or improvement made on the basis of the technical solution of this specification shall be included in the scope of protection of this specification.

Claims (8)

  1. 一种直接内存存取方法,其特征在于,包括:A direct memory access method is characterized in that it includes:
    从直接内存存取DMA控制块指针队列中读取DMA控制块指针;Read the DMA control block pointer from the direct memory access DMA control block pointer queue;
    根据所述DMA控制块指针,在***内存中确定对应的DMA控制块,所述DMA控制块的内容包括DMA控制信息和输入数据;所述***内存是指用于存储通用中央处理器CPU使用的数据的存储空间;According to the DMA control block pointer, a corresponding DMA control block is determined in system memory, and the content of the DMA control block includes DMA control information and input data; the system memory is used to store the data used by the general-purpose central processing unit CPU Data storage space;
    确定所述DMA控制信息和所述输入数据的总长度;Determining a total length of the DMA control information and the input data;
    根据所述DMA控制块指针以及所述总长度,将所述DMA控制信息和所述输入数据搬移至设备内存;所述设备内存是指用于存储专用计算芯片的数据的存储空间;Moving the DMA control information and the input data to a device memory according to the DMA control block pointer and the total length; the device memory refers to a storage space for storing data of a dedicated computing chip;
    对所述输入数据进行相应的计算,得到输出数据;Performing corresponding calculation on the input data to obtain output data;
    将所述输出数据写入所述设备内存;Writing the output data into the device memory;
    获取所述输出数据的长度;Obtaining the length of the output data;
    根据所述DMA控制信息以及所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述DMA控制块。Moving the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data.
  2. 根据权利要求1所述的方法,其特征在于,所述DMA控制信息包括所述输出数据的偏移地址;The method according to claim 1, wherein the DMA control information includes an offset address of the output data;
    所述根据所述DMA控制信息以及所述输出数据的长度,将所述设备内存的所述输出数据搬移到所述DMA控制块,包括:Moving the output data of the device memory to the DMA control block according to the DMA control information and the length of the output data includes:
    根据所述输出数据的偏移地址以及所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述DMA控制块。Moving the output data from the device memory to the DMA control block according to the offset address of the output data and the length of the output data.
  3. 根据权利要求1所述的方法,其特征在于,所述DMA控制信息具有固定长度;所述确定所述DMA控制信息和所述输入数据的总长度,包括:The method according to claim 1, wherein the DMA control information has a fixed length; and determining the total length of the DMA control information and the input data comprises:
    从DMA长度寄存器中读取所述输入数据的长度;所述输入数据的长度是由所述通用CPU根据当前所执行的异构计算方法确定的;Read the length of the input data from the DMA length register; the length of the input data is determined by the general-purpose CPU according to the heterogeneous calculation method currently performed;
    根据所述固定长度以及所述输入数据的长度,确定所述总长度。The total length is determined according to the fixed length and the length of the input data.
  4. 一种专用计算芯片,其特征在于,包括:直接内存存取DMA长度寄存器、DMA控制块指针队列、DMA数据传输模块以及专用计算模块;A special-purpose computing chip, comprising: a direct memory access DMA length register, a DMA control block pointer queue, a DMA data transmission module, and a special-purpose calculation module;
    所述DMA长度寄存器,用于存储输入数据的长度以及输出数据的长度;The DMA length register is used to store the length of the input data and the length of the output data;
    所述DMA控制块指针队列,用于存储多个DMA控制块指针;所述DMA控制块指针指向***内存中的DMA控制块;所述DMA控制块的内容包括DMA控制信息和输入数据;The DMA control block pointer queue is used to store multiple DMA control block pointers; the DMA control block pointer points to a DMA control block in system memory; the content of the DMA control block includes DMA control information and input data;
    DMA数据传输模块,用于根据所述输入数据的长度、所述DMA控制信息的长度以及所述DMA控制块指针,将所述DMA控制信息以及所述输入数据从***内存搬移至设备内存;还用于根据所述DMA控制信息以及所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述***内存;A DMA data transmission module, configured to move the DMA control information and the input data from system memory to device memory according to the length of the input data, the length of the DMA control information, and the DMA control block pointer; Configured to move the output data from the device memory to the system memory according to the DMA control information and the length of the output data;
    所述专用计算模块,用于对所述输入数据进行计算,并得到所述输出数据。The dedicated calculation module is configured to calculate the input data and obtain the output data.
  5. 一种异构计算***,其特征在于,包括:通用中央处理器CPU、***内存、如权利要求4所述的专用计算芯片和设备内存;A heterogeneous computing system, comprising: a general-purpose central processing unit (CPU), a system memory, the dedicated computing chip according to claim 4, and a device memory;
    所述通用CPU,用于调用所述专用计算芯片进行异构计算;The general-purpose CPU is configured to call the dedicated computing chip for heterogeneous computing;
    所述***内存,用于存储所述通用CPU使用的数据;The system memory is used to store data used by the general-purpose CPU;
    所述设备内存,用于存储所述专用计算芯片所使用的数据。The device memory is configured to store data used by the dedicated computing chip.
  6. 一种直接内存存取装置,其特征在于,包括:A direct memory access device, comprising:
    读取单元,用于从直接内存存取DMA控制块指针队列中读取DMA控制块指针;A reading unit for reading a DMA control block pointer from a direct memory access DMA control block pointer queue;
    确定单元,用于根据所述读取单元读取的所述DMA控制块指针,在***内存中确定对应的DMA控制块,所述DMA控制块的内容包括DMA控制信息和输入数据;所述***内存是指用于存储通用中央处理器CPU使用的数据的存储空间;A determining unit, configured to determine a corresponding DMA control block in system memory according to the DMA control block pointer read by the reading unit, and the content of the DMA control block includes DMA control information and input data; the system Memory refers to the storage space used to store data used by the general purpose central processing unit CPU;
    所述确定单元,还用于确定所述DMA控制信息和所述输入数据的总长度;The determining unit is further configured to determine a total length of the DMA control information and the input data;
    搬移单元,用于根据所述读取单元读取的所述DMA控制块指针以及所述确定单元确定的所述总长度,将所述DMA控制信息和所述输入数据搬移至设备内存;所述设备内存是指用于存储专用计算芯片的数据的存储空间;A moving unit, configured to move the DMA control information and the input data to a device memory according to the DMA control block pointer read by the reading unit and the total length determined by the determining unit; Device memory refers to the storage space used to store data for dedicated computing chips;
    计算单元,用于对所述输入数据进行相应的计算,得到输出数据;A calculation unit, configured to perform corresponding calculation on the input data to obtain output data;
    写入单元,用于将所述计算单元计算的所述输出数据写入所述设备内存;A writing unit, configured to write the output data calculated by the calculation unit into the device memory;
    获取单元,用于获取所述输出数据的长度;An obtaining unit, configured to obtain a length of the output data;
    所述搬移单元,还用于根据所述DMA控制信息以及所述获取单元获取的所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述DMA控制块。The moving unit is further configured to move the output data from the device memory to the DMA control block according to the DMA control information and a length of the output data obtained by the obtaining unit.
  7. 根据权利要求6所述的装置,其特征在于,所述DMA控制信息包括所述输出数据的偏移地址;The apparatus according to claim 6, wherein the DMA control information includes an offset address of the output data;
    所述搬移单元具体用于:The moving unit is specifically configured to:
    根据所述输出数据的偏移地址以及所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述DMA控制块。Moving the output data from the device memory to the DMA control block according to the offset address of the output data and the length of the output data.
  8. 根据权利要求6所述的装置,其特征在于,所述DMA控制信息具有固定长度;所述确定单元具体用于:The apparatus according to claim 6, wherein the DMA control information has a fixed length; and the determining unit is specifically configured to:
    从DMA长度寄存器中读取所述输入数据的长度;所述输入数据的长度是由所述通用CPU根据当前所执行的异构计算方法确定的;Read the length of the input data from the DMA length register; the length of the input data is determined by the general-purpose CPU according to the heterogeneous calculation method currently performed;
    根据所述固定长度以及所述输入数据的长度,确定所述总长度。The total length is determined according to the fixed length and the length of the input data.
PCT/CN2019/076252 2018-05-21 2019-02-27 Direct memory access method and device, dedicated computing chip and heterogeneous computing system WO2019223383A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810488487.0 2018-05-21
CN201810488487.0A CN110515872B (en) 2018-05-21 2018-05-21 Direct memory access method, device, special computing chip and heterogeneous computing system

Publications (1)

Publication Number Publication Date
WO2019223383A1 true WO2019223383A1 (en) 2019-11-28

Family

ID=68616539

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/076252 WO2019223383A1 (en) 2018-05-21 2019-02-27 Direct memory access method and device, dedicated computing chip and heterogeneous computing system

Country Status (3)

Country Link
CN (1) CN110515872B (en)
TW (1) TWI696949B (en)
WO (1) WO2019223383A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021052391A1 (en) 2019-09-18 2021-03-25 华为技术有限公司 Method for constructing intermediate representation, compiler and server
CN111190842B (en) * 2019-12-30 2021-07-20 Oppo广东移动通信有限公司 Direct memory access, processor, electronic device, and data transfer method
CN113342721B (en) * 2021-07-06 2022-09-23 无锡众星微***技术有限公司 DMA design method for memory controller

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1474568A (en) * 2002-08-06 2004-02-11 华为技术有限公司 Direct internal storage access system and method of multiple path data
CN1641613A (en) * 2003-12-05 2005-07-20 联发科技股份有限公司 Virtual first-in first-out direct storage accessing device
CN105512005A (en) * 2015-12-12 2016-04-20 中国航空工业集团公司西安航空计算技术研究所 Circuit and method for synchronous working of control/remote node and bus monitor node
CN106339338A (en) * 2016-08-31 2017-01-18 天津国芯科技有限公司 Data transmission method and device capable of improving system performance
CN106569736A (en) * 2015-10-10 2017-04-19 北京忆芯科技有限公司 Nvme protocol processor and processing method thereof

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5953538A (en) * 1996-11-12 1999-09-14 Digital Equipment Corporation Method and apparatus providing DMA transfers between devices coupled to different host bus bridges
GB2359906B (en) * 2000-02-29 2004-10-20 Virata Ltd Method and apparatus for DMA data transfer
US6904473B1 (en) * 2002-05-24 2005-06-07 Xyratex Technology Limited Direct memory access controller and method of filtering data during data transfer from a source memory to a destination memory
US7533198B2 (en) * 2005-10-07 2009-05-12 International Business Machines Corporation Memory controller and method for handling DMA operations during a page copy
CN100395737C (en) * 2006-06-08 2008-06-18 杭州华三通信技术有限公司 Method for transmitting data between internal memory and digital signal processor
US8250252B1 (en) * 2010-06-29 2012-08-21 Qlogic, Corporation System and methods for using a DMA module for a plurality of virtual machines
CN102467473B (en) * 2010-11-03 2015-02-11 Tcl集团股份有限公司 Method and device for transmitting data between user space and kernel
US9239796B2 (en) * 2011-05-24 2016-01-19 Ixia Methods, systems, and computer readable media for caching and using scatter list metadata to control direct memory access (DMA) receiving of network protocol data
CN103377170B (en) * 2012-04-26 2015-12-02 上海宝信软件股份有限公司 SPI high-speed bidirectional Peer Data Communication system between heterogeneous processor
CN103500149A (en) * 2013-09-29 2014-01-08 华为技术有限公司 Direct memory access controller and direct memory access control method
CN104317754B (en) * 2014-10-15 2017-03-15 中国人民解放军国防科学技术大学 The data transfer optimization method that strides towards heterogeneous computing system
CN105656805B (en) * 2016-01-20 2018-09-25 中国人民解放军国防科学技术大学 A kind of packet receiving method and device based on control block predistribution

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1474568A (en) * 2002-08-06 2004-02-11 华为技术有限公司 Direct internal storage access system and method of multiple path data
CN1641613A (en) * 2003-12-05 2005-07-20 联发科技股份有限公司 Virtual first-in first-out direct storage accessing device
CN106569736A (en) * 2015-10-10 2017-04-19 北京忆芯科技有限公司 Nvme protocol processor and processing method thereof
CN105512005A (en) * 2015-12-12 2016-04-20 中国航空工业集团公司西安航空计算技术研究所 Circuit and method for synchronous working of control/remote node and bus monitor node
CN106339338A (en) * 2016-08-31 2017-01-18 天津国芯科技有限公司 Data transmission method and device capable of improving system performance

Also Published As

Publication number Publication date
CN110515872B (en) 2020-07-31
TW202004494A (en) 2020-01-16
CN110515872A (en) 2019-11-29
TWI696949B (en) 2020-06-21

Similar Documents

Publication Publication Date Title
US20200159681A1 (en) Information processor with tightly coupled smart memory unit
CN110647480A (en) Data processing method, remote direct memory access network card and equipment
WO2019223383A1 (en) Direct memory access method and device, dedicated computing chip and heterogeneous computing system
US9710191B1 (en) Rapid memory buffer write storage system and method
US11308171B2 (en) Apparatus and method for searching linked lists
CN112214158B (en) Device and method for executing host output and input command and computer readable storage medium
KR102287677B1 (en) Data accessing method, apparatus, device, and storage medium
US10049035B1 (en) Stream memory management unit (SMMU)
US20210295607A1 (en) Data reading/writing method and system in 3d image processing, storage medium and terminal
JP2021515318A (en) NVMe-based data reading methods, equipment and systems
WO2015176664A1 (en) Data operation method, device and system
CN112506823A (en) FPGA data reading and writing method, device, equipment and readable storage medium
JP6679570B2 (en) Data processing device
WO2022068328A1 (en) Data migration method and apparatus, and processor and calculation device
US8200900B2 (en) Method and apparatus for controlling cache memory
CN116627867B (en) Data interaction system, method, large-scale operation processing method, equipment and medium
CN112035056B (en) Parallel RAM access equipment and access method based on multiple computing units
CN107807888B (en) Data prefetching system and method for SOC architecture
CN113742115A (en) Method for processing page fault by processor
TWI786476B (en) Processing and storage circuit
CN117312182B (en) Vector data dispersion method and device based on note storage and computer equipment
TWI799317B (en) Flash memory controller and method used in flash memory controller
US20230350797A1 (en) Flash-based storage device and copy-back operation method thereof
CN110245096B (en) Method for realizing direct connection of processor with expansion calculation module
US8296481B2 (en) Device and method for improving transfer efficiency of odd number of data blocks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19806451

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19806451

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/05/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19806451

Country of ref document: EP

Kind code of ref document: A1