WO2019223383A1 - Direct memory access method and device, dedicated computing chip and heterogeneous computing system - Google Patents
Direct memory access method and device, dedicated computing chip and heterogeneous computing system Download PDFInfo
- Publication number
- WO2019223383A1 WO2019223383A1 PCT/CN2019/076252 CN2019076252W WO2019223383A1 WO 2019223383 A1 WO2019223383 A1 WO 2019223383A1 CN 2019076252 W CN2019076252 W CN 2019076252W WO 2019223383 A1 WO2019223383 A1 WO 2019223383A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dma control
- length
- control block
- output data
- input data
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
Definitions
- One or more embodiments of the present specification relate to the field of computer technology, and in particular, to a direct memory access method, device, dedicated computing chip, and heterogeneous computing system.
- Heterogeneous computing refers to the control of the overall process of data processing by a general-purpose central processing unit (CPU).
- CPU central processing unit
- the general-purpose CPU calls a special-purpose computing chip for calculation.
- a general-purpose CPU needs to call a Direct Memory Access (DMA) method (a method of transferring memory data through a dedicated hardware module) to transfer input data of a dedicated calculation from a system memory to a device memory. After the special calculation chip completes the calculation, the output data is transmitted back to the system memory.
- DMA Direct Memory Access
- the transmission process of the input data may be: 1) accessing a pointer of the queue where the DMA descriptor is located to read the DMA descriptor of the input data (used to describe the address and length of the input data). 2) Access the DMA descriptor of the input data to read the address and length of the input data. 3) Read the input data according to the address and length of the input data.
- the transmission process of the output data can be: 1) access the pointer of the queue where the DMA descriptor is located to read the DMA descriptor of the output data (used to describe the address and length of the output data). 2) Access the DMA descriptor of the output data to read the address and length of the output data; 3) Write the output data according to the address and length of the output data.
- a process of heterogeneous computing needs to perform six access operations.
- One or more embodiments of the present specification describe a direct memory access method, a device, a dedicated computing chip, and a heterogeneous computing system, which can reduce the number of data accesses in a DMA transfer, thereby improving the performance of heterogeneous computing.
- a direct memory access method including:
- a corresponding DMA control block is determined in system memory, and the content of the DMA control block includes DMA control information and input data;
- the system memory is used to store the data used by the general-purpose central processing unit CPU. Data storage space;
- the device memory refers to a storage space for storing data of a dedicated computing chip
- a dedicated computing chip including: a direct memory access DMA length register, a DMA control block pointer queue, a DMA data transmission module, and a dedicated computing module;
- the DMA length register is used to store the length of the input data and the length of the output data
- the DMA control block pointer queue is used to store multiple DMA control block pointers; the DMA control block pointer points to a DMA control block in system memory; the content of the DMA control block includes DMA control information and input data;
- a DMA data transmission module configured to move the DMA control information and the input data from system memory to device memory according to the length of the input data, the length of the DMA control information, and the DMA control block pointer; Configured to move the output data from the device memory to the system memory according to the DMA control information and the length of the output data;
- the dedicated calculation module is configured to calculate the input data and obtain the output data.
- a heterogeneous computing system including: a general-purpose central processing unit CPU, system memory, a dedicated computing chip and device memory as provided in the second aspect above;
- the general-purpose CPU is configured to call the dedicated computing chip for heterogeneous computing
- the system memory is used to store data used by the general-purpose CPU
- the device memory is configured to store data used by the dedicated computing chip.
- a direct memory access device including:
- a determining unit configured to determine a corresponding DMA control block in system memory according to the DMA control block pointer read by the reading unit, and the content of the DMA control block includes DMA control information and input data;
- the system Memory refers to the storage space used to store data used by the general purpose central processing unit CPU;
- the determining unit is further configured to determine a total length of the DMA control information and the input data
- a moving unit configured to move the DMA control information and the input data to a device memory according to the DMA control block pointer read by the reading unit and the total length determined by the determining unit;
- Device memory refers to the storage space used to store data for dedicated computing chips;
- a calculation unit configured to perform corresponding calculation on the input data to obtain output data
- a writing unit configured to write the output data calculated by the calculation unit into the device memory
- An obtaining unit configured to obtain a length of the output data
- the moving unit is further configured to move the output data from the device memory to the DMA control block according to the DMA control information and a length of the output data obtained by the obtaining unit.
- the direct memory access method, device, dedicated computing chip, and heterogeneous computing system read a DMA control block pointer from a DMA control block pointer queue.
- the corresponding DMA control block is determined in the system memory. Determines the total length of DMA control information and input data in the DMA control block.
- the DMA control information and input data are moved to the device memory. Perform corresponding calculations on the input data to obtain the output data. Write the output data to the device memory and get the length of the output data.
- the output data is moved from the device memory to the DMA control block.
- the scheme provided in this specification performs the following two access operations during the input data transmission process: for the first time, the DMA control block pointer queue is accessed to read the DMA control block pointer. The second time, the DMA control block pointer is accessed to read the DMA control information and input data.
- the access to the DMA descriptor is reduced.
- the output data can be moved to the DMA control block directly according to the DMA control information. That is to say, the output data transmission process only needs to perform access to the output data once, and it is no longer necessary to execute the pointer operation of the queue where the DMA descriptor is located and the access operation of the output data descriptor.
- two DMA transfers can reduce three access operations. This can greatly improve the DMA transmission efficiency of the data, which in turn can improve the performance of heterogeneous computing.
- FIG. 1 is a schematic structural diagram of a heterogeneous computing system provided in this specification
- FIG. 2 is a flowchart of a direct memory access method according to an embodiment of the present specification
- FIG. 3 is a schematic diagram of a direct memory access device according to an embodiment of the present specification.
- the direct memory access method provided by an embodiment of this specification can be applied to a heterogeneous computing system as shown in FIG. 1.
- the heterogeneous computing system may include a general-purpose CPU 10, a system memory 20, a dedicated computing chip 30, and a device memory 40 Among them, the general-purpose CPU 10 and the dedicated computing chip 30 may also be referred to as two computing units in a heterogeneous computing system.
- the general-purpose CPU 10 is used to control the main data processing flow of heterogeneous computing. Specifically includes: a. Preprocessing and preparation of heterogeneous computing input data. b. Calling a special computing chip for heterogeneous computing. c. Query heterogeneous calculation results (also called output data) and return. d. Perform post-processing and output of heterogeneous calculation output data.
- the system memory 20 is used to store data used by the general-purpose CPU 10.
- it can store the used data in the form of a DMA control block (a data structure), which occupies a physically continuous address space in the system memory 20.
- the content of the DMA control block may include DMA control information, input data, and output data.
- the space occupied by the input data may also be referred to as an input data block.
- the space occupied by output data can also be referred to as an output data block.
- the general-purpose CPU 10 may determine the length of the input data and the length of the output data according to the current heterogeneous calculation method.
- the corresponding input data, the length of the input data, and the length of the output data are usually determined.
- the DMA control information can be constructed later, and the specific construction process will be described later.
- part of the content of the DMA control block (DMA control information and input data) is obtained.
- the general-purpose CPU 10 can write the part of the content into a physically continuous address space of the system memory 20. It can be understood that since the length of the output data is also determined, after the above part of the content, an address space of the above length is usually reserved continuously for writing output data.
- the above-mentioned DMA control block is constituted by an address space in which a part of content is written in the system memory 20 and a reserved address space.
- a DMA control block can be formed in the system memory 20.
- the above DMA control information may include: an offset address of input data, an offset address of output data, and a calculation completion flag.
- the offset address of the input data can occupy 32 bits (that is, 4 bytes), which can refer to the offset of the space occupied by the input data (or the input data block) relative to the start address of the DMA control block.
- the actual address of the input data in the system memory 20 may be determined according to the DMA control block start address and the offset address.
- the definition of the offset address of the output data is the same as the definition of the offset address of the input data.
- the calculation completion flag can occupy 1 bit (expanded to 4 bytes), which can be cleared by the general-purpose CPU 10 before heterogeneous calculation. After the heterogeneous calculation is completed, the dedicated calculation chip 30 rewrites the flag bit to 1.
- the CPU 10 polls the calculation completion flag to confirm whether the heterogeneous calculation is completed.
- the special-purpose calculation chip 30 is configured to cooperate with a general-purpose CPU to perform special-purpose calculation (for example, matrix multiplication and large number multiplication) functions.
- the dedicated computing chip 30 may be, for example, a Field Programmable Gate Array (FPGA) chip, an Application Specific Integrated Circuit (ASIC) chip, a Graphics Processing Unit (GPU) chip, or the like.
- the general-purpose CPU 10 has lower calculation efficiency, and uses a special-purpose calculation chip 30 for calculation, and has higher cost performance.
- the device memory 40 is used to store data of the dedicated computing chip 30. Specifically, when the heterogeneous calculation is started, the dedicated computing chip 30 may read the input data from the device memory 40. When the heterogeneous calculation is completed, the output data may be written into the device memory 40.
- the dedicated computing chip 30 may specifically include: a DMA length register 31, a DMA control block pointer queue 32, a DMA data transmission module 33, and a dedicated computing module 34.
- the DMA length register 31 is used to store the length of the input data and the length of the output data. Usually for a specific heterogeneous calculation, the length of the input data and the length of the output data are fixed. That is, the general-purpose CPU 10 may determine the foregoing length according to the heterogeneous calculation method currently performed.
- the DMA control block pointer queue 32 is used to store a plurality of DMA control block pointers.
- the DMA control block pointers point to the DMA control blocks in the system memory 20, which can occupy 32 bits. Specifically, each time a DMA control block is formed in the system memory 20, the general-purpose CPU 10 can write a DMA control block pointer corresponding to the DMA control block to the DMA control block pointer queue 32. Since one DMA control block can be formed for one heterogeneous calculation, the DMA control block pointer also corresponds to one heterogeneous calculation.
- multiple DMA control block pointers in the DMA control block pointer queue 32 can be read at the same time, so that multiple heterogeneous calculations can be performed in parallel. Greatly improve the efficiency of heterogeneous computing. It should be noted that the multiple heterogeneous calculations here belong to the same type, for example, they are all encrypted calculations.
- the DMA data transmission module 33 is used to move the DMA control information and input data from the system memory 20 to the device memory 40 according to the length of the input data, the length of the DMA control information, and the DMA control block pointer; it is also used to output the data according to the DMA control information and output. The length of the data, the output data is moved from the device memory 40 to the system memory 20.
- the special-purpose calculation module 34 is used to implement a special-purpose calculation function. Specifically, it is used to calculate input data and obtain output data.
- the general-purpose CPU 10 in FIG. 1 can call a special-purpose computing chip 30 to perform heterogeneous calculations.
- this specification improves the direct memory access method. Before executing the direct memory access method for heterogeneous computing provided in this manual, the following steps can be performed first:
- the general-purpose CPU 10 determines the length of the input data and the length of the output data according to the current heterogeneous calculation method, and writes the length of the input data and the length of the output data into the DMA length register 31.
- the general-purpose CPU 10 prepares input data for heterogeneous calculation and constructs DMA control information.
- the DMA control information may include: an offset address of input data, an offset address of output data, and a calculation completion flag.
- the offset address of the input data can be determined according to the length of the DMA control information.
- an address space of the above length can be continuously reserved for writing the output data.
- the address space in which data is written in the system memory 20 and the reserved address space constitute a DMA control block.
- the general-purpose CPU 10 writes a DMA control block pointer into the DMA control block pointer queue 32, and the DMA control block pointer points to the start address of the formed DMA control block.
- step 1) can be performed only once, while steps 2) and 3) are It can be executed multiple times based on the number of heterogeneous calculations.
- FIG. 2 is a flowchart of a direct memory access method according to an embodiment of the present specification.
- the execution subject of the method may be a dedicated computing chip 30 in FIG. 1.
- the method may specifically include:
- Step 210 Read the DMA control block pointer from the DMA control block pointer queue.
- the DMA control block pointer here points to the start address of the DMA control block, so that the content of the DMA control block can be directly accessed according to the pointer, thereby reducing the number of accesses to the system memory 20, and thus reducing the DMA transfer delay.
- the dedicated computing chip 30 may poll to check whether the DMA control block pointer queue 32 is empty. If it is not empty, the DMA control block pointer can be read from the head of the queue. Since the DMA control block pointer queue 32 provided in this specification can store multiple DMA control block pointers at the same time, it can more conveniently support DMA asynchronous operations, more conveniently support multiple processes to perform DMA operations independently, and improve DMA transmission efficiency.
- Step 220 Determine the corresponding DMA control block in the system memory according to the DMA control block pointer.
- the DMA control block A when the DMA control block pointer A is read, the DMA control block A can be determined; when the DMA control block pointer B is read, the DMA control block B can be determined. It can be understood that the content of the DMA control block read in this step only includes DMA control information and input data.
- Step 230 Determine the total length of the DMA control information and the input data.
- the DMA control information in this specification may include: an offset address of input data, an offset address of output data, and a calculation completion flag, and it has a fixed length.
- the fixed length is the sum of the lengths of the three.
- the above-mentioned determination process of the total length may be: reading the length of the input data from the DMA length register 31. Determine the total length based on the fixed length and the length of the input data.
- Step 240 Move the DMA control information and input data to the device memory according to the DMA control block pointer and the total length.
- the DMA data transmission module 33 may move the DMA control information and input data to the device memory 40 according to the DMA control block pointer and the total length.
- a physically continuous address space may be divided in the device memory 40 first. After that, the DMA control information and input data can be read from the corresponding DMA control block according to the DMA control block pointer and the total length.
- the above-mentioned read operation can also be understood as a continuous read read operation.
- the DMA control information and the input data are written into a preliminarily divided physically continuous address space in the device memory 40. It can be understood that, after the foregoing write operation is performed, the start addresses of the DMA control information and the input data in the device memory 40 are determined.
- Step 250 Perform corresponding calculations on the input data to obtain output data.
- the dedicated calculation module 34 may be called to perform corresponding calculations on the input data.
- the actual address of the input data in the device memory 40 may be determined according to the start address determined in step 240 and the offset address of the input data in the DMA control information. After that, the data input can be read from the device memory 40 according to the actual address, and a dedicated calculation module 34 is called to perform corresponding calculations on the input data.
- Step 260 Write the output data to the device memory.
- the actual address of the output data in the device memory 40 may be determined according to the start address determined in step 240 and the offset address of the output data in the DMA control information. After that, the output data can be written into the storage space corresponding to the actual address in the device memory 40.
- Step 270 Obtain the length of the output data.
- the length of the output data may be read from the DMA length register 31.
- Step 280 Move the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data.
- the output data may be moved from the device memory 40 to the DMA control block according to the offset address of the output data and the length of the output data in the DMA control information.
- the moving process may specifically include: obtaining a DMA control information and a start address of the input data in the device memory 40.
- the first actual address of the output data in the device memory 40 is determined according to the offset address and the start address.
- the second actual address of the output data in the DMA control block is determined.
- the dedicated computing chip 30 may rewrite the calculation completion flag in the DMA control information, for example, the calculation completion flag may be rewritten to 1.
- the general-purpose CPU 10 may poll the calculation completion flag. When the calculation completion flag is 1, it indicates that the heterogeneous calculation is completed, and the output data in the system memory 20 may be used.
- the direct memory access method provided in the embodiment of the present specification can avoid access to the system memory by DMA transmission of separate output data, and obtains the offset address of the output data while obtaining the input data from the system memory. Therefore, after the heterogeneous calculation is completed, the output data is directly moved according to the offset address. It avoids the operation of the general-purpose CPU and reduces the delay of the entire heterogeneous calculation.
- the DMA block pointer queue provided in this specification only needs to write a 32-bit DMA block pointer at a time, and the amount of data is very small, which directly corresponds to an atomic write operation of a general-purpose CPU, which improves the efficiency of concurrent operations of multiple processes.
- an embodiment of the present specification further provides a direct memory access device.
- the device may include:
- the reading unit 301 is configured to read a DMA control block pointer from a direct memory access DMA control block pointer queue.
- the determining unit 302 is configured to determine a corresponding DMA control block in the system memory according to the DMA control block pointer read by the reading unit 301, and the content of the DMA control block includes DMA control information and input data.
- the above system memory refers to a storage space for storing data used by a general-purpose central processing unit CPU.
- the determining unit 302 is further configured to determine a total length of the DMA control information and the input data.
- the DMA control information may have a fixed length.
- the determining unit 302 may be specifically configured to:
- the length of the input data is determined by the general-purpose CPU according to the heterogeneous calculation method currently performed.
- the moving unit 303 is configured to move the DMA control information and the input data to the device memory according to the DMA control block pointer read by the reading unit 301 and the total length determined by the determining unit 32.
- the device memory refers to the storage space used to store data for a dedicated computing chip.
- the moving unit 303 here may be implemented by the DMA data transmission module 33 in FIG. 1.
- the calculation unit 304 is configured to perform corresponding calculation on the input data to obtain output data.
- the calculation unit 304 here may be implemented by a dedicated calculation module 34 in FIG. 1.
- the writing unit 305 is configured to write output data calculated by the calculation unit 304 into a device memory.
- the obtaining unit 306 is configured to obtain a length of the output data.
- the moving unit 303 is further configured to move the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data obtained by the obtaining unit 306.
- the above DMA control information may include an offset address of output data.
- the moving unit 303 can be specifically used for:
- the output data is moved from the device memory to the DMA control block.
- the reading unit 301 reads a DMA control block pointer from a direct memory access DMA control block pointer queue.
- the determining unit 302 determines a corresponding DMA control block in the system memory according to the DMA control block pointer.
- the determining unit 302 is further configured to determine a total length of the DMA control information and the input data.
- the moving unit 303 moves the DMA control information and the input data to the device memory according to the DMA control block pointer and the total length.
- the calculation unit 304 performs corresponding calculations on the input data to obtain output data.
- the writing unit 305 writes the output data into the device memory.
- the obtaining unit 306 obtains the length of the output data.
- the moving unit 303 moves the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data. This improves the performance of heterogeneous computing.
- the direct memory access device provided in the embodiment of the present specification may be a module or a unit in the dedicated computing chip 30 in FIG. 1.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bus Control (AREA)
Abstract
Description
Claims (8)
- 一种直接内存存取方法,其特征在于,包括:A direct memory access method is characterized in that it includes:从直接内存存取DMA控制块指针队列中读取DMA控制块指针;Read the DMA control block pointer from the direct memory access DMA control block pointer queue;根据所述DMA控制块指针,在***内存中确定对应的DMA控制块,所述DMA控制块的内容包括DMA控制信息和输入数据;所述***内存是指用于存储通用中央处理器CPU使用的数据的存储空间;According to the DMA control block pointer, a corresponding DMA control block is determined in system memory, and the content of the DMA control block includes DMA control information and input data; the system memory is used to store the data used by the general-purpose central processing unit CPU Data storage space;确定所述DMA控制信息和所述输入数据的总长度;Determining a total length of the DMA control information and the input data;根据所述DMA控制块指针以及所述总长度,将所述DMA控制信息和所述输入数据搬移至设备内存;所述设备内存是指用于存储专用计算芯片的数据的存储空间;Moving the DMA control information and the input data to a device memory according to the DMA control block pointer and the total length; the device memory refers to a storage space for storing data of a dedicated computing chip;对所述输入数据进行相应的计算,得到输出数据;Performing corresponding calculation on the input data to obtain output data;将所述输出数据写入所述设备内存;Writing the output data into the device memory;获取所述输出数据的长度;Obtaining the length of the output data;根据所述DMA控制信息以及所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述DMA控制块。Moving the output data from the device memory to the DMA control block according to the DMA control information and the length of the output data.
- 根据权利要求1所述的方法,其特征在于,所述DMA控制信息包括所述输出数据的偏移地址;The method according to claim 1, wherein the DMA control information includes an offset address of the output data;所述根据所述DMA控制信息以及所述输出数据的长度,将所述设备内存的所述输出数据搬移到所述DMA控制块,包括:Moving the output data of the device memory to the DMA control block according to the DMA control information and the length of the output data includes:根据所述输出数据的偏移地址以及所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述DMA控制块。Moving the output data from the device memory to the DMA control block according to the offset address of the output data and the length of the output data.
- 根据权利要求1所述的方法,其特征在于,所述DMA控制信息具有固定长度;所述确定所述DMA控制信息和所述输入数据的总长度,包括:The method according to claim 1, wherein the DMA control information has a fixed length; and determining the total length of the DMA control information and the input data comprises:从DMA长度寄存器中读取所述输入数据的长度;所述输入数据的长度是由所述通用CPU根据当前所执行的异构计算方法确定的;Read the length of the input data from the DMA length register; the length of the input data is determined by the general-purpose CPU according to the heterogeneous calculation method currently performed;根据所述固定长度以及所述输入数据的长度,确定所述总长度。The total length is determined according to the fixed length and the length of the input data.
- 一种专用计算芯片,其特征在于,包括:直接内存存取DMA长度寄存器、DMA控制块指针队列、DMA数据传输模块以及专用计算模块;A special-purpose computing chip, comprising: a direct memory access DMA length register, a DMA control block pointer queue, a DMA data transmission module, and a special-purpose calculation module;所述DMA长度寄存器,用于存储输入数据的长度以及输出数据的长度;The DMA length register is used to store the length of the input data and the length of the output data;所述DMA控制块指针队列,用于存储多个DMA控制块指针;所述DMA控制块指针指向***内存中的DMA控制块;所述DMA控制块的内容包括DMA控制信息和输入数据;The DMA control block pointer queue is used to store multiple DMA control block pointers; the DMA control block pointer points to a DMA control block in system memory; the content of the DMA control block includes DMA control information and input data;DMA数据传输模块,用于根据所述输入数据的长度、所述DMA控制信息的长度以及所述DMA控制块指针,将所述DMA控制信息以及所述输入数据从***内存搬移至设备内存;还用于根据所述DMA控制信息以及所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述***内存;A DMA data transmission module, configured to move the DMA control information and the input data from system memory to device memory according to the length of the input data, the length of the DMA control information, and the DMA control block pointer; Configured to move the output data from the device memory to the system memory according to the DMA control information and the length of the output data;所述专用计算模块,用于对所述输入数据进行计算,并得到所述输出数据。The dedicated calculation module is configured to calculate the input data and obtain the output data.
- 一种异构计算***,其特征在于,包括:通用中央处理器CPU、***内存、如权利要求4所述的专用计算芯片和设备内存;A heterogeneous computing system, comprising: a general-purpose central processing unit (CPU), a system memory, the dedicated computing chip according to claim 4, and a device memory;所述通用CPU,用于调用所述专用计算芯片进行异构计算;The general-purpose CPU is configured to call the dedicated computing chip for heterogeneous computing;所述***内存,用于存储所述通用CPU使用的数据;The system memory is used to store data used by the general-purpose CPU;所述设备内存,用于存储所述专用计算芯片所使用的数据。The device memory is configured to store data used by the dedicated computing chip.
- 一种直接内存存取装置,其特征在于,包括:A direct memory access device, comprising:读取单元,用于从直接内存存取DMA控制块指针队列中读取DMA控制块指针;A reading unit for reading a DMA control block pointer from a direct memory access DMA control block pointer queue;确定单元,用于根据所述读取单元读取的所述DMA控制块指针,在***内存中确定对应的DMA控制块,所述DMA控制块的内容包括DMA控制信息和输入数据;所述***内存是指用于存储通用中央处理器CPU使用的数据的存储空间;A determining unit, configured to determine a corresponding DMA control block in system memory according to the DMA control block pointer read by the reading unit, and the content of the DMA control block includes DMA control information and input data; the system Memory refers to the storage space used to store data used by the general purpose central processing unit CPU;所述确定单元,还用于确定所述DMA控制信息和所述输入数据的总长度;The determining unit is further configured to determine a total length of the DMA control information and the input data;搬移单元,用于根据所述读取单元读取的所述DMA控制块指针以及所述确定单元确定的所述总长度,将所述DMA控制信息和所述输入数据搬移至设备内存;所述设备内存是指用于存储专用计算芯片的数据的存储空间;A moving unit, configured to move the DMA control information and the input data to a device memory according to the DMA control block pointer read by the reading unit and the total length determined by the determining unit; Device memory refers to the storage space used to store data for dedicated computing chips;计算单元,用于对所述输入数据进行相应的计算,得到输出数据;A calculation unit, configured to perform corresponding calculation on the input data to obtain output data;写入单元,用于将所述计算单元计算的所述输出数据写入所述设备内存;A writing unit, configured to write the output data calculated by the calculation unit into the device memory;获取单元,用于获取所述输出数据的长度;An obtaining unit, configured to obtain a length of the output data;所述搬移单元,还用于根据所述DMA控制信息以及所述获取单元获取的所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述DMA控制块。The moving unit is further configured to move the output data from the device memory to the DMA control block according to the DMA control information and a length of the output data obtained by the obtaining unit.
- 根据权利要求6所述的装置,其特征在于,所述DMA控制信息包括所述输出数据的偏移地址;The apparatus according to claim 6, wherein the DMA control information includes an offset address of the output data;所述搬移单元具体用于:The moving unit is specifically configured to:根据所述输出数据的偏移地址以及所述输出数据的长度,将所述输出数据从所述设备内存搬移至所述DMA控制块。Moving the output data from the device memory to the DMA control block according to the offset address of the output data and the length of the output data.
- 根据权利要求6所述的装置,其特征在于,所述DMA控制信息具有固定长度;所述确定单元具体用于:The apparatus according to claim 6, wherein the DMA control information has a fixed length; and the determining unit is specifically configured to:从DMA长度寄存器中读取所述输入数据的长度;所述输入数据的长度是由所述通用CPU根据当前所执行的异构计算方法确定的;Read the length of the input data from the DMA length register; the length of the input data is determined by the general-purpose CPU according to the heterogeneous calculation method currently performed;根据所述固定长度以及所述输入数据的长度,确定所述总长度。The total length is determined according to the fixed length and the length of the input data.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810488487.0 | 2018-05-21 | ||
CN201810488487.0A CN110515872B (en) | 2018-05-21 | 2018-05-21 | Direct memory access method, device, special computing chip and heterogeneous computing system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019223383A1 true WO2019223383A1 (en) | 2019-11-28 |
Family
ID=68616539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/076252 WO2019223383A1 (en) | 2018-05-21 | 2019-02-27 | Direct memory access method and device, dedicated computing chip and heterogeneous computing system |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN110515872B (en) |
TW (1) | TWI696949B (en) |
WO (1) | WO2019223383A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021052391A1 (en) | 2019-09-18 | 2021-03-25 | 华为技术有限公司 | Method for constructing intermediate representation, compiler and server |
CN111190842B (en) * | 2019-12-30 | 2021-07-20 | Oppo广东移动通信有限公司 | Direct memory access, processor, electronic device, and data transfer method |
CN113342721B (en) * | 2021-07-06 | 2022-09-23 | 无锡众星微***技术有限公司 | DMA design method for memory controller |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1474568A (en) * | 2002-08-06 | 2004-02-11 | 华为技术有限公司 | Direct internal storage access system and method of multiple path data |
CN1641613A (en) * | 2003-12-05 | 2005-07-20 | 联发科技股份有限公司 | Virtual first-in first-out direct storage accessing device |
CN105512005A (en) * | 2015-12-12 | 2016-04-20 | 中国航空工业集团公司西安航空计算技术研究所 | Circuit and method for synchronous working of control/remote node and bus monitor node |
CN106339338A (en) * | 2016-08-31 | 2017-01-18 | 天津国芯科技有限公司 | Data transmission method and device capable of improving system performance |
CN106569736A (en) * | 2015-10-10 | 2017-04-19 | 北京忆芯科技有限公司 | Nvme protocol processor and processing method thereof |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5953538A (en) * | 1996-11-12 | 1999-09-14 | Digital Equipment Corporation | Method and apparatus providing DMA transfers between devices coupled to different host bus bridges |
GB2359906B (en) * | 2000-02-29 | 2004-10-20 | Virata Ltd | Method and apparatus for DMA data transfer |
US6904473B1 (en) * | 2002-05-24 | 2005-06-07 | Xyratex Technology Limited | Direct memory access controller and method of filtering data during data transfer from a source memory to a destination memory |
US7533198B2 (en) * | 2005-10-07 | 2009-05-12 | International Business Machines Corporation | Memory controller and method for handling DMA operations during a page copy |
CN100395737C (en) * | 2006-06-08 | 2008-06-18 | 杭州华三通信技术有限公司 | Method for transmitting data between internal memory and digital signal processor |
US8250252B1 (en) * | 2010-06-29 | 2012-08-21 | Qlogic, Corporation | System and methods for using a DMA module for a plurality of virtual machines |
CN102467473B (en) * | 2010-11-03 | 2015-02-11 | Tcl集团股份有限公司 | Method and device for transmitting data between user space and kernel |
US9239796B2 (en) * | 2011-05-24 | 2016-01-19 | Ixia | Methods, systems, and computer readable media for caching and using scatter list metadata to control direct memory access (DMA) receiving of network protocol data |
CN103377170B (en) * | 2012-04-26 | 2015-12-02 | 上海宝信软件股份有限公司 | SPI high-speed bidirectional Peer Data Communication system between heterogeneous processor |
CN103500149A (en) * | 2013-09-29 | 2014-01-08 | 华为技术有限公司 | Direct memory access controller and direct memory access control method |
CN104317754B (en) * | 2014-10-15 | 2017-03-15 | 中国人民解放军国防科学技术大学 | The data transfer optimization method that strides towards heterogeneous computing system |
CN105656805B (en) * | 2016-01-20 | 2018-09-25 | 中国人民解放军国防科学技术大学 | A kind of packet receiving method and device based on control block predistribution |
-
2018
- 2018-05-21 CN CN201810488487.0A patent/CN110515872B/en active Active
-
2019
- 2019-02-21 TW TW108105818A patent/TWI696949B/en active
- 2019-02-27 WO PCT/CN2019/076252 patent/WO2019223383A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1474568A (en) * | 2002-08-06 | 2004-02-11 | 华为技术有限公司 | Direct internal storage access system and method of multiple path data |
CN1641613A (en) * | 2003-12-05 | 2005-07-20 | 联发科技股份有限公司 | Virtual first-in first-out direct storage accessing device |
CN106569736A (en) * | 2015-10-10 | 2017-04-19 | 北京忆芯科技有限公司 | Nvme protocol processor and processing method thereof |
CN105512005A (en) * | 2015-12-12 | 2016-04-20 | 中国航空工业集团公司西安航空计算技术研究所 | Circuit and method for synchronous working of control/remote node and bus monitor node |
CN106339338A (en) * | 2016-08-31 | 2017-01-18 | 天津国芯科技有限公司 | Data transmission method and device capable of improving system performance |
Also Published As
Publication number | Publication date |
---|---|
CN110515872B (en) | 2020-07-31 |
TW202004494A (en) | 2020-01-16 |
CN110515872A (en) | 2019-11-29 |
TWI696949B (en) | 2020-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200159681A1 (en) | Information processor with tightly coupled smart memory unit | |
CN110647480A (en) | Data processing method, remote direct memory access network card and equipment | |
WO2019223383A1 (en) | Direct memory access method and device, dedicated computing chip and heterogeneous computing system | |
US9710191B1 (en) | Rapid memory buffer write storage system and method | |
US11308171B2 (en) | Apparatus and method for searching linked lists | |
CN112214158B (en) | Device and method for executing host output and input command and computer readable storage medium | |
KR102287677B1 (en) | Data accessing method, apparatus, device, and storage medium | |
US10049035B1 (en) | Stream memory management unit (SMMU) | |
US20210295607A1 (en) | Data reading/writing method and system in 3d image processing, storage medium and terminal | |
JP2021515318A (en) | NVMe-based data reading methods, equipment and systems | |
WO2015176664A1 (en) | Data operation method, device and system | |
CN112506823A (en) | FPGA data reading and writing method, device, equipment and readable storage medium | |
JP6679570B2 (en) | Data processing device | |
WO2022068328A1 (en) | Data migration method and apparatus, and processor and calculation device | |
US8200900B2 (en) | Method and apparatus for controlling cache memory | |
CN116627867B (en) | Data interaction system, method, large-scale operation processing method, equipment and medium | |
CN112035056B (en) | Parallel RAM access equipment and access method based on multiple computing units | |
CN107807888B (en) | Data prefetching system and method for SOC architecture | |
CN113742115A (en) | Method for processing page fault by processor | |
TWI786476B (en) | Processing and storage circuit | |
CN117312182B (en) | Vector data dispersion method and device based on note storage and computer equipment | |
TWI799317B (en) | Flash memory controller and method used in flash memory controller | |
US20230350797A1 (en) | Flash-based storage device and copy-back operation method thereof | |
CN110245096B (en) | Method for realizing direct connection of processor with expansion calculation module | |
US8296481B2 (en) | Device and method for improving transfer efficiency of odd number of data blocks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19806451 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19806451 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/05/2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19806451 Country of ref document: EP Kind code of ref document: A1 |