WO2017173755A1 - 片上数据划分读写方法、***及其装置 - Google Patents

片上数据划分读写方法、***及其装置 Download PDF

Info

Publication number
WO2017173755A1
WO2017173755A1 PCT/CN2016/094168 CN2016094168W WO2017173755A1 WO 2017173755 A1 WO2017173755 A1 WO 2017173755A1 CN 2016094168 W CN2016094168 W CN 2016094168W WO 2017173755 A1 WO2017173755 A1 WO 2017173755A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
chip
read
storage medium
splicing
Prior art date
Application number
PCT/CN2016/094168
Other languages
English (en)
French (fr)
Inventor
陈天石
杜子东
郭崎
陈云霁
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Priority to US16/071,458 priority Critical patent/US10496597B2/en
Publication of WO2017173755A1 publication Critical patent/WO2017173755A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • G06F15/7846On-chip cache and off-chip main memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/781On-chip cache; Off-chip memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device

Definitions

  • the present invention relates to the field of information security and integrated circuits, and in particular, to a hardware security design, and more particularly to an on-chip data partitioning method, system and device thereof.
  • bandwidth is a major bottleneck limiting the performance of the accelerator.
  • a common solution is to balance the bandwidth imbalance by placing an on-chip cache. These common solutions do not optimize the data read and write, so the data characteristics are not well utilized, the on-chip storage overhead is too large, and the data read and write overhead is too large.
  • the data is mostly reusable, that is, the same data will be used multiple times, so that the data has the same part, such as the weight in the neural network.
  • an object of the present invention is to provide an on-chip data partitioning and reading system and an implementation method thereof, aiming at efficiently reading and writing repeated data, thereby reducing the memory bandwidth requirement and providing good flexibility. Reduce on-chip storage overhead.
  • the present invention provides an on-chip data partitioning and reading method, including:
  • the data dividing step stores the on-chip data in different areas according to the data partitioning strategy, and stores the on-chip storage medium and the off-chip storage medium respectively;
  • the on-chip storage data and the off-chip input data are spliced according to a data splicing strategy to obtain a raw data representation.
  • the method for dividing and reading data on the chip according to the present invention further includes:
  • the read and write ports are separated in the storing step, and the reading and writing of data are independent of each other;
  • the pre-operational steps further include:
  • the external input data includes the off-chip input data and data directly read by the read/write port.
  • the data storing step further includes:
  • the data to be stored is written to the corresponding storage location according to the write address.
  • the data dividing step further includes:
  • the address space is divided into an off-chip data space and an on-chip data space;
  • the data replacement policy includes sequential replacement, reverse order replacement, and random replacement
  • the data partitioning strategy includes fixed point number division and floating point number division;
  • the data splicing step further includes:
  • the data splicing step is performed by an on-chip off-chip data path or an on-chip data path, and the on-chip off-chip data path includes PCI, PCIE, and HT interconnection technologies, and the on-chip data path includes FAT-TREE and H-TREE.
  • Interconnect technology, on-chip off-chip data connection includes multi-chip interconnect structure;
  • the data in the on-chip storage medium or the off-chip storage medium may be read and written one or more times, the data may be read to one or more on-chip computing units; the on-chip storage medium Or the off-chip storage medium can be read and written by one or more externally, and the medium can be read and written internally or one or more times.
  • the invention provides an on-chip data division and reading system, comprising:
  • a data dividing module configured to divide on-chip storage data into different areas according to a data partitioning strategy, and store the same in an on-chip storage medium and an off-chip storage medium;
  • a pre-operation module configured to perform an operation process on the on-chip address index of the on-chip stored data in advance when performing data splicing
  • the data splicing module is configured to splicing the on-chip storage data and the off-chip input data according to the data splicing strategy to obtain the original data representation.
  • the on-chip data partitioning and reading system further includes:
  • a storage module configured to carry the on-chip storage data of the on-chip storage medium and the off-chip input data from the off-chip storage medium;
  • the storage module is separated by a read/write port, and data read and write are independent of each other;
  • the pre-processing module further includes:
  • An on-chip processing sub-module configured to perform processing on the on-chip storage data
  • the external input data includes the off-chip input data and data directly read by the read/write port.
  • the storage module further includes:
  • An address index interface configured to index the on-chip storage data according to an on-chip address index
  • a data readout interface configured to output an output exit of the on-chip stored data
  • a data write interface for writing data to be stored to the corresponding storage location based on the write address.
  • the on-chip data partitioning and reading system further includes:
  • the data division module further includes:
  • the address division sub-module is used for dividing the address space into an off-chip data space and an on-chip data space;
  • a data replacement submodule configured to perform data replacement between the on-chip storage medium and an off-chip storage medium according to a data replacement policy;
  • the data replacement policy includes sequential replacement, reverse-order replacement, and random replacement;
  • the data partitioning strategy includes fixed point number division and floating point number division; the data division module is implemented based on one or more on-chip computing units in the chip, and the on-chip computing unit initiates a read and write request Raw data obtained by splicing;
  • the data splicing module further includes:
  • An index splicing sub-module wherein the form of on-chip off-chip data transmission is converted from a raw data representation to a full or partial data index, and the result of arranging all or part of the data index on the slice obtains the original data representation;
  • the data splicing module reads and writes through an on-chip off-chip data path or an on-chip data path, and the on-chip off-chip data path includes a PCI, PCIE, and HT interconnection technology, and the on-chip data path includes FAT-TREE, H. -TREE interconnection technology, on-chip off-chip data connection mode includes multi-chip interconnect structure;
  • the data in the on-chip storage medium or the off-chip storage medium is read and written one or more times, the data is read to one or more on-chip computing units; the on-chip storage medium or the slice
  • the external storage medium is read and written from the outside one or more times, and the on-chip storage medium is read and written from the inside one or more times.
  • the present invention provides an on-chip read/write device comprising the on-chip data partition read/write system according to any of the above.
  • the on-chip read/write device includes a static random access memory, a dynamic random access memory, an enhanced dynamic random access memory, a register file, and a nonvolatile memory or a 3D memory device.
  • FIG. 1 is a schematic structural diagram of an on-chip data division and reading system according to the present invention.
  • FIG. 2 is a schematic structural diagram of the on-chip data division and reading system according to a preferred embodiment of the present invention
  • FIG. 3A is a schematic diagram of an implementation of an on-chip data partitioning strategy according to the present invention.
  • FIG. 3B is a second schematic diagram of the implementation of the on-chip data partitioning strategy of the present invention.
  • FIG. 4 is a schematic diagram of an embodiment of an on-chip data indexing of the on-chip data partitioning and reading system according to the present invention.
  • FIG. 5 is a schematic diagram of a physical framework of a method for dividing and reading data on-chip according to the present invention
  • FIG. 6 is a physical design framework diagram of a data splicing operation of an embodiment of the on-chip data partitioning and reading method according to the present invention
  • FIG. 7 is a schematic flow chart of the method for dividing and reading data on the chip in the present invention.
  • FIG. 8 is a schematic flow chart of a specific embodiment of the method for dividing and reading data on the chip in the present invention.
  • the data stored on the on-chip of the accelerator is very limited, and all the data needs to be divided into data blocks that can be stored on the chip, and the data is exchanged through the off-chip large storage medium and the on-chip small storage medium.
  • the data block needs to be read in or written out.
  • FIG. 1 shows an on-chip data partitioning and reading system 100, which includes:
  • the data dividing module 10 is configured to divide the on-chip storage data into different areas according to the data partitioning strategy, and store the on-chip storage medium and the off-chip storage medium respectively;
  • the pre-operation module 20 is configured to perform an operation process on the on-chip address index of the on-chip stored data in advance when performing data splicing;
  • the data splicing module 30 is configured to splicing the on-chip storage data and the off-chip input data according to the data splicing strategy to obtain the original data representation.
  • the data stored on the on-chip of the accelerator is very limited, and all the data needs to be divided into data blocks that can be stored on the chip.
  • the data interaction between the large-scale off-chip storage medium and the on-chip small storage medium will be The required data block is read in or written out.
  • the on-chip data address is provided to the on-chip computing unit as needed by the on-chip address index, and the physical frame is as shown in FIG. 5; the embodiment shown in FIG. 2 and FIG. 3 is only a typical case involved in the present invention, and the present invention It is not limited to a specific data division, and extreme cases such as data are all on-chip, or the data is all divided off-chip, and it is also within the scope of the present invention.
  • the on-chip data partitioning and reading system 100 of the present invention further includes:
  • a storage module 40 configured to store the on-chip storage data of the on-chip storage medium and the off-chip input data from the off-chip storage medium;
  • the storage module 40 is separated by a read/write port, and data read and write are independent of each other;
  • the pre-processing module 20 further includes:
  • the on-chip processing sub-module 21 is configured to perform processing on the on-chip storage data
  • the storage module 40 further includes:
  • An address indexing interface 41 configured to index the on-chip storage data according to an on-chip address index
  • a data readout interface 42 for outputting an output to the on-chip storage data
  • the data write interface 43 is configured to write the data to be stored into the corresponding storage location according to the write address.
  • the on-chip data partitioning and reading system 100 preferably the data dividing module 10 further includes:
  • the address division sub-module 11 is configured to divide the address space into an off-chip data space and an on-chip data space;
  • a data replacement sub-module 12 configured to perform data replacement between the on-chip storage medium and an off-chip storage medium according to a data replacement policy;
  • the data replacement policy includes sequential replacement, reverse-order replacement, and random replacement;
  • the data partitioning strategy includes fixed point number division and floating point number division; as a typical example, as shown in FIG. 3A, it is a data division of a fixed point number embodiment, which divides the fixed point data into an integer part and a fractional part, FIG. 3B
  • the data partitioning of one floating point embodiment is shown. This division divides the floating point number into an exponential part and a fractional part.
  • the division of the embodiment shown in FIG. 3A and FIG. 3B is only a typical case involved in the present invention, and the present invention is not limited to a specific data division. In an extreme case, such as data is all on the chip, or the data is all divided off-chip.
  • the on-chip cache structure includes a buffer for input data, and within the scope of the present invention, the address division sub-module 11 maps the address space of the index to the off-chip data space and the on-chip data space, and passes when necessary.
  • the data replacement sub-module 12 performs the exchange, which will require the transfer of accelerated data processing to the on-chip.
  • the data partitioning module 10 is implemented based on one or more on-chip computing units in the chip, the on-chip computing unit initiating a read and write request and processing the stitched raw data.
  • the data splicing module 30 further includes:
  • the index splicing sub-module 31 is configured to convert the original off-chip data transmission form from a raw data representation to a full or partial data index, and splicing all or part of the data index on the slice to obtain the original data representation;
  • the data splicing module 30 reads and writes through an on-chip off-chip data path or an on-chip data path
  • the on-chip off-chip data path includes a PCI (Peripheral Component Interconnect), a PCIE (Bus and Interface Standard, Peripheral Component Interface Express), HT Interconnect Technology (Hyper Transport, HyperTransport, is a new type of new, scalable, High-speed, high-performance end-to-end integrated circuit interconnect bus technology
  • the on-chip data path includes FAT-TREE, H-TREE interconnect technology (hierarchy tree)
  • on-chip off-chip data connection mode includes multi-chip interconnection Structure
  • the on-chip off-chip data connection shown in Figure 1 is not limited to PCIE bus connections, but also includes multi-chip interconnect structures such as on-chip networks.
  • the data path of the on-chip computing unit and the on-chip storage medium shown in FIG. 1 is not limited to H-TREE, or an interconnection technology such as FAT-TREE, and can be addressed off-chip through an off-chip off-chip data path, so that the on-chip data
  • the partitioning and reading system 100 can accurately restore various data that needs to be spliced into original data, and can effectively support different data partitioning strategies, thereby reducing on-chip off-chip data exchange.
  • the data in the on-chip storage medium or the off-chip storage medium is read and written one or more times, the data is read to one or more on-chip computing units; the on-chip storage medium or the slice
  • the external storage medium is read and written from the outside one or more times, and the on-chip storage medium is read and written from the inside one or more times.
  • FIG. 4 is a flow chart of a specific embodiment of the on-chip data partitioning and reading method of the present invention, which can be implemented by the on-chip data partitioning and reading system 100 of the present invention as shown in FIG.
  • the method for dividing and reading data on the chip includes:
  • Step S701 the data dividing step, storing the on-chip data in different areas according to the data dividing strategy, and storing the on-chip storage medium and the off-chip storage medium respectively;
  • Step S702 a pre-operation step of performing an operation process on the on-chip address index of the on-chip stored data in advance when performing data splicing;
  • Step S703 the data splicing step, splicing the on-chip storage data and the off-chip input data according to the data splicing strategy to obtain the original data representation.
  • the data is divided by the data dividing module 10, the pre-operation module 20 and the data splicing module 30, respectively, and the original data is subjected to lossless recovery on-chip.
  • the on-chip data partitioning and reading method of the present invention needs to implement storage management.
  • the splicing process requires the support of the storage module 40.
  • the data dividing and reading method further includes:
  • the data to be stored is written into the corresponding storage location according to the write address
  • the read and write are supported by the address index interface 41, the data read interface 42, and the data write interface 43, respectively, and cooperate with the on-chip off-chip data path and the on-chip data path to implement data communication inside and outside the module, and the independent read/write interface can be Realize simultaneous reading and writing.
  • the on-chip data is indexed according to the on-chip address, and the on-chip address index may be subjected to a certain operation of the pre-operation module 30 (such as address offset calculation), and the intra-chip memory is retrieved to obtain the on-chip storage data, and the external input is input to the on-chip data. After the splicing operation, the final complete data is obtained.
  • the on-chip data partitioning method includes:
  • Step S801 the address space is divided into an off-chip data space and an on-chip data space;
  • Step S802 performing data replacement between the on-chip storage medium and an off-chip storage medium according to a data replacement policy;
  • the data replacement policy includes sequential replacement, reverse-order replacement, and random replacement;
  • the data division strategy includes fixed-point number division, Floating point number division;
  • Step S803 the operation processing the on-chip storage data
  • Step S804 the operation processing external input data processing, the external input data comprising the off-chip input data and the data directly read by the read/write port.
  • Step S805 the form of on-chip off-chip data transmission is converted from the original data representation to all or part of the data index, and the result of the data index on all or part of the slice is spliced to obtain the original data representation.
  • FIGS. 4 to 6 the physical design framework diagram of one embodiment shown in FIGS. 4 to 6 will be described below.
  • the data stored on the on-chip of the accelerator is very limited, and all the data needs to be divided into data blocks that can be stored on the chip, through large off-chip large storage media (ie, off-chip storage media) and small on-chip.
  • the data interaction on the storage medium ie, the on-chip storage medium
  • the on-chip storage medium reads or writes the required data blocks, and is differentiated in the data block size, and thus is divided and stored in different areas, and the off-chip storage is added according to the capacity requirement. medium.
  • the on-chip data address is provided to the on-chip computing unit as needed by the on-chip address index. As shown in FIG. 6, the index is obtained by the on-chip address index interface 41 and the data corresponding to the index is obtained.
  • indexing process The 8-bit address indexes 256 storage locations to obtain 32-bit data, and is not limited to the illustrated address index bit width and on-chip data storage bit width.
  • the implementation of the process also relies on the on-chip storage medium, off-chip storage medium, on-chip off-chip data path, and inter-chip data path communication.
  • the data stored in the chip is illustrated as a 32-bit bit width, and is processed by the on-chip data processing sub-module 31, which is illustrated as a 32-bit bit width.
  • the on-chip data processing sub-module 31 is not limited to the addressing operation, but also includes other operations such as arithmetic calculation.
  • the off-chip input data shown as a 32-bit bit width, is processed by the off-chip data processing sub-module 32, which is shown as a 32-bit bit width.
  • the processed on-chip storage data and the off-chip input data are spliced together, and the illustration is 64-bit wide, and is sent to subsequent module processing, such as an on-chip computing unit.
  • the processed on-chip storage data and off-chip input data are not limited.
  • the illustrated bit width, the data block is not limited to a specific data bit width, data processing is not limited to a specific operation, but may involve complex operations, not only simple splicing, but also other operations.
  • the data splicing step is performed by an off-chip off-chip data path or an on-chip data path
  • the on-chip off-chip data path includes a PCI, PCIE, and HT interconnect technology to implement data flow between the internal and the off-chip.
  • the on-chip data path includes FAT-TREE and H-TREE interconnection technologies
  • the on-chip off-chip data connection mode includes a multi-chip interconnection structure, such as an on-chip network.
  • the data in the on-chip storage medium or the off-chip storage medium may be read and written one or more times, and the data may be read to one or more on-chip computing units; the on-chip storage medium or medium
  • the off-chip storage medium can be read and written by one or more externally, and the medium can be read and written internally or one or more times.
  • the invention provides an on-chip reading and writing device, comprising the on-chip data dividing and reading system 100, wherein the on-chip reading and writing device comprises an on-chip storage medium, an off-chip storage medium, an on-chip off-chip data path and an on-chip data path.
  • the on-chip read/write device further includes a static random access memory (SRAM), a dynamic random access memory (DRAM), and an enhanced dynamic random access memory (Enhanced Dynamic Random Access Memory, eDRAM), register file (RF) and other common storage media, can also be a new type of storage device, such as non-volatile memory (Non-Volatile Memory, NVM) or 3D storage devices.
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • eDRAM enhanced dynamic random access memory
  • RF register file
  • other common storage media can also be a new type of storage device, such as non-volatile memory (Non-Volatile Memory, NVM) or 3D storage devices.
  • the invention converts the data representation into an index, can efficiently perform repeated addressing in the on-chip address space, and can also perform off-chip address addressing; devices for repeated addressing on-chip in a heterogeneous environment and usage strategies thereof, Unlike the direct acceleration of the data cache itself, hardware support needs to include on-chip storage media, off-chip storage media, address indexing devices, on-chip off-chip data paths, and on-chip data paths.
  • the present invention is directed to strategies, apparatus, and methods for different data partitioning. According to different partitioning strategies, data is divided into different parts, and devices in the present invention support devices of different partitioning strategies.
  • the device of the present invention and related methods of use can effectively provide data reusability and flexible addressing requirements, and effectively reduce the memory bandwidth requirement, and can be applied to different scenarios, and is not limited to Machine learning accelerator.
  • the invention can also reduce the on-chip cache overhead by reasonably scheduling data, thereby providing more efficient accelerator design support.
  • the device can effectively provide data reusability and flexible addressing requirements through the device and related usage methods, and effectively reduce the memory bandwidth requirement, and can be applied to different scenarios, and is not limited to the machine learning accelerator.
  • the invention can also reduce the on-chip cache overhead by reasonably scheduling data, thereby providing more efficient accelerator design support.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System (AREA)

Abstract

一种片上数据划分读写方法,其特征在于,包括:数据划分步骤,根据数据划分策略将片上数据存储在不同区域,分别存储在片内存储介质和片外存储介质(S701);预先操作步骤,在进行数据拼接时预先对片内存储数据的片内地址索引进行操作处理(S702);数据拼接步骤,根据数据拼接策略将所述片内存储数据和片外输入数据拼接得到原始数据表示(S703)。同时还提供相应的片上数据划分读写***(100)及其装置。借此,能够使重复数据高效地进行读写,从而降低访存带宽需求,同时提供良好的灵活性,从而降低片上存储开销。

Description

片上数据划分读写方法、***及其装置 技术领域
本发明涉及信息安全领域及集成电路领域,尤其涉及一种硬件安全设计,特别是涉及片上数据划分读写方法、***及其装置。
背景技术
随着电子设备的广泛使用,在大数据时代,越来越多的设备需要对于真实世界的实时输入进行越来越复杂的处理,如工业机器人、自动驾驶无人汽车以及移动设备等等。这些任务大多数偏向于机器学习领域,其中大部分运算为向量运算或者矩阵运算,具有极高的并行度。相较于传统通用的GPU/CPU加速方案,硬件ASIC加速器是目前最受欢迎的加速方案,一方面可以提供极高的并行度可以实现极高的性能,另外一方面具有极高的能效性。
然而这其中带宽成为限制加速器性能的一大瓶颈,常见的解决方案是他通过放置在片上的缓存来平衡带宽的不均衡性。这些常见的解决方案并没有对数据读写进行优化,从而不能很好的利用数据的特性使得片上存储开销过大,数据读写开销过大。对于目前常见的机器学习类算法,其数据大多具有重用性,也即同样的数据会被多次使用,从而数据具有相同部分,如神经网络中的权值。
综上可知,现有技术在实际使用上显然存在不便与缺陷,所以有必要加以改进。
发明公开
针对上述的缺陷,本发明的目的在于提供一种片上数据划分读写***及其实现方法,目的在于针对重复数据高效地进行读写,从而降低访存带宽需求,同时提供良好的灵活性,从而降低片上存储开销。
为了实现上述目的,本发明提供一种片上数据划分读写方法,包括:
数据划分步骤,根据数据划分策略将片上数据存储在不同区域,分别存储在片内存储介质和片外存储介质;
预先操作步骤,在进行数据拼接时预先对片内存储数据的片内地址索引进 行操作处理;
数据拼接步骤,根据数据拼接策略将所述片内存储数据和片外输入数据拼接得到原始数据表示。
根据本发明所述片上数据划分读写方法,还包括:
数据存储步骤,搬运所述片内存储介质的所述片内存储数据和来自所述片外存储介质的所述片外输入数据;
所述存储步骤中读写端口分离,数据的读出和写入相互独立;
所述预先操作步骤还包括:
运算处理所述片内存储数据;
运算处理外部输入数据处理;
所述外部输入数据包括所述片外输入数据、所述读写端口直接读入的数据。
根据本发明所述片上数据划分读写方法,所述数据存储步骤还包括:
根据片内地址索引来索引所述片内存储数据;
已索引到所述片内存储数据的输出出口;
将要存储的数据根据写入地址写入相应存储位置。
根据本发明所述片上数据划分读写方法,所述数据划分步骤还包括:
地址空间划分成为片外数据空间和片内数据空间;
根据数据替换策略在所述片内存储介质和片外存储介质之间进行数据替换;所述数据替换策略包括顺序替换、逆序替换以及随机替换;
所述数据划分策略包括定点数划分、浮点数划分;
所述数据拼接步骤还包括:
片内片外数据传输的形式从所述原始数据表示转为全部或者部分的数据索引,拼接全部或者部分的片上的所述数据索引的结果获得所述原始数据表示;
所述数据拼接步骤通过片内片外数据通路或片内数据通路进行,所述片内片外数据通路包括PCI、PCIE、HT互联技术,所述片内数据通路包括FAT-TREE、H-TREE互联技术,片内片外数据连接方式包括多芯片互联结构;
所述片内存储介质或所述片外存储介质中的所述数据可以被一次或者多次读写,所述数据可以被读至一个或者多个片上运算单元;所述片内存储介质 或所述片外存储介质可以被一次或者多从外部进行读写,介质可以被一次或者多次从内部读写。
本发明提供一种片上数据划分读写***,包括:
数据划分模块,用于根据数据划分策略将片内存储数据划分在不同区域,分别存储在片内存储介质和片外存储介质;
预先操作模块,用于在进行数据拼接时预先对片内存储数据的片内地址索引进行操作处理;
数据拼接模块,用于根据数据拼接策略将片内存储数据和片外输入数据拼接得到所述原始数据表示。
根据本发明所述片上数据划分读写***,还包括:
存储模块,用于搬运所述片内存储介质的所述片内存储数据和来自所述片外存储介质的所述片外输入数据;
所述存储模块采用读写端口分离,数据的读出和写入相互独立;
所述预先处理模块还包括:
片上处理子模块,用于运算处理所述片内存储数据;
片外处理子模块,用于运算处理外部输入数据处理;
所述外部输入数据包括所述片外输入数据、所述读写端口直接读入的数据。
根据本发明所述片上数据划分读写***,所述存储模块还包括:
地址索引接口,用于根据片内地址索引来索引所述片内存储数据;
数据读出接口,用于已索引到的所述片内存储数据的输出出口;
数据写入接口,用于将要存储的数据根据写入地址写入相应存储位置。
根据本发明所述片上数据划分读写***,还包括:
所述数据划分模块还包括:
地址划分子模块,用于地址空间划分成为片外数据空间和片内数据空间;
数据替换子模块,用于根据数据替换策略在所述片内存储介质和片外存储介质之间进行数据替换;所述数据替换策略包括顺序替换、逆序替换以及随机替换;
所述数据划分策略包括定点数划分、浮点数划分;所述数据划分模块基于芯片中的一个或多个片上计算单元实现,所述片上计算单元发起读写请求并处 理拼接得到的原始数据;
所述数据拼接模块还包括:
索引拼接子模块,用于片内片外数据传输的形式从原始数据表示转为全部或者部分的数据索引,拼接全部或者部分的片上的所述数据索引的结果获得所述原始数据表示;
所述数据拼接模块读写通过片内片外数据通路或片内数据通路进行,所述片内片外数据通路包括PCI、PCIE、HT互联技术,所述片内数据通路包括FAT-TREE、H-TREE互联技术,片内片外数据连接方式包括多芯片互联结构;
所述片内存储介质或所述片外存储介质中的所述数据被一次或者多次读写,所述数据被读至一个或者多个片上运算单元;所述片内存储介质或所述片外存储介质被一次或者多从外部进行读写,所述片内存储介质被一次或者多次从内部读写。
本发明提供一种片上读写装置,包括根据上述任一项所述片上数据划分读写***。
根据本发明所述片上读写装置,所述片上读写装置包括静态随机存储器、动态随机存储器、增强动态随机存取存储器、寄存器堆以及非易失存储器或者3D存储器件。
附图简要说明
图1是本发明所述片上数据划分读写***的结构示意图;
图2是本发明优选实施例的所述片上数据划分读写***的结构示意图;
图3A是本发明所述片上数据划分策略的实现示意图之一;
图3B是本发明所述片上数据划分策略的实现示意图之二;
图4是本发明根据本发明所述片上数据划分读写***的片上数据索引实施例示意图;
图5是本发明根据本发明所述片上数据划分读写方法的物理框架示意图;
图6是本发明根据本发明所述片上数据划分读写方法一个实施例数据拼接操作的物理设计框架图;
图7是本发明中所述片上数据划分读写方法流程示意图;
图8是本发明中所述片上数据划分读写方法一个具体实施例流程示意图。
实现本发明的最佳方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
现有异构平台,加速器的片上能够存储的数据十分有限,需要将所有的数据划分成为大小可以存储在片上的数据块,通过片外大存储介质和片内小存储介质上的数据交互将所需数据块读入或者写出。
为了实现上述目的,图1示出了本发明提供一种片上数据划分读写***100,包括:
数据划分模块10,用于根据数据划分策略将片内存储数据划分在不同区域,分别存储在片内存储介质和片外存储介质;
预先操作模块20,用于在进行数据拼接时预先对片内存储数据的片内地址索引进行操作处理;
数据拼接模块30,用于根据数据拼接策略将片内存储数据和片外输入数据拼接得到所述原始数据表示。
对于异构平台来说,加速器的片上能够存储的数据十分有限,需要将所有的数据划分成为大小可以存储在片上的数据块,通过片外大存储介质和片内小存储介质上的数据交互将所需数据块读入或者写出。其间,片内数据地址通过片内地址索引按需提供给片上计算单元,物理框架如图5所示;图2和图3所示的实施例划分只为本发明所涉及的典型情况,本发明并不局限于特定的数据划分,极端情况如数据全部被在片上,或者数据全部被划分在片外,也在本发明的实现范围之内。
进一步地,本发明所述片上数据划分读写***100,还包括:
存储模块40,用于存储搬运所述片内存储介质的所述片内存储数据和来自所述片外存储介质的所述片外输入数据;
所述存储模块40采用读写端口分离,数据的读出和写入相互独立;
所述预先处理模块20还包括:
片上处理子模块21,用于运算处理所述片内存储数据;
片外处理子模块22,用于运算处理外部输入数据处理,所述外部输入数 据包括所述片外输入数据、所述读写端口直接读入的数据。
进一步地,存储模块40还包括:
地址索引接口41,用于根据片内地址索引来索引所述片内存储数据;
数据读出接口42,用于已索引到所述片内存储数据的输出出口;
数据写入接口43,用于将要存储的数据根据写入地址写入相应存储位置。
所述片上数据划分读写***100,优选的是数据划分模块10还包括:
地址划分子模块11,用于地址空间划分成为片外数据空间和片内数据空间;
数据替换子模块12,用于根据数据替换策略在所述片内存储介质和片外存储介质之间进行数据替换;所述数据替换策略包括顺序替换、逆序替换以及随机替换;
所述数据划分策略包括定点数划分、浮点数划分;作为典型,如图3A所示即为一个定点数实施例的数据划分,这种划分将定点数据换分成为整数部分和小数部分,图3B所示一个浮点数实施例的数据划分。这种划分将浮点数划分成为指数部分和小数部分。图3A和图3B所示的实施例划分只为本发明所涉及的典型情况,本发明并不局限于特定的数据划分,极端情况,如数据全部被在片上,或者数据全部被划分在片外,片上的缓存结构包括对输入数据的缓存,也在本发明的设计范围之内,地址划分子模块11将索引的地址空间划分对应到片外数据空间和片内数据空间,有需要的时候通过数据替换子模块12进行交换,将需要加速数据处理的转移到片内。数据划分模块10基于芯片中的一个或多个片上计算单元实现,所述片上计算单元发起读写请求并处理拼接得到的原始数据。
所述数据拼接模块30还包括:
索引拼接子模块31,用于片内片外数据传输的形式从原始数据表示转为全部或者部分的数据索引,拼接全部或者部分的片上的所述数据索引的结果获得所述原始数据表示;
所述数据拼接模块30读写通过片内片外数据通路或片内数据通路进行,所述片内片外数据通路包括PCI(Peripheral Component Interconnect,外部控制器接口)、PCIE(总线和接口标准,Peripheral Component Interface Express)、HT互联技术(Hyper Transport,超传输,是一种全新的具有可升级性的新型、 高速、高性能的端到端集成电路互联总线技术),所述片内数据通路包括FAT-TREE、H-TREE互联技术(hierarchy tree,层次树),片内片外数据连接方式包括多芯片互联结构;图1所示的片内片外数据连接并不局限于PCIE总线连接,也包涵多芯片互联结构如片上网络。图1所示的片上计算单元与片内存储介质的数据通路不局限于H-TREE,或者FAT-TREE等互联技术,通过片内片外数据通路可以在片外寻址,从而所述片上数据划分读写***100可以对准确无误地将各种需要拼接的数据还原成原始数据,可以有效的支持不同的数据划分策略,从而减少片内片外数据交换。
所述片内存储介质或所述片外存储介质中的所述数据被一次或者多次读写,所述数据被读至一个或者多个片上运算单元;所述片内存储介质或所述片外存储介质被一次或者多从外部进行读写,所述片内存储介质被一次或者多次从内部读写。
图4是本发明所述片上数据划分读写方法的一个具体实施例的流程图,其可通过如图1~2所示的本发明所述片上数据划分读写***100实现,如图7,所述片上数据划分读写方法包括:
步骤S701,数据划分步骤,根据数据划分策略将片上数据存储在不同区域,分别存储在片内存储介质和片外存储介质;
步骤S702,预先操作步骤,在进行数据拼接时预先对片内存储数据的片内地址索引进行操作处理;
步骤S703,数据拼接步骤,根据数据拼接策略将所述片内存储数据和片外输入数据拼接得到原始数据表示。
分别通过数据划分模块10、预先操作模块20和数据拼接模块30实现,将原始数据在片内进行无损恢复。
其中优选的,本发明所述片上数据划分读写方法需要实现对于存储的管理,实现拼接过程需要存储模块40的支持,所述数据划分读写方法还包括:
数据存储步骤,存储搬运所述片内存储介质的所述片内存储数据和来自所述片外存储介质的所述片外输入数据;所述存储步骤中读写端口分离,数据的读出和写入相互独立;具体地,所述数据存储步骤还包括:
第一、根据片内地址索引来索引所述片内存储数据;
第二、将已索引到数据的输出出口;
第三、将要存储的数据根据写入地址写入相应存储位置;
读写时分别由地址索引接口41、数据读出接口42、数据写入接口43提供支持,与片内片外数据通路和片内数据通路配合实现模块内外的数据通信,独立的读写接口可以实现同时读写。片上数据根据片内地址索引,该片内地址索引有可能经过预先操作模块30一定的操作(如地址偏移计算),检索片内存储得到片内存储数据,结合外部输入至片内的数据,经过拼接操作,得到最后的完整数据。
在一个具体实施例中,优选的本发明所述片上数据划分读写方法的一个优选实施例的流程图,如图8所示,所述片上数据划分读写方法步骤包括:
步骤S801,地址空间划分成为片外数据空间和片内数据空间;
步骤S802,根据数据替换策略在所述片内存储介质和片外存储介质之间进行数据替换;所述数据替换策略包括顺序替换、逆序替换以及随机替换;所述数据划分策略包括定点数划分、浮点数划分;
步骤S803,运算处理所述片内存储数据;
步骤S804,运算处理外部输入数据处理,所述外部输入数据包括所述片外输入数据、所述读写端口直接读入的数据。
步骤S805,片内片外数据传输的形式从所述原始数据表示转为全部或者部分的数据索引,拼接全部或者部分的片上的所述数据索引的结果获得所述原始数据表示。
经过处理过后的片内存储数据和片外输入数据拼接在一起,然后才能交由后续的模块进行原始数据的处理,实现加速器的功能。
进一步地,为便于理解,下面以图4~图6所示的一个具体实施例的物理设计框架图进行说明。
对于异构平台来说,加速器的片上能够存储的数据十分有限,需要将所有的数据划分成为大小可以存储在片上的数据块,通过片外大存储介质(即片外存储介质)和片内小存储介质(即片内存储介质)上的数据交互将所需数据块读入或者写出,在数据块大小上有区分,因而划分并存储在不同区域,根据容量需求不同增设所述片外存储介质。其间,片内数据地址通过片内地址索引按需提供给片上计算单元,如图6通过片内地址索引接口41获取索引以及得到索引对应的数据,图4所示即为一个实施例的片上数据索引过程,装置根据 8-bit地址索引256个存储位置,得到32-bit的数据,并不局限于图示的地址索引位宽和片上数据存储位宽。流程的实现在硬件上还依赖于片内存储介质、片外存储介质、片内片外数据通路以及片内数据通路之间的相互通信。
如图6所示即为一个实施例的数据拼接操作过程,片内存储数据,图示为32bit位宽,经过片上数据处理子模块31处理,图示为32bit位宽。片上数据处理子模块31并不局限于寻址操作,也包括其他运算,如算术计算。片外输入数据,图示为32bit位宽,经过片外数据处理子模块32处理,图示为32bit位宽。处理过后的片内存储数据和片外输入数据拼接在一起,图示为64bit位宽,输送给后续模块处理,如片上计算单元,经过处理的片内存储数据和片外输入数据并不局限于图示的位宽,数据块并不局限于特定的数据位宽,数据处理并不局限于特定的操作,而可能包涵复杂的操作,不仅是简单的拼接,而包涵其他操作处理。
具体地,所述数据拼接步骤通过片内片外数据通路或片内数据通路进行,尤其所述片内片外数据通路包括PCI、PCIE、HT互联技术,实现内部与片外之间的数据流,所述片内数据通路包括FAT-TREE、H-TREE互联技术,片内片外数据连接方式包括多芯片互联结构,如片上网络。
所述片内存储介质或所述片外存储介质中的所述数据可以被一次或者多次读写,所述数据可以被读至一个或者多个片上运算单元;所述片内存储介质或所述片外存储介质可以被一次或者多从外部进行读写,介质可以被一次或者多次从内部读写。
本发明提供一种片上读写装置,包括所述片上数据划分读写***100,所述片上读写装置包括片内存储介质、片外存储介质、片内片外数据通路和片内数据通路,所述片上读写装置优选的是,还包括了静态随机存储器(Static Random Access Memory,SRAM),动态随机存储器(Dynamic Random Access Memory,DRAM),增强动态随机存取存储器(Enhanced Dynamic Random Access Memory,eDRAM),寄存器堆(Registerfile,RF)等常见存储介质,也可以是新型的存储器件,如非易失存储器(Non-Volatile Memory,NVM)或者3D存储器件等等。
本发明将数据表示转换到索引,可以高效的进行片上地址空间内的重复寻址,也可以进行片外地址寻址;异构环境下片上重复寻址的装置及其使用策略, 不同于直接对数据本身缓存进行加速,硬件支持需要包含片内存储介质,片外存储介质,地址索引器件,片内片外数据通路,片内数据通路。
最后,本发明旨在用于不同的数据划分的策略、装置和方法,根据不同的划分策略,数据被划分成为不同的部分,本发明中的装置支持不同划分策略的装置。
综上所述,本发明的装置及其相关使用方法可以有效的提供数据的复用性和其灵活寻址的需求,有效的降低访存带宽需求,能够适用于不同场景,并不仅仅局限于机器学习类加速器。本发明同时可以通过合理调度数据,缩减片上缓存开销,从而可以提供更加高效的加速器设计支持。
当然,本发明还可有其它多种实施例,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。
工业应用性
本发明通过装置及其相关使用方法可以有效的提供数据的复用性和其灵活寻址的需求,有效的降低访存带宽需求,能够适用于不同场景,并不仅仅局限于机器学习类加速器。本发明同时可以通过合理调度数据,缩减片上缓存开销,从而可以提供更加高效的加速器设计支持。

Claims (10)

  1. 一种片上数据划分读写方法,其特征在于,包括:
    数据划分步骤,根据数据划分策略将片上数据存储在不同区域,分别存储在片内存储介质和片外存储介质;
    预先操作步骤,在进行数据拼接时预先对片内存储数据的片内地址索引进行操作处理;
    数据拼接步骤,根据数据拼接策略将所述片内存储数据和片外输入数据拼接得到原始数据表示。
  2. 根据权利要求1所述片上数据划分读写方法,其特征在于,还包括:
    数据存储步骤,存储搬运所述片内存储介质的所述片内存储数据和来自所述片外存储介质的所述片外输入数据;
    所述存储步骤中读写端口分离,数据的读出和写入相互独立;
    所述预先操作步骤还包括:
    运算处理所述片内存储数据;
    运算处理外部输入数据处理;
    所述外部输入数据包括所述片外输入数据、所述读写端口直接读入的数据。
  3. 根据权利要求2所述片上数据划分读写方法,其特征在于,所述数据存储步骤还包括:
    根据片内地址索引来索引所述片内存储数据;
    已索引到的数据的输出出口;
    将要存储的数据根据写入地址写入相应存储位置。
  4. 根据权利要求1所述片上数据划分读写方法,其特征在于,所述数据划分步骤还包括:
    地址空间划分成为片外数据空间和片内数据空间;
    根据数据替换策略在所述片内存储介质和片外存储介质之间进行数据替换;所述数据替换策略包括顺序替换、逆序替换以及随机替换;
    所述数据划分策略包括定点数划分、浮点数划分;
    所述数据拼接步骤还包括:
    片内片外数据传输的形式从所述原始数据表示转为全部或者部分的数据索引,拼接全部或者部分的片上的所述数据索引的结果获得所述原始数据表示;
    所述数据拼接步骤通过片内片外数据通路或片内数据通路进行,所述片内片外数据通路包括PCI、PCIE、HT互联技术,所述片内数据通路包括FAT-TREE、H-TREE互联技术,片内片外数据连接方式包括多芯片互联结构;
    所述片内存储介质或所述片外存储介质中的数据可以被一次或者多次读写,所述数据可以被读至一个或者多个片上运算单元;所述片内存储介质或所述片外存储介质可以被一次或者多从外部进行读写,介质可以被一次或者多次从内部读写。
  5. 一种片上数据划分读写***,其特征在于,包括:
    数据划分模块,用于根据数据划分策略将片内存储数据划分在不同区域,分别存储在片内存储介质和片外存储介质;
    预先操作模块,用于在进行数据拼接时预先对片内存储数据的片内地址索引进行操作处理;
    数据拼接模块,用于根据数据拼接策略将片内存储数据和片外输入数据拼接得到所述原始数据表示。
  6. 根据权利要求5所述片上数据划分读写***,其特征在于,还包括:
    存储模块,用于存储搬运所述片内存储介质的所述片内存储数据和来自所述片外存储介质的所述片外输入数据;
    所述存储模块采用读写端口分离,数据的读出和写入相互独立;
    所述预先处理模块还包括:
    片上处理子模块,用于运算处理所述片内存储数据;
    片外处理子模块,用于运算处理外部输入数据处理;
    所述外部输入数据包括所述片外输入数据、所述读写端口直接读入的数据。
  7. 根据权利要求6所述片上数据划分读写***,其特征在于,所述存储模块还包括:
    地址索引接口,用于根据片内地址索引来索引所述片内存储数据;
    数据读出接口,用于已索引到的所述片内存储数据的输出出口;
    数据写入接口,用于将要存储的数据根据写入地址写入相应存储位置。
  8. 根据权利要求5所述片上数据划分读写***,其特征在于,所述数据划分模块还包括:
    地址划分子模块,用于地址空间划分成为片外数据空间和片内数据空间;
    数据替换子模块,用于根据数据替换策略在所述片内存储介质和片外存储介质之间进行数据替换;所述数据替换策略包括顺序替换、逆序替换以及随机替换;
    所述数据划分策略包括定点数划分、浮点数划分;所述数据划分模块基于芯片中的一个或多个片上计算单元实现,所述片上计算单元发起读写请求并处理拼接得到的原始数据;
    所述数据拼接模块还包括:
    索引拼接子模块,用于片内片外数据传输的形式从原始数据表示转为全部或者部分的数据索引,拼接全部或者部分的片上的所述数据索引的结果获得所述原始数据表示;
    所述数据拼接模块读写通过片内片外数据通路或片内数据通路进行,所述片内片外数据通路包括PCI、PCIE、HT互联技术,所述片内数据通路包括FAT-TREE、H-TREE互联技术,片内片外数据连接方式包括多芯片互联结构;
    所述片内存储介质或所述片外存储介质中的数据被一次或者多次读写,所述数据被读至一个或者多个片上运算单元;所述片内存储介质或所述片外存储介质被一次或者多从外部进行读写,所述片内存储介质被一次或者多次从内部读写。
  9. 一种片上读写装置,其特征在于,包括根据权利要求5~8任一项所述片上数据划分读写***。
  10. 根据权利要求9所述片上读写装置,其特征在于,所述片上读写装置包括静态随机存储器、动态随机存储器、增强动态随机存取存储器、寄存器堆以及非易失存储器或者3D存储器件。
PCT/CN2016/094168 2016-04-06 2016-08-09 片上数据划分读写方法、***及其装置 WO2017173755A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/071,458 US10496597B2 (en) 2016-04-06 2016-08-09 On-chip data partitioning read-write method, system, and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610210082.1A CN105843775B (zh) 2016-04-06 2016-04-06 片上数据划分读写方法、***及其装置
CN201610210082.1 2016-04-06

Publications (1)

Publication Number Publication Date
WO2017173755A1 true WO2017173755A1 (zh) 2017-10-12

Family

ID=56596831

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/094168 WO2017173755A1 (zh) 2016-04-06 2016-08-09 片上数据划分读写方法、***及其装置

Country Status (3)

Country Link
US (1) US10496597B2 (zh)
CN (1) CN105843775B (zh)
WO (1) WO2017173755A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979041A (zh) * 2022-05-18 2022-08-30 芯河半导体科技(无锡)有限公司 一种提升片上缓存利用效率的拼包方法

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11095556B2 (en) * 2017-06-30 2021-08-17 Intel Corporation Techniques to support multiple protocols between computer system interconnects
CN107807819B (zh) * 2017-07-20 2021-06-25 上海寒武纪信息科技有限公司 一种支持离散数据表示的用于执行人工神经网络正向运算的装置及方法
CN111091189B (zh) * 2017-12-14 2023-08-29 中科寒武纪科技股份有限公司 集成电路芯片装置及相关产品
CN109978155A (zh) * 2017-12-28 2019-07-05 北京中科寒武纪科技有限公司 集成电路芯片装置及相关产品
CN109992542B (zh) * 2017-12-29 2021-11-30 深圳云天励飞技术有限公司 一种数据搬运方法、相关产品及计算机存储介质
CN110018784B (zh) * 2018-01-09 2023-01-10 阿里巴巴集团控股有限公司 数据处理方法、装置及计算设备
CN110321296A (zh) * 2018-03-31 2019-10-11 深圳忆联信息***有限公司 数据写入方法及固态硬盘
CN111258653B (zh) * 2018-11-30 2022-05-24 上海寒武纪信息科技有限公司 原子访存方法、存储介质、计算机设备、装置和***
US11436165B2 (en) * 2019-05-01 2022-09-06 Samsung Electronics Co., Ltd. High bandwidth memory system
CN112446497B (zh) * 2019-09-02 2024-02-27 中科寒武纪科技股份有限公司 数据块拼接方法、相关设备及计算机可读介质
CN112540936A (zh) * 2019-09-23 2021-03-23 无锡江南计算技术研究所 面向异构众核架构的离散访存读写方法
KR20210072524A (ko) 2019-12-09 2021-06-17 삼성전자주식회사 뉴럴 네트워크 장치 및 그 동작 방법
CN111045963A (zh) * 2019-12-15 2020-04-21 苏州浪潮智能科技有限公司 一种高位宽总线读写的方法及装置
US11442643B2 (en) * 2020-02-13 2022-09-13 Samsung Electronics Co., Ltd. System and method for efficiently converting low-locality data into high-locality data
CN114996205B (zh) * 2022-07-21 2022-12-06 之江实验室 辅助3d架构近存计算***的片内数据调度控制器及方法
CN115130675B (zh) * 2022-09-02 2023-01-24 之江实验室 一种量子随机电路的多振幅模拟方法和装置
CN117369733B (zh) * 2023-12-07 2024-02-23 上海励驰半导体有限公司 集成电路、数据处理***和车辆
CN118070865B (zh) * 2024-04-25 2024-07-23 北京壁仞科技开发有限公司 人工智能模型的优化方法及装置、电子设备与存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101034384A (zh) * 2007-04-26 2007-09-12 北京中星微电子有限公司 一种能同时进行读写操作的dma控制器及传输方法
CN101452422A (zh) * 2007-11-29 2009-06-10 大唐移动通信设备有限公司 一种芯片的数据读写方法、相应装置和***
US20100064110A1 (en) * 2006-12-14 2010-03-11 Joern Boettcher Method for reading out data from a storage medium
CN102025634A (zh) * 2010-12-16 2011-04-20 中兴通讯股份有限公司 数据包缓存管理方法和设备
CN104035903A (zh) * 2014-07-02 2014-09-10 东南大学 一种基于可重构技术的二维数据访问动态自适应方法
CN104699630A (zh) * 2015-03-16 2015-06-10 清华大学 共享片上缓存划分装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8595437B1 (en) * 2008-11-21 2013-11-26 Nvidia Corporation Compression status bit cache with deterministic isochronous latency
CN104346285B (zh) * 2013-08-06 2018-05-11 华为技术有限公司 内存访问处理方法、装置及***
US20160218739A1 (en) * 2014-02-17 2016-07-28 Mediatek Inc. Data access methods and data access devices utilizing the same

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100064110A1 (en) * 2006-12-14 2010-03-11 Joern Boettcher Method for reading out data from a storage medium
CN101034384A (zh) * 2007-04-26 2007-09-12 北京中星微电子有限公司 一种能同时进行读写操作的dma控制器及传输方法
CN101452422A (zh) * 2007-11-29 2009-06-10 大唐移动通信设备有限公司 一种芯片的数据读写方法、相应装置和***
CN102025634A (zh) * 2010-12-16 2011-04-20 中兴通讯股份有限公司 数据包缓存管理方法和设备
CN104035903A (zh) * 2014-07-02 2014-09-10 东南大学 一种基于可重构技术的二维数据访问动态自适应方法
CN104699630A (zh) * 2015-03-16 2015-06-10 清华大学 共享片上缓存划分装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979041A (zh) * 2022-05-18 2022-08-30 芯河半导体科技(无锡)有限公司 一种提升片上缓存利用效率的拼包方法
CN114979041B (zh) * 2022-05-18 2024-03-08 芯河半导体科技(无锡)有限公司 一种提升片上缓存利用效率的拼包方法

Also Published As

Publication number Publication date
CN105843775A (zh) 2016-08-10
US10496597B2 (en) 2019-12-03
CN105843775B (zh) 2018-12-04
US20190026246A1 (en) 2019-01-24

Similar Documents

Publication Publication Date Title
WO2017173755A1 (zh) 片上数据划分读写方法、***及其装置
CN110688157B (zh) 一种计算装置及计算方法
CN107203807B (zh) 神经网络加速器的片上缓存带宽均衡方法、***及其装置
US10990650B1 (en) Reducing computations for data including padding
US11580367B2 (en) Method and system for processing neural network
US20190026626A1 (en) Neural network accelerator and operation method thereof
US11294599B1 (en) Registers for restricted memory
CN112054963A (zh) 用于异构计算环境中的数据传输的网络接口
US20130219148A1 (en) Network on chip processor with multiple cores and routing method thereof
US10684946B2 (en) Method and device for on-chip repetitive addressing
WO2020073801A1 (zh) 一种3d图像处理中数据读写方法及***、存储介质及终端
TWI537980B (zh) 用於寫入經遮罩資料至緩衝器之裝置及方法
US11467973B1 (en) Fine-grained access memory controller
US20200293452A1 (en) Memory device and method including circular instruction memory queue
US20200356844A1 (en) Neural network processor for compressing featuremap data and computing system including the same
WO2020029181A1 (zh) 三维卷积神经网络计算装置及相关产品
US11868873B2 (en) Convolution operator system to perform concurrent convolution operations
CN116415100A (zh) 业务处理方法、装置、处理器及计算设备
Jain et al. Merge network for a non-von Neumann accumulate accelerator in a 3D chip
CN113077042A (zh) 卷积神经网络的数据重用与高效处理方法
CN111126586A (zh) 数据通信电路、电子设备和数据通信方法
TWI793676B (zh) 應用於類神經網路之填充架構
US20210117762A1 (en) Arithmetic processing device
US20230266879A1 (en) System, method and computer-readable storage medium for direct memory accesses
US20240257844A1 (en) Memory device including a filtering circuit and memory system including the memory device and filtering circuit

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16897688

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16897688

Country of ref document: EP

Kind code of ref document: A1