CN112799975A - Data caching device and method and memory - Google Patents

Data caching device and method and memory Download PDF

Info

Publication number
CN112799975A
CN112799975A CN201911106091.6A CN201911106091A CN112799975A CN 112799975 A CN112799975 A CN 112799975A CN 201911106091 A CN201911106091 A CN 201911106091A CN 112799975 A CN112799975 A CN 112799975A
Authority
CN
China
Prior art keywords
data
cache
level cache
level
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911106091.6A
Other languages
Chinese (zh)
Inventor
鲁国宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanechips Technology Co Ltd
Original Assignee
Sanechips Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanechips Technology Co Ltd filed Critical Sanechips Technology Co Ltd
Priority to CN201911106091.6A priority Critical patent/CN112799975A/en
Publication of CN112799975A publication Critical patent/CN112799975A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The present disclosure provides a data caching method, which includes: writing the characteristic diagram data and the weight data read from the dynamic random access memory DRAM into a second-level cache; and when the storage space of the first-level cache is judged to be free, writing the characteristic diagram data and the weight data in the second-level cache into the first-level cache for the reasoning module to read. The disclosure also provides a data caching device and a memory.

Description

Data caching device and method and memory
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a data caching device and method and a memory.
Background
Artificial intelligence is currently built on top of a multi-layered large-scale neural network, which is essentially a module that involves matrix multiplication and convolution operations. The realization method of the method is that a cost function is defined firstly, then the read feature graph and the weight data are transmitted to an operation network in batches, and the cost function value is derived according to the parameters, so that the whole neural network model is updated; this usually implies significant data reading requirements. For example, the artificial intelligence processing algorithm based on high definition video only needs to read 20 megabytes of data per layer of convolved feature map, which causes the DRAM (dynamic random access memory) bandwidth which is very limited to be more crowded, and meanwhile, the large data handling means the same amount of energy consumption. Therefore, an urgent need of the hardware acceleration apparatus for deep learning is to accelerate the data handling part, and accelerate the data transmission from the DRAM to the internal artificial intelligence operation core through the dedicated data reading and cache management module, so that the artificial intelligence operation core can work efficiently at full time to realize the improvement of computing power.
Disclosure of Invention
The embodiment of the disclosure provides a data caching device and method and a memory.
In a first aspect, an embodiment of the present disclosure provides a data caching method, which includes:
writing the characteristic diagram data and the weight data read from the dynamic random access memory DRAM into a second-level cache;
and when the storage space of the first-level cache is judged to be free, writing the characteristic diagram data and the weight data in the second-level cache into the first-level cache for the reasoning module to read.
In some embodiments, the level one cache comprises a profile data level one cache and a weight data level one cache; when the storage space of the first-level cache is judged to be free, the step of writing the feature map data and the weight data in the second-level cache into the first-level cache comprises the following steps:
when the storage space of the first-level cache of the feature map data is judged to be free, a first request is sent to the second-level cache, and the feature map data in the second-level cache is written into the first-level cache of the feature map data; and when the storage space of the primary weighted data cache is free, sending a second request to the secondary cache, and writing the weighted data in the secondary cache into the primary weighted data cache.
In some embodiments, the level two cache comprises a plurality of bank groups; each bank group including a plurality of banks; the step of writing the feature map data and the weight data read from the dynamic random access memory DRAM into the second level cache includes:
and storing the characteristic diagram data and the weight data read from the dynamic random access memory DRAM into a corresponding memory bank of the memory bank group according to a preset read-write sequence and a preset space address.
In some embodiments, the profile data level one cache comprises a plurality of cells; each cell comprises a plurality of rooms; when the storage space of the first-level cache of the feature map data is judged to be free, a first request is sent to the second-level cache, and the step of writing the feature map data in the second-level cache into the first-level cache comprises the following steps:
and when judging that idle cells exist in the primary storage of the feature map data, sending a first request to a secondary cache, and writing the feature map data in the secondary cache into a room of the idle cells of the primary cache.
In some embodiments, the weighted data first-level cache comprises a plurality of ring caches, and the number of the ring caches is the same as the number of the memory banks of the second-level cache; when the storage space of the first-level cache of the weight data is judged to be free, a second request is sent to the second-level cache, and the step of writing the weight data in the second-level cache into the first-level cache comprises the following steps:
and when the annular cache of the first-level cache of the weight data is judged to be idle, sending a second request to the second-level cache, and writing the weight data in the corresponding memory bank in the second-level cache into the annular cache.
In some embodiments, the step of writing the feature map data and the weight data read from the dynamic random access memory DRAM into the second level cache includes:
and writing the characteristic diagram data and the weight data read from the dynamic random access memory DRAM into a second-level cache according to the preconfigured access arbitration logic.
In a second aspect, an embodiment of the present disclosure provides a data caching apparatus, including:
the data reading control module is used for reading the characteristic diagram data and the weight data in the DRAM and writing the characteristic diagram data and the weight data into the secondary cache;
the second-level cache is used for writing the feature map data and the weight data into the first-level cache when the storage space of the first-level cache is free;
and the first-level cache is used for caching the characteristic diagram data and the weight data for the reasoning module so as to be read by the reasoning module.
In some embodiments, the level two cache comprises: a plurality of bank groups, each bank group including a plurality of banks; wherein the content of the first and second substances,
each memory bank group is used for storing the characteristic diagram data and the weight data in a corresponding memory bank of the memory bank group according to a preset read-write sequence and a preset space address.
In some embodiments, the level one cache comprises:
the first-level cache of the feature map data is used for storing the feature map data written in the second-level cache;
and the weight data primary cache is used for storing the weight data written in the secondary cache.
In some embodiments, the profile data level one cache includes a register unit.
In some embodiments, the profile data level one cache comprises a plurality of cells, each of the cells comprising a plurality of rooms; each of the rooms is for storing the feature map data.
In some embodiments, the weighted data first level cache comprises a plurality of ring caches; and each annular cache corresponds to a memory bank in the second-level cache.
In a third aspect, an embodiment of the present disclosure provides a memory, which includes the data caching apparatus described above, and a dynamic random access memory DRAM.
The embodiment of the disclosure has the following beneficial effects:
the data caching device provided by the embodiment of the disclosure comprises a data reading control module, a first-level cache and a second-level cache, and is particularly suitable for artificial intelligence core operation, namely, used for storing feature map data and weight data. The characteristic diagram data and the weight data in the DRAM are read through the first-level cache and the second-level cache, the effects of effectively reducing the DRAM reading bandwidth requirement and timely supplying data required by artificial intelligent core operation can be achieved, the DRAM access bandwidth and energy consumption are saved, and the data supply speed and the data utilization rate are improved.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:
FIG. 1 is a schematic diagram illustrating a characteristic diagram stored in a DRAM according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a data caching apparatus according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a second level cache according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of another second-level cache according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a feature map data level cache according to an embodiment of the present disclosure;
fig. 6 is a flowchart of a data caching method according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a memory according to an embodiment of the disclosure.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following describes in detail a data caching apparatus, a data caching method, and a data caching memory provided by the present invention with reference to the accompanying drawings.
Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but which may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Embodiments described herein may be described with reference to plan and/or cross-sectional views in light of idealized schematic illustrations of the disclosure. Accordingly, the example illustrations can be modified in accordance with manufacturing techniques and/or tolerances. Accordingly, the embodiments are not limited to the embodiments shown in the drawings, but include modifications of configurations formed based on a manufacturing process. Thus, the regions illustrated in the figures have schematic properties, and the shapes of the regions shown in the figures illustrate specific shapes of regions of elements, but are not intended to be limiting.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
It should be noted that the data caching apparatus 1, the memory, and the data caching method provided in the embodiments of the present disclosure are particularly suitable for modeling a neural network in artificial intelligence; that is, the method is particularly suitable for storing feature map data and weight data required by an artificial intelligence operation core.
The inference module in the embodiments of the present disclosure refers to an artificial intelligence operation core, and the inference module is also described as an artificial intelligence operation core in the following description of the embodiments.
The characteristic diagram data is stored in a DRAM (dynamic random access memory) before being written into the data buffer device 1, and in order to clarify how the characteristic diagram data is stored in the DRAM, a DRAM memory space plan will be described below.
Specifically, artificial intelligence is largely used for matrix operation, so in order to improve reading efficiency, the feature map data required by the matrix operation cannot be simply stored linearly according to the spatial position of each element in the feature map, as shown in fig. 1, in the embodiment of the present disclosure, the feature map is decomposed into a small matrix block (hereinafter referred to as blk) by combining the specification of the multiply-accumulate matrix in the artificial intelligence operation core, and addresses are continuously stored in the DRAM in a continuous order of input-channels (hereinafter referred to as ich), as shown in fig. 1. Wherein, the characteristic diagram is analyzed to obtain 32 characteristic diagram data of 0-31, each characteristic data is decomposed into 8 blks of 0-7, and each blkich is written into the corresponding position of the DRAM through the corresponding line (line0, line1. Thus, the read bus is more efficient when using burst transfer mode.
The data caching device 1, the memory, and the data caching method in the embodiments of the present disclosure are described in detail below.
In a first aspect, as shown in fig. 2, an embodiment of the present disclosure provides a data caching apparatus 1, including: a data reading control module 11, a first-level cache 13 and a second-level cache 12. The data reading module is used for reading the characteristic diagram data and the weight data in the DRAM and writing the characteristic diagram data and the weight data into the second-level cache 12; the second-level cache 12 is used for writing the feature map data and the weight data into the first-level cache 13 when the storage space of the first-level cache 13 is free; the first-level cache 13 is configured to cache the feature map data and the weight data for the artificial intelligence core operation, so that the inference module can read the feature map data and the weight data.
It should be noted that the storage space of the first-level cache 13 is much smaller than that of the second-level cache 12.
The data caching device 1 provided by the embodiment of the present disclosure includes a data reading control module 11, a first-level cache 13, and a second-level cache 12, and the data caching device 1 is particularly suitable for use in artificial intelligence core operations, that is, for storing feature map data and weight data. The characteristic diagram data and the weight data in the DRAM are read through the first-level cache 13 and the second-level cache 12, the effects of effectively reducing the DRAM reading bandwidth requirement and timely supplying data required by artificial intelligence core operation can be achieved, the DRAM access bandwidth and energy consumption are saved, and the data supply speed and the data utilization rate are improved.
In order to make the functions of the data read control module 11, the first level cache 13 and the second level cache 12 more clear in the embodiment of the present disclosure, the following respectively describes the three in detail.
A data read control module 11, which includes DRAM read bus control logic and read bus arbitration logic. The data reading control module 11 may be specifically configured to read out the feature map data and the weight data stored in the DRAM according to a predetermined rule, and transmit the feature map data and the weight data to the lower-level cache 12.
As shown in fig. 2, a level two cache 12, which has a data prefetch function. The secondary cache 12 is composed of an SRAM (static random access memory), and its main function is to read the required data from the DRAM to the secondary cache 12 in advance according to the data requirement of the artificial intelligence operation core, so that the artificial intelligence operation core can acquire the data in time to start the operation, thereby alleviating the DRAM delay effect.
For the division of the memory banks of the secondary storage, in the embodiment of the present disclosure, the secondary cache 12 is composed of a plurality of memory banks ("memory banks", hereinafter referred to as banks), each bank is composed of an independent SRAM, wherein each of the banks may be further packaged to form a memory bank group ("memory bank group", hereinafter referred to as bank-group "); therefore, the secondary storage is managed by being divided into a plurality of bank-groups (bank-group0, bank-group1, bank-group2, and. Each bank-group stores the contents of all ichs of a blk for the current profile data ("contents of all ichs of a blk", hereinafter referred to as blk-ai (blk all ich)).
Specifically, as shown in fig. 3, the secondary cache 12 is composed of 32 banks, each bank is composed of an independent SRAM, and each bank 8 is composed of a bank-group, and thus is divided into 4 banks-groups (bank-group0, bank-group1, bank-group2, and bank-group3) for management.
This is because all convolution kernels commonly used at present are matrices larger than 1 × 1, and thus when convolving blk boundaries, data of adjacent blks are required. For example: when the blk boundary in the bank-group1 is convolved, that is, the bank-group1 is used as the main data, the left and right adjacent data of the bank-group1, that is, the data are located in the bank-group0 and the bank-group 2. Each bank-group can release the space of the bank-group after being read as main data and adjacent data; after the bank-group space is released, the data prefetching logic can prefetch subsequent blk-ai data from the DRAM to the partial space so as to ensure the continuity of data transmission and ensure that the artificial intelligence operation core can always keep full operation.
Similarly, when the convolution kernel is equal to 1 × 1, since there is no need to read the data of the neighbor bank-group, it is only necessary to directly read the data of one bank-group, and such a reading mechanism can still be nested in the control logic described above without increasing extra consumption.
For the sharing of the second-level cache 12, the feature map data and the weight data are stored in a plurality of banks of the second-level cache 12, and the storage space allocation is similar to the traditional data space management mechanism, for example, a part of the address space is allocated to the feature map data, and another part is allocated to the weight data; the address space start and end can be dynamically configured by software. For example: as shown in FIG. 3, the feature map data and the weight data are stored in 32 banks of the secondary cache 12, wherein 0 to 511 address spaces are allocated to the feature map data and 511 to 1023 address spaces are allocated to the weight data; the address space start and end can be dynamically configured by software. Since the two data, the feature map data and the weight data, share one secondary cache 12, each bank write and read port of the secondary cache 12 has a weight configurable access arbitration logic to ensure that no long-term monopolizing of the feature map data or the weight data to the secondary cache 12 occurs. The weight ratio of each bank write-in port and each read-out port of the second-level cache 12 may allocate more access time to access a certain data stream by dynamically configuring the access weight through software, or certainly configure a certain data stream to be an absolute high priority through software.
For the reconfiguration of the level two cache 12. The cache depth of the second level cache 12 may be adjusted by dynamic configuration of software. The characteristic of the data volume of the characteristic diagram and the weight of different artificial intelligence neural networks can be adapted. In addition, the second level cache 12 also supports a space equal division management strategy. This ensures that the feature map and the weight data are read into the cache as much as possible when the space of the second-level cache 12 is sufficient, and can be dynamically adjusted according to the condition of ich, generally speaking, when there is much ich, the blk-ai data amount is large, so that the number of equal divisions of the second-level cache 12 is reduced, otherwise, the number of equal divisions of the second-level cache 12 is increased. Each partitioned blk-ai data space has special tag management logic that tags as releasable after completion of configured read times, and new blk-ai data can overwrite that portion of the content.
Specifically, the second-level cache 12 in the embodiment of the present disclosure may select a management policy that supports power-order halving of 2 for the cache space, for example, the feature map data occupies 512 depths of the second-level cache 12 space of 0 to 511, and since there are 4 bank-groups, the blk-ai data of 4 feature maps may be stored; if the blk-ai data is small, one bank-group can store two blk-ai data, the number of times can be configured to divide the 512 spaces into two, which is equivalent to 8 bank-groups, so that 8 blk-ai data can be stored. In the embodiment of the present disclosure, the level two cache 12 supports the equal division of 128, so that theoretically, 128 × 4 ═ 512 feature map blk-ai data can be stored at maximum. As shown in fig. 4, taking the storage space of the second-level cache 12 as an example, in this case, the second-level cache 12 may store blk-ai data of 4 × 4 — 16 feature maps. Likewise, the space of weight data supports the above-mentioned halving management strategy, which is not described in detail here.
For the data read-write optimization of the secondary cache 12, the secondary cache 12 in the embodiment of the present disclosure may adopt a single-port or dual-port SRAM; wherein the control rhythm is optimized integrally according to the characteristics of SRAM and DRAM. Firstly, the read-write sequence of 32 banks of the secondary cache 12 can be planned to ensure the overall balance of read-write, so that DRAM codes cannot be blocked, and an artificial intelligence operation core cannot have data to use; then, if read-write conflict occurs, the aforementioned SRAM read-write arbitration logic will be enabled to arbitrate the conflict so as to ensure the normal access of the SRAM. Meanwhile, the reading characteristic of the DRAM is considered, and data cannot be continuously output all the time, so that the writing of the SRAM can be performed at a certain interval, and the access time is reserved for the read data when collision occurs; in addition, the space reserved in the first-level cache 13 is added, so that the continuity of the artificial intelligent operation core data transmission can be basically ensured.
Specifically, taking the data of the characteristic diagram as an example, assuming that the second-level cache 12 does not perform the power-of-2 halving function, at this time, 4 bank-groups are available, and at this time, it is possible to ensure that the bank-group being read is not written and the bank-group being written is not read through the read-write exclusive control logic, so that the second-level cache 12 is macroscopically observed, and the function of simultaneous reading and writing can be implemented. If the power-2 halving function of the space of the second-level cache 12 is turned on, a richer space can be managed, but since bank-groups may coincide, there is a thirty-half probability that read-write conflicts occur among 32 banks under the worst condition, and at this time, conflicts will be arbitrated through the above-mentioned SRAM read-write arbitration logic. To ensure normal access of the SRAM. In consideration of the reading characteristics of the DRAM, data is not continuously output, and therefore, the writing of the SRAM also has a certain gap, which may leave access time for the read data when a collision occurs.
In addition, the space reserved in the first-level cache 13 is added, so that the continuity of the artificial intelligent operation core data transmission can be basically ensured.
In some embodiments of the present disclosure, the level one cache 13 includes a feature map data level one cache 131 and a weight data level one cache 132. The first level cache 131 of feature map data is used for storing the feature map data written in the second level cache 12; the weight data primary cache 132 is used to store weight data written by the secondary cache 12.
The storage space of the first-level cache 131 of the feature map data is much smaller than that of the second-level cache 12; the banks of the profile data level one cache 131 may be composed of hardware shift register units (hereinafter reg). Specifically, the first-level cache 131 of feature map data in the embodiment of the present disclosure may include a plurality of cells (hereinafter referred to as cells), each cell may include a plurality of rooms (hereinafter referred to as rooms) according to requirements, and each room includes a plurality of blks. If the artificial intelligence operation core only needs to start a processing pipeline of the feature diagram, only one cell is enabled; if the artificial intelligence operation core needs to start the processing pipelines of the four characteristic graphs in parallel, the four cells are enabled to provide parallel characteristic graph data output.
For example: each cell contains 3 rooms, namely a left room (hereinafter referred to as lft-room), a middle room (hereinafter referred to as cnr-room) and a right room (hereinafter referred to as rgt-room), specifically, as shown in fig. 5, a left room containing cell 0, a middle room containing cell 0, and a right room containing cell 0. When the size of the convolution kernel is 1x1, only cnr-room data need to be read for the artificial intelligence operation core, and when the size of the convolution kernel is larger than 1x1, neighbor data need to be read for the artificial intelligence operation, so that data in lft-room and rgt-room need to be read simultaneously; to conserve reg resources, the size of lft-room and rgt-room may be smaller than cnr-room. It is so arranged. This is because lft-room and rgt-room are only required to store blk boundary data for the convolution kernel, and need only be equal to or greater than half the width of the convolution kernel. Meanwhile, in order to ensure the buffering capability of the feature map data first-level cache 131, each rom may store 4 blks corresponding to each ich.
The storage space of the weighted data first-level cache 132 is also much smaller than that of the second-level cache 12. The memory bank of the first-level data cache 13 in the circle consists of a plurality of annular caches; each ring cache corresponds to a bank of the second level cache 12. That is, 32 banks are included in the second-level storage, and at this time, the number of data cached in the cache line should be 32. Because the data caching device 1 according to the embodiment of the present disclosure is compatible with specifications of multiple convolution kernels, bit widths required by the artificial intelligent operation core for the weight data packets are different according to different convolution kernels, and bit widths of the secondary cache 12bank are fixed, so that the bit widths are adjusted in a ring cache manner, where the smallest particles of the ring cache are bytes, and the working mechanism of the ring cache is consistent with the industry, and therefore, details are not described herein.
Because the data caching device 1 in the embodiment of the present disclosure includes a data reading control unit, a secondary storage, a first-level cache 131 for feature map data, and a first-level cache 132 for weight data, and these modules have the functions as described above, the improvement on efficient reading of feature map and weight data is achieved, the effects of effectively reducing the DRAM reading bandwidth requirement and timely supplying the operational data required by the artificial intelligence operational core are achieved, the DRAM access bandwidth and energy consumption are saved, and the data supply speed and data utilization rate are improved.
In a second aspect, an embodiment of the present disclosure further provides a data caching method, which may be implemented by using the data caching apparatus 1. As shown in fig. 6, the method in the embodiment of the present disclosure specifically includes the following steps:
s1, the feature map data and the weight data read from the dynamic random access memory DRAM are written into the second level cache 12.
Specifically, in this step, the feature map data and the weight data in the DRAM may be read by the data read control module 11 of the data cache device 1, and the feature data and the circle data are written into the secondary cache 12.
And S2, when the storage space of the first-level cache 13 is judged to be free, writing the feature map data and the weight data in the second-level cache 12 into the first-level cache 13 for the artificial intelligent core operation to read.
Specifically, in this step, the primary cache 13 may determine whether to send a request to the secondary cache 12 according to the current state of its own storage space, so as to obtain the feature map data and the weight data in the secondary cache 12.
In some embodiments, the level one cache 13 includes a profile data level one cache 131 and a weight data level one cache 132; the step S2 may specifically include the following steps:
when the first-level cache 131 of the feature map data judges that the storage space is free, a first request is sent to the second-level cache 12, so that the second-level cache 12 writes the feature map data in the first-level cache 131 of the feature map data; when the weighted data primary cache 132 determines that the storage space thereof is free, a second request is sent to the secondary cache 12, so that the secondary cache 12 writes the weighted data therein into the weighted data primary cache 132.
It should be noted that the feature map data written in the first-level cache 131 of feature map data is placed at the port position thereof, so as to facilitate the reading of the artificial intelligent core operation; similarly, the weight data written in the first-level cache 132 of weight data will also be placed at the port position, which is also convenient for the artificial intelligence core operation to read.
In some embodiments, the step S1 may specifically include:
the profile data and weight data read from the dynamic random access memory DRAM are written into the secondary cache 12 according to the preconfigured access arbitration logic.
Since the ports of the data read-write control module of the data caching device 1 are configured with the read-write logic and the weight ratio of the graph data such as the feature and the like and the weight data in advance, as mentioned above, detailed description is omitted here; at this time, the data read control module 11 may write the feature map data and the weight data read from the dynamic random access memory DRAM into the second level cache 12 according to the preconfigured access arbitration logic.
In some embodiments, the level two cache 12 includes a plurality of bank groups; each bank group including a plurality of banks; the step S1 includes:
and storing the characteristic diagram data and the weight data read from the dynamic random access memory DRAM into a corresponding memory bank of the memory bank group according to a preset read-write sequence and a preset space address.
Specifically, since the ports of the data read-write control module of the data caching device 1 are configured with the read-write logic and the weight ratio of the graph data such as the feature and the like and the weight data in advance, as described above, detailed description is omitted here; therefore, in this step, the data read/write control module may store the feature map data and the weight data in the DRAM in the corresponding bank of the bank group together with the pre-configured read/write order and the spatial address.
In some embodiments, profile data level one cache 131 includes a plurality of cells; each cell comprises a plurality of rooms; when the storage space of the first-level cache 131 of the feature map data is judged to be free, a first request is sent to the second-level cache 12, and the step of writing the feature map data in the second-level cache 12 into the first-level cache 13 includes: when judging that the idle cells exist in the feature map primary storage, sending a first request to the secondary cache 12, and writing the feature map data in the secondary cache 12 into a room of the idle cells in the primary cache 13.
In some embodiments, the weighted data first level cache 132 includes a plurality of ring caches, and the number of the ring caches is the same as the number of banks of the second level cache 12; when the storage space of the primary cache 132 of the weight data is judged to be free, a second request is sent to the secondary cache 12, and the step of writing the weight data in the secondary cache 12 into the primary cache 13 includes: when the ring cache of the weighted data first-level cache 132 is determined to be idle, a second request is sent to the second-level cache 12, and the weighted data in the corresponding memory bank in the second-level cache 12 is written into the ring cache.
In a third aspect, as shown in fig. 7, an embodiment of the present disclosure provides a memory, which includes the data caching apparatus 1 and the DRAM. The data caching device 1 is used for buffering the characteristic diagram data and the weight data in the DRAM so as to read the characteristic diagram data and the weight data by the artificial intelligent core operation.
Since the characteristic diagram data in the DRAM is stored according to the consecutive addresses in the embodiment of the present disclosure, the data reading control module 11 in the data caching device 1 is facilitated to read, and meanwhile, since the data caching device 1 includes the data reading control module 11, the primary cache 13, and the secondary cache 12, the data caching device 1 is particularly suitable for the artificial intelligence core operation, that is, for storing the characteristic diagram data and the weight data. The characteristic diagram data and the weight data in the DRAM are read through the first-level cache 13 and the second-level cache 12, the effects of effectively reducing the DRAM reading bandwidth requirement and timely supplying data required by artificial intelligence core operation can be achieved, the DRAM access bandwidth and energy consumption are saved, and the data supply speed and the data utilization rate are improved.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims (13)

1. A method of caching data, comprising:
writing the characteristic diagram data and the weight data read from the dynamic random access memory DRAM into a second-level cache;
and when the storage space of the first-level cache is judged to be free, writing the characteristic diagram data and the weight data in the second-level cache into the first-level cache for the reasoning module to read.
2. The method of claim 1, wherein the level one cache comprises a feature graph data level one cache and a weight data level one cache; when the storage space of the first-level cache is judged to be free, the step of writing the feature map data and the weight data in the second-level cache into the first-level cache comprises the following steps:
when the storage space of the first-level cache of the feature map data is judged to be free, a first request is sent to the second-level cache, and the feature map data in the second-level cache is written into the first-level cache of the feature map data; and when the storage space of the primary weighted data cache is free, sending a second request to the secondary cache, and writing the weighted data in the secondary cache into the primary weighted data cache.
3. The method of claim 2, wherein the level two cache comprises a plurality of bank groups; each bank group including a plurality of banks; the step of writing the feature map data and the weight data read from the dynamic random access memory DRAM into the second level cache includes:
and storing the characteristic diagram data and the weight data read from the dynamic random access memory DRAM into a corresponding memory bank of the memory bank group according to a preset read-write sequence and a preset space address.
4. The method of claim 3, wherein the profile data level one cache comprises a plurality of cells; each cell comprises a plurality of rooms; when the storage space of the first-level cache of the feature map data is judged to be free, a first request is sent to the second-level cache, and the step of writing the feature map data in the second-level cache into the first-level cache comprises the following steps:
and when judging that idle cells exist in the primary storage of the feature map data, sending a first request to a secondary cache, and writing the feature map data in the secondary cache into a room of the idle cells of the primary cache.
5. The method of claim 3, wherein the weighted data first-level cache comprises a plurality of ring caches, and the number of the ring caches is the same as the number of the memory banks of the second-level cache; when the storage space of the first-level cache of the weight data is judged to be free, a second request is sent to the second-level cache, and the step of writing the weight data in the second-level cache into the first-level cache comprises the following steps:
and when the annular cache of the first-level cache of the weight data is judged to be idle, sending a second request to the second-level cache, and writing the weight data in the corresponding memory bank in the second-level cache into the annular cache.
6. The method of claim 1, wherein the step of writing the profile data and weight data read from the dynamic random access memory DRAM into the second level cache comprises:
and writing the characteristic diagram data and the weight data read from the dynamic random access memory DRAM into a second-level cache according to the preconfigured access arbitration logic.
7. A data caching apparatus, comprising:
the data reading control module is used for reading the characteristic diagram data and the weight data in the DRAM and writing the characteristic diagram data and the weight data into the secondary cache;
the second-level cache is used for writing the feature map data and the weight data into the first-level cache when the storage space of the first-level cache is free;
and the first-level cache is used for caching the characteristic diagram data and the weight data for the reasoning module so as to be read by the reasoning module.
8. The apparatus of claim 7, wherein the level two cache comprises: a plurality of bank groups, each bank group including a plurality of banks; wherein the content of the first and second substances,
each memory bank group is used for storing the characteristic diagram data and the weight data in a corresponding memory bank of the memory bank group according to a preset read-write sequence and a preset space address.
9. The apparatus of claim 8, wherein the level one cache comprises:
the first-level cache of the feature map data is used for storing the feature map data written in the second-level cache;
and the weight data primary cache is used for storing the weight data written in the secondary cache.
10. The apparatus of claim 9, wherein the profile data level one cache includes a register unit.
11. The apparatus of claim 9, wherein the profile data level one cache comprises a plurality of cells, each of the cells comprising a plurality of rooms; each of the rooms is for storing the feature map data.
12. The apparatus of claim 9, wherein the weighted data first level cache comprises a plurality of ring caches; and each annular cache corresponds to a memory bank in the second-level cache.
13. A memory comprising the data caching device of any one of claims 7 to 12 and a dynamic random access memory, DRAM.
CN201911106091.6A 2019-11-13 2019-11-13 Data caching device and method and memory Pending CN112799975A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911106091.6A CN112799975A (en) 2019-11-13 2019-11-13 Data caching device and method and memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911106091.6A CN112799975A (en) 2019-11-13 2019-11-13 Data caching device and method and memory

Publications (1)

Publication Number Publication Date
CN112799975A true CN112799975A (en) 2021-05-14

Family

ID=75803122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911106091.6A Pending CN112799975A (en) 2019-11-13 2019-11-13 Data caching device and method and memory

Country Status (1)

Country Link
CN (1) CN112799975A (en)

Similar Documents

Publication Publication Date Title
EP3149595B1 (en) Systems and methods for segmenting data structures in a memory system
US5835941A (en) Internally cached static random access memory architecture
CN103136120B (en) Row buffering operating strategy defining method and device, bank division methods and device
US8244972B2 (en) Optimizing EDRAM refresh rates in a high performance cache architecture
CN101271435B (en) Method for access to external memory
US4138720A (en) Time-shared, multi-phase memory accessing system
CN111881068A (en) Multi-entry fully associative cache memory and data management method
CN105487988B (en) The method for improving the effective access rate of SDRAM bus is multiplexed based on memory space
US10061513B2 (en) Packet processing system, method and device utilizing memory sharing
US10067868B2 (en) Memory architecture determining the number of replicas stored in memory banks or devices according to a packet size
US20230269205A1 (en) Switch for transmitting packet, network on chip having the same, and operating method thereof
US9658951B1 (en) Scalable high bandwidth memory in a network device
CN112799975A (en) Data caching device and method and memory
CN115052042B (en) Method for realizing high-performance multi-channel shared cache
WO2013184855A1 (en) Memory with bank-conflict-resolution (bcr) module including cache
CN109992528A (en) From the multilevel system memory configuration of very fast storage level operation higher-priority subscriber
CN114742214A (en) Caching method, system, device and storage medium of neural network
US5701431A (en) Method and system for randomly selecting a cache set for cache fill operations
US10067690B1 (en) System and methods for flexible data access containers
Vudarapu et al. Optimization of SDRAM memory controller for high-speed operation
CN113778335B (en) Control method of multi-port low-delay access SRAM group in SSD master control
US9612950B2 (en) Control path subsystem, method and device utilizing memory sharing
WO2024001414A1 (en) Message buffering method and apparatus, electronic device and storage medium
US9582215B2 (en) Packet processing system, method and device utilizing memory sharing
CN116860185B (en) Data access apparatus, system, method, device, chip and medium for SRAM array

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination