CN113051194A - Buffer memory, GPU (graphic processing unit), processing system and cache access method - Google Patents

Buffer memory, GPU (graphic processing unit), processing system and cache access method Download PDF

Info

Publication number
CN113051194A
CN113051194A CN202110228263.8A CN202110228263A CN113051194A CN 113051194 A CN113051194 A CN 113051194A CN 202110228263 A CN202110228263 A CN 202110228263A CN 113051194 A CN113051194 A CN 113051194A
Authority
CN
China
Prior art keywords
access
data
address
block
storage unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110228263.8A
Other languages
Chinese (zh)
Other versions
CN113051194B (en
Inventor
龙斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Jingmei Integrated Circuit Design Co ltd
Changsha Jingjia Microelectronics Co ltd
Original Assignee
Changsha Jingmei Integrated Circuit Design Co ltd
Changsha Jingjia Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Jingmei Integrated Circuit Design Co ltd, Changsha Jingjia Microelectronics Co ltd filed Critical Changsha Jingmei Integrated Circuit Design Co ltd
Priority to CN202110228263.8A priority Critical patent/CN113051194B/en
Priority to PCT/CN2021/087350 priority patent/WO2022183571A1/en
Publication of CN113051194A publication Critical patent/CN113051194A/en
Application granted granted Critical
Publication of CN113051194B publication Critical patent/CN113051194B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The embodiment of the application provides a buffer memory, a GPU, a processing system and a cache access method, wherein the buffer memory comprises: the access address comparison units are used for receiving access addresses sent by a plurality of access sources and comparing the access addresses with block addresses stored in the address storage unit to generate comparison results; the plurality of access address comparison units are independent of each other; the data access management unit is used for receiving the comparison result sent by the access address comparison unit and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit so as to enable the corresponding access source to access the buffer data block in the data storage unit through the data channel; and the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent. The cache memory, the GPU, the processing system and the cache access method can improve access efficiency.

Description

Buffer memory, GPU (graphic processing unit), processing system and cache access method
Technical Field
The present application relates to memory access technologies, and in particular, to a buffer memory, a GPU, a processing system, and a cache access method.
Background
A Graphics Processing Unit (GPU) is a microprocessor specially used for Processing images or Graphics, and is applied to a display system of an electronic terminal to reduce the pressure of a Central Processing Unit (CPU) in image or Graphics Processing.
A Cache and a plurality of operation units are arranged in the GPU, and each operation unit can be used as an access source to access the Cache. When a plurality of access sources access to the Cache, arbitration is needed, and then the plurality of access sources are sequenced and then sequentially accessed. The access mode has low efficiency, which causes that a plurality of arithmetic units are always in a waiting stage or cannot normally work due to bottleneck effect, thereby reducing the processing efficiency of the GPU. Moreover, when the Cache fails in hit, all subsequent read-write accesses can be suspended to execute the replacement operation of a certain data block in the Cache, and the waiting time of the arithmetic unit is further prolonged.
Disclosure of Invention
The embodiment of the application provides a buffer memory, a GPU, a processing system and a cache access method, which are used for solving the problem that the efficiency of accessing the buffer memory by an access source in the traditional scheme is low.
An embodiment of the first aspect of the present application provides a buffer memory, including:
the access address comparison units are used for receiving access addresses sent by a plurality of access sources and comparing the access addresses with block addresses stored in the address storage unit to generate comparison results; wherein, the plurality of access address comparison units are mutually independent;
the data access management unit is used for receiving the comparison result sent by the access address comparison unit and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit so as to enable the corresponding access source to access the buffer data block in the data storage unit through the data channel;
a data storage unit for storing a plurality of buffered data blocks;
the address storage unit is used for storing the block address corresponding to each buffer data block;
and the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent.
An embodiment of a second aspect of the present application provides a GPU, including: a plurality of arithmetic units and a buffer memory as described above.
An embodiment of a third aspect of the present application provides a processing system, including: a graphics processor GPU as described above.
The embodiment of the fifth aspect of the present application provides a cache access method applying the above cache memory, including:
the multiple access address comparison units receive access addresses sent by multiple access sources and compare the access addresses with block addresses stored in the address storage unit to generate comparison results; the multiple access address comparison units are mutually independent;
the data access management unit receives the comparison result sent by each access address comparison unit, and controls the corresponding data channel in the data channel management unit to gate when the comparison result is hit, so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel;
the data storage unit is used for storing a plurality of buffer data blocks; the address storage unit is used for storing block addresses corresponding to the buffer data blocks; the data channel management unit comprises a plurality of data channels, and the data channels are independent from each other.
According to the technical scheme provided by the embodiment of the application, a plurality of access address comparison units are adopted for receiving access addresses sent by a plurality of access sources and comparing the access addresses with block addresses stored in an address storage unit to generate comparison results; the multiple access address comparison units are mutually independent; the data access management unit is used for receiving the comparison result sent by the access address comparison unit and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit so as to enable the corresponding access source to access the buffer data block in the data storage unit through the data channel; a data storage unit for storing a plurality of buffered data blocks; the address storage unit is used for storing the block address corresponding to each buffer data block; and the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent. According to the technical scheme, the plurality of access address comparison units work independently, receive access addresses sent by the plurality of access sources and compare the access addresses with the block addresses of the buffer data blocks, and when the comparison result is hit, the data access management unit controls the corresponding data channel to be gated, so that the access sources access the buffer data blocks in the data storage unit independently through the corresponding data channels, and the access efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a block diagram of a buffer memory according to an embodiment of the present disclosure;
fig. 2 is a block diagram of a GPU provided in the fifth embodiment of the present application;
fig. 3 is a block diagram of a processing system according to a fifth embodiment of the present disclosure;
fig. 4 is a flowchart of a cache access method according to a sixth embodiment of the present application.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Example one
The present embodiment provides a buffer memory, which can be applied to a processor, where the processor may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or another processor. The processor comprises, in addition to the buffer memory, a plurality of arithmetic units, each arithmetic unit being accessible as an access source to the buffer memory. The buffer memory provided by the embodiment provides an interface for a plurality of access sources, so that the plurality of access sources can access simultaneously.
The processor can be applied to a processing system, and the processing system includes a processor, a Read Only Memory (ROM), a Random Access Memory (RAM), and the like. The processor is provided with the buffer memory, and the RAM can be also called as an external memory from the viewpoint of the buffer memory.
Fig. 1 is a block diagram of a buffer memory according to an embodiment of the present application. As shown in fig. 1, the buffer memory provided in the present embodiment includes: the device comprises an address storage unit 1, an access address comparison unit 2, a data access management unit 3, a data channel management unit 4 and a data storage unit 5.
Data acquired from an external memory is divided into a plurality of buffer data blocks and stored in the data storage unit 5. The access source may read the buffered data block from the data storage unit 5 or may modify the content of the buffered data block. The block address corresponding to each buffered data block is stored in the address storage unit 1.
The data path management unit 4 has a plurality of data paths, which are independent of each other and operate in parallel. At least one data channel may be gated with the data storage unit 5 such that at least one access source may read and write buffered data blocks in the data storage unit 5 through the corresponding data channel.
The number of the access address comparison units 2 is multiple, and the multiple access address comparison units 2 are used for receiving access addresses sent by multiple access sources and comparing the access addresses with block addresses in the address storage unit 1. Specifically, each access address comparing unit 2 is configured to receive an access address from an access source, compare the access address with a block address stored in the address storage unit 1, and generate a comparison result. The plurality of access address comparison units 2 are independent of each other and can work in parallel. When the access address is the same as a certain block address stored in the address storage unit 1 after comparison, the access address comparison unit 2 generates a hit comparison result; when the access address is not the same as any of the block addresses stored in the address storage unit 1, the access address comparison unit 2 generates a missed comparison result, which may also be referred to as a failed comparison result.
The data access management unit 3 is used for receiving the comparison result sent by the access address comparison unit 2. When the comparison result is a hit, the data access management unit 3 controls the corresponding data channel to gate, so that the corresponding access source can access the buffered data block in the data storage unit 5 through the data channel.
When an access source accesses the buffer memory, one of the access address comparison units 2 receives the access address of the access source, compares the access address with the block address stored in the address storage unit 1, generates a comparison result and sends the comparison result to the data access management unit 3. When the comparison result is a hit, the data access management unit 3 controls one data channel in the data channel management unit 4 to gate, so that the access source accesses the corresponding buffered data block in the data storage unit 5 through the data channel.
When more than two access sources access the buffer memory, the access address comparison units 2 with the same number respectively receive the access address of one access source, respectively compare the received access address with the block address stored in the address storage unit 1, generate a comparison result and send the comparison result to the data access management unit 3. For the case that the comparison result is a hit, the data access management unit 3 controls the data channel gating with the same number as the hit result, so that each access source accesses the corresponding buffered data block in the data storage unit 5 through one data channel.
For example, the following steps are carried out: when there are three access sources A, B, C accessing the buffer memory, the three access sources A, B, C respectively send access addresses to the buffer memory. The three access address comparison units 2 respectively receive an access address of an access source, compare the received access address with the block address stored in the address storage unit 1, generate a comparison result and send the comparison result to the data access management unit 3.
Assuming that all three comparison structures are hit, the data access management unit 3 controls gating of three data channels, the three access sources A, B, C respectively access the corresponding buffered data blocks in the data storage unit 5 through one data channel, and the three access sources A, B, C can access in parallel without mutual interference.
Assuming that the two comparison results corresponding to the access sources a and B are hits, the data access management unit 3 controls the gating of the two data channels, the two access sources A, B access the corresponding buffered data block in the data storage unit 5 through one data channel, and the two access sources A, B can access in parallel without mutual interference.
According to the technical scheme provided by the embodiment, a plurality of access address comparison units are adopted for receiving access addresses sent by a plurality of access sources and comparing the access addresses with block addresses stored in an address storage unit to generate comparison results; the multiple access address comparison units are mutually independent; the data access management unit is used for receiving the comparison result sent by the access address comparison unit and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit so as to enable the corresponding access source to access the buffer data block in the data storage unit through the data channel; a data storage unit for storing a plurality of buffered data blocks; the address storage unit is used for storing the block address corresponding to each buffer data block; and the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent. According to the technical scheme, the plurality of access address comparison units work independently, receive access addresses sent by the plurality of access sources and compare the access addresses with the block addresses of the buffer data blocks, and when the comparison result is hit, the data access management unit controls the corresponding data channel to be gated, so that the access sources access the buffer data blocks in the data storage unit independently through the corresponding data channels, and the access efficiency is improved.
Example two
In this embodiment, on the basis of the above embodiments, the buffer memory is optimized:
as shown in fig. 1, the buffer memory further includes: and a memory access unit 6. When the comparison result generated by the access address comparison unit 2 is invalid, the data access management unit 3 controls the external memory access unit 6 to read the data block corresponding to the access address sent by the access source from the external memory and write the data block into the data storage unit 5, and updates the address storage unit 1 according to the address corresponding to the read data block. And then, the access address comparison unit 2 compares the access address with the block address again, and if the address is hit, the data channel is gated to access the buffer data block in the data storage unit 5.
Specifically, the external memory access unit 6 may include: the device comprises an external memory access module and a write-back data cache module. The external memory access module pre-stores the data block read from the external memory into the loading data cache module for caching, and then writes the data block into the data storage unit 5.
When there are multiple access sources accessing the buffer memory, only the access source with address matching failure is suspended, and then read from the external memory through the external memory access unit 6. And the access sources with the hit of the rest addresses can access the buffer data block through all the data channels in parallel. The operation of suspending the access source does not affect the normal access operation of hitting the access source by other addresses, can improve the utilization rate of the data storage unit 5, and solves the problem of low access efficiency caused by the fact that other access sources have to wait all the time after the access operation of a certain access source is suspended in the conventional scheme.
In the above process, a data block to be replaced needs to be selected for storing data to be read from the external memory. The data block to be replaced may be the least-utilized data block, for example, determined by a least-recently-used algorithm.
When the selected data block to be replaced is not modified in the operation process, which is equivalent to the data content of the data block to be replaced being the same as that of the external memory, the data block to be replaced can be directly replaced by the data read from the external memory.
However, when the selected data block to be replaced is modified during the operation, which is equivalent to that the data content of the data block is different from that of the external memory, the selected data block to be replaced needs to be written back to the external memory first, and then the data block to be replaced is replaced by the data read from the external memory. Specifically, the external memory access unit 6 further includes a load data cache module. The external memory access module is used for pre-storing the selected data block to be replaced into the write-back data cache module for caching when receiving a control instruction sent by the data access management unit 3 after the comparison result is failure, and then writing back the data block to be replaced into the external memory. And then, pre-storing the data block read from the external memory into the loading data cache module, and writing the data block into the data storage unit 5 to replace the data block to be replaced. The data access management unit 3 then sends the address of the data block to be replaced to the address storage unit 1 so that the address storage unit 1 corresponds to the update block address.
EXAMPLE III
In this embodiment, on the basis of the above embodiments, the buffer memory is optimized, and particularly, the address storage unit 1 and the access address comparison unit 2 are optimized:
the address storage unit 1 may store therein block addresses corresponding to all buffered data blocks. The access address comparing unit 2 may compare the access address received from the access source with all the block addresses in the address storage unit 1.
Or the block addresses may be grouped, and the block addresses corresponding to the buffer data blocks associated with each access source are divided into a group, so that the access address comparison unit 2 may compare only the group of block addresses associated with one access source when receiving the access address of the access source, thereby reducing the number and times of comparison, shortening the comparison time, and facilitating to improve the access efficiency.
One specific implementation is as follows: the address storage unit 1 adopts a group association scheme, each address of the address storage unit 1 stores a plurality of groups of addresses of buffer data blocks, and the addresses of all the buffer data blocks in one address are associated with one access source.
Further, the address storage unit 1 includes a plurality of register sets, each register set is configured to store a plurality of block addresses, which may be specifically a plurality of block addresses of the buffered data block corresponding to one access source. The adoption of the register group can improve a plurality of read address inlets, is convenient for the access address comparison unit 2 to obtain the block address, and is beneficial to realizing addressing and address comparison of a plurality of access sources. In practical applications, the configuration of the register set can be determined according to the capacity and area of the address storage unit 1.
A specific implementation manner is as follows: the access address comparison unit 2 includes: the device comprises an address acquisition module and an address comparison module, wherein the address acquisition module is used for acquiring an access address sent by an access source. The address comparison module is used for intercepting the high-order part in the access address, comparing the high-order part with a plurality of block addresses stored in the address storage unit 1, and then outputting a comparison result. When the address hits, the comparison result includes address hit information and the hit block address number.
Further, when the address storage unit 1 is implemented by using multiple register sets, the access address comparison unit 2 further includes an address obtaining module, configured to determine a target register set according to an access address sent by an access source, and obtain a block address in the target register set, so that the address comparison module compares a high-order portion in the access address with the block address in the target register set. The access address comparison unit 2 compares only the block addresses in the target register group associated with the access source, so that the number and times of comparison are reduced, the comparison time is shortened, and the access efficiency is improved.
Example four
The present embodiment provides an implementation manner of a buffer memory on the basis of the above embodiments:
as shown in fig. 1, the buffer memory provided in the present embodiment includes: the device comprises an address storage unit 1, an access address comparison unit 2, a data access management unit 3, a data channel management unit 4 and a data storage unit 5.
The address storage unit 1 is called Cache Tag, and the number of the address storage unit 1 is one. The data read by the external memory is divided into a plurality of data blocks, and each data block is called Cache Line. A plurality of data blocks are stored in the data storage unit 5, and block addresses of the data blocks are stored in the address storage unit 1.
The address storage unit 1 adopts a group-associative scheme and is realized by adopting a register group, and can simultaneously address and read a plurality of access sources, thereby avoiding conflict generated by the access of the plurality of access sources.
By adopting the 8-path access address comparison unit 2, the access addresses of 8 access sources can be compared at the same time. The starting of each access address comparing unit 2 can be distributed according to the number of the access sources in sequence, namely: and the first access address comparison unit 2 is allocated to the first access source initiating access, the second access address comparison unit 2 is allocated to the second access source initiating access, and so on. Alternatively, the 8-way access address comparison unit 2 may correspond to the access sources one to one, that is: an access address comparing unit 2 is used for receiving the access address sent by the corresponding fixed access source and comparing the access address with the block address.
Specifically, as shown in fig. 1, the 8 access address comparing units 2 are respectively: and the access address comparison units from 0 to 7. The number 0 address comparison unit correspondingly receives and compares the access address sent by the number 0 access source, the number 1 address comparison unit correspondingly receives and compares the access address sent by the number 1 access source, and so on.
The access address comparison unit 2 compares the high-order part of the access address with the multiple block addresses output by the address storage unit 1 in a synchronous manner, and outputs hit information and hit block address numbers when the comparison result is hit. The plurality of access address comparison units 2 are independent of each other and can operate simultaneously. The 8 access address comparison units 2 respectively send comparison results to the data access management unit 3.
The data storage unit 5 is used to store the buffered data blocks. The data storage unit 5 in this embodiment is formed by splicing a plurality of RAM bodies of a random access memory, each RAM body stores a plurality of buffer data blocks, and the plurality of RAM bodies are independent of each other. Each RAM body provides an access interface, and the requirement that a plurality of access sources access different RAM bodies at the same time can be met. When a plurality of access sources simultaneously access different RAM banks, the data storage unit 5 can simultaneously respond to the access demands of the plurality of access sources. And when the hit RAM body executes read-write operation, the RAM body selected as the RAM body corresponding to the data block to be replaced can synchronously execute data replacement operation, and the data replacement operation is not influenced with the hit RAM body. The splicing structure of the RAM body can be determined according to factors such as the area, power consumption and efficiency of the buffer memory.
When multiple access sources hit the same buffer data block, the access can be performed in sequence according to the access order or according to the priority.
The number of data channels in the data channel management unit 4 may be the same as the number of access sources, and the data channels may be allocated according to the access order of the access sources, for example: a first data channel is assigned to the access source of the first access, a second data channel is assigned to the access source of the second access, and so on.
Or, the data channels correspond to the access sources one to one, that is: the access source accesses the data storage unit 5 through the dedicated data channel corresponding thereto. As shown in fig. 1, 8 data channels are used, which are respectively a data channel No. 0 to a data channel No. 7. The number 0 data channel corresponds to the number 0 access source one by one, the number 1 data channel corresponds to the number 1 access source one by one, and so on. The data channel management unit 4 is further provided with a data gating module for gating at least two data channels and the data storage unit at the same time. And the data gating module receives the data access management unit and controls the data channel module to gate the corresponding data channel when the address comparison result is hit.
Each data channel includes a read data channel and a write data temporary buffer. The data channels are independent from each other, and when the address comparison is hit, each data channel independently completes the access operation, and the data can be read from the RAM body or written into the RAM body.
When the address comparison fails, the external memory access unit 6 is configured to send a read-write command to the external memory according to an external protocol, acquire a buffer data block that needs to be written back to the external memory from the RAM, write the buffer data block back to the external memory, temporarily receive and store a data block that is acquired from the external memory and is needed by the access source, update the corresponding data block to be replaced when the data storage unit 5 is allowed, and update the address storage unit 1 through the data access management unit 3. For example: the data of the data block to be replaced is X, the X is firstly written back to the external memory, then the data Y in the external memory is read out, and the data block to be replaced is replaced by Y when the data storage unit 5 allows.
When the to-be-replaced data storage unit 5 allows, it may be determined that the to-be-replaced data block is allowed to be updated when it is determined that the storage area corresponding to the to-be-replaced data block is not hit by another access source.
The data access management unit 3 serves as a global management module of the buffer memory, and globally manages the cooperative operation of the above units.
The number 8 in this embodiment is merely an example, and actually, the number of the access address comparing units 2 and the number of the data channels may not be limited to 8, may be less than 8, and may also be greater than 8.
In addition, when address matching of a plurality of access sources fails, only the access source corresponding to the failed address matching needs to be suspended, and the buffer data block replacement operation is performed by the external memory access unit 6 according to the access order or priority. And the access operation can be normally executed by comparing the hit access source, so that the utilization rate of the data storage unit 5 is improved.
The address storage unit 1, the access address comparison unit 2, the data access management unit 3, the data channel management unit 4, and the data storage unit 5 in the above embodiments may be constructed by adopting a hardware structure, and the present embodiment does not limit the specific hardware structure of each unit, as long as the functions of each unit can be realized.
EXAMPLE five
Fig. 2 is a block diagram of a GPU provided in the fifth embodiment of the present application. As shown in fig. 2, the present embodiment provides a GPU including: a plurality of arithmetic units 21 and a buffer memory 22, and the plurality of arithmetic units 21 can access the buffer memory 22 as an access source. The buffer memory 22 may employ any of the various embodiments described above.
Fig. 3 is a block diagram of a processing system according to a fifth embodiment of the present application. As shown in fig. 3, the present embodiment provides a processing system, including: the GPU31 is configured to perform a graphics processing task. In addition, the processing system may also include a central processing unit CPU32 and a random access memory RAM 33. The CPU32 may issue graphics processing tasks to the GPU31, and the GPU31 performs the graphics processing tasks. The CPU32 and GPU31 may access the random access memory 33 during operation.
The GPU, the processing system and the electronic terminal provided by the embodiment have the same technical effects as the buffer memory.
EXAMPLE six
In this embodiment, on the basis of the above embodiments, a cache access method is provided, which can be executed by the above cache memory.
Fig. 4 is a flowchart of a cache access method according to a sixth embodiment of the present application. As shown in fig. 4, the cache access method provided in this embodiment includes:
step 401, the multiple access address comparing units receive the access addresses sent by the multiple access sources, and compare the access addresses with the block addresses stored in the address storage unit to generate comparison results.
The number of the access address comparison units 2 is multiple, and each access address comparison unit 2 is used for correspondingly receiving an access address sent by an access source. The access address comparing units 2 are independent of each other and can work in parallel.
A plurality of buffer data blocks are stored in the data storage unit, and block addresses corresponding to the buffer data blocks are stored in the address storage unit.
After receiving the access address from the access source, each access address comparing unit 2 compares the access address with the block address stored in the address storage unit 1. The block address stored in the address storage unit 1 is the address of each buffered data block in the data storage unit 5.
When the access address is the same as a certain block address, the comparison result is hit; when the access address is different from any block address, the comparison result is invalid.
And step 402, the data access management unit receives the comparison result sent by each access address comparison unit, and controls the corresponding data channel in the data channel management unit to gate when the comparison result is hit, so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel.
The number of the data channels is multiple, and the data channels are independent. Each data channel can be strobed in correspondence with a data storage unit. The data access management unit 3 receives the comparison result sent by the access address comparison unit 2, and when the comparison result is hit, controls the corresponding data channel and the data storage unit 5 to gate, so that the access source can access the buffered data block in the data storage unit 5 through the data channel.
In the above steps, the access address comparing units 2 work independently of each other, and each access address comparing unit 2 can obtain and compare access addresses. The data channels are independent from each other, and the gating of the data channels is not influenced mutually, so that the data access device can be accessed by a plurality of access sources at the same time.
According to the technical scheme provided by the embodiment, a plurality of access address comparison units are adopted for receiving access addresses sent by a plurality of access sources and comparing the access addresses with block addresses stored in an address storage unit to generate comparison results; the multiple access address comparison units are mutually independent; the data access management unit is used for receiving the comparison result sent by the access address comparison unit and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit so as to enable the corresponding access source to access the buffer data block in the data storage unit through the data channel; a data storage unit for storing a plurality of buffered data blocks; the address storage unit is used for storing the block address corresponding to each buffer data block; and the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent. According to the technical scheme, the plurality of access address comparison units work independently, receive access addresses sent by the plurality of access sources and compare the access addresses with the block addresses of the buffer data blocks, and when the comparison result is hit, the data access management unit controls the corresponding data channel to be gated, so that the access sources access the buffer data blocks in the data storage unit independently through the corresponding data channels, and the access efficiency is improved.
On the basis of the above technical solution, when the comparison result is that the comparison fails, the data access management unit 3 controls the external memory access unit 6 to read the data block corresponding to the access address sent by the access source from the external memory and write the data block into the data storage unit, and then updates the block address stored in the address storage unit 1.
One implementation is as follows: one buffered data block is selected in the data storage unit as a data block to be replaced in advance, and the data block read from the external memory can be replaced by the data block to be replaced. The data block to be replaced may be selected by a least recently used algorithm.
When the selected data block to be replaced is not modified in the operation process, which is equivalent to the data content of the data block to be replaced being the same as that of the external memory, the data block to be replaced can be directly replaced by the data read from the external memory.
When the selected data block to be replaced is modified in the operation process, which is equivalent to that the data content of the data block is different from that of the external memory, the selected data block to be replaced needs to be written back to the external memory first, and then the data block read from the external memory is replaced by the data block to be replaced.
After the external memory access unit 6 finishes acquiring the data block from the external memory and replacing the data block to be replaced selected in the data storage unit 5, the address storage unit 1 is updated through the data access management unit 3, so that the access address comparison unit 2 performs address comparison again.
In addition, the access address comparison unit 2 can compare the high-order part in the access address with the block address in the process of comparing the addresses, so that the number of compared address bits can be reduced, the comparison time can be shortened, and the access efficiency can be improved.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the description of the present application, it is to be understood that the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (16)

1. A buffer memory, comprising:
the access address comparison units are used for receiving access addresses sent by a plurality of access sources and comparing the access addresses with block addresses stored in the address storage unit to generate comparison results; wherein, the plurality of access address comparison units are mutually independent;
the data access management unit is used for receiving the comparison result sent by the access address comparison unit and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit so as to enable the corresponding access source to access the buffer data block in the data storage unit through the data channel;
a data storage unit for storing a plurality of buffered data blocks;
the address storage unit is used for storing the block address corresponding to each buffer data block;
and the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent.
2. The buffer memory of claim 1, further comprising:
and the external memory access unit is used for reading the data block corresponding to the access address sent by the access source from the external memory and writing the data block into the data storage unit when the comparison result received by the data access management unit is invalid.
3. The buffer memory according to claim 2, wherein the external memory access unit comprises: the device comprises an external memory access module and a loading data cache module;
and the external memory access module is used for reading the data block corresponding to the access address sent by the access source from the external memory when the comparison result received by the data access management unit is invalid, caching the data block by the loading data caching module, and writing the data block into the data storage unit.
4. The buffer memory according to claim 3, wherein the external memory access unit further comprises: write back data cache module;
before the external memory access module reads the data block corresponding to the access address sent by the access source from the external memory, the method further includes: the external memory access module writes the selected data block to be replaced into an external memory after the data block to be replaced is cached by the write-back data caching module;
after the external memory access module reads the data block corresponding to the access address sent by the access source from the external memory, the method further includes: and the external memory access module replaces the data block to be replaced with the data block read from the external memory.
5. The buffer memory according to any one of claims 1 to 4, wherein the address storage unit comprises: a plurality of register sets, each register set for storing a block address of a buffered data block corresponding to one access source.
6. The buffer memory of claim 5, wherein the access address matching unit comprises:
the access address acquisition module is used for acquiring an access address sent by an access source;
and the address comparison module is used for comparing the high-order part in the access address with the block address stored in the address storage unit and generating a comparison result.
7. The buffer memory of claim 6, wherein the access address matching unit further comprises:
the block address acquisition module is used for determining a target register group according to an access address sent by an access source and acquiring a block address in the target register group, wherein the block address is used for comparing a high-order part in the access address by the address comparison module.
8. The buffer memory of claim 1, wherein the access address comparing units are in one-to-one correspondence with the access sources, and one access address comparing unit is configured to receive an access address from a corresponding access source and compare the access address with a block address.
9. The buffer memory of claim 1, wherein the data storage unit is formed by a plurality of random access memory banks, each random access memory bank providing an access interface.
10. The buffer memory according to claim 1, wherein the data channels in the data channel management unit are in one-to-one correspondence with access sources, and the access sources access the data storage unit through the data channels corresponding to the access sources.
11. A Graphics Processor (GPU), comprising: a plurality of arithmetic units and a buffer memory according to any of claims 1-10.
12. A processing system, comprising: a graphics processor GPU according to claim 11.
13. A cache access method for applying the cache memory according to any one of claims 1 to 10, comprising:
the multiple access address comparison units receive access addresses sent by multiple access sources and compare the access addresses with block addresses stored in the address storage unit to generate comparison results; the multiple access address comparison units are mutually independent;
the data access management unit receives the comparison result sent by each access address comparison unit, and controls the corresponding data channel in the data channel management unit to gate when the comparison result is hit, so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel;
the data storage unit is used for storing a plurality of buffer data blocks; the address storage unit is used for storing block addresses corresponding to the buffer data blocks; the data channel management unit comprises a plurality of data channels, and the data channels are independent from each other.
14. The cache access method of claim 13, further comprising:
and when the comparison result is failure, the data access management unit controls the external memory access unit to read the data block corresponding to the access address sent by the access source from the external memory and write the data block into the data storage unit.
15. The cache access method according to claim 14, before the data access management unit controls the external memory access unit to read the data block corresponding to the access address sent by the access source from the external memory, further comprising:
and the data access management unit controls the external memory access unit to write the selected data block to be replaced back to the external memory.
16. The cache access method according to claim 13, wherein the access address comparing unit compares the access address with a block address stored in the address storage unit, specifically:
the access address comparison unit compares the high-order part in the access address with the block address stored in the address storage unit.
CN202110228263.8A 2021-03-02 2021-03-02 Buffer memory, GPU, processing system and buffer access method Active CN113051194B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110228263.8A CN113051194B (en) 2021-03-02 2021-03-02 Buffer memory, GPU, processing system and buffer access method
PCT/CN2021/087350 WO2022183571A1 (en) 2021-03-02 2021-04-15 Buffer memory, gpu, processing system and cache access method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110228263.8A CN113051194B (en) 2021-03-02 2021-03-02 Buffer memory, GPU, processing system and buffer access method

Publications (2)

Publication Number Publication Date
CN113051194A true CN113051194A (en) 2021-06-29
CN113051194B CN113051194B (en) 2023-06-09

Family

ID=76509714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110228263.8A Active CN113051194B (en) 2021-03-02 2021-03-02 Buffer memory, GPU, processing system and buffer access method

Country Status (2)

Country Link
CN (1) CN113051194B (en)
WO (1) WO2022183571A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013407A (en) * 2007-02-05 2007-08-08 北京中星微电子有限公司 System and method for implementing memory mediation of supporting multi-bus multi-type memory device
CN102298561A (en) * 2011-08-10 2011-12-28 北京百度网讯科技有限公司 Method for conducting multi-channel data processing to storage device and system and device
JP2013174997A (en) * 2012-02-24 2013-09-05 Mitsubishi Electric Corp Cache control device and cache control method
US20140156947A1 (en) * 2012-07-30 2014-06-05 Soft Machines, Inc. Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
CN111209232A (en) * 2018-11-21 2020-05-29 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for accessing static random access memory
CN111881068A (en) * 2020-06-30 2020-11-03 北京思朗科技有限责任公司 Multi-entry fully associative cache memory and data management method
CN112214427A (en) * 2020-10-10 2021-01-12 中科声龙科技发展(北京)有限公司 Cache structure, workload proving operation chip circuit and data calling method thereof
CN112231254A (en) * 2020-09-22 2021-01-15 深圳云天励飞技术股份有限公司 Memory arbitration method and memory controller

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4895262B2 (en) * 2005-12-09 2012-03-14 株式会社メガチップス Information processing apparatus, controller, and file reading method
TWI366094B (en) * 2007-12-28 2012-06-11 Asmedia Technology Inc Method and system of integrating data assessing commands and data accessing device thereof
CN102147757B (en) * 2010-02-08 2013-07-31 安凯(广州)微电子技术有限公司 Test device and method
CN102012872B (en) * 2010-11-24 2012-05-02 烽火通信科技股份有限公司 Level two cache control method and device for embedded system
CN106569727B (en) * 2015-10-08 2019-04-16 福州瑞芯微电子股份有限公司 Multi-memory shares parallel data read-write equipment and its write-in, read method between a kind of multi-controller

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013407A (en) * 2007-02-05 2007-08-08 北京中星微电子有限公司 System and method for implementing memory mediation of supporting multi-bus multi-type memory device
CN102298561A (en) * 2011-08-10 2011-12-28 北京百度网讯科技有限公司 Method for conducting multi-channel data processing to storage device and system and device
JP2013174997A (en) * 2012-02-24 2013-09-05 Mitsubishi Electric Corp Cache control device and cache control method
US20140156947A1 (en) * 2012-07-30 2014-06-05 Soft Machines, Inc. Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
CN111209232A (en) * 2018-11-21 2020-05-29 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for accessing static random access memory
CN111881068A (en) * 2020-06-30 2020-11-03 北京思朗科技有限责任公司 Multi-entry fully associative cache memory and data management method
CN112231254A (en) * 2020-09-22 2021-01-15 深圳云天励飞技术股份有限公司 Memory arbitration method and memory controller
CN112214427A (en) * 2020-10-10 2021-01-12 中科声龙科技发展(北京)有限公司 Cache structure, workload proving operation chip circuit and data calling method thereof

Also Published As

Publication number Publication date
WO2022183571A1 (en) 2022-09-09
CN113051194B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
US10515011B2 (en) Compression status bit cache and backing store
US8271763B2 (en) Unified addressing and instructions for accessing parallel memory spaces
US10037228B2 (en) Efficient memory virtualization in multi-threaded processing units
US10169091B2 (en) Efficient memory virtualization in multi-threaded processing units
US10310973B2 (en) Efficient memory virtualization in multi-threaded processing units
US7415575B1 (en) Shared cache with client-specific replacement policy
US9262174B2 (en) Dynamic bank mode addressing for memory access
CN108733415B (en) Method and device for supporting vector random access
US20120075319A1 (en) Hierarchical Memory Addressing
US9798543B2 (en) Fast mapping table register file allocation algorithm for SIMT processors
US9280464B2 (en) System and method for simultaneously storing and reading data from a memory system
US20120089792A1 (en) Efficient implementation of arrays of structures on simt and simd architectures
US20090300293A1 (en) Dynamically Partitionable Cache
US9934145B2 (en) Organizing memory to optimize memory accesses of compressed data
JP2010086496A (en) Vector computer system with cache memory, and operation method therefor
US10402323B2 (en) Organizing memory to optimize memory accesses of compressed data
US7069384B2 (en) System and method for cache external writing and write shadowing
US8139073B1 (en) Early compression tag lookup for memory accesses
CN113051194B (en) Buffer memory, GPU, processing system and buffer access method
US11934311B2 (en) Hybrid allocation of data lines in a streaming cache memory
US11321241B2 (en) Techniques to improve translation lookaside buffer reach by leveraging idle resources
US8127082B2 (en) Method and apparatus for allowing uninterrupted address translations while performing address translation cache invalidates and other cache operations
EP0611462B1 (en) Memory unit including a multiple write cache
US6996675B2 (en) Retrieval of all tag entries of cache locations for memory address and determining ECC based on same
JP2002041358A (en) Processor system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant