GB2215099A - Distributed cache architecture - Google Patents

Distributed cache architecture Download PDF

Info

Publication number
GB2215099A
GB2215099A GB8822580A GB8822580A GB2215099A GB 2215099 A GB2215099 A GB 2215099A GB 8822580 A GB8822580 A GB 8822580A GB 8822580 A GB8822580 A GB 8822580A GB 2215099 A GB2215099 A GB 2215099A
Authority
GB
United Kingdom
Prior art keywords
data
memory
address
cache
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB8822580A
Other versions
GB8822580D0 (en
Inventor
Scott I Griffith
Steven E Golson
Joseph Murphy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Microsystems Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Publication of GB8822580D0 publication Critical patent/GB8822580D0/en
Publication of GB2215099A publication Critical patent/GB2215099A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Description

1 1 li.
2215099 - 1 DISTRIBUTED CACHE ARCHITECTURE BACKGROUND OF THE INVENTION
1. F9ELp OE THE INVENTION:
The present invention relates to cache memory systems for computers, and more particularly. to an improved cache memory for increasing data access speed and efficiency.
2. ART BAcKr.RouND In many data processing systems, it is common to utilize a high speed buffer memory, referred to as a "cache". coupled to a central processing unit (CPU) to improve the average memory access time for the processor. The use of a cache is based upon the premise that over time a data processing system will access certain localized areas of memory with high frequency, and the data access exhibits spatial and temporal locality within the memory. The cache typically contains a subset of the complete data set disposed in the memory, and can be accessed very quickly by the CPU without the necessity of reading the data locations in the main memory.
Most modern memory systems utilize dyn-amic random access memories (DRAMs) which generally have 200 nanosecond cycle times ("cycle" time being the time from the initiation of a memory access until initiation of the next memory access). Most cache based systems employ a static RAM in the cache memory to provide faster data access speeds than would be available from the main DRAM memory. In addition, most cache based systems couple the cache memory between the CPU and main bus, rather than to the DRAM memory, In order to Increase access speed for data stored within the cache. Accordingly, the usefulness of the cache memory Is limited since it is coupled directly to the CPU and typically must contain representative samples of frequently accessed data which Is disposed In a plurality of DRAM memories coupled to a common bus.
1 1.
In addition, most cache systems employ data processing devices which may act as bus masters in addition to the CPU. These devices, such as by way of example a direct memory access (DMA) device, may modify data stored In a DRAM without updating any corresponding data In the cache, 5 thereby presenting a cache coherence problem.
As will be described, the present invention provides an improved cache memory which utilizes a distributed architecture. Each DRAM memory coupled to a common bus is provided with its own separate cache memory disposed between the DRAM memory and the bus. Each cache contains a representative subset of the data stored within its respective DRAM memory, such that the processor may efficiently obtain data from each memory "node" by providing an address which Includes a base and tag address unique to the particular memory node. Moreover, the present Invention avoids any cache coherence problems since each bank of DRAMs has a single master, namely, the cache controller.
n 4.
SUMMARY OF THE INVENTION
The present Invention provides an Improved memory architecture Including a processor (CPU) coupled over a bus to a plurality of dynamic random access memory (DRAM) banks. A cache memory is coupled between each of the DRAM banks and the bus to provide data cache capabilities for each of the banks of DRAMs. Each cache memory includes a static memory for storing data which is a subset of the data stored in its respective DRAM. An address provided by the CPU Is composed of a base address and tag address. A bus interface controller (BIC) within each cache compares the base address and determines If It falls within the range of addresses stored within its PRAM. If the base address falls outside the range, the BIC Ignores the address. If the base address falls within the range, and the processor Is executing a READ operation, a tag address control compares the tag address to a.plurality of previously stored tag addresses representing data stored In the cache memory.
is If a match occurs, then the cache control logic Initiates a READ memory cycle to the cache's memory and provides the data to the CPU. In the event no match is found, the cache initiates a DRAM access cycle to read the data location associated with the address provided by the processor. One of the previously stored tag addresses within the cache Is then chosen, and the data read from the DRAM is stored in the cache memory In place of the data corresponding to the chosen tag address. The data corresponding to the chosen tag is stored in the DRAM If a modify bit Indicates that K represents updated data not previously stored in the DRAM. The data read from the DRAM, now also stored in the cache memory, is provided to the processor. Similar operations are provided In the case of a WRITE processor command, whereby data to be written Is stored in the cache memory, a previously stored tag address Is selected and deleted, and the DRAM Is updated If required.
1 4 1 BRIEF DESCRIPTION OF THE DRAWINGS
FIGURE 1 Is a block diagram conceptually illustrating a typical prior art cache based system.
FIGURE 2 is block diagram illustrating the basic architecture of the present invention.
FIGURE 3 is a detailed block diagram of the cache memory 10 utilized by the present Invention.
FIGURE 4 conceptually Illustrates the address bit allocation used by the processor of the present invention.
FIGURE 5 is a diagram illustrating the address interleave logic of the present invention.
1 DETAILED DESCRIPTION OF THE INVENTION
An Improved memory architecture is disclosed having particular application for use In computer systems which employ a central processing unit (CPU) coupled to a plurality of DRAM memory banks (nodes) over a common bus. In the following description for purposes of explanation, specific memory devices, data rates, architectures, and components are set forth in order to provide a thorough understanding of the present Invention. However, it will be apparent to one skilled in the art that the present Invention may be practised without these specific details. In other instances, well known circuits are shown in block diagram form in order not to obscure the present Invention unnecessarily.
41 With reference to Figure 1, a prior art data processing system is disclosed which employs a processor 10 coupled to a cache 12. As Is
Illustrated, cache 12 Is coupled to a bus 14 to which DRAM memories 16 and 18 and VO devices 20 and 22 are also coupled. Cache 12 stores frequently accessed data which is also stored in the DRAM memories 16 and/or 18. As is common in prior art systems, a memory address to be accessed by processor 10 is provided first to cache 12. In the event that the address provided to cache 12 corresponds to datastored within the cache, cache 12 provides the desired data to processor 10, thereby avoiding the necessity of accessing the DRAM in which the original data was stored. If the desired data Is not located within cache 12, processor 10's request Is transmitted to the appropriate DRAM over bus 14 and a full memory access cycle is completed by the appropriate DRAM, the data being returned to the processor 10 over bus 14. In many systems, a "miss" by cache 12 results In the a ssed data stored in the DRAM (for example DRAM 16) being used to update cache 12 In the event of a subsequent request for the same data. it will be appreciated, that In the system Illustrated In Figure 1, cache 12 stores representative samples of data in both DRAM 16, as well as DRAM 18. In addition, a direct memory access (DMA) 6 device 23 is coupled to bus 14. DMA 23 may act as a bus master and access data stored in DRAM memories 16 and 18. it Will be appreciated that arry update or modification of data in either DRAM 16 or DRAM 18 by DMA 23 will not be reflected in data stored In cache 12, thereby creating possible cache 5 coherence problems.
Referring now to Figure 2, the architecture of the present 4k.
Invention is illustrated in which a processor 30 is coupled directly to a bus 33 for communication with other data processing resources. DRAM memories 32 and 34 are coupled to cache memories 36 and 38, respectively. Cache memories 36 and 38 are in turn coupled to bus 33 for communication to processor 30 as well as L10 devices 40 and 42 and DMA 43. As will be described more fully below, in the event the processor 30 attempts to access (for either a read or write operation) a memory location, the processor transmits an address which includes a base address, tag address and least significant bits (see Figure 4) over bus 33. Each cache memory coupled to bus 33 determines if the base address (which In the presently preferred embodiment comprises a four bit word) falls within the range of its respective DRAM memory. If the base address does not fall within the range of the cach&s DRAM memory (for example DRAM memory 32) then the cache (in the present example cache 36) takes no action on the processor's address. If, however, the base address does fall within the range of the cache's DRAM memory address range, then the cache examines the tag address (in the presently preferred embodiment comprising 18 bits). If the cache has stored this memory location In its static RAM, It provides this data to processor 30 over bus 33 without Initiating a memory access cycle of DRAM 32. If, however, cache 36 has not stored the desired data, then a full memory access of DRAM 32 is completed and the data Is provided over bus 33 to the processor 30. R will be appreciated by one skilled In the art, that the architecture of the present Invention utilizes a plurality of caches each.of which are responsible for a DRAM memory. This distributed architecture improves system efficiency and W 7 , di.
provides processor 30 (or other memory accessing devices such as 1/0 40 or 42) with faster and more comprehensive memory access than prior art systems.
Moreover, since all data accesses to either DRAM 32 or DRAM 34 are controlled by their respective caches, the use of DMA 43 does not create the cache coherence problems associated with the prior art system of Figure 1.
Referring now to Figure 3, the present Invention's cache memory will be described In detail. For purposes of this description, the structure and operation of cache 36 and DRAM memory 32 will be described, however, it will be appreciated that the operation of cache 36 and DRAM 32 is indicative of the structure and operation of cache 38 In conjunction with DRAM 34, as well as other caches utilized in the present Invention. For purposes of this Specification, the combination of a cache and DRAM (such as cache 36 and DRAM 32) are referred to as memory "nodesw.
Bus 33 includes a system data bus 46, a system address bus 50 and a control bus 48. Cache memory 60 is coupled to the system data bus 46 and is organized as a fully associative physical address cache which In the present embodiment contains 128 sixteen (16) byte, four word, entries, thus the total size of cache memory Is 2K bytes. As will be described, transfers of data between DRAM 32 and cache memory 60 are performed using a four cycle burst (nibble mode) access. In the presently preferred embodiment, DRAMs manufactured by Toshiba are used having part number TC51 1001 P-1 0, and operate using a 40 ns nibble cycle time. For each 16 byte cache entry In cache memory 60, there Is a corresponding cache tag field stored In content addressable memory (CAM) 63 of tag address control 62. Tag address control 62 Is, as Is Illustrated, coupled to the cache memory 60 over bus 81 as well as the bus interface control 64 over tag address bus 84. The CAM 63 includes, In the present embodiment, 128 eighteen (18) bit address tags which each correspond to a memory location within the cache memory 60. Each cache address tag includes a vallcr (V) bit, the state of which denotes whether or not the tag Is valid, as well as a 'modify" (M) bit, the state of which denotes 8 whether or not the data which corresponds to the tag has been modified, the function and operation of which will be described below. it will be appreciated that due to the fully associative architecture of cache 36, any entry can represent any 16 byte block of memory present In the DRAM 32 associated 5 with the cache.
Bus Interface controller (BIC) 64 Is coupled to both the control bus 48 and address bus 50. The BIC 64 Is responsible for generating and maintaining all timing signals associated with cache 36 and DRAM 32, and Initiates the vadous operations of other modules within cache 36, as will be described below with respect to the operation of the cache. BIC 64 is coupled to a cache control 76, which is in turn coupled to a DRAM control 80. As is illustrated, a temporary data holding register (TDHR) 72 is coupled to cache memory 60 as well as the data port of DRAM 32 over DRAM data bus 70. A temporary address holding register (TAHR) 71 Is coupled to CAM 63 over bus 82 and the address port of DRAM 32 over DRAM address bus 74. Cache control 76 is respqnsible for generating the control and timing signals that govern all transactions within cache 36 and DRAM 32. Cache control 76 Issues control signals to DRAM control 80 which generates row address strobe (RAS) and column address strobe (CAS) signals for the DRAM 32, and provides these signals to the DRAM over lines 73. As Is illustrated, system clock 86 signals are provided to a clock buffer 90 within cache 36. Clock signals are coupled to the bus Interface control 64, cache control 76, DRAM control 80 and other systems, as required, In order to synchronize operations of cache 36 to the system clock 86. Moreover, it will be appreciated by one skilled in the art, that additional control lines, data paths, and other functions of cache 36 are not Illustrated in Figure 3 In order to avoid obscuring the present Invention unnecessarily. However, the Implementation of the present Invention In specific system architectures may require one skilled in the art to utilize such control lines, etc. as may be necessary In order to Incorporate the present Invention In a particular data processing system. In addition, further i 9 structural elements of cache 36 will be described In conjunction with the operation of the present invention as described below.
is OPERATION In the event processor 30 desires to read a memory location of DRAM 32, the processor applies an address corresponding to the memory location in the DRAM to bus 33. In the presently preferred embodiment, the address, as Illustrated In Figure 4, Includes a four (4) bit base address, an eighteen (18) bit tag address, and two (2) least significant bits (LSB). This address is transmitted over bus 33, and In particular system address bus 50, and is intercepted by bus interface control 64 for each cache coupled to bus 33. BIC 64 compares the four (4) bit base address to the range of address used by DRAM memory 32. Similarly, other caches (for example cache 38) coupled to bus 32 compare the base address to the range of addresses for their respective DRAM memories. If the base address does not correspond to the range of addresses for a cache's DRAM memory, then the address is ignored by the cache.
If, after comparison of the base address, the bus interface controller 64 determines that the address provided by the processor 30 falls within the range of addresses for DRAM memory 32, then BIC 64 applies the eighteen (18) bit tag address to tag address control 62 of cache 36 over tag address bus 84. As previously described, tag address control 62 Includes a content addressable memory (CAM) 63 which stores a plurality of eighteen (18) bit address tags. In the presently preferred embodiment. the content addressable memory 63 of tag address control 62 Is fabricated out of cells which comprise one bit random access memory (RAM) cells that can generate a 'match' signal. when data applied to the CAM call matches the data contained In the cell. Accordingly, a tag entry comprised of a number of content addressable memory cells with a common match line will assert the match One only when all bits applied to the entry match all bits contained In the entry. Cache 36 uses CAM 63 to select one entry whose tag is valid and matches the tag address of the current address applied by processor 30.
Cache control 76 then uses the match Information to access cache memory 60 or initiate a memory access cycle of DRAM 32.
Assume for sake of example that data coffesponding to the address provided by processor 30 Is stored Within cache memory 60. Once BIC 64 compares the base address and applies the tag address bits to tag address control 62 over tag address bus 84, the applied eighteen (18) bit tag is compared to one of a plurality (in the presently preferred embodiment 128) tag addresses stored within the CAM 63 of tag address control 62. Each of the eighteen (18) bit tags include a valid" bit which, when set to 1, indicates that the tag and its associated entry in the cache memory 60 are both valid. In addition. each tag address includes a wmodifted bit. such that In the presently preferred embodiment when the modified bit is set to 1, this indicates that the associated entry in the cache memory 60 has been modified by one or more previous wdte cycles, and must therefore be stored in the main DRAM 32 if the cache entry is to be replaced. As will be descdbed, the generation of cache tags, cache storage allocation and entry replacement are performed by cache control 76.
Once the tag address is compared to the tags previously stored in the content addressable memory 63 and a match occurs (a cache Ohir). the data stored In the cache memory 60 associated with the tag Is read. A match line within the CAM 63 is coupled over match line 81 (shown symbolically) to select one of the 128 RAM lines of cache memory 60. As previously descdbed, cache memory 60 stores four (4) consecutive thirty-two (32) bit Wide words for each tag address. The two LSBs provided as part of the full address by processor 30 determines which of the four words associated with that particular tag are read from cache memory 60. The desired word is read from cache memory 60 and coupled to system data bus 46 for return to processor 30. It will be noted, that In the previous example the data to be read . ki was stored within cache memory 60, and it was therefore unnecessary to read the respective address location In DRAM 32, such that cache 36 performed the read operation In a much quicker and more efficient manner than would be possible had the data been stored within the DRAM memory 32 itself.
Assume now that the application of the tag address to the content addressable memory 63 resulted in no match (a miW) with the plurality of tags representing data stored within the cache memory 60. In such event, K is necessary for the data to be read directly from DRAM memory 32. Cache control 76 Initiates a DRAM memory access cycle of the data stored at the address defined by the tag address provided by processor 30. DRAM control generates appropriate RASICAS signals for DRAM 32 to access the data locations specified by the address and provides these control signals over lines 73. In addition. cache control 76 updates the content of cache memory to reflect data currently sought by processor 30 from the DRAM 32. In the is presently preferred embodiment, cache control 76 randomly (or pseudo randomly) picks a tag address stored within the content addressable memory 63 of tag address control 62. Cache control 76 then determines whether the tag address and data is valid (by determining if the valid (V) bit Is set) and if the modiffled (M) bit for the chosen tag address is set. In the event that both the valid and modified bit have been set, this state indicates that the cache data has been modified and Is therefore more recent than the data stored in the DRAM memory 32 at that address.
If cache control 76 determines that the randomly chosen tag is valid and not modified (or not valid), then cache control 76 deletes the tag and replaces K with the new tag address which has been provided by processor 30.
The corresponding valid bit is set and the modified bit is cleared. BIC 64 applies the tag address to tag address control 62 over tag address bus 84, and then to temporary address holding register 71 over bus 82, and then to DRAM 32 over DRAM address bus 74. Cache coRtrol 76 further Initiates and completes a DRAM memory access cycle to read the data in the DRAM i 1 1 12 associated with the new tag address. This data is provided to processor 30 over DRAM data bus 70 through cache memory 60, which also stores the data In the cache memory, and then over system data bus 46. Using the present Invention's fetch feature, a memory access of DRAM 32 results in four (4) thirty-two (32) bit words being transferred to cache memory 60, IS&Q, Toshiba MOS Mem=, Toshiba America Inc. (1987)], wherein only one of the words correspond to the address read by the processor. The two LSBs of the address provided by the processor 30 are presented to DRAM 32 to indicate that the particular word desired by the processor 30 is to be accessed first, followed by the three remaining words. It has been found that updating cache memory 60 with recently accessed data in DRAM 32 increases the probability of subsequent "hite by the cache 36 in future processor requests.
Assume that cache control 76 randomly chooses a tag address (to avoid ambiguity referred to as the "first tag address" herein) In content addressable memory 63, In which the tag is valid and modified, thereby indicating that the data within the cache memory 60 at the chosen tag location is newer than the data stored at the same address in DRAM 32. Cache control 76 then transfers the data at the chosen tag location within cache memory 60 Into temporary data holding register (TDHR) 72. Cache control further stores the first tag address In temporary address holding register (TAHR) 71. Cache control 76 presents the new tag address provided by the processor 30 to DRAM 32, and, upon receiving the requested data from DRAM 32 provides the data over DRAM data bus 70 to the system data bus 46 for receipt by processor 30. Cache control 76 further writes the data obtained from DRAM 32 (along with three additional words as previously described) to cache memory 60 and replaces the first tag address In CAM 63 with the new tag address provided by processor 30. The corresponding valid bit Is set and the modified bit Is cleared. Further, cache control 76 stores the data which has been stored In the temporary data holding register (TDHR) 72 In DRAM 32 at the first tag address which has been temporarily stored In TAHR 71. This updates the data - 13 disposed within DRAM 32 at the first tag address. it Will be noted that typically an update to cache data is not reflected in the corresponding DRAM location until the cache must be updated with new data and a tag Is randomly selected for replacement, as described above. As in the previous case, the first word of the four consecutive words read from DRAM 32 is provided to processor 30.
Assume for sake of example, that processor 30 desires to write new data into DRAM memory 32. As previously described, processor 30 provides an address, as Illustrated in Figure 4, over system address bus 50 to the respective cache (in the present example cache 36). BIC 64 compares the base address with the range of addresses stored within DRAM memory 32, and if the base address falls within the range of addresses utilized by DRAM memory 32, then BIC 64 applies the tag address to tag address control 62 over tag address bus 84. If no match occurs (a "missw) then cache control 76 Initiates a DRAM memory access cycle of the data stored at the address defined by the tag address provided by processor 30. Cache control 76 randomly (or pseudo-randomly) picks a tag address stored within the CAM 63 and examines the valid and modified bits.
If cache control 76 determines that the rarldomly chosen tag is valid and not modified (or not valid), then cache control 76 deletes the tag and replaces It with the new tag address provided by the processor 30. Cache control 76 further initiates and completes a DRAM memory access cycle to read the data in the DRAM associated with the new tag address. This data is provided to cache memory 60 over DRAM data bus 70. As described above, a total of fo6r words are transferred from DRAM 32 to cache memory 60. Cache control 76 then writes the new data provided by processor 30 into cache memory 60 at the new tag address. The two LSBs of the address provided by the processor 30 indicate which of the four data words associated with the new tag address Is to be written to. Since the three remaining words have been fetched from the DRAM 32, the result is that all four words contain valid data.
Cache control 76 further sets the valid and modified bits In the tag address i 4- - 14 stored in CAM 63 to Indicate that the data at the tag address In cache memory is more recent than the corresponding data at that address in DRAM 32.
If cache control 76 determines that the randomly chosen tag is valid and modified then cache control 76 reads cache memory 60 at the randomly selected tag location (the 'first tag address) and writes the corresponding tag address In TAHR 71 and data Into TDHR 72. Cache control 76 then presents the new tag address provided by processor 30 to DRAM 32, and writes the data obtained from DRAM 32 into cache memory 60. Further.
cache control 76 replaces the first tag address in CAM 63 with the new tag address provided by processor 30. The corresponding valid and modified bits are set. Cache control 76 then writes the new data provided by processor 30 into cache memory 60 at the new tag address. Old data stored in TDHR 72 is then written to DRAM 32 at the first tag address which has been temporarily stored in TAHR 71. Accordingly, although the data previously stored in TDHR is 72 is no longer represented in cache memory 60, the data continues to be stored in DRAM 2, and thus both the cache memory and DRAM have been updated with no data lost.
If the processor 30 provides an address over system address bus to write new data, and the address corresponds to a tag address currently in the cache 36, then cache control 76 simply writes into the cache memory 60 the new data at the tag address provided by the processor. Cache control 76 further sets the modify bit In the tag address stored in CAM 63 to Indicate that the data at the tag address In cache memory 60 is more recent than the corresponding data at that address in the DRAM 32.
As Is illustrated in Figure 5, bus interface control 64 Includes address Interleave logic 52 which is coupled to system address bus 50 and to tag address bus 84. In the presently preferred embodiment, address interleave logic 52 permits more efficient use of the cache memories coupled to each DRAM by Interleaving each cache Into the entire address space for the system. For example, without Interleaving, each DRAM memory of Figure 2 1 - stores data at sequential memory addresses. If DRAM 32 stores addresses from 0 to 4 megabytes, then DRAM 34 would store addresses from 4-8 megabytes, and so forth. R will be appreciated that due to the distributed architecture of the present Invention, cache 36 would contain only a subset of the data stored In DRAM 32, and thus, a program stored only In DRAM 32 (within the 0-4 Mbytes space) would have access only to cache 36 (presently 2k bytes of memory). Although a total of 4k bytes of cache memory exists in the system (2K each in cache memories 36 and 38), a program that is stored In a small range of addresses (for example from 0 to 4 Mbytes) only gets the benefit of 2K of the cache. The additional cache memory provided by cache 38 is not used. As will be appreciated, the use of address Interleave logic 52 permits the use of both cache memory 36 as well as 38 by a program that is stored In a small range of addresses.
With reference again to Figure 5, the circuitry of the address is Interleave logic 52 is shown in conceptual form. R will be noted that although discrete lines and switch devices are illustrated In Figure 5, that in the actual Implementation the circuitry of address Interleave logic 52 comprises semiconductor devices, such as transistors and the like. In Figure 5, the address provided by processor 30 over system address bus 50 is comprised of twenty-four (24) individual bit lines 102-125. The tag address bus 84 connecting to tag address control 62 Iscomprised of eighteen (18) Individual bit lines 304-307 and 108-121. The base address provided to base address logic 53 is comprised of four (4) individual bit Ones 322-325. The two least significant bits (LSBs) of the address are provided by bit lines 102-103, and. as described above, determine which words of data in the cache memory are coupled to the processor.
Assume that switches 200a, 200b, 201 a. 201 b, 202a, 202b, 203a, 203b are positioned such that lines 122-125 are coupled to lines 322 325 respectively, and thus lines 104-107 aT coupled to lines 304-307 - 16 W.
respectively. Then we can symbolically Illustrate the address presented on system address bus 50 as follows: BIBB where lines 102-125 appear sequentially from right to left with line 102 (the -least significant bit) on the extreme right and line 125 (the most significant bit) on the extreme left. A "B" signifies use as a bass address bit, a 7" signifies use as a tag address bit, and "V signifies the two LSBs.
Further, assume that base address logic 53 for cache 36 is set to respond to addresses of '00000, Le., when lines 322-325 are all low. Then DRAM 32 w.,'i contain all system addresses of the form OOOOAAAAAAAAAAAAAAAAAAAA where "A" represents either a 0 or 1. Thus addresses from 000000000000000000000000 to 000011111111111111111111 will be contained in DRAM 32. This includes addresses from 0 up to 4 Mbytes, as discussed above. Also, assume that base address logic 53 for cache 38 is set to respond to addresses of "0001", i.e., when line 322 is high and lines 323 325 are all low. Thus addresses from 000100000000000000000000 to 000111111111111111111111 will be contained in DRAM 34. This includes addresses from 4 Mbytes up to 8 Mbyte.
Consider a program which resides from addresses 000000000000000000000000 to 000011111111111111111111.
It will be contained entirely within DRAM 32, and will have access to only cache 36.
Now, assume that switches 200a, 200b, 201 a, 201b, 202a, 202b, 203a, 203b are positioned such that lines 123-125 are coupled to lines 323 325 respectively, and thus lines 105-107 are coupled to lines 305-307 respectively as above, but One 122 is coupled to One 304, and line 104 is coupled to line 322. This is the position of 4Wftches as shown In Figure 5.
Then we can describe the address presented on system address bus 50 as follows:
BB 111111111111111111E3I1 By changing the position of switches 200a and 200b, the bits on two of the S address fines have been swapped.
Further, assume that base address logic 53 for cache 36 is set to rspond to addresses of 0000. Le., when lines 322-325 are all low. Then DRAM 32 will contain all system addresses of the form OOOAAAAAAAAAAAAAAAAAAOAA where W represents either a 0 or 1. Thus, addresses from 000000000000000000000000to 000000000000000000000011 000000000000000000001000to 000000000000000000001011 000000000000000000010000to 000000000000000000010011 000111111111111111111000 to 000111111111111111111011 will be contained in DRAM 32. Also. assume that base address logic 53 for cache 38 is set to respond to addresses of 00001% Le., when line 322 Is high and lines 323-325 are all low. Then DRAM 34 will contain all system addresses of the form OOOAAAAAAAAAAAAAAAAAA1AA where W represents either a 0 or 1. Thus, addresses from 000000000000000000000looto 000000000000000000000111 000000000000000000001100 to 000000000000000000001111 000000000000000000010looto 000000000000000000010111 0 a a 000111111111111111111100 to 000111111111111111111111 j.
will be contained in DRAM 34.
Thus DRAM 32 and DRAM 34 will collectively store addresses from 0 to 8 Mbytes, with DRAM 32 storing the first 16 bytes starting at address 0, DRAM 34 storing the next 16 bytes, etc. The memory Is Interleaved, with 5 aftemate groups of 16 contiguous bytes being stored In alternate DRAMs.
Now consider the program discussed above which resides from addresses 000000000000000000000000 to 000011111111111111111111. Due to the Interleaving, this program will now be stored half In DRAM 32 and half in DRAM 34, and will thus have access to both cache 36 and 38.
Although the program remained the same size, the amount of cache available to the program has doubled from 2K to 4K bytes, allowing maximum utilization of the cache memories.
Further, K four memory nodes are attached to bus 33, the switches in the address Interleave logic 52 may be set to swap two bits instead of one, such that a four-way Interleave is achieved. Indeed either two, four, eight, or sixteen-way Interleave may be accomplished by proper setting of the switches In the address Interleave logic 52.
Those skilled in the art will note that many useful interleave combinations are possible. For example, a system with six memory nodes can have a four-way Interleaving of four of the nodes, and two-way Interleaving of the other two nodes.
In the presently preferred embodiment, the switch positions and base address ranges are set for each cache depending on the total number of cache nodes In the system, and are not reset unless the number of nodes Is changed. The number of nodes Is not fixed, so additional memory nodes may be added and the Interleave switches adjusted accordingly. The use of the present Invention's address Interleave logic 52 permits maximum utilization of the cache memories coupled to bus 33, while avoiding many of the problems associated with prior art cache systems.
Accordingly, a cache architecture has been disclosed which utilizes a distributed architecture for improved system performance. Although the present invention has been described With reference to Figures 1-5, it will be appreciated by one skilled in the art that the Figures are for Illustration only, and do not serve as limitations on the invention.
a

Claims (1)

  1. CLAIMS :-4.
    1. A data processing system Including a processor coupled to a plurality of data processing resources over a bus, said data processing system comprising:
    memory means coupled to said bus for storing data, said memory means Including a plurality of data storage locations, each of said storage locations defined by an address; cache memory means coupled between said memory means and said bus for storing data at a plurality of data storage locations having addresses corresponding to storage locations In said memory means, said data including a subset of said data stored in said memory means, said cache memory means including:
    controller means for receiving an address transmitted by said processor gver said bus and determining if said addressed data is stored in said cache memory means, said controller means then accessing said data In said cache memory means thereby avoiding the need to initiate a data access cycle in said memory means; In the event said addressed data is stored only in said memory means, said controller means accessing said data in said memory means; whereby data addressed by said processor which is stored In said cache memory means is accessed without Initiating a data address cycle in said memory means.
    n .4 2. The data processing system as defined by claim 1. wherein said controller means includes bus Interface controller means coupled to sald bus for comparing a first portion of said address provided by said promssor to a range of memory addresses representing the range of memory addresses 5 stored in said memory means coupled to said cache memory means.
    3. The data processing system as defined by claim 2. wherein said cache memory means Includes a cache memory for storing said cache data.
    4. The data processing system as defined by claim 2 further Including tag address control means coupled to said cache memory and said bus for comparing a tag address comprising a second portion of said address is provided by said processor to a plurality of tag address stored In a tag memory coupled to said tag address control means.
    5. The data processing system as defined by claim 4, wherein said 20 tag memory Is a content addressable memory (CAM).
    6. The data processing system as defined by claim 4, wherein each of said tag addresses in said tag memory corresponds to data stored In said 25 cache memory.
    7. The data processing system as defined by claim 5, further Including memory control means coupled to said controller means for generating memory control signals and providing said signals to said memory means upon the receipt of a command signal from said controller means.
    S. The data processing systems as defined by claim 6. wherein said signals generated by said memory control means Include mw address strobe (RAS) and column address strobe (CAS) signals.
    9. The data processing system as defined by claim 7, wherein H said processor issues a memory access command over said bus and said tag address provided by said processor does not correspond to one of said tag addresses stored In said tag memory, said controller initiates a memory access cycle to read the storage location in said memory means Identified by said address provided by said processor, said memory control means providing said RAS and CAS signals to said memory means.
    10. The data processing system as defined by claim 8. wherein If said memory access command Is a READ command, said controller means provides said read data from said memory means to said processor over said bus.
    gi.
    11. The data processing system as defined by claim 8, wherein said tag address comprises multiple bits, at least one of said bits being a modify bit, the state of said bit Indicating whether the data stored at said address location In said cache memory is different than the data stored at the corresponding 5 address in said memory means.
    12. The data processing system as defined by claim 10, wherein said controller means further Includes tag selection means for selecting a tag address stored in said tag memory In accordance with a predetermined method.
    13. The data processing system as defined by claim 11, wherein once said controller means selects a tag address. said controller means is determines if said modify bit Is set to indicate the data in said cache memory is different then the data in said memory means.
    14. The data processing system as defined by claim 12. further Including a temporary data holding register (TIDHR) coupled to said cache memory and said memory means, wherein if said modify bit Is set, data stored in said cache memory at said selected tag address is stored in said TDHR.
    1 15. The data processing system as defined by claim 13, further Including dock means coupled to said cache memory means for providing clock signals to permit the operation of said cache memory means to be synchronized to said clock signals.
    sseippia eel pies pus sseippe pies lo uo!ljod isig pies lo sl!q pau!wjelepeid Buiddems AIGA113eles J01 SUIRew 10AU03 SS9JPPE 6191 Pies PUB SUE9W jelloiluoo eoulielul snq pies ol peldnoo susew emepejul sseippe Bulpnjoul 5Z jeqljnl It, L wlelo Aq paugap ss welsAs Buisseooid elLp eqjL 'L? -seplue piom e;Aq ue%xIs qZ L jo Muc yyd ollels c sespdwoo kowew eqoeo plies ulejeqm 19 L wiLio Aq paugep sle welsAs Suisseooid elep eqjL -0g OE puewwoo sseooe Ajowaw pies oi esuodsei u! josseooid pies ol peplAcud aic peei spiom aidllinw pies jo qo!qm seg!luepi sseippe pies jo sliq luwgludls Iseal pies Ipuewwoo pLej c si puLwwoo sseooL, Ajowew pies 51 jl uiejeqm I. L wfflio Aq peugep ss welsAs Bu!sseooid elep eqjL -6 L spiom eld!nnw ul peei si sucew Ajowew Pics woji peei mlep ulejeqm,9 L wislo Aq peugap ss welsAs du!sseooid elLp aqt SL -sseippe pies jo (9S-1) sMq juiaogluDls weal Ouispdwoo uoqjod pj!ql c sepnlou! je4ljr4 sseippe pies ulejeqm V L wjujo Aq paugap ss welsAs Ouisseooid clup eqjL -/.L sep (nyU) kowew sseoos wopum ollels sespdwooxjc)wew eqow piss uiE)jeqm C wicio Aq peul;ep ss welsk Buisseooid simp eqjL 9L - #3 - 22. The data processing system as defined by claim 20, further Including second memory means coupled to second cache memory means on said bus.
    23. The data processing system as defined by claim 22, wherein said address Interleave means selectively swaps at least one of said bits comprising the four 1 significant bits of said tag address wkh one of said bits of sald first portion of said address.
    24. The data processing system as defined by claim 23. wherein said swapping of said Ms results In alternate groups of contiguous bytes being stored In said first cache memory means and said second cache memory is means.
    %i- 26 - 25. In a data processing system employing a processor coupled to a plurality of data processing devices over a bus, an Improved method for accessing data. comprising the steps of: storing data In memory means coupled to sald bus, sald memory means Including a plurality of data storage locations, each of said storage locations defined by an address; providing cache memory means coupled between affid memory means and add bus for storing data In a cache memory at a plurality of data storage locations having addresses corresponding to storage locations In said memory means. said data Including a subset of said data stored In raid memory means, said cache memory means Including: controller means for receiving an address transmitted by said processor over said bus and determining 9 raid addressed data 1,9 stored In said cache merribry means, said controller means then accessing said is data In said cache memory means thereby avoiding the need to Initiate a data aocesf cycle In said memory means, In the event add addressed data Is stored only In add memory means, sald controller means accessing said data In sald memory means; whereby data addressed by said processor which Is stored In said cache memory means Is accessed without Initiating a data address cycle In said memory means.
    i a 4 1 26. The method as defined by claim 25, wherein said controller means Includes bus Interface controller means coupled to said bus for comparing a first portion of said address provided by said processor to a range of memory addresses representing the range of memory addresses stored In said memory means coupled to said cache memory means.
    27. The method as defined by claim 25, further Including tag address control means coupled to said cache memory and said bus for comparing a tag address cornpdrJng a second portion of said address provided by said processor to a plurality of tag address stored In a tag memory coupled to said tag address control means, each of said tag addresses In said tag memory corresponding to data stored In said cache memory.
    is 28. The method as defined by claim 27, wherein said tag memory Is a content addressable memory (CAM).
    29. The method as defined by claim 27, further Including memory access signal generation means coupled to said controller means for generating row and column address signals and providing said signals to said memory means upon the receipt of a command signal from sald controller means.
    3.0 30. The method as defined by claim 29, wherein said signals generated by said memory control means Include row address strobe (RAS) and column address strobe (CAS) signals.
    - 28 31. The method as defined by claim 29, further including second memory means coupled to second cache memory means on said bus.
    is 32. The method as defined by claim 31, further Including the step of selectively swapping at least one of said bits comprising the four least significant bits of said tag address with one of said bits of said first portion of said address.
    33. The method as defined by claim 32, wherein said swapping of said bits results In alternate groups of contiguous bytes being stored In said first cache memory means and said second cache memory means.
    34. A data processing system including a processor coupled to a plurality of data processing resources over a bus substantially as hereinbefore described with reference to the accompanying drawings.
    35. An improved method for accessing data in a data processing system employing a processor coupled to a plurality of data processing devices over a bus substantially as hereinbefore described with reference to the accompanying drawings.
    Published 1989 at The Patent Office, State House, 68.71 High Holborn, London WCIR 4TP ' Further copies may be obtained frolo The Patent Office. Was Branch, St Mary Cray, Orpington, Kent BR-5 3RD. Printed by Multaplex techniques ltd, St Mary Cray, Kent, Con. 1/87 It
GB8822580A 1988-02-16 1988-09-26 Distributed cache architecture Withdrawn GB2215099A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15738988A 1988-02-16 1988-02-16

Publications (2)

Publication Number Publication Date
GB8822580D0 GB8822580D0 (en) 1988-11-02
GB2215099A true GB2215099A (en) 1989-09-13

Family

ID=22563521

Family Applications (1)

Application Number Title Priority Date Filing Date
GB8822580A Withdrawn GB2215099A (en) 1988-02-16 1988-09-26 Distributed cache architecture

Country Status (5)

Country Link
JP (1) JPH01229345A (en)
AU (1) AU2278688A (en)
DE (1) DE3903066A1 (en)
FR (1) FR2627298A1 (en)
GB (1) GB2215099A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2270781A (en) * 1992-09-16 1994-03-23 Hewlett Packard Co Improved cache system for reducing memory latency times

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2938511B2 (en) * 1990-03-30 1999-08-23 三菱電機株式会社 Semiconductor storage device
US6578110B1 (en) 1999-01-21 2003-06-10 Sony Computer Entertainment, Inc. High-speed processor system and cache memories with processing capabilities
JP4656565B2 (en) * 1999-01-21 2011-03-23 株式会社ソニー・コンピュータエンタテインメント High speed processor system, method and recording medium using the same
DE10151733A1 (en) * 2001-10-19 2003-04-30 Infineon Technologies Ag Processor Memory System

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1532278A (en) * 1975-04-25 1978-11-15 Data General Corp Data processing system and memory module therefor
EP0019358A1 (en) * 1979-05-09 1980-11-26 International Computers Limited Hierarchical data storage system
EP0175080A2 (en) * 1984-09-18 1986-03-26 International Business Machines Corporation Microcomputer memory and method for its operation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1168377A (en) * 1980-04-25 1984-05-29 Michael L. Ziegler Data processing system having a memory system which utilizes a cache memory and unique pipelining techniques for providing access thereto
US4646237A (en) * 1983-12-05 1987-02-24 Ncr Corporation Data handling system for handling data transfers between a cache memory and a main memory

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1532278A (en) * 1975-04-25 1978-11-15 Data General Corp Data processing system and memory module therefor
EP0019358A1 (en) * 1979-05-09 1980-11-26 International Computers Limited Hierarchical data storage system
EP0175080A2 (en) * 1984-09-18 1986-03-26 International Business Machines Corporation Microcomputer memory and method for its operation

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2270781A (en) * 1992-09-16 1994-03-23 Hewlett Packard Co Improved cache system for reducing memory latency times
US5404484A (en) * 1992-09-16 1995-04-04 Hewlett-Packard Company Cache system for reducing memory latency times
GB2270781B (en) * 1992-09-16 1996-07-10 Hewlett Packard Co Improved cache system for reducing memory latency times

Also Published As

Publication number Publication date
FR2627298A1 (en) 1989-08-18
GB8822580D0 (en) 1988-11-02
DE3903066A1 (en) 1989-08-24
JPH01229345A (en) 1989-09-13
AU2278688A (en) 1989-08-17

Similar Documents

Publication Publication Date Title
EP0407119B1 (en) Apparatus and method for reading, writing and refreshing memory with direct virtual or physical access
US5640534A (en) Method and system for concurrent access in a data cache array utilizing multiple match line selection paths
US4577293A (en) Distributed, on-chip cache
US6327642B1 (en) Parallel access virtual channel memory system
JP3169155B2 (en) Circuit for caching information
KR920005280B1 (en) High speed cache system
US5666494A (en) Queue management mechanism which allows entries to be processed in any order
US4685082A (en) Simplified cache with automatic update
US6493812B1 (en) Apparatus and method for virtual address aliasing and multiple page size support in a computer system having a prevalidated cache
US6708254B2 (en) Parallel access virtual channel memory system
EP0838057B1 (en) Memory controller which executes write commands out of order
JPS624745B2 (en)
US5278967A (en) System for providing gapless data transfer from page-mode dynamic random access memories
US6219765B1 (en) Memory paging control apparatus
EP0708404A2 (en) Interleaved data cache array having multiple content addressable fields per cache line
EP0706131A2 (en) Method and system for efficient miss sequence cache line allocation
WO2002025447A2 (en) Cache dynamically configured for simultaneous accesses by multiple computing engines
EP0706132A2 (en) Method and system for miss sequence handling in a data cache array having multiple content addressable fields per cache line
US5813030A (en) Cache memory system with simultaneous access of cache and main memories
GB2215099A (en) Distributed cache architecture
EP0535701A1 (en) Architecture and method for combining static cache memory and dynamic main memory on the same chip (CDRAM)
US5434990A (en) Method for serially or concurrently addressing n individually addressable memories each having an address latch and data latch
US5577228A (en) Digital circuit for performing multicycle addressing in a digital memory
JPH01163852A (en) Subsystem in data processing system and its operation method
JP3967921B2 (en) Data processing apparatus and data processing system

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)