GB2037466A - Computer with cache memory - Google Patents

Computer with cache memory Download PDF

Info

Publication number
GB2037466A
GB2037466A GB7941909A GB7941909A GB2037466A GB 2037466 A GB2037466 A GB 2037466A GB 7941909 A GB7941909 A GB 7941909A GB 7941909 A GB7941909 A GB 7941909A GB 2037466 A GB2037466 A GB 2037466A
Authority
GB
United Kingdom
Prior art keywords
cache
directory
store
block
levels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB7941909A
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bull HN Information Systems Italia SpA
Bull HN Information Systems Inc
Original Assignee
Honeywell Information Systems Italia SpA
Honeywell Information Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honeywell Information Systems Italia SpA, Honeywell Information Systems Inc filed Critical Honeywell Information Systems Italia SpA
Publication of GB2037466A publication Critical patent/GB2037466A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/0822Copy directories

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A processor has a cache store 26, of 512 locations each including 4 levels each of which stores 1 block, each level of each location having a respective full/empty flip- flop associated with it in control logic 38. A primary directory has 512 columns each of 4 levels to store the section addresses of the blocks in the cache store 26 - each section of main memory contains 512 blocks. An absolute address of section address + block address within the section has its block address used to look up the 4 section addresses in the corresponding column of the directory; a match with the absolute section address indicates the required block is in cache and identifies its level. A duplicate directory 52 duplicates the contents of primary directory 32. Writes from other sources have their addresses checked against the duplicate directory. If there is a match, the corresponding full/empty flip-flop associated with the primary directory and cache is cleared, so that the block which has been changed by the write is effectively removed from the cache system. <IMAGE>

Description

SPECIFICATION Computer with cache memory This invention relates generally to electronic digital data processing systems and, more particularly, to processors which incorporate a cache memory store having a selective block clearing apparatus.
A cache memory store is small, easily accessible memory store located within a processor of a data processing system. The cache store is generally a fraction of the size of a main memory but permits quicker access to data stored in the cache than data stored in the main memory store. Operands and instructions, hereafter generically referred to as data words, are fetched from the main memory and stored in the cache store. The processor then accesses the cache store first for the required data words. Data words, prior to storing in the cache, are obtained from a first access of the processor to the main memory. These accessed data words are used by the processor and stored at the same time in the cache.
The efficiency of the cache store depends upon the use, by the processor, of the same data words a multiple of times during the same program. There are times, such as when an entire program is completed, when the best usage of the cache store requires clearing the cache store completely.
A clearing technique often employed for a cache store is to clear the entire cache on all external interrupt operations. These interrupt operations indicated the possibility that data in the main memory store has been changed by the operations of the input/output controller and, therefore, the data in the cache store may be incorrect. This type of clearing is very mechanical and results in a considerable amount of unnecessary clearing of data information from the cache store. This is true because only a fraction of the I/O operation is input oriented which would change the backing store or main memory store, and only a small portion of the time would this data be resident in the cache store. Since the clearing action clears the entire cache, much of the data information is lost and must be retrieved again on a subsequent access of the main memory store.
Some data processing systems employ segmentation and paging for accessing the memory store. A segment is divided into smaller sections generally referred to as pages which commonly contain about a thousand data words. Since the number of data words in a page is generally smaller than the size of the cache store, a selective clearing operation has been developed wherein the page alone may be cleared corresponding to data information which is no longer needed. This is accomplished by addressing each level of an associated tag directory to the cache store.
The columns of each level are compared to the page address and if a comparison is signalled, that column of the addressed level is cleared by clearing the flag indicating a full status of the column in the addressed level.
While this operation represents an improvement over that which requires clearing of the entire cache, the degree of selectivity in the clearing operation is only slightly improved.
The object of the present invention is therefore to provide an improved method of clearing a cache store.
Accordingly the present invention provides a a multi-processor data processing system wherein each processor includes a cache system comprising: a cache store of a plurality of locations each divided into a plurality of levels, each level of each location being capable of storing a block of information; primary and duplicate directories each of a plurality of columns corresponding to the locations of the cache store and divided into a plurality of levels corresponding to the levels of the cache store; a plurality of full/empty flip-flops corresponding to the levels and locations of the cache store; means for searching the primary directory to determine whether a block of information to be accessed by the processor is stored in the cache store;; means for searching the duplicate directory to determine whether a write notification from outside the processor corresponds to a block of information stored in the cache store; and means operative on a match in the duplicate directory to clear the corresponding full/ empty flip-flop.
A multi-processor data processing system embodying the invention will now be described, by way of example, with reference to the accompanying drawings, in which: Figure 1 is a block diagram of a multiprocessor data processing system; Figure 2 is a block diagram illustrating the cache section and duplicate directory section in more detail; Figure 3 illustrates the organization of the directory; and Figure 4 illustrates various address formats.
INTRODUCTORY SUMMARY The present system includes a plurality of processors, write command collection logic in each processor, a duplicate image directory of a primary directory in each processor and means to filter each write command through each duplicate directory in order to determine if accessed data is stored in the cache store thus resulting in a match. If a match is detected, the primary directory is notified to clear its entry for that write command. In this multi-processor system, each processor will perform all writes and clears through its associated cache store to memory. Since all data changes resulting from write or clear commands will be recorded in cache on the way by, the block clearing for the host processor is minimized.The cache of each processor consists of 512 locations each divided into 4 levels, and the associated directory has 512 columns each divided into 4 levels and containing the section addresses of the blocks in the corresponding levels of the corresponding locations in the cache. Thus each cache can store 4 X 512 blocks (of 4 words each).
A full/empty flip-flop is associated with each level of each block, and a duplicate directory contains an image of the primary directory. The duplicate directory checks all write notifications from each on line system controller for a match with its contents. If a match occurs, the write address will be passed to the primary directory to reset the corresponding full/empty flip-flop and thus in effect clear that block from the cache.
DETAILED DESCRIPTION Fig. 1 is a block diagram of a multi-processor data processing system employing the present cache clearing apparatus. The system shown includes two processors 2 and 4 and a system controller 6 which controls access to a main memory store 8 and controls communication with a set of peripherals through I/O controller 10. The system controller is connected to all of the processors in the system to enable communications among the processors, the peripherals, and the main memory store. Each processor has access to the main store 8 and via a controlling gate 12 to an operating system module 14 and communication table 16 in main store 8. Gate 12 controls the access of the communicating devices such as the processors into the operating system module 14 and communication table 16 in known manner.
Each processor of the multi-processor system, for example processor 2, includes an operations unit 1 8 performing arithmetic and logic functions on operands fetched from the main memory store 8 in accordance with instructions likewise fetched from the main memory 8. The interface functions of processor 2, including preparation of absolute data addresses, are performed by communication control unit 20. Each processor includes a cache store and associated control logic shown in processor 1 as cache section 22.
Associated with each cache section is a duplicate directory section 24.
Fig. 2 is a block diagram of cache section 22, duplicate directory section 24 and a portion of the communication control unit 20.
The cache store 26 provides fast access to blocks of data previously retrieved from the main store. The cache is operated parallel with other processor functions. The arrangement shown in Fig. 2 includes a cache store 26, an input memory bus ZM switch 28, an output memory bus ZD switch 30, a primary directory 32 and associated compare network 34, an interrupt generator 36, control logic 38, control switch 40, a cache address latch 42, a write notification buffer 44 and associated stack 46, a write buffer 48 and associated stack 50, a duplicate directory 52 and associated compare network 54, and a clear cache stack 56.
During main memory store fetch cycles, the data information is distributed for usage by the processor while at the same time the ZM switch 28 is enabled to allow storage into cache store 26. On subsequent processor cycles, the cache store 26 is checked at the same time that a fetch from the main store 8 is being readied. If the data needed is already in the cache store 26, the fetch from the main memory store is aborted. A cache read cycle is enabled by disabling ZM switch 28 and enabling ZD switch 30 to transfer the data information from the cache store 26 directly to the processor.
The primary directory 32 identifies the data stored in the cache store 26. Tag words are stored in the directory 32 to reflect the absolute address of each data block. The relationships and operation of the main store, the directory, and the cache store are illustrated by Figs. 3 and 4.
The cache store 26 consists of 512 locations. Each location is divided into 4 levels and each level in each location can store a single block. A block consists of 4 words, but this fact is not relevant to the operation of the cache system, and likewise the fact that the number of words per block is the same as the number of levels per location is mere coincidence.
The directory 32 consists of 512 columns (shown horizontally in Fig. 3), corresponding to the 512 locations of the cache store 26, each divided into 4 levels corresponding to the 4 levels of each location in the cache store 26.
The main memory has its contents organized into sections (or pages), each of which contains 512 blocks. A main memory or absolute address is received through control switch 40 and consists of 24 bits (Fig. 4); the first 13 bits 0 to 12 identify the section number, the next 9 bits 13 to 21 identify the block number within the section, and the last 2 bits 22 and 23 identify the word number within the block.
Considering a single section, say section 0, any particular block of the 512 blocks within this section can be stored in the corresponding location of the 512 locations within the cache store. Typically, several of the 512 blocks within the section will be stored in the cache store at any given time. The same is true of other sections; blocks from them will also be stored in the corresponding locations in the cache store. Each location in the cache store has 4 levels, and can therefore store up to 4 blocks from different sections.
The columns and levels of the directory are used to store the section numbers of the blocks in the corresponding locations and levels of the cache store. Thus the sections of main memory are mapped into the cache store and the directory, as shown in Fig. 3.
When a word is to be accessed, the block portion of its absolute address is used as an address (BLOCK TAGS) applied to the directory. This reads the 4 section numbers (as four sets of tag store contents, TAG STORE, in parallel). These are compared with the section number (TAG) of the absolute address by the compare circuitry 34. A failure to match means that the word is not in cache store and must be fetched from main memory. A successful match between the TAG from the absolute address and one of the four TAG STORE outputs from the directory generates a match signal to interrupt generator 36, indicating that a retrieval of data from the main memory is not required, together with a 2-bit level signal indicating the level of the matching TAG STORE output.This level signal LEVEL is combined with the block address portion BLOCK and the word address portion WORD from the absolute address to form a cache store address, which is used to access the required block from the cache store and then select the required word from that block.
The control logic 38 includes a full/empty flip-flop for each level of each location/column of the cache, and a round robin counter for each of the 512 locations/columns to determine which level a new block should be written into. The loading algorithm is as follows: If an entire location/column is empty, then the levels in that location/column will be loaded sequentially. If the levels in that location/column are randomly empty, they will be loaded by selecting the empty levels sequentially. Finally, if all of the levels in a particular location/column are full, they will be loaded with new information in accordance with a sequential round robin count.
A system of this type is described in greater detail in our copending Application No.
78.47553, Specification No. 2 009 982 A.
All cache cycles start with a generation of a strobe address register (SAR) signal and a strobe interrupt (SIN) signal generated by interrupt generator 36. Both of these signals are applied to control logic 38 along with bits 11, 12, 22 and 23 of the absolute address from control switch 40. A first output from control logic 38 is applied back to control switch 40 in order to pass the absolute address from the central processor through in the absence of a clear. A second output of control logic 38 is a level select signal which is applied to primary directory 32 and to control switch 40 for subsequent delivery to duplicate directory 52.
The output of interrupt generator 36 is likewise applied to cache address latch 42 to indicate that information is to be received from the cache store 26 as a result of a match being detected. If the duplicate directory indicates that the cache memory is current for one of the four associations made from the primary directory, then bits 13-23 and additional level indicating bits are applied to the cache store 26 to access it from the cache address latch 42.
Duplicate directory 52 contains an image of primary directory 32. The function of this duplicate directory is to check all write notifications from each "on-line" system controller for a match against its contents. If a match occurs, the full/empty bit will be reset and the write address passed to the primary directory to mark the corresponding entry empty.
The address passed to the primary directory is put in a 4-deep register 56 the output of which is applied to an input of control switch 40.
All write notifications are first passed through buffer 44 into a write notification stack 46. Bits 13-22 of the write notification will access the duplicate directory 52. As was the case with primary directory 32, bits 0-12 of the accessed column (4 levels) are applied to comparator 54 along with bits 0-12 of the write notification. If a match is detected, a signal is applied from comparator 54 into stack 56 and bits 13-21 are stored therein.
In a write mode, bits 0-21 are passed from control switch 40 into buffer 48 and thereafter write stack 50. As was the case previously, bits 0-12 will be stored in the duplicate directory at a location defined by bits 13-22.
If a match in the duplicate directory is detected as a result of a write notification, a P~ busy signal is applied to control logic 38 which causes control switch 40 to pass the contents of stack 56 to control switch 40, thus updating the primary directory 32.
It is necessary to clear the cache store 26 whn certain conditions exist which may not leave the cache store 26 with the correct current image of backing store. Cache may be cleared in one of two ways. First, all of the full/empty bits may be reset. Second, only the full/empty bit associated with a particular block may be reset. A block in the cache store 26 will be cleared if a write notification is received and matches the cache entry. This is accomplished, as is described above, by a search of the duplicate directory 52.
While only one buffer 44 and stack 46 are shown for receiving a write notification, there are four such buffer and stack arrangements in actuality. A write notification is received in one of the four buffers and an input counter is strobed up by one count. This will generate a difference with the contents of an output counter and this difference is sent to a priority network. This priority network will start a duplicate directory cycle.
Write notifications are received and placed on a duplicate directory input bus for a lookup search in the duplicate directory. If a match occurs, the write notification is passed to stack 56 and the associated full/empty bit is reset. If no match is obtained, the write notification is discarded. Thus, the same information is applied to both the duplicate directory and to the primary directory.
The output buffer 56 is a source of write notification data for primary directory 32 and supplies a signal indicating that a clear cycle is required. The SAR signal to the port control will sample a not-empty line and intercept the next interrupt being generated. The SIN signal will be delayed long enough to empty the buffer and then released to generate the needed store request.

Claims (6)

1. A multiprocessor data processing system wherein each processor includes a cache system comprising: a cache store of a plurality of locations each divided into a plurality of levels, each level of each location being capable of storing a block of information; primary and duplicate directories each of a plurality of columns corresponding to the locations of the cache store and divided into a plurality of levels corresponding to the levels of the cache store; a plurality of full/empty flip-flops corresponding to the levels and locations of the cache store; means for searching the primary directory to determine whether a block of information to be accessed by the processor is stored in the cache store; means for searching the duplicate directory to determine whether a write notification from outside the processor corresponds to a block of information stored in the cache store; and means operative on a match in the duplicate directory to clear the corresponding full/empty flip-flop.
2. A system according to Claim 1, wherein the full address of each block of information is divided into high order and low order portions, the low order portion being used to access a column in a directory and a location in the cache store, and the high order portion being used for storage in and matching with the contents of a directory.
3. A system according to Claim 2, wherein each directory has comparing means associated with it for comparing the high order portion of the full address of a block of information to be accessed by the processor or notified as a write notification with the contents of each of the levels of an accessed column.
4. A system according to any previous claim including means for storing a plurality of write notifications.
5. A system according to any previous claim including means for storing a plurality of clear signals from the duplicate directory searching means for the full/empty flip4lops.
6. A data processing system substantially as herein described and illustrated.
GB7941909A 1978-12-11 1979-12-05 Computer with cache memory Pending GB2037466A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US96822378A 1978-12-11 1978-12-11

Publications (1)

Publication Number Publication Date
GB2037466A true GB2037466A (en) 1980-07-09

Family

ID=25513931

Family Applications (1)

Application Number Title Priority Date Filing Date
GB7941909A Pending GB2037466A (en) 1978-12-11 1979-12-05 Computer with cache memory

Country Status (5)

Country Link
JP (1) JPS5580875A (en)
AU (1) AU538678B2 (en)
DE (1) DE2947115A1 (en)
FR (1) FR2444299A1 (en)
GB (1) GB2037466A (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1159153A (en) * 1979-12-14 1983-12-20 Charles P. Ryan Apparatus for cache clearing
US4322795A (en) * 1980-01-24 1982-03-30 Honeywell Information Systems Inc. Cache memory utilizing selective clearing and least recently used updating
US4399506A (en) * 1980-10-06 1983-08-16 International Business Machines Corporation Store-in-cache processor means for clearing main storage
EP0212678B1 (en) * 1980-11-10 1990-05-16 International Business Machines Corporation Cache storage synonym detection and handling means
DE3138972A1 (en) * 1981-09-30 1983-04-14 Siemens AG, 1000 Berlin und 8000 München ONCHIP MICROPROCESSORCHACHE MEMORY SYSTEM AND METHOD FOR ITS OPERATION

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3723976A (en) * 1972-01-20 1973-03-27 Ibm Memory system with logical and real addressing
JPS5440182B2 (en) * 1974-02-26 1979-12-01
US3979726A (en) * 1974-04-10 1976-09-07 Honeywell Information Systems, Inc. Apparatus for selectively clearing a cache store in a processor having segmentation and paging

Also Published As

Publication number Publication date
FR2444299A1 (en) 1980-07-11
JPS5580875A (en) 1980-06-18
AU5325179A (en) 1980-06-19
DE2947115A1 (en) 1980-06-26
AU538678B2 (en) 1984-08-23

Similar Documents

Publication Publication Date Title
US4493026A (en) Set associative sector cache
US4831520A (en) Bus interface circuit for digital data processor
KR100204741B1 (en) Method to increase performance in a multi-level cache system by the use of forced cache misses
CA1300280C (en) Central processor unit for digital data processing system including write buffer management mechanism
EP0090575A2 (en) Memory system
EP0303648B1 (en) Central processor unit for digital data processing system including cache management mechanism
US6571316B1 (en) Cache memory array for multiple address spaces
US6332179B1 (en) Allocation for back-to-back misses in a directory based cache
US5091845A (en) System for controlling the storage of information in a cache memory
US5119484A (en) Selections between alternate control word and current instruction generated control word for alu in respond to alu output and current instruction
US5293622A (en) Computer system with input/output cache
US5479629A (en) Method and apparatus for translation request buffer and requestor table for minimizing the number of accesses to the same address
US4445191A (en) Data word handling enhancement in a page oriented named-data hierarchical memory system
JP4047281B2 (en) How to synchronize cache memory with main memory
EP0173909A2 (en) Look-aside buffer least recently used marker controller
GB2037466A (en) Computer with cache memory
US4737908A (en) Buffer memory control system
US4424564A (en) Data processing system providing dual storage of reference bits
US5276892A (en) Destination control logic for arithmetic and logic unit for digital data processor
JPS59112479A (en) High speed access system of cache memory
EP0302926B1 (en) Control signal generation circuit for arithmetic and logic unit for digital processor
JP3061818B2 (en) Access monitor device for microprocessor
EP0418220B1 (en) Destination control logic for arithmetic and logic unit for digital data processor
EP0303664A1 (en) Central processor unit for digital data processing system including virtual to physical address translation circuit
JPH0548498B2 (en)