CA1283222C

CA1283222C - Microprocessor having separate instruction and data interfaces

Info

Publication number: CA1283222C
Application number: CA000612701A
Authority: CA
Inventors: Howard Gene Sachs; Walter H. Hollingsworth; James Y. Cho
Original assignee: Intergraph Corp
Current assignee: Intergraph Corp
Priority date: 1989-09-22
Filing date: 1989-09-22
Publication date: 1991-04-16
Anticipated expiration: 2008-04-16

Abstract

ABSTRACT
A microprocessor architecture is disclosed having separate very high speed instruction and data interface circutry for coupling via respective separate very high speed instruction and data interface buses to respective external instruction cache and data cache circuitry. The microprocessor is comprised of an instruction interface, a data interface, and an execution unit. The instruction interface controls communications with the external instruction cache and coupled the instructions from the instruction cache to the micro-processor at very high speed. The data interface controls communications with the external data cache and communicates data bi-directionally at very high speed between the data cache and the microprocessor. The execution unit selectively processes the data received via the data interface from the data cache responsive to the execution unit decoding and executing a respective one of the instructions received via the instruction interface from the instruction cache. In one embodiment, the external instruction cache is comprised of a program counter and addressable memory for outputting stored instructions responsive to its program counter and to an instruction cache advance signal output from the instruc-tion interface. An address generator in the instruction interface selectively outputs an initial instruction address for storage in the instruction cache program counter resonsive to a context switch or branch, such that the instruction interface repetitively couples a plurality of instructions from the instruction cache to the microprocessor responsive to the cache advance signal, independent of and without the need for any intermediate or further address output from the instruc-tion interface to the instruction cache except upon the occurrence of another context switch or branch.

Description

MICROPROCESSOR HAVING

This invention relates to computer systems and more particularly to a mlcropro¢essor architecture having ~eparate instruction inter~ace and data interface circuitry for coupling via eparate instruction and data interface buse~ to respective external instruction cache and data cache clrcuitry.
Prior microprocessor designs have primarily followed a Von Neuman architecture, or some derivative thereof~ Recently, microprocessor designs have evolved which have the capability for $nterfacing ~ith a single external cache memory with controller. This ~ingle cache is coupled via a single interface bus to provide both instructionq and data to the mlcroprocessor. The memory cycle for that qystem consi~ted o~ an address output from the microprocecsor to the cache controller, which deter-mined whether or not the requested addre~s ls present ln the cache. If pre ent, the cache provided an output of that word. I~ the requested addres~ did not correspond to data present in the cache, the cache then indicated a miss, and circuitry somewhere in the system provided for an acces~ to main memory to provlde the necessary data ~or loading either directly to the proces~or or for loading the data into the cache. Thus, the microproces-sor maintained it~ own program counter which controls access requests to main memory and to the cache memory whi¢h responded to each addreqs to provide the requested data if present.

- 2 -SUMMARY
-In accordance with the present invention, a microprocessor i5 provided which has mutually exclu~ive a~d independently operable Reparate data and in~truction cache interrace~. This provides for very high-speed instruction transfer from a dedicated in~truction cache to the proces~or via a speoial dedicated in~truction bu~, and capability for ~imultaneouq high-speed tran~fer from the data cache to the mioroproces~or via a speoial dedi-cated high-~peed data bus. The data cache and instruc-tion cache each have a separate dedicated ~y~tem bus interface for coupling ts a sy~tem bu of moderate peed relative to the data and in~truction bu~es, which system bus alco couples to the main memory, and to other peri-pheral devices coupled to the sy tem bu~.
The microproce~or 1~ comprised of an in3truc-tion interface, a data interface and an execution unit.
The instruction interface controls communications with the external in~truction oache and couple~ the instruc-tions from the instruction cache to the microprocessor.
The data interface controls communications with the external data cache and oommunicates data bi-direction-ally between the data cache and the microproces~or. The execution unit selectively proce~ses the data received via the data interface rrOm the data cache responsive to the execution unit decodlng and executing a re~pective instruction received via the instruction interface from the instruction cache.
In a preferred embodiment, the external instruction cache is oompri~ed of a program counter and an addressable memory for outputting stored instructions responsive to it3 program counter. The in~truction interface of the ~icroproces~or is further comprised Or an address generator for Qelectively outputting an initial lnstruction address for ~torage in the instruction ¢achè program counter responsive to a context ~witch5 branch or program initialization activity. The

3 ~ 64157-244D

inStruGtiOn interface further includes means for outputting a cache advance signal for selectively enabling incrementing of the instruction cache program counter, except during such context switch or branch. Thus, the instruction interface repetitively couples a plurality of instructions from the instruction cache to the microprocessor responsive to the cache advance signal, independent of and without the need for any intermediate or further address output from the instruction interface to the in-struction cache, except upon the occurrence of another context switch.
In accordance with the present invention there is provided a method for communicating instructions between an instruction cache and a processor comprising the steps of:
(a) storing a first address value in a counter;
(b) addressing the instruction cache with the value stored in the counter;
(c) communicating said instruction stored in the addres-sed location of the instruction cache to a multistage instruction buffer;
(d) serially communicating the instructions stored in the instruction buffer to the processor;
(e) generating a cache advance signal when a stage in the instruction buffer is empty;
(f) independently incrementing the counter in response to the cache advance signal;
(g) repeating steps (b) through (f) until the occurrence ;3'~

- 3a - 64157-244D

of either one of a context switch or a braneh.
In accordance with the present invention there is further provided a microprocessor comprising: an addressable cache memory; execution means for processing digital information received from the cache memory; and interface means, coupled to the cache memory and to the execution means, for retrieving digital information from the cache memory and for communicating the retrieved digital information to the execution means, the inter-faee means co~prising: a counter coupled to the memory, a value stored in the counter being used for addressing the eache memory;
address communicating means coupled to the counter for communicat-ing an address from the execution means to the counter; address storing means, coupled to the counter and to the address communi-cating means, for storing the address in the counter; and incrementing means coupled to the counter for selectively incre-menting the initial address stored in the counter; wherein the address eommunicating means operates independently of the incrementing means.
Brief Description of the Drawings These and other features and advantages of the present invention will become apparent from the following detailed description of the drawings, wherein:
Figure 1 illustrates a block diagram of a micro-processor-based dual cache/dual bus system architecture in accordance with the present i~vention;
Figure 2 is a block diagram of the instruetion ~3;~

- 3b - 64157-244D

interface of Figure l;
Figure 3 is a more detailed block diagram of the instruction decoder 120 of the instruction interface 100 of Figure 2;
Figure 4 is an electrical diagram illustrating the instruction cache/processor bus, the data cache/processor bus, and the system bus dual bus/dual cache system of Figure l;
Figure 5 illustrates the system bus to cache interface of Figure 4 in greater detail;
Figure 6 is an electrical diagram illustrating the drivers/receivers between the instruction cache-MMU and thQ
system bus;
Figures 7A-C illustrate the virtual memory, real memory, and virtual address concepts as utilized with the present invention;
Figure 8 illustrates an electrical block diagram of a cache memory management unit;

' .

3~ J

FIG. 9 i~ a detailed block diagram of the cache-memory management unit of FIG. 8;
FIGS. 10A-B illustrate the storage structure within the cache memory sub~ystem 320;
FI~S. 1lA-B illustrate the TLB me~ory 3ubsystem 350 storage ~tructure ln greater detall;
FIG. 12 illustrateQ the cache memory quad word boundary organlzation;
FIG. 13 illustrates the hardwired virtual to real translationQ provided by the TLB ~ubsystem;
FIG. 14 illuqtrate~ the cache memory ~ubsy3tem and affiliated cache-MMU architecture which ~upport the quad word boundary utilizing line registers and line boundary registers;
FIG. 15 illustrates the load timing for the cache-MMU systems 120 and 130 of FIGo 1;
FIG. 16 illu~trates the store operation for the cache-MMU system~ 120 and 130 of FIG. 1, for Qtorage from the CPU to the cache-MMU in copyback mode, and for storage from the CPU to the cache-MMU and the ma~n memory for the write-through mode of operation;
FIGS. 17A-B illuatrate the data flow of operations between the CPU and the cache-MMV and the main memory;
FIG. 18 illustrates the data flow and state flow interaction of the CPU, cache memory subsy~tem, and TLB memory subsy~tem;
FIG. 19 illustrate~ the data flow and operation of the DAT and TLB subsystem~ in performing addreQs translation and data store and load operation~;
FIG. 20 illustrates a block diagram of the cache-MMU sy~tem, includ~ng bus interface Qtructures internal to the cache-MMU;
FIG. 21 is a more detailed electrical block dlagra~ of FIG. 20; and FIG~ 22 is a detailed electrical block dlagram o~ the control logic microengine 650 of FIG. 21.

Re~erring to FIG. 1, a sy~tem embodiment of the present invention is illustrated. A central processing un't 110 i coupled via separate and lndependent very high-speed cache/proce~or bu3es, an in tru¢t~on bu~ 121 and a dat~ bu3 131, coupling to an instruction cache-memory management unit 120 and a data cache-memory management unit 130, respectively. Additionally, a system ~tatus bu~ 115 i~ coupled from the CPU 110 to each of the in~truction cache memory management unit 120 and data cache memory management unit 130. Each of the in~truotion cache memory management unit 120 and data cache memory management unit 130 has a separate interface for coupllng to a system bu~ 141. A main memory 140 contains the primary core storage for the ~ystem, and may be compri~ed of dynamic RAM, static RAM, or other medium to high-speed read-write memory. The caches 120 and 130 each couple to the main memory 140 via the system bus 141.
~ dditionally, other qystems elements can be coupled to the system bu~ 141, sueh as an I~O processing unlt, IOP 150, which couple~ the sy~tem bu~ 141 to the I/O bus 151 Por the respective IOP 150. The I/O bus 151 can either ~e a standard bus interface, such as Ethernet, Unibus, VMEbus, Multibus, or the I/O bus 151 can couple to the secondary storage or other peripheral devices, such a~ hard di~ks, rloppy disks, printers, etc.
Multiple IOPs can be coupled to khe system bus 141. The IOP 150 can communicate with the main memory 140 via the sy~tem bus 141.
The CPU 110 19 al~o coupled via interrupt lines 111 to an interrupt controller 170. Each of the unitq contending ~or interrupt priority to the CPU has separate interrupt lines ~eoupled into the interrupt controller 170. As illu~trated ln FIG. 1, the main memory 140 ha~

33;~

an interrupt output I1, 145, and the IOP 150 ha3 an interrupt output 155 labelled 12. TheQe interrupt~ I1, 145, and 12, 155, are coupled to the interrupt controller 170 which prloritize~ and arbitrate~ prlority of interrupt reque~ts to the CPU 110. The CPU 110 can be comprl~ed o~ multiple parallel CPU~, or may be a ~ingle CPU. In the event Or multiple CPU~, prioritlzatlon and re~olution of interrupt request~ i~ handled by the interrupt controller 170 in con~unction with the 3ignal control lines 111 rrom the CPU 110 to the interrupt oontroller 170.
A ~ystem clock 160 provides a master clock MCLK
to the CPU 110 instruction cache-memory management unit 120 and data cache-memory management unit 130 for synchronizing internal operationQ therein and operations therebetween. In addition, a bu~ clock BCLK output from the ~ystem clock 160, provides bus synchronization signal~ for trans~er~ via the ~yqtem bus 141, and i~
coupled to all sy~tem element~ coupled to the Qy~tem bus 141. This includes the instruction cache MMU 120, the data cache-MMU 130, the main memory 140, the IOP 150, and any other system element~ which couple to the syRtem buQ
141. Where multiple device~ reque~t acce~s to the system bus 141 at the same time, a bus arbitration unit 180 is coupled to the devices which are coupled to the system bu~ 141. The bus arbiter ha~ separate couplings to each of the potent~al bus mast~r~ which ¢ouple to the system bus 141. The buQ arblter 1~0 utillzes a handshake scheme9 and prioritizes acces~ to the 3y~tem buq 141.
The bu~ arbitration unit 180 controls and avoid~
colli~ions on the Qy~tem bus 141, and generally arbitrateq use of the ~y~tem bus 141.
The proceqsor 110 include~ cache interfaces providing mutually exclusive and independently operable dual-cache interface sy~tem~ compri~ing an instruction interface coupled`to bus 121 and a data interface coupled to bus 131. The in~truction interface control~

J

communications with the external in truction cache-MMU
120 and provide~ for the coupllng of instructions from the in3truction cache-MMU 120 to the proceqsor 110. The data interface provide~ control of communioations with the external data cache-MMU 130 and control~ bi-directional communicatior of data between the proces~or 10 and the data cache-MMU 130. The execution unit Or the proce sor is ooupled to the in~truction interface and the data interface of~ the proces~or. The execution unit provide for the selective proces~lng of data received from the data cache-MMU responsive to decoding and executing a re~pective one or more of the instruction~
received from the in~truction cache-MMU 1200 The instruction interfaoe couples to the instructlon cache-MMU 120 via a very high-speed instruction cache-MMU bus 121. The data interface couple~ to the data cache-MMU
130 via a very high-~peed data bu~ 131. The in~truction interface and data interface provide the capability for Yery high speed tran-~fer of instructions from the in~truction cache-MMU 120 to the proce3sor 110, and for simultaneous independent tran~fer of data between the data cache-MMU 130 and the proce~sor 110.
The data cache-MMU 130 and in~truction cache-MMU 120 each have a re~pective ~econd bus interface for coupling to a main ~yqtem bus 141 for coupling therefrom to a main memory 145, which i~ a very large, relatively 910w memory. The system bus 141 i~ of moderately high-~peed, but is slow relative to the data bus 131 or instruction bu~ 121. The ~ystem bu~ 141 also provides ~ean~ for coupling of other circuits and per~pheral device~ into the microproces~or system architecture.
The instruction and data interface of the proce~sor 110 provide nece~Qary control, timing, and buffering logic to co~pletely control the interface and data transfer proce~s between the processor 110 and the re~pective cach~ 120 and 130. Similarly, the in3truction eache-MMU 120 and data cache-MMU 130 have ~ ~ ~3~ ~ J

neces~ary control and bu~fering circuitry to allow ror interface to the prooessor 110 vla the re~pectlve ~nstruction intçrface and data interfaQe. The in~truction cache-MMU 120 and data cache-MMU 130 al30 each have nece~sary oontrol and bufrering cireuitry to provide for interface with and memory management of the maln memory 140 via the system bus 141. Functionally, the in truction cache-MMU 120 and instruction interface provide a separate and independent subsy3tem from the data cache-MMU 130 and data interface. The in~truction cache-MMU 120 acces~e~ main memory 140 directly and independently from the data ¢ache-MMU 130 operation , and vice versa.
Referring to FIG. 2, the processor 110 of FIG.
1 i~ shown in rurther detail. As illu trated in FIG. 2, the proce ~or 110 is further comprised of an instructlon register 112, an in~truction decoder 113 and an execution unit 114. The inQtruction regi~ter 112 prov1des means for ~toring and outputting ir~structions reoeived from the instruction cache-MMU 120 Yia the instruction bus 121 and to the in~truction interface of the processor 110. The output from the instruction register 112 ~s ~oupled to the instructlon deooder 113. The inQtruction deooder 113 provides mean~ for outputting operation Qelection signals respon~ive to decoding the in~truction output received from the instruction reglster 112. The output operation selection signals from the instruction decoder 113 are coupled to the execution unit 114. The execution unit 114 provide~ ~eans for proce~sing selected data received from the data cache-MMU 130 via the data interface of the pro¢eqsor 110 and the data bu~ 131, respon~ive to the operation selection ~ignals received from the instruction decoder 113~
In a preferred embodiment, the processor 110 provides for plpelined operation. AS illu~trated in FIG.
2, there are five stage~ of pipelined operations, the iostruction reg~ter 1129 stage C in the instruction J
33;~

decoder 113, and ~tage~ D, E, F, re~pecti~ely, in the executlon unit 114. Thu~, multiple operation~ can be performed respon~ive to multiple instruction~, concurrently.
In the illustrated embodiment o~ FIG. 2f the exeoution unit 114 is further comprised of an interface 115 which provide~ means ~or coupling th0 output result-ing from the proces~ing o~ the selected data to the data lnterface of the processor 110 for output of the resultant data therefrom to the data oache-MMU 130. The interface 115 provide~ for bi-directional coupling o~
data between the exeuction unit 114 and the data interface of the processor 110 and therefrom via the data bus 131 to the data cache-MMU 130.
Referring to FIG. 3, the instruction decoder 113 of FIG. 2 is shown in greater detail illustrating one embodiment o~ an implementation of the instruction decoder 113. As illu~trated in FIG. 3, the inqtruction decoder 113 i~ comprised of a sequential state machine 116 which decodes instruction~ received from the in~truc-tion register 113 and provides operation code signal~
responsive to the instruction output of the instruction register 112. The operational code signals ~rom the Qequential ~tate machine 116 are coupled to a timing and control circuit 117 which provides mean~ for outputting the operation selection signals to control the sequencing of lnstruction execution, for coupling to the execution unit 114, responsive to the operation code signals output from the sequential state machine 116.
In a pre~erred embodiment, each mioroproce~sor is a single ¢hip inteBrated oirouit.
However, multiple chip embod~ment3 can also be utilized depending on design constraint~.
The instruction interface of the proces30r 110 is further comprised Or a multi-stage instruction buffer which provldes ~means for storing, in eriatim, a plurality of instructions, one instruction per stage, and ~ ~3~ J

whlch further provides mean3 for selectiYely outputting the stored instruction~ to the execution mean~ 100. The cache advance 31gnal i~ driven by the instruotion inter~ace a~ it has Pree ~pace. The Caoh~ ADVance control~ the I-Cache-MMU accesses. Thu~, the instruction ~nterface provldes a multl-~tage instruct~on buffer Por coupllng and Qtoring a plurality of instruction word~ as output in a serial ~tream from the in~truction cache-MMU
120 via the instruction bus 121~ Thi~ multi-~tage lnqtructlon bu~er provldes ~or $ncreasing ln~truction throughput rate J and can be utllized for pipelined operation of the procesqor 110. An external ~y~tem clock 160 provides clook ~ignal~ for synchronizing operations within and with the proceq~or 110.
The instruction interface of the proce~or 110 i9 further compri~ed of addre3q generator for 3electively outputting an initlal instruction address ~or storage in an instruction cache-MMU 120 program counter responsive to the occurrence of a context switch or branch in the operation of the microproces~or ~ystem. A context switch can include a trap, an interrupt, or any initialization of programs requiring initiallzation of the in3truction cache 120 program counter to indicate a new ~tarting polnt for a ~tream of instruction~. The instruction interface provides a cache advance signal output whic provides for selectively incrementing the instruction cache-MMU program counter, except during a context switch or branch. Upon the occurrence of a context ~witch or branch, the inqtruction cache-MMU 120 program counter i~
loaded with a new value ~rom the addres~ generator of the in~truction interface of the proce~or 110. A sy tem clock 160, provideq clock qignal~ to the instruction interrace of the microproces~or 110.
Upon initialization o~ the sy~tem, or during a context ~witch or branch, the in~truotion interface addre~q generator~of the proces~or 110 cau~e~ the loading Or the inqtruction cache 120 program counter.

~ ~ ~3 ~ J

Therearter~ when enabled by the cache advance ~Ignal, the instruction cache-MMU 120 causes a plurality of ~nstruction~ ~e.g. a quad word~ to be output for coupllng to the in~truction interface of the proces30r 110.
Instruction~ are sequentially output thereafter respon~ive to the output of the in~truction cache-MMU 120 program counter, 1ndependent and exclu~ive Or any further addre~ output from the instruction interface of the proce~or 110 to the instruction cache-MMU 120~
A3 illu~trated, the data interface of the proce~or 110 is further comprised of an addres~
generator and interface which output~ a data addre~ for coupling to the addre~s regi~ter 505 of the external data cache-MMU 503. The MCLK of the 3y~tem clock 160 i~
coupled to the data cache-MMU 130 for ~ynchronizing transfer of data between the data cache-MMU 130 and the data interface of the proces~or 110. In a preferred embodiment, ~ean~ are provided for ooupl~ng a defined number of data word~ bekween the data cache-MMU 503 and data interface 302 of the microproces~or 12 for each addre~ output from the data interface 302 independent and exclusive of any intermediate addre~s output ~rom the addre~s interface 324.
The inQtruction interface of the proce~or 110 and in~truction ca¢he-MMU 120 provide for continuous output of an instruction stream Or non-predefined length from the in~truction cache-MMU 120 to the instruction interface of the proces~or 110, re~ponslve to only a ~ingle initial addre~s output from the addreQ~ generator of the instruction lnterface and an active cache advance ~ignal output, continuing until a branch or context 9Wi tch occur~.
The operation Or the proce~or 110 data interface and the data cache-MMU 130 provi~eq for transfer of one or more defined number of word~ of data therebetween for`each addre~ output from the proces~or 110 to the data caohe-MMU. The fir~t of such defined 1 ~ ~3 ~

plurallty o~ words i~ output respon~1ve to the addre3s from proce~or 110. The remaining words are trans~erred BS qoon ~9 the sy~tem i~ ready. Upon completlon Or tran~fer of thi~ deflned number of words, a new addreqs must be loaded into the addres3 register of the data cache~MMU 130 from the proces~or 110. Every tran~fer of data between the data cache-MMU 130 and the data lnterface o~ the proce~or 110 requires the loading of a new addre~s from the proces30r 110 data interface into the addre~s regi~ter of the data cache-MMU 130. Although this transfer can be of one or multiple word~, the number of word~ i3 fixed and defined at the tart of the tran~fer, and each tranQ~er requlre~ a separate new addreqs be loaded.
The main, or primary, memory 140 i~ coupled to a ~ystem bus 141 to which i~ al90 coupled the data cache-MMU 130 and instruction cache MMU 120~ ~he main memory 140 selectively stores and outputs dlgital in~ormation from an addressable read-write memory.
The inqtruction cache-MMU 120, coupled to the main memory 140 via the ~ystem bu~ 141, manages the selective aCCeSQ to ths main memory 140 and provide~ for transfer of data from the read-write memory of the main memory 140 to the instruction cache-MMU 120 ~or storage ln the very high-speed memory of the instruction cache-MMU 120. Additionally, the ln~truction cache-MMU 120 provides meanq for selectively providing the ~tored data from the addres~able very h$gh-speed in~truction oache-MMU read-write memory for output to the processor 110.
The data caohe-MMU 130 iQ ooupled to the main memory 140 v~a the ~y~tem bus 141, and manages the selective acces~ to the main memory 140 for storage and retrieval of data between the main memory 140 and the data cache-MMU 130. The data cache-MMU 130 is further compri~ed of means ror selectively storlng and outputting data, from and to the proces~or 110 via the very high-~peed data bu~ 131, or from and to the main memory 14-~3 ~ J

via the system bus 141. The data cache-MMU 130 proYides selective storage and output of the data from it~
addressable very high-~peed read-write memory.
The processor 110 ~s independently coupled to the lnstruction cache-MMU 120 via instruct~on bu-q 121 and to the data cache-MMU 130 via the data bus 131. The proce3sor 110 processes data reoeived from the data cache-MMU 130 responsive to decoding and executing respective -ones of the instructions received from the instruction cache-MMU 120~ Proceqsing can be arithmetic, logical, relationally-based, etc.
As discu~sed above, the program counter of the in3truction cache-MMU 120 is loaded with an addres~ only during branche~ and context switches. Otherwi~e, the instruction cache-MMU operate~ in a contlnuous stream output mode. Thu~, once the program counter of the instruction cache-MMU 120 is loaded with a ~tarting address and the cache advance signal is activated, the reQpective addre~qed location'~ data i~ output from the inqtruction cache~MMU 120 memory to the proce~sor 110, and subsequent in~tructions are tran~erred to the processor 110 in a ~tream, ~erially one inQtruction at a time. Each subsequent instruction word or group of instruction words tran~erred without the need for any additional addre~s transfer between the proces~or 110 to the in~truction cache-MMU 120 program counter, except when a context switch or branch is required.
The MCLK is the clo¢k to the entlre maln clock, (e.g. 33 MHz), logic. BCLK i~ the ~y~tem bus clock, preferably at either 1/2 or 1J4 Or the MCLK.
For the system bus 141 synchronizatlon, BCLK is delivered to all t~e unitq on the sy~tem bu~, i.e., CPU, IOPs, bus arbiter, caches, interrupt controller~, Mp and ~o forth. All signals must be generated orto the bus and be ~ampled on the rising edge of BCLK. The propagation delay of the signals must be within the one cycle of BCLK
in order So guarantee the ~ynchronou~ mode of bus . J
~l~83;~

operation. The phase relatlon3 between BCLK and MCLK are strictly ~pecified. In one embodiment, BCLK l~ a 50 duty-cycle clock of twice or four time~ the cycle time of MCLR, whlch depend~ upon the phy~lcal ~ize and loads o~
the ~ystem buQ 141.
As illu~trated, the transfer of in~truction~ is fro~ the instruction cache-MMU 120 to the proces~or 110. The tran3fer of data i~ bi-directional between the data cache-MMU 130 and proces~or 110. InterPace between the in3truction cache-MMU 120 and main memory 140 is of instructions from the main memory 140 to the inqtruotion cache-MMU 120 respon~ive to the memory management unit of the in~truction caohe-MMU 120. Thi~ occurs whenever an instruction is required which i~ not re~ident in the cache memory of ICACHE-MMU 120. The transfer of data between the data cache-MMU 130 and main memory 140 is bi-directional. The memory management unit3 of the instruction cache-MMU 120 and data ~ache-MMU 130 perform all memory management, protection, and virtual to physical addre~s tran~lation.
As ~llustrated, the processor 110 provide~
virtual addreQ~ outputs which have an associatively ~apped relationship to a corresponding physical addreYs in ~ain memory. The memory mana ement units of the instruction and data cache-MMU~ 120 and 130, respective-ly, are responsive to tha respective virtual addres3 outputs from the in~truction and data interfaces of the proce~qor 110, such that the memory management units ~ele¢tlvely provide respective output of the as30ciated mapped digital information for the re~pective virtually addre~3ed location. When the requested information for the addre~sed location is not stored (i.e. a cache miss) in the re~pective cache-MMU memories, 120 and 130, the respect$ve memory management unit of the oache-MMUs prov~des a translated physical addre3~ for output to the main memory 1~0`. The oorresponding information is thereafter coupled ~rom the main memory 140 to the . ' ' ~ : .

~3~ ~ J

respective lnstructlon cache MMU 120 or to or from the data cache-MMU 130, and as needed to the proce~or 110.
As dl3cu~sed herein, the system o~ FIG. 1 l~
comprised of a central proce~3ing un~t 110, a ~ingle chip m~croprooes~or in the preferred embodiment, which ha~
separate in~truction caehe-MMU and data cache-MMU bu3 interface~ contained therein. The CPU 110 couples via a separate instruction bus 121 to in~truction cache-MMU
120. The instruction bu~ 121 l~ a very high-~peed bus 9 which, aq di~cussed above, provide~ ~tream3 of instruction without processor intervention except during branches and oontext Rwitches. The instruction bus 121 provldes fo~ very high-speed instruction communication~, and provides mean~ for communicating in~truction~ at Yery high speed from the instruotion cache-MMU 120 to the processor 110. The proce~sor 110 is al~o coupled via a separate and independent high-speed data bu~ 131 to a data cache-MMU. The data bus 131 provides for very high-speed bi-directional communication o~ data between the proce~30r 110 and the data cache-MMU 130.
The two separate cache interface buses, the instruction buQ 121 and the data bu~ 131 are each oomprised of multiple ~ignal~O A~ illustrated in FIGS. 4 and 5, for one embodiment, the signalq on and the data cache bu~ 131, the instruction oache bu~ 121 are as foll OW8:

~ DATA CACHE BUS ~

ADF<31:0> : addres~/data bus These lines are bi-directional and provide an addres~data multiplexed bus. The CPU put~ an address on these line~ for memory references ror one clock cycle.
On store operatlons, the addre~s is followed by the data. On load or TAS operations, these bus lines become . .

J
3;~

idle ~floating) aft~r 'che addre~ cycle, 80 that these 11nes are ready to receive data from the Data Ca¢he-MMU~ The Data Cache then puts the addres~ed data on the llnes ~or a load or TAS operation.

MPUO : SSW30, supervisor mode MPK : SSW29, protection key MPUOU : SSW28, seleckin~ a u~er'~ data space on ~upervi~or mode MPKU : SSW27, protection key of a u~er'~
data space on supervisor mode MPM : SSW26, vi rtual mapped These ~ignal~ repreqent the SyAtem Status Word (SSW<30: 26>) in the CPU and are pro~rided to both the D-Qache and I-cache.

FC<3: O> fun¢tion code / trap code The CPU puts " the type of data tranQf er" on FC<3: O> lines for one clock cycle at the addreQ~ cycle .
The D-CACHE, or I-CACHE, ~ends baek n the type of trapn on abnormal operations along with TSTB.

Tran~f er type On ASF A¢tlve) FC < 3 2 1 O >
_ _ _ _ _ __ _ O O O O load single-word mode O O O 1 load double-word mode O O 1 O load byte O O 1 1 load half-word O 1 O O Te~t and set 1 X O O store ingle word 1 X O 1 store double word 1 X 1 O ~tore b~te ~_ J

1 X 1 1 store half-word The D~cache put~ the TRAP code on FC to respond to the CPU .

Trap Code (on TSTB active~
FC < 3 2 1 0 >
_____________ X O O O
X O 0 1 memory error (MSBE) X 0 1 0 memory error (MDBE) X 1 0 1 page fault X 1 1 0 protection fault (READ) X 1 1 1 protection fault (WRITE) ASF : address strobe ASF i~ activated by the CPU ~ndicating that the 'address' and 'type o~ data transfer' are valid on ADF<31:10~ and FC<3:0> lines, respectively. ASF iQ
active half a clock cyle earlier than the addre~s is on the ADF bus.

RSP : response 9i gnal On load operations, the RSP signal iq activated by the D-cache indioating that data is ready on the ADF
bus. RSP i~ at the same t~ming ac the data on the ADF
bu~. The D-cache ~ends data to CPU on a load operation~
and accepts data from the CPU on a store operation.
On ~tore operatlons, RSP i8 activated when the data cache-MMU becomes ready to accept the next operation. On load-double, RSP ls sent back along with each data parcel~trans~er. On store-doub1e, only one RSP
i~ ~ent back after the second data parcel is accepted.

~3~ J

~ 18 -TSTB : TRAP strobe TSTB, along with the trap code on FC<2:0>, i~
sent out by the D-cache indicating that an operation i~
abnormally terminated, and that the T~AP code is ava~l-able on FC<2:0> line~. On an already-corrected error (MSBE), TSTB is followed by RSP after two clock interYal3 wherea~ on any FAULT~ or on a non-correctable ERROR
(MDBE), only TSTB i~ sent out.

nDATA : D-cache Low on this line indicates that the data cache-MMU chip i~ connected to the DATA cache bu~.

~*~ INST bu8 ~*~*

IADF<31:0~ : addre~instruction bu~
These lines are bi-directional, and form an addre~s/in~truction multiplexed bus. The CPU ~end~ out a virtual or real addres~ on the~e line when it changes the ~low of the program ~uch as Branch, RETURN, Super-visor Call, etc., or when it change~ SSW<30:26> value.
The in~truction cache-MMU-MMU return~ in~truotion~ on these line~.

MPUO, MPK, MPUOU, MPM : (refer to DATA cache bu~
de~cription of the~e lines).

IFC<3:0> : function code/respon~e code The I-cache put~ the TRAP code on the FC lines to re~pond to the CPU.

IFC (at ITSTB active) __________ X ~ O O

~ ~ ~3~ J

_ 19 _ X Q 0 1 memory error (MSBE) X Q 1 0 memory error (MDBE~

X 1 0 1 page fault X 1 1 0 protection fault (execut~on) IASF : addres~ ~trobe IASF i~ activated by the CPU, lnd~cating that the address i~ valid on IADF<31~0> lines. IASF i9 actiYe half a elock cycle earlier than the addres~ l~ on the IADF bu3.

SEND : send in~truction (i.e. cache advance ~ignal).
ISEND i~ activated by the CPU, indicating that the CPU i~ ready to accept the next in-~truction (e.g. the instruction ~uffer in CPU is not ~ull).
At the trailing edge of RSP, ISEND must be o~f lf the instruction bu~fer i~ full, otherwise the next in tructions will be sent ~rom the instruction cache-MMU. When the new addre~ generated, on Branch for example, ISE~D must be o~P at least one clock oycle earller than IASF becomes active.

IRSP : re~pon3e ~ignal IRSP i~ activated by the I-cache, indloating an instruct~on i~ ready on the IADF<31:0> lines. IRSP is at the ~ame tim~ng as the data on the bus.

ITSTB : TRAP ~trobe Thi i~ activated by the I-cache, indicating that the oache has abnormally terminated it~ operat~on, and that a TRAP oode i~ available on IFC<3:0> line~. On an already-corrected error tMSBE), TSTB i~ ~ollowed by ~3~ J

RSP after two clook interval~, wherea~ on FAULT~ or a n~n-oorrectable ERROR ~MDBE), snly TSTB i~ sent out and becomes active.

INST : I-cache A high on thi~ line indicates that the cache iA
connected to the INST cache bus.
Each of the instruction cache-MMU 120 and data cache-MMU 130 has a second bus interface for coupling to the system bus 141. The ~y~tem bus 141 communicates information between all element coupled thereto. The bu~ alock signal BCLK of the system clock 160 provides for 3ynchronization o~ transrer~ between the elements coupled to the ~ystem bus 141.
As shown in FIG. 6, the Yystem buA output from the instruction cache-MMU 120 and data cache-MMU 130 are coupled to a ¢ommon lntermedlate bus 133 whlch couple~ to TTL driver/~uf~er circuitry 135 ~or buffering and driving interface to and from the system bus 141. This is particularly useful where each Or the instruction cache-MMU 120 and data cache-MMU 130 are monolithic single chip lntegrated clrcuits, and where it is desirable to isolate the buY driverQ/receiYers from the monolithic integrated circuits to protect the monollthio integrated circuits from bus interface hazards. The following bus Qignals coordinate bu~ driver/receiver activity:

DIRout : direction of the AD bus is outward Thl3 signal i3 used to control off-chlp drivers-receivers of the AD lines. The master cache activates this signal on generating the ADDRESS~ and on sendlng out DATA on the wrlte mode. The 31ave cache actlvates thi3 slgnal on sending out the DATA on the read mode.

ICA/ ~ I-cache access nICA is used only by the data and in3truction ~3~ ~ J

caches and the CPU. Thi~ signal ls sent from the D-cache to the paired I-cache ~or acee~inB the IO space in the I-cache. Upon the arrlYal Or a mem~ry-mapp~d IO access from the ~y~tem bus, the I-cache accept~ lt as an IO
com~and only when the nICA i~ active. Thus, the caches accept IO commands only from the palred CPU.
Synchronou~ operatlon Or th~ sy~t~m bu~ 141 18 made pos~ible in the above described ~ystem envlronment so long a~ no signal change occurs at the moment it is sampled. Two timings are fundamental to realize this operation, one is ~or generating signals on the bus and the other is ~or sampling to detect signal~. These two timings muYt be generated ~rom the 8us Clock BCLK which has a certain phase relationship with the Master Clock MCLK, to maintain the oertain relationship with internal logic operation. These timings must have a small skew from one unit to the other on the bu3 to satisfy the rollowing equation.

Tg-~ > Tpro ~ Tsk where, Tg g i~ the tlme period from the signal generating timing to the signal sampling timing, Tpro is the maximum propagation delay time of 3ignals, and Tsk is the skew o~
the bus clock.
I~ the physical requirements of the system bus do not satisfy the above equatlon, the signals will arrive asynchronously with respeot to the sampling timlng. In thls case, a synchronizer is required in the bus interface to synchronize the external asynchronous signals. Although the asynchronous operation does not restrict the physical size Or the bus or any kinds of timing delay, a seriou~ drawback exists in that it i~
extremely diffioult to eliminate the possibility of a "synchronlze faultn. Another disad~antage o~ the asyn-ohronous 3cheme is a ~peed limitation due to the hand-shake protocol whi¢h is mandatory in asynchronous .
.

' .

J

schemes. Thi~ i9 e~pecially inefficient ~n a multi-data tran~fer mode. Although a hand3hake scheme i~ a u~eful method of inter-communication between one source and one or more destinatlonq, and although this is a ~are way for data tran~fer operation~ the timing protocol re~trict3 the speed and i~ sometimes unsati~factory in very fa~t bu~ operatlon~. Additionally, an a~ynchronou bu~ i9 also ~en~itive to noi~e.
In the prererred embodiment, the 3ystem bu~ 141 ha~ one clock: BCLK. The MCLK i~ u~ed for internal log~c operation of the CPU 110 and Caches 120 and 130, and ~CLK
i~ u~ed to generate the qynchronous timings of bu~
operation as de~cribed above.
The sy~tem bus oan provlde the combinations of hand~hake and non-hand~hake schemes compatibility.
In a preferred embodiment, the sy~tem bu~ 141 1s a hlgh ~peed, synchronou~ bus wlth multiple maQter capability. Each potential ma~ter can have ~eparate interrupt line~ coupled to an interrupt controller 170 coupled via control line~ 111 to the proce~or 110. The 8y8tem bu~ 141 ha~ a multiplexed data/addre~ path and allows slngle or multiple word block transfers. The bus is optimized to allow efficient CPU-cache operation. It haQ no explicit read/modify/write cycle but implements thls by doing a read then write cycle without relea3ing the bu~.
As an illustration of an exemplary embodiment Or FIG. 1, the ~y3tem lnclude~ a 31ngle CPU ttO, an eight input flxed prlorlty bus arblter 180 and Rn lnterrupt controller 170. All 3ignals are generated and sampled on a clock edge and ~hould be stable for at lea~t a ~et up time before the next clock edge ard be held constant for at least a hold time after the olock edge to avold indeterminate circuit operation. Thi means that there should be limitations placed on bus delays which will in turn l~mit bus length and loading.
The sy~tem bu~ 141 i3 compri3ed of a plurality J

of slgnal~. For example, as illu3trated in FIG. 5, for one embodiment, the 3ystem bus 141 oan be csmprised Or the following signal~, where ~/" indioates a low true signal.

AD<31:0> : address/data bus Thi~ iq the multiplexed addres~/data bus.
During a valid bu~ cycle, the bu3 master with the right of the bus put~ an address on the buR. Then that bu~
maater either puts data on the bus for a write, or three-states (float ) it~ AD bus outputa to a high impedance ~tate to prepare to receive data durin~ a read CT<3:0> : CycleType CT<3:2> indicates the type of master on the bus and whether a read or write cycle i~ occurring.

____ O O ~ CPU write (write issued by a CPU type devlce O 1 ---~ CPU read (read issued by a CPU
type device) 1 O ~ -- IO write (write issued by an IOP type device) - IO read (read issued by an IOP
type device) CT~1:0) indlcates the number of word~ to be tran~ferred in the cycle.

CT<1:0>
______ O O ---~ a 3ingle-word tran~rer O 1 -~ a quad-word tranAfer 1 O ------- a 16-word trans~er - J
~ ~ ~3 ~ Global CAMMU wrlte MS<4:0> : Sy~tem Memory Space bits The system MS bits ~pecify the memory space to which the current acce~s will occur and the code which ~ndicate,~ that the cache will perform an internal cycleO That cycle is required to either update a cache entry or to supply the data to the ~ystem bus if a cache has a more recent copy Or the data.
MS : 4 3 2 _________~__ ______________ ____________________ 0 0 0 Ma~n M9mory~ privatc ~pace. Cache-able, write through.
0 0 1 Main memory, shared space.
Cache-able, write through.
0 1 0 Main memory~ private '~ ~pace, Cache-able. Copy back.
0 1 1 Main memory, ~hared 3pace.
Not cache-able 1 X 0 Memory-mapped I0 space.
Not cache-able 1 X 1 Boot loader space. Not cache-able A transfer between a cache-MMU and a device in memory mapped space is by single or partial word only.
If' the transfer i~ to memory mapped I0 space, it will be of the single cyole type~ that is, CT(1:0) are (00), then the lower two MS bit3 indicate the size of the referenced data:

MS (1:0) _________ 0 X Whole word transfer 1 0 ~ Byte transfer 1 1 1/2 word transfer ~3 The byte or halfword tranqfered must appear on the bus bits pointed to by the data's addre~q. Por exa~ple, during a byte acceQs to addre~ FFO3 (HEX), the desired data must appear on bu~ ~ignals AD~23:16>, the third byte of the word.

When a cache, 120 or 130, i~ accessed by a Shared-Write ~i.e. a write into shared space in main memory 140) or IO
write from the sy~tem bus, the hit line in the aDpro-priate cacheq must be invalidated. When a cache iq aoces3ed by IO read from the system bus, the matched dirty data in the cache muqt be ~ent out.
Masters must only issue to the slave the type(s) of cycle(s) that the slave is capable of replying to, otherwise the bus will time out.

AC~ : ActiveCycle This is a~serted by the current bu~ master to indicate that a bus cycle i3 active.
RDY/ : ReaDY
RDY/ iq issued by the addressed slave when it is ready to complete the required bus operation and has either taken the available data or has placed read data on the bus. RDY/ may not be asserted until CBSY~ becomes inactive. RDY/ may be negated between transfers on multiple word access cycles to allow for long acce~s times. During multiple word read and write cycles, ReaDY/ muqt be asserted two clocks before the ~irst word of the tran~fer is removed. If the next data is to be delayed, ReaDY/ must be negated on the clock after it is as3erted. This signal is "wired-ORed" between devices that can behave as slaves.

CBSY/ : CacheBUSY
CBSY~ is i~sued by a cache when, due to a bus acce~s, it i8 performing an lnternal cycle. The current controller of the bus and the addressed slave m~t not ~ 3~

complete the cycle until CBSY has become false. This ~ignal i~ "wire-ORed~ between cache~.tion. The CBSYJ
line i~ released only after the operation is over. On private-write mode, each lave cache keeps its CBSY/
signal in a high impedance state.
MSBE/ : MemorySingleBitError This is issued by main memory 140 after it has detected and corrected a single bit memory error. This will only go true when the data in error is true on the bus (i.e. if the third word of a four word tran~fer has had a corrected read error in this cycle, then during the time the third word iR active on the buq (MMBE) will be true).

MMBE/ : MemoryMultipleBitError This is issued by main memory when it detects a non-correctable memory error. Thi~ will only go true when the data in error is true on the bus ~i.e. if the third word of a four word transfer ha~ an uncorrectable read error in this cycle then during the time the third word i~ active on the bus MMBE will be true).

~ERR/ : Bu~ERRor Thi~ is issued by the bus arbitration logic after it detects a bus time out condition or a bus parity error has been detected. The qignal timed out timing iq the period of ~usCrant.

P~3:0> : Parity bits 3 through O
These are the four parity bit~ for the four bytes on the AD<31:0> bus. Both addres~ and data have parity checked on all cycle3.

PERR/ : Parity ERRor This is an open collector signal driven by each device's parity checker circuitry. It is asserted when a parity error i~ detect~d ~n either addre~3 or data. It .

3;~
-- ~7 --i~ latched by the bu~ arbltratlon logic 180 whlch then generates a bu~ error sequenGe.

BRX : E3usReques'c Thiq i~ the bu~ request s1gnal from device x to the bu~ arbiter 180.

BGX : Bu~Grant Thi~ i~ the bu~ srant signal from the buq arbiter 180 to the device x.

LOCK s Thi~ is generated during a Read/Modify/Write cycle. It has the qame timing a~ the CT and MS signals.

MCLK : master clock The master c}ock MCLK ~ delivered to the CPU
or CPU'~ 110 and caches 120 and 130.

BCLK : BusClook This i~ the sy~te~'s bu3 cloc~. All signals are generated and ~ensed on it~ rising edge.

RESET/
This is the sy~tem's master reset signal. It is as~erted for a large number of bu~ clock cycles.
RATE : BCLK/MCLK rate Low : BCLK has the frequency of 1/2 of the MCLK (e.g. 60n~).
High s BCLK has the frequency of 1/4 of the MCLK (e.g. 120ns).

In one embodiment~ the system architecture include~ multiple cache memorie~, multiple proce3sors, and IO proce~sor In this embodiment, there is a problem in keeping the ~amE piece of data at the 3ame value in every place it is ~tored and/or used. To alle-viate this problem, the cache memorie3 monitor the sy~tem bus, in~pecting each cycle to ~ee if lt is of the type that could affect the consistency of data in the ~ystem. If it is, t~e cache performs an internal cycle to determine whether it has to purge it~ data or to supply the data to the y~tem bus from the cache instead of from the addresqed device on the bus. Whlle the cache i5 deciding this, it assertq CacheBuSY/. When it has finiqhed the cycle it negates CacheBuSY/. If it has the data, it places it on the bus and assert~ ReaDY/.
The bus cycles that will cause the cache to do an internal cycle are:
1. An IO read (IOR) to private memory space. ThiQ allows the cache to supply data, which may have been modified but has not yet been written into memory. The MemorySpace code l~ <010xx>. That is , memory space is main memory, and the data required is cached in copy back mode into a private memory area. If, due to a programming error, a 16 word cycle is declared cache able and a cache hit occurs, the cache will supply the first four words correctly and then supply the value of the ~orth word transferred to the remaining 12 words.
~ . IO write cycles (IOW) of one, four or sixteen words. This allow~ the cache to invalidate any data that it (they) contain which is to be changed in memory. The MemorySpace codes are <001xx>, <001xx> and <010xx>. That is, purge any matching data that is cached.
3. Single and four word CPU writes to shared memory. This allow other caches to invalidate any data they contain that is being changed in memory.
The MemorySpace code i9 <001xx>. That i3~ any matching data that is cache-able and in shared memory areas.

4. Global writes to the cache-memory management unit (CAMMU) control registers. In a multiple-CPU ~ystem, e.g. with multple cache pair~, an -~3~
_ 29 --additional device i~ required to monltor the CBSY line and is~e the RDY slgnal when CBSY is off in the Global mode.

5. Acce~ses from the data cache-memory ~anaBe~ent unit (DCAMMU) to it~ companion in~truction cache-Demory management unit (ICAMMU).
The following iq an exemplary qummary of bus tranAfer requirement~ which should be followed to suc-ce~fully transfer data acro~s the 8y3tem bu Other reqtrictions due to ~oftware conventions may al~o be nece3sary.
1. All activity occurs on the risin~ edge of BCLK.
2. All signals mu~t meet all appropriate ~et up and hold times.
3. Master~ must only issue those cycles to~
~lave~ that the slaves can perform. The~e are:
(i) MMIO and Boot acce ses are single cycle only.
~ ii) Sixteen word tran~fer4 t~ memory may only be issued a~ IO type cycleq.
4. During cache-able cycles the bus slaves must not is~ue ReaDY/ until CacheBuSY/ has been negated. During not cache-able cycle~, the addre~3ed ~lave does not need to te~t for CacheBuSY/~ If ReaDY/ iq asserted when CacheBuSY/ i5 negated, the memory ~ystem mu~t abort it~ oycle.
~ typical sy~tem bu~ 141 cycle starts when a device requests bu~ maqtership by asserting BusRequest to the bus arbiter 180. Some time later, the arblter 180 return~ BusCrant indicating that the requesting device may u~e the bus. On the next clock the device a~sertq ActiveCycle/, the bus address, the bu~ CycleType and the bus MemorySpace codes. The bus addre~s i~ removed two BCLK'~ later. If the cycle is a wri~e, then data is a~serted on th~e AddressDat~ line~. If it i~ a read cycle, the AddressData line~ are three-~tated in anticipation Or data being placed on them. Then, one of the following will occur:
1. If the cycle involve~ a cache internal access, the cache (caches) will a~sert Cache~uSY/ until it (they) has (have) completed it' 5 (their) internal operations. CacheBuSY/ asseIted inhibit~ the main memory from completing its cycle. There are now qeveral po3sible sequences that may occur:
i. If the cycle is an I0 read to private memory and a cache has the most current data, the cache will Aimult~neously place the data on the system bus 141, assert ReaDY/ and negate CacheBuSYJ. ReaDY/
going true indicates to the me~ory 140 that it i~ to abort the current cycle.
ii. If the cycle is an I0 write or a write to shared memory, the memory 140 waits for Cache~uSY~ to be negated and a~sert~ ReaDY/.
iii. I~ the cycle is an I0 read to private memory in main mémory 140, and the cache doesn't have the data, CacheBuSY/ is eventually negated. This enables the memory 140 to assert the data on the bus 141 and assert ReaDY/.
2. If the cycle doesn't involve a cache access 9 CacheBuSY/need not be monitored.
ReaDY/ going true signals the master that the data has been transrerred successfully. If a single word access, it indicates that the cycle is to end. ReaDY/
stays true until one BCLK after ActiveCycle/ is dropped. If it's a read cycle, then data stays true for one BCLK longer than ActiveCycle/. For a write cycle, data is dropped with ActiveCycle/. ~usReque3t, MemorySpace and CycleType are also dropped with ActiveCycle/. BusRequest going false cau~es the bus arbiter 180 to drop Bu~Grant on the next BCLK, ending the cycle. ReaDY/~is dropped with BusGrant. If the cycle i~
a multi-word type then ReaDY/ going true indicates that ~3 further tran~fer will take place. The la3t transfer of a multiple word cycle appear~ identical to that of the corresponding single word cycle.
The Read/Modlfy/Write cycle iQ a read cycle and a write cycle without the bus arbitration occurring between them. The read data mu~t be removed no later than the BCLK edge upon which the next ActiveCycle/ is as~erted .
A BusError, BERR, ~ign~l is provided to enable the sy~tem bus 141 to be orderly cleared up after some bus fault condition. Since the length of the longest cycle i~ known (e.g. a ~ixteen word read or write), it is only required to time out BusGrant to provide sufficient protection. If, when a ma ter, a device 3ees BusERRor it will immediately abort the cycle, drop BusRequeqt and get off the bu3. BusGrant i~ dropped to the current master when Bu~ERRor iQ dropped. Bus drive lo~ic i~ desigred to handle this condition. The addre~s pre~ented at the beginning of the last cycle that caused a bus time-out is stored in a regiqter in the buQ controller.
BERR i3 also generated when Parity ERRor/ goes true. If both a time out and Parity ERRor go true at the s3me tims, time out takes precedence.
The main memory 140, as illuqtrated, i~
comprised of a read-write memory array error correction and drivers-receiver~ and buA interface circuitry which provlde for bus coupling and lnterface protocol handling for tran~fers between the main memory 140 and the oy~tem bus 141. The main memory 140 memory error correction unit provides error detection and correction when reading from the ~torage of main memory 140. The error correction unit is coupled to the memory array storage of the main memory 140 and via the system bus 141 to the data cache-MMU 130 and in~truction cache-MMU 120. Data beLng read from the memory 140 is proces~ed ~or error correction by the error detectior. and correction unit.

The processor 110 provides addres3es, in a manner as described above, to the instruction cache-MMU
120 and data cache-MMU 130 so as to indicate the starting locatlon of data to be transferred. In the preferred embodiment, this addres~ inPormation is provided in a virtual or logical addres~ format which corresponds via an associative mapping to a real or physical address in the maln memory 140. The main memory 140 provides for the reading and writing of data from addressable location~ within the main memory 140 responsive to physical addresses as coupled via the sy3tem bus 141.
The very high-speed memories of the instruction cache-MMU 120 and d~ta cache-MMU 130 provide for the selective storage and output of digital information in a mapped associative manner from their respective addressable very high-speed memory. The instruction cache-MMU 120 includes memory management means for managing the selective acces~ to the primary main memory 140 and performs the virtual to physical address mapping and translation, providing, when nece~ary, the physical address output to the system bus 141 and therefrom to the main memory 140. The data cache-MMU 130 also ha~ a very high-speed mapped addressable memory responsive to virtual addre~ses as output from the processor 110. In a manner similar to the instruction cache-MMU, the data cache-MMU 130 has memory management means for managing the selective access to the main memory 140, the memory management means including virtual to physical address mapping and translation for providing, when necessary, a physical addres~ output to the system bu9 141 and therefrom to the primary memory 140 responsive to the virtual address output from the proce~sor 110. The system bus 141 provides for high-speed communications coupled to the main memory 140, the instruction cache-MMU
120, the data cache-MMU 130, and other elements coupled thereto, communicating digital lnformation~therebetween.

~3 ~

The CPU 110 can simultaneously access the two cache~MMU's 120 and 130 through two very h~gh speed cache buse~, in~tructlon cache~processor bu~ 121 and the data cache~proce sor bus 131 Each cache-MMU acoe-~ses the sy~tem bus 140 when there i3 a "mi~" on a CPU acce~s to the cache-MMU. The cache-MMU' 9 essentially sliminate the ~peed discrepaney between CPU 110 execution time and the Main Memory 140 access time.
The I/O Interface Processine Unit (IOP) 150 is comprised of an IO adapter 152, an IO processor unit 153 and a local memory MIO 154, as shown in FIG. 1. The I/~
interface 152 interface~ the sy~tem bus 141 and an external I/O buq 151 to which external I~O devices are connected. Different versions of ItO adapterq 152 can be deslgned, such a~ to interface with ~econdary storage such as disks and tapeC, and with different standard I~O
buses such as VMEbus and MULTIbus, as well as with custom buqe~. The I/O processor unit 153 oan be any kind of existing standard micro-processor, or can be a cuqtom microprocessor or random logic. IO programs, including disk control programs, can reside on the MIO 154.
Data transfer modes on the system bu~ 141 are defined by the CT code via a CT bu~o In the preferred embodiment, data cache-MMU 130 to Main Memory 140 (i.e.
Mp) data transfer~ can be either in a quad-word mode (i.e. one addresq followed by four consecutive data words) or a single-word mode.
On I/O read/write operations, initiated by an IO proceqsor, IOP 150, the block mode can be declared in addition to the single and quad modes described above.
The block mode allows a 16-word consecutive data transfer to increase data transfer rate on the system bus 141.
This is usually utilized only to 'write thru' pages on IO
read. On IO write, this can be declared to either 'write thru' or 'copy back' pages. When the IOP 150 initiates the data trans~er fro~ main memory 140 to the IOP 150 a cache may have to respond to the IOP~ requegt, instead ~3 of the maln memory 140 responding on a copy-back scheme, becau~e it may not be the main memory 140 but the data cache 130 which has the mo~t recently modified data. A
special control signal i~ coupled to the cache~ 120, 130, and to main memory 140 ~i.e. CBSY~ and RDYt ~ignals).
For a read-modify-write operation, the ~ingle-read operation is ~ollowed by a ~ingle-word write opera-tion within one bus request cycle.
The main memory 140 can be co~prised of multiple board~ of memory connected to an intra-memory bus. The intra-memory bus i~ separated into a main memory addre~ bus and a main memory data bus. All the data tran~er mode~ a~ de~cribed above are supported.
Boot ROM i~ located in a special addre~s space and can be connected directly to the ~ystem bu~ 141.
Referring again to FIG. 1, the processor 110 is also shown coupled to an interrupt controller 170 via interrupt vector and control line~ 111. The interrupt controller 170 a~ shown is coupled to the main memory 140 via the interrupt lines 145, to the IOP 150, via the interrupt lines 155, and to the Array Procesqor 188 via interrupt lines 165. The inkerrupt contoller 170 signals interrupt~ to the proce~sor 110 via interrupt lines 111.
An interrupt controller 170 i~ coupled to the CPU 110 to respond to interrupt request3 i~sued by ,bu4 master devices.
The CPU has a ~eparate independent interrupt bus 111 which controls maskable interrupt~ and couple~ to the interrupt controller 170. Each level interrupt can be masked by the corresponding bit of an ISW (i.e.
Interrupt Status Word) in the CPU. All the levels are vectored interrupt~ and have common requeqt and acknow-ledge/enable lines.
The bu~ interrupt controller 170 enables ~everal high level interrupt source~ to interrupt the CPU
110. In one embodiment, the interrupt controller 170 is o~ the parallel, fixed priority type. It~ protocol is 1~3~ J

~milar to that of the ~y~tem buQ 141, and multiplex's the group and level over the same lines.
The interrupt controller 170 i~ coupled to each potential interrupting devices by the ~ollow~ng qignals:

IREQX/ : InterruptREQuest from device x Thi~ signal i~ is~ued to the interrupt control-ler 170 by the interrupting device a~ a request for ~ervlce.

IENX~ : InterruptENable to device x This is is~ued by the interrupt controller 17~
to the lnterrupting dev~ce to indcate that it ha~ been granted interrupt service.

IBUS<4:0> : InterruptBUS
These five lines carry the interrupts group and level to the interrupt controller 170. Thi~ is a three state buQ.

IREJ/ : InterruptREJect This ~ignal indicates to the interrupting device that the CPU 110 has refu~ed to accept the inter-rupt in this group. Thi~ i~ connected to all interrupt devices.

The interrupt controller 170 is coupled to the CPU, or CPU's, 110 by the ~ignal lines 111 as follow~:

IR/ : CPU Interrupt Request IR/ indicate~ the exi~tence of a pending vectored interrupt, the level of which is available on the VCT<2:0> lines.

IAK/ : CPU Interrupt AcKnowledge The CPU 110 ~ends out IAK/ to indicate that the interrupt is accepted9 and at the same time read~ the 1~3;~

vector number through the VCT~4:0> lines. IAK/ and IR/
configure a handshake scheme.

MK MasKed response Each CPU which ~ masking out the current interrupt returns an MK signal in tead of an IAK/
~ignal. The interrupt ia not l~tched in the CPU in thi~
case. MK can be used by the interrupt controller to relea3e the ma3ked interrupt and g~ve way to a newly arrived higher level interrupt.

V5T<5:0> : level and vector code YCT lines are multiplexed, and provide a level number and a ~ector number. A level number 0-7 is put on the VCT<2:0> lines when IR/ is active. When IAK/ is activated by the CPU, the VCT<4:o~ line~ ha~e a ~ector number which identifies one of 32 interrupts of that level. The VCT lineR couple outputs from the interrupt controller 170 to the CPU, or CPU's, 110.
The CPU 110 activates IAK~, and input~ the ~ector number, tbrough I~US<4:0> line~, that identifies one of 32 interrupts in each level. In a multi-processor environment, these levels can be used to let the system have a flexible interrupt ~cheme. As an example of the interrupt scheme in a multi-processor system, when all the IREQx/ lines are activated, the CPU's enable bits in the ISW distinguish whether or not the CPU should accept the interrupt. Each level of interrupt thus has 32 interrupts and the level can be dynamically allocatable to any one of the CPUs by controlling the enable bits in SSW (i.e. system status word).
MK (masked) signals are activated, instead of IAK/, by the CPUs which are masking out the current interrupt. The interrupt iR ignored (i.e. not latched) by those CPUs. These signals allow the interrupt controller 170~to reserve the ma~ked interrupt and let a higher interrupt be proces~ed if it occurs.

- ~ , .

~3 Beyond the element~ as dsscribed above for FIG.
1, additional ~ystems elements can be added to the archi-tecture, and coupled via the ~ystem bus 141 into the system.
A bu~ arbiter 180 i~ coupled to the system bu~
141 and to system elementJ coupled to the sy3tem bus 141, ~uch as to the lnstruction cache-MMU 120 and data cache-MMU 130, for selectively resolving channel access con-flicts between the multiple potential "master" elements ooupled to the qystem bus 141. This maintains the integrity of communications on the system buQ 141 and avoids collision~ of data tranqfer thereupon. The bus arb~ter 170 has bus request and bus grant input3 and outputq, respectively, coupled to each of the instruction cache-MMU 120, data cache-M~U 130, and to IOP 150. For example, if the in~truction cache-MMU 120 reque~t~ a transfer of inAtruction data from the main memory 140 at the same time as the IOP 150 request~ tran~fer of data to or from the IOP 150 relative to the main memory 149, the bus arbiter 180 ls re~pon~ible for re~olving the con~lict so that khe two events would happen in sequence, rather than allowing a conflict and collision to occur as a result of the simultaneous attempt~.
The bu~ arbitration between bus masterq is done by the bus arbiter 180. Each bus master activates its Bus Requeqt BR line when it intends to access the system bus 141. The bus arbiter 180 returns a Bu~ Granted (BG) signal to the new master, which has always the highest priority at that time.
The bus master, having active BR and BG
~ignal~, is able to maintain the right of the bus by keeping its BR signal active until the data transfer i~
eomplete. Other master~ will keep their BR signals active untll its respective BG signal is activated in turn.
The s~ tem bus 141 is a shared resource, but only one unit can have the use of the bus at any one :

J
~ ~ ~3 time. Since there are a number of potential "bus ma~ter"
unit~ coupled to the system bu~ 141, each of which could attempt to access the ~y~tem bus 141 independently, the bus arbiter 180 is a neces~ary element to be coupled to the syste~ bus 141.
There are, in general, two arbitration priority techniques: a fixed priority, and rotating or scheduled, priority. There are also two kinds of signal handling scheme~: a serial (i.e. daisy-chained) and a parallel.
The serial scheme when configured aY a fixed priority system requires le99 circuitry than a parallel scheme, but is relatively slow in throughput speed. The combina-tion of a serial 3cheme and a rotating priority can be provided by a high performance bu~ arbiter 180. The parallel scheme can be realized with either a fixed or a rotating pr$ority, and is fa~ter in ~peed than a ~erial or mixed scheme, but requires much more circuitry. The bus arbiter 180 of the present invention can utilize any of these schemes.
In an alternative embodiment, a rotating priority ~cheme can give every bus master an equal chance to use the system bus. However, where IOPs or one particular CPU should have higher priority, a fixed priority i~ usually preferable and simpler.
The bus arbiter 180 can al50 provide the function of checking for any long bus occupancy by any of the units on the ~ystem bus 141. This can be done by mea~uring the actlve time of a bus grant ~lgnal, BG. If the BG signal is too long in duration, a bus error ~ignal, BERR, can be generated to the bu~ master current-ly occupying the sy~tem bus 141. BERR is alQo generated when Parity ERRor/ occur~.
As further illu~trated in FIG. 1, an array proce sor 188 can be coupled to the system bus 141.
Complex computational problemg compatible with the array processor's ca~abilities can ba downloaded to provide for parallel proces~ing of the downloaded data, with the resultant answer~ being passed back via the sy~tem buQ
141 (e.g. back to main memory 140 or to the data cache-M~U 130 and therefrom to the CPU for action thereupon).
As discus~ed above, the I/O Proce~sing Unit (IOP) 150 couples to the sy~tem bu~ 141 and ha~ means for coupling to an I/O bu~ 151, such a~ to a secondary stcrage disk or tape unit. The IOP 150 can provide for direct tran~fer of data to and from the main memory 140 and from and to the secondary torage device coupled to the IOP 150t and can effectuate 3aid transfer indepen-dently of the lnstruction cache-MMU 120 and data cache-MMU 130. The IOP 150 can also be coupled as a "bus master" to the bus arbiter 180 to resolve confllcts for access to the main memory 140 via access to the system bus 141. This provides for flexibility. For example, data tran~ferred between main memory 140 via the system bu~ 141 to the IOP 150 and therefrom to a secondary ~torage device can be controlled to provide a 16-way interleave, whereas transfers between a cache 120, or 130, and the main memory 140 can be controlled to provide a 4-way interleave. Thi~ i~ pos~ible since the control of the transfers between the caches, 120 or 130, and main memory 140 iq separate from the control for transfers between the IOP 150 and main memory 1~0.
The IOP 150 can alternatively or additionally provide for protocol conversion. In this embodiment, the protocol IOP 150 is coupled to the system bus 141, and is alRo coupled to an external I/O buq 151. Preferably, the InP 150 is also coupled to the bus arbiter 180. The protocol conversion IOP 150 manages the interface access and protocol conversion Or digital information between any of the system elements coupled to the system buq 141 and provides for transfer of the digital information via the external communications I/O bus 151 to the external ~ystem. Thu3, for example, the ~ystem bus 141 architecture and transfer protocol can be made to inter-face with non-compatible syqtem and bug ~ructures and .- 40 protocol~, ~uch as interfacing to a Multibus sy~tem.
FIGS. 7A-C illu~trate the virtual memory, real memory, and virtual addre~ concept3, re pectively.
Referring to FIG~ 7A, the virtual memory as ~een by the CPU 110 i9 illustrated. The vlrtual memory i~
illustrated as compriqing a 232 word 32-bit memory array, binary addre~sable from 0 to FFF FFF FF (hexadecimal).
This virtual memory can be v1qualized as comprising 1,024 (21) segment~, each ~egment having 1,G24 (i.e. 21) pages, each page having 4,096 (i.e. 21~) words or bytes. Thu~, the CPU can addre s a 4 gigabyte virtual memory ~pace. Thls vlrtual memory addre~s ~pace is independent of the actual real memory ~pace avail~ble.
For example, the real memory (i.e. main memory) can be comprised of 16 megabytes, or 212 paseq~
A~ illustrated in FIG. 7B, real memory qpace is represented by a real addre~, RA, from 0 to FF~ FFF
(hexadecimal). The cache-memory management unit of the pre~ent invention provides very high ~peed virtual to real memory 3pace address translation as needed. The cache-memory management unit provides a ~apping for correlating the cache memory'Q content~ and certain preQtored information from virtual to real memory space addre~ses.
Referring to FIG. 7C, the 32-bit virtual address, VA, is compri~ed of a 10-bit segm~nt addreq~, bits 31 to 22 (i.e. VAC31:22>), a 10-bit page address, bits 21 to 12 (i.e. VA<21:12>), and a 12-bit displacement address~ bit~ 11 to 0 (i.e. VA<11:0>). In a preferred embodiment, the cache memory management unit provides qet aq30ciative mapping, such that the displacement address bit~ 0 to 11 of the vLrtual address corre~pond to bitQ 0 to 11 Or the real addres~. This provide~ certain advantages9 and speedq the tran~lation and mapping proces~.
Referring to FIG. 8, a block diagram of the cache-memory management unit i~ illustrated. In a ~3 preferred embod1ment, a ~ingle cache-memory management unit architecture can be utilized for either instruction or data cache purpo~e~, ~elected by programming at ti~e of manufacture or qtrapping or initiallzation procedures at the time of sy~tem configuration or initialization.
The cache-memory management unit ha~ a CPU interface coupling to the procsssor cache bus 121 or 131, and a ~ystem bus interface coupling to the system bus 141. The CPU interface is comprised of an addre 8 input register 210, a cache output register 230, and a cache input register 240. The syQtem bus interface i~ comprised of a system bus input register 260 and a sy~tem bus output register 250. The addres input register 210 couple3 the virtual addre~ via bu~ 211 to a cache-memory system 220, a translation logic block (i.e. TLB) 270, and a direct address translation logic (i.e. DAT) unit 280. The DAT
280 and its operation are described in greater detail with reference to FIG. 12, hereafter. The data output from the cache memory system 220 is coupled via bus 231 to the cache output register 230. The cache memory system receives real addreQs inputc via bus 261 from the system input register 260 and additionally receives a real address input from the TLB 270. Data input to the cache memory sy~tem 220 is via the cache data bus (i.e.
DT) 241, which couples to each of the cache input register 240, the qy~tem bus input register 260, the system bus output register 250, oache output register 230, translation logic block 270, and DAT 280, for providing real address and data paq~-through capabilit~es. The TLB 270 and DAT 280 are bi-d1rectionally coupled to the DT bu~ 241 for coupling of real addre~s and address translation data between the DT
bus 241 and the TLB 270 and the DAT 2Bo . The syQtem bus interface can communicate with the DAT 280 and TLB 270 as well as with the cache memory ~ystem 220 via the DT bus 241.

_ 42 ~
Referr1ng to FIG. 9, a detailed block diagram of the cache-MMU i~ shown, illustratine the data flow operations internal cache-MMU.
The virtual address ls taken from the fast cache bus, 121 or 131, via the cache input register 240, and is stored in an accumulator/register 310 of the cache-MMU. This address is then sp~it into three parts. The high order bits (<31:>) are sent to the TLB
350 and DAT 370. Bits <10:4> are sent to the cache memory 320 buffer selection logic to selct a line therein. Bits <3:2> are sent to the multiplexer 341 which selects one of the four output words Or the quad-word line registers 333 and 335. Bits <0:1> are used only on ~tore byte/store halfword operations5 a~
described below.
The TLB 350 uses the low order 6 bits <17:12>
of the virtual page address to acce~s a two way set associative array 352 and 354 which has as its output the real address of the page corresponding to the virtual address pre~ented. Bit <11> is passed through without translation. Since the page size i~ 4K, bit <11> i~ part Or the specification f the byte within the page.
Therefore, if a match is ~ound, the real addres~ is gated out and into the comparators 332 and 334 for comparison to the cache real addre~s tag output~ 322 and 326.
If no match is found in the TLB 350, then the DAT (dynamic address translator) 370 i9 invoked. The DAT, by use of the segment and page tables for the active process, tran~lates the virtual address presented to a real address. The real address is loaded into the TLB
350, replacing an earlier entry. The TLB 350 then sends the real address to the cache 320.
The cache data buffer 321 and 322 i5 a set assoc~ative memory, organized as 128 sets of two lines of 16 bytes each. Bit~ <10:4> of the virtual addres~ select a set in the c~che data buffer. The 16 bytes of data for each of the two line~ ln the ~et are gated out into the two quad w~rd registera in the cache loslc.
The comparators 332 and 334 compare the real address (~rom the TLB) with both of the real addres~
tag~, 322 and 326, from the cache data buffer. If there i~ a match, then the appropriate word from the line matched is gated out to the COR 230. Bit~ <3:2~ are used to 3elect the appropriate word via multiplexer 341. If the valid bit for a line i~ ofP, there iY no match.
For byte or half word loads, the cache-MMU
provides the entire word, andthe CPU 110 ~elects the byte or halfword. For byte or half word store3, there is a more complex sequence of operations. The byte or half word from the CPU 110 is placed in the CIR 240, simultaneously, the cache reads out the word into which the byte(s) is being stored into the COR 230. The contents of the CIR 240 and COR 230 are then merged and are placed on the processor/cache bus.
If there is a mi s (i.e. no match), then the real address i~ sent over the ~ystcm bu5 141 to main memory 140 and a 16 byte line is received in returnO
That 16 byte line and 1ts associated tags replace a llne in the cache data buffer 321 and 323. The specific word requested is then read from the cache-MMU.
The accumulator regi~ter 310 ~unctions as the addre~s register in the data cache-MMU and a~ the program counter in the instruction cache-MMU. The function as either an instruction cache-MMU or a data cache-M~U is being determined by initialization of the ~y~tem or by hardwired strapping. For a monolithic integrated circuit cache-MMU embodiment, this decis~on can be ~ade at time of final packaging (e.g. such as by ~trapping a particular pin to a voltage or to ground or by laser or ion implant procedures). Alternatively, it can be programmed aq part of the initialization of the chip by the system (e.g. by loading values per an initialization protocol onto the chip~. The register-accumulator 310 stores the addre~3 output from the CPU 110. As de~cribed before~ thi~ addres~ is 32 bits in length, bit~ 0 to 31.
The cache memory sub-system 320 is divided into two equal halves labelled "W, 321", and "Xl', 323". Each half i~ identical and ~toreq multiple word~ of data, the real addres~ for that data, and certain control information in flag blt~. The internal structure of the cache is described in greater detail with reference to FIG. 10. Each half of the cache, W and X, provide addre~s output3 and multiple words of data output therefrom, via lines 322 and 324 for address and data output from the W cache half 321, and addres~ and data outputs 326 and 328 from the X cache half 323.
In the preferred embodiment, the data output i~
in the form Or quad-words output simultaneously in parallel. Ths is complimentary to the storage structure of four wordq in each half, W and X, of the cache for each line in the cache half, as illustrated in FIG. 10.
The quad-word outputs from the two halve~, W and X, of the cache, respectively, are coupled to quad-word line regi~ters 333 and 335, respectively. The number of words in the line registers corresponds to the number of words stored per line in each half of the cache. The addre~s outputs ~rom each half of the cache, W and X, 321 and 323, respectively 9 are coupled to one input each of comparators 332 and 334, respectively. The other input of each comparator 332 and 334 is coupled to the output of a multiplexer 347 which provides a real address, bits 31 to 11, output. The real address, bits 31 to 11, are compared via the comparators 332 and 334, respectively, to the outputs o~ t~e addres3 interface from each of the cache halveY W, 321, and X, 323, respectively, to determine whether or not the requested address corresponds to the addresses present in the cache 320.
The accumulator 310 provide3 an output of bits 10 to 4 to t~e cache memory ~ubsystem, ~o as to ~elec~ one line therein. The ~real addreYs ~tored in that line for each half, W ard X, o~ the cache memory 320 i~ output from the ~3 re~pective half` Yia ~ t~ respective addre3~ output line, 322 and 326, to 1ts reQpective comparator, 332 and 335.
The output3 from each of the line regi~ters 333 and 335 are coupled to the multiplexer 341. The accumulator-regl~ter 310 provides output of bitq 3 and 2 to select one of four consecutive word~ from the quad-word storage line registers 333 and 335.
The ~elected word from each of the line regi~ters are outputs from multiplexer 341 to to multiplexer 343. The ~election of which line regi3ter, i.e. 333 or 335, output is to be output from multiplexer 343 is determined responsive to the match~no match outputs or comparator3 332 and 334. The multiplexer 343 couples the data out bits 31 to O to the procesqor cache bu~, via the cache output regi~ter 230 of FIG. 4. The match/no-match ~isnals output from comparator~ 332 and 334 indicate a cache hit ~i.e. that i~ that the requested real address was present in the cache and that the data was valid] or a cache misY [i.e. reque~ted data not present in the cache] for the respective corre3ponding half o~ the cache, W (321) or X (323). The real address bits 31 to 11, which are coupled to the comparators 332 and 334 from the multiplexer 337, i5 constructed by a concatination proces~ i~lu~trated at 348. The register accumulator 310 output bit 11 corresponding in the ~et associative mapping to the real address bit 11 is concatlnated, with the real address output bit~ 31 to 12 from the multiplexer 345 of the TLB 270.
The TLB 270 of FIG. 8 is ~hown in greater detail in FIG. 9, as comprising a translation logic block storage memory 350 comprising a W half 352 and an identical X half 354, each having multiple lines of ~torage, each line compris1ng a virtual addre~Q, flag ~tatus bits, a real address. Each half provides a virtual addre~s output and a real addre~ output. The virtual addre~s output from the W half of the TLB 352 is coupled to comparator 362. The virtual address output of `J
~ 46 ~
the X hal~ 354 ~ coupled to comparator 364. The other input to the comparator~ 362 and 364 i~ coupled in common to the regi~ter accumulatvr 310 output bit3 31 to 18. A
line i~ ~elected in the TLB re~ponsive to the regi~ter accumulator 310'9 output bit~ 17 to 12, which select one of the llnes ln the TLB as the active selected line. The virtual addres~ output from the TLB W and X halves, 352 and 354 respectively, correspond~ to selected line. The "match" output lines from comparators 362 and 364 are each coupled to select inputs of a multiplexer 345 which provides a real address output of bits 31 to 12 to the concatination logic 348 for selective ~a~age to the multiplexer 347, etc. The real address outputs for the selected line (i.e. for both halves) of the TLB 350 are coupled to the multiplexer 345. On a TLB hit, where there iq a match on one of the halves, W or X, of the TL~, the corresponding comparator provides a match signal to the multiplexer 345 to ~elect the real addre~s for the half of the TLB having the match of the virtual addres3e~
to provide its real addres~ output from the multiplexer 345 to the concatinatlon losic 348. In the event of a TLB miss, a TLB mi~s signal 372 is coupled to the direct address tran~lation unit 37Q. The DAT 370 provides page table acce~s as illustrated at 374, and provides replacement of TLB lines as illu~trated at 375. The operation of the DAT will be deqcribed in greater detail later herein. On a oache miss, the requested addre~ed data is replaced within the cache as indicated via line 325~
Referring to FIG. 10A, the organization of the cache memory system i~ illu~trated. The cache memory Rystem 320 i~ compriYed of three field~, a Used bit field, and two identical high peed read-write memory field3, W and X. The first field 329 is comprised of a Used "U" bit memoryS indicating whether the W or X half was the most recently u~ed half for the addre~ed line of ¢ache memory 320. The W and X memories each contain multlple lines ~e.g. 128 lines). The U-memory field 32g ha~ the same number of line~ (e.g. 128 lines). The ~torage array~ W and X of cache memory ~ub~ystem 320 can be expanded to multiple planes (i.e. more than two equal block~3, with the size of the U-memory word correspondingly changed.
Each line in each cache memory subsystem half, W and X respectively, contain multiple fields, as Qhown in FIG. 10B. Each line in the W or X ~ubsy~tem memory contains an enable bit "E", a line valid bit "LV", a line dirty bit "LD", a real addres~ field "~A", and multiple data words "DT". The enable bit set indicate~ that the respective aQ~ociated line i~ functional. A reset enable blt indieates that the respe¢tive a~oeiated line is not operational. A reset enable bit result3 in a cache misQ
for attempted acoesses to that line. For monolithic integrated circuit cache-MMU'~, the enable bit can be laser set after final test a~ part of the manufacturins process. The line valid bit LV indicates whether or not to invalidate the entire current line on a cold start, I/0 Write, or under processor command. The line dirty bit LD indicates whether the respective as~ociated current line of the ~ache memory ~ubsy~tem ha~ been altered by the prooessor (i.e. main memory i~ not current). The real address field, illustrated as 21 bits, comprises the most significant 20 bits for the real address in main memory Or the first stored data word which followY. The multiple data words, illu~trated as four words DT0 to DT4, are acce~sed by the proces~or inQtead of main memory. Each data word contains multiple bit3, e.g. 32 bits.
As illuQtrated in FIG. 11A, the TLB subsystem 350 is comprised of three fields, a Used "U" field 359, and dual high speed read-write memory fieldQ, W and X
memory subsystem. The W and X memory subsy~tems are equivalents form~ng two halves of the cache memory storage. As illustrated, each half contains 64 lines of 3;~

addressable storage havlng 47~bit wide worda, and support the vlrtual to real addres~ translation. The Used field of each line performs in a manner ~imilar to that which i~ described wi~h reference to FIG. 10A.
A~ illustrated in FIG. 11B, each storage line in W and X is comprised of a 14 bit virtual addres~ "VA"
field, ~ 20 bit real address "RA" field, a ~upervisor valid bit field, a user valid bit UV field, a dirty bit "D" field, a referenced bit "R", a protectlon level word "PL" field, illuctrated a~ four bit , and a system tag "ST" field, illustrated a~ five bits.
The TLB is a type of content addres~able memory which can be read within one MCLK cycle. It is organized a~ a set as~ociative buffer and con3iClts of 64 sets of two element~ each. The low order 6 bit3 of the virtual page address are u~ed to select a set, i.e. a line of storage. Then, the upper 14 bits of the virtual address are compared (i.e. 362 and 364) to the key field VA
output of both element~ 352 and 354 of the set. On a TLB
hit, the real addre s field (20 bit~) RA of the TLB line entry which matches ic output via multiplexer 345, along with the a sociated system tags and acces~ protection bits~ A TLB translation search is provided responsiYe to 14 bits of virtual address, supervi~or valid and user valid.
As illustrated in FIG. 12, the cache memory is organized on a quad word boundary. Four addras33ble word~ of real address memory are stored in each line for each half (i.e. W and X) of the cache memory system 320. The cache memory subsystem provides quad-word output on the quad-word boundaries to ~urther accelerate cache access time. For example, on a load operation, when the current addres3 is within the quad boundary of the prev10u~ addres~ 9 then the cache access time i3 minimal ~e.g. two clock clycles~. When the current address i~ beyond the quad boundary of the previous address, the cache acce3s time is longer [e.g. four clock cycles ] .
As discu3~ed elsewhere berein in greater detail, the TLB i5 reRerved for providing hardwired translation logic for crltical functions. This provides a very hlgh speed guaranteed main memory virtual to real mapplng and translation capability. The hardwired transl~tion logic block functions are illu~trated in FIG.
13. Each line contain~ information as ~ndicated in FIG.
11B. The translation and ~ystem information is provided for crit1cal functSon~ ~uch a~ boot ROM, memory management9 I/O, vector~, operating gysSem and reserv~d locations, applications reserved locations as di~cussed above in greater detail with reference to FICS. 11A-B.
In additlon to the read-write TLB, there are eight hardwired virtual to real tran~lations, as discuRsed with referen e to FIG. 13. Some of these translation~ are mapped to real page~ 0-3. Page O in virtual space, the ~irst page in the low end of real memory, i9 used for trap and interrupt vectors~ Pages 1-3 are used a~ a ~hared area for initializatlon of the sy~tem. Pages 6 and 7 are u~ed for bootstrap system ROM
and Pages 4 and 5 are used for memory mapped I/O. These eight page translations will only be u3ed when in supervi~or mode. As a result of these being hardwired in the TLB, a mis~ or page fault will never occur to the first eight virtual pages of ~y~tem space.
The PL bits indicate the protection level of the page. The function code which accompanies the ~A
(virtual addre~) from the CPU contains the mode of memory reference. These modes are compared with the PL
bits and if a violation i~ detected, a CPU Trap îs generated.
The cache-MMU provide~ memory access protection by examining the four protection bits (PL) in the TLB
entry or page table entry. Thi~ is acco~plished by comparing the supervi~or/user bit and K bit in the supervisor ~tatus word (SSW) with the acce~s code, and, i~ there is a violation access is donled and a trap is gel1erated to the CPU.
The vlrtual addres3 whloh cauaed the trap 19 ~aved ~n a register and can be read wlth an I/0 command.
Ther~ are three unique traps generated:
1. In3truct~0n F~tch Access ~iolatlon - Instruct~on cache only.

2. Read Access ~iolation - Data cache only.

3. Write Acces3 Violation - Data cache only.

Access Code PSW S,K Bits aooo RW - - ~

1111 _ _ _ _ where: R~ = read/write, E - instruction e%ccution, = no access, S - supervisor~u3er, and K - protect.
Th~ (D) dirty bit in the data csche llne lndicateq that the llne has been modlfled since reading it ~rom main memory.
The dlrty bit ln the TL8 indicates that one or more words in that page have been modi~ied.
When a word 15 to be written in the cache, the dirty bit in the line is -qet. If the dlrty bit in the TLB is not set, it i~ then set and the line in the TLB is written back in the page table. If the dirty bit in the TLB is already ~et, then the page table is not updated.
This mechanism will automatlcally update the page table dirty bit the ~irst time the page is modified.
The referenced bit (R) in the TLB i3 uYed to ind1cate that the page has been re~erenced by a read or write at least once. The same approach that isuYed for the D bit will be used ror updating the R bit in the page table entry.
The valid bitQ (SV, UV) are u~ed to invalidate the line. On a cold start, both SV and UV are set to 2ero. On a context switch ~rom one user to another, UV
is set to zero. UV i8 not reset when going from User to 3upervisor or back to the same user.
A 20 ~it Real Address (RAj is al~o ~tored ~t each line location. When the virtual addre~s has a match, the real address is sent to the cache for comparison or to the SOR.
When the 9y9tem is runnlng in the non-mapped mode (i.e. no virtual addre~sing), the TLB i~ not active and the protection clrcuits are disabled.
Tho TL3 respond~ to the following Memory Mapped I/O commands:

o Re~et TL~ 5upervisor Valld Hits - All SY bits in the TLB are re3et.

i3~

o Reset TL8 User Valld Bit-Q - All UV bits ln the TLB are reset.

o Reset D 81t - 3et all dirty (D) bit to zero in the TLa.

o Reset R Bit - .5et all referenced (R) bit~ to æero in the TLB.

o Read TLB Upper - Most ~ignif~cant part of addressed TLB location i~ read to CPU.

o Read TL~ Lower - Least significant part o~
addre~sed TLB location is read to CPU.

o Write TLB Upper - Most signifioant part o~
addressed TL~ location i3 written from CPU.

o Write TLB Lower - Leaqt significant part of addres~ed TLB location i~ written from CPU.

Memory mapped I/O to the cache-MMU goes through virtual page 4.
The system tags are used by the sy3tem to change the cache-MMU strategy for wrlting (i.e. copy back or write through), enabling the oache-MMU and handling I/O. The system tagq are located in the page tables and the TL~.

System Tags O O O Tt TO Private, write through O 1 O T1 TO Private, copy back 0 1 1 T1 TO Non cacheable O 0 1 Tl TG Common, write through 1 X O T1 TO Noncacheable, m~mapped I/) area 1 X 1 T1 TO Noncacheable, bootstrap R - referenced bit, D= dirty blt Five of the ~ystem tags are brought outside the cache-MMU for decoding by t~e aystem. Tag T2 is used to differentiate between bootstrap and I~O qpace. Tag T4 is used to dirferentiate between memory space, and boot or I/O ~pace. The UNIX operating system (e.g. UNIX) can use tags TO and T1. $herefore, TO and T1 cannot be u~ed by the qystem designer unless the operat~ng system is known to not use them. The~e four tagq are only valid when the cache-MMU ha~ acquired the sy~tem bus. These signals ~re buQqed together with tags from other ca¢he-MMU'3.

ST(O O 1 x x x x) : Common, Write Through When v1rtual page O is detected in the TL9 in supervisor mode, page O of real memory i~ assigned. This first page of real memory can be RAM or ROM and contain~
Vectors for traps and interrupts. This hardwired translation only occurs in Supervi~or state. The most signiricant 20 bits of the real addreQs are zero.

ST (1,X,1,X,X,X,X) When page 6 and 7 in virtual memory are addressed, the system tags are output from the hardwired TLB. This translation occur~ only in supervisor state.
Pageq 6 and 7 of virtual memory map into pages 0 and 1 of boot memory.
The most significant 19 bits o~ the real addre~s are zero and b$t 12 is a 1 for page 1 of boot memory, and a O for page 0.
The boot memory real 3pace lq not ln the real me~ory ~pace.

ST (1,X,O,X,X,X,X) Memory Mapped I/O

~ 3 Pages 4 and 5 in the vlrtual ~pace, when ln quper~isor mode, have a hardwired translation ln the TLB. The most qignificant 1~ bit3 of the tran~lated real addreqs are zero. The I~0 sy3tem must decode ~ystem tag3 T2 and T4, which ind1cate memory mapped I~0. Further decoding of the most significant 20 bit o~ the real addres~ c~n be u3ed for additional pages of I/0 co~mand3. Each real page ha~ 10~4 commands, performed by read (word) and wtore ~word) to the correspondlng locat~on.
When this hardwired page is detected in the TLB
or page table entry, the read or write command i~ acted upon a~ if it were a nencacheable read or write.
The u~e and allocation Or the I~0 3pace is a~
follows:
/0 in Supervisor Mode, mapped or unmapped, pageq 4 and 5 Page~ 4 and 5 of the virtual addre~s 3pace are mapped re~pectively into page~ 0 and 1 of the I~0 address space by the hardwired TLB entries. Page 4 is used for co~mands to the cache chlp.
/0 in Superviqor Mode, mapped, additional page~.
I/0 space can al90 be dsfined in the page table. The I/0 command i~ identified by the appropriate tag bits. Any virtual addre~, except pageq 0-7 can be mapped to an I/0 page, not including 0 and 1.

I/0 Space in the Cache An I/0 addre3s directed to the cache chip should be interpreted as follows:

Cache I/0 Space Page 4: cache I/0 space Addresqe~ 00004000 ~ 00004BFF - D-cache Ad~res~es 00004C00 - 00004FFF - I~cache - .

~3~ ~ ~!

Page 5 t qy3tem I~O space Addres~e3 00005000 - 00005FFF

Cache IiO Command~

VA<31:12> = O O O O 4 Hex VA<11:0> = RA<11:0>

Bit 11 - 0: speci~ies D-oache I~O space Bit 0: 0 = data part; 1=addre~s part Bit 1: 0 = ~ X (oompartment) Bits 2-3: word po~ition Bit~ 4-10: line number Bit 11=1 9 ~its 8~9 = 0: qpecifies TLB
Bit 10:. 0: D-cache; 1: I-cache Bit 0: 0 - lower, 1-upper Bit 1: 0 - W; 1 = X
aits 2-8: line number Other:
Bit 10=1; I-cache, Bit 10=0: D-cache 1 x 0 1 ~ O O -- supervl~or STO
1 x 0 1 ---- O 1 -- u~er STO
1 x 0 1 --- 1 0 -- F Reg. ~virtual address of fault) 1 x 0 1 ---- 1 1 -- E R~g. ~phy~ical cache location of error) 1 x 1 1 0 0 0 0 0 1 - reset cache LV all 1 x 1 1 0 0 0 0 1 0 - - reset TLB SV all 1 x 1 1 0 0 0 1 0 0 - - re~et TLB UV all 1 x 1 1 0 0 1 0 O O - - reset TLB D all 1 x 1 1 0 1 0 0 0 0 - - reset TL3 R ~ll .

Store Word ST(0,1,0,X,X,D,R) - PriYate, Copy ~ack A. LV 1~ 1, and HIT: Wr~te word ln line and ~et line and page dirty blt.

B. Miss - Line to be replaced Not Dirty: Read quadword from ~emory and ~tore in line. Write word in new line and set line and page dlrty.

C. Mis~ - Line to be replaced Dirty: Write dirty line ~ac~ to memory. Read new quadword into line. Write word in new line and set line and page dirty.

ST(O,O,O,X,X,D,R) - Pri~ate, Wrlte Through A. LV 1s 1, a~d HIT: '~rite data word ln line and to memory. Set page dirSy bit.

B. MLss: Write word in memory and 9et page dirty bit.

ST(0,0,1,X,X,D,R) - Common, Write Through A. LV is 1 and HITs Write data word in l~ne and to memory. Set page dlrty blt.

B. Miss: Write word in memory and 9et page dirty bit.

ST(0,1,1~X,X,D,R) - Noncacheable A. Write word in main memory, If a hit, then purge.

-St~r~ t~ fw d -ST(0,1AO,X,X,D,R) - Private, Copy Back A. LV 1s 1, and HIT: Write byte or halrword in line and 3et line and page dlrty bit.

B. Mis3 - Line to be replaoed i5 Not Dirty: Read quadword from memory and ~tore in line. Write byte or halfword in new line and ~et line and page dirty~

C. Mi~ and Line to be replaced i~ Dirty: Write line back to memory. Read new quadword into line. Write byte or halfword ln new line and set line and page dirty.

ST(O,O,X,X,D,R,~ - Pr~vate, Write Throu~h A. HIT: Write byte or halfword in line. Copy modified word from cache line to memory.

B. MISS: Read word. Modify by~e or halfword. Write modif$ed word from cache 11ne to memory.
(Read~modify/write cycle.) (No write allocate.) ST(0,0,1,X,X,D,R) - Common, Write Through A. LV is 1, and HIT: Write byte or halfword in line.
Write modified word ~rom cache llne to memory.

B. MISS: Read word. Write byte or halfword in line.
Write modified word from caohe line to memory.
(Read/mod~fy/writ~ cycle; no write allocate.) .

3~ J~ J

ST(0,1,1,X,X,D~R) - Non-Cacheable A. Read word into ca¢he chip. Update appropriate byte~halfword and wrlte modiried word back to main memory.

Te~t and Set ST(0,1,1,X,X,D,R) - Non-Cacheable Read main memory location, test and modify word and store back at ~ame location. Return original word to CPU.

Memory bus is dedicated to cache until this operation i9 complete.

If the following system tag occur~ wh~le executing this instruction, an error condit~on will occur.

1 X X X X X X (m/m I/0 space or boot qpace) Read Word/~yte/Halfword ST(0,1,0,X,X,D,R) - Private, Copy Back A. LV is 1, and HITs Read word from cache to CPU~

B. Mi~s Line to be replaced Not Dirty: Read new quadword from memory into cache. Read word to CPU.

C. Ml~s - Line to be replaced is Dlrty: Write line back to memory. Read new quadword from memory ihtO
cache. Read word to CPU.

: , ~ ~ ~3 - 59 ~

ST(O,O,X,X,D,R) or ST(0,0~1,X,X,D9R) - Write Through A. LV is 1, and HIT: Read word ~rom cache to CPU.

B. Mi~: Read new quadword into line. Read word into CPU .

ST(0,1,1,X,X,D,R) Non-Cacheable A. Read word from ma~n memory to CPU.

Co~mon Write From Cache To Memory ST(0,0,1,X,X,D,R) - Common, Write Through All cache~ examine the bu~ and ir ~here i3 a hit, invalidate the line in cache. If there is not a h~t, ignore the bus.
When an I/0 ~ystem is reading data ~rom the cache or main memory, the real address i5 examined by the caohe and the following action take~ place. The TLB is not acces~ed.

A. LV is 1 and HIT, and LD is 1: Read a word or a line from Cache to I/0 a. MISS: Read a word, quadword, or 16 words from memory to I~0.

When an I~0 i9 taking place to main memory, the real address is examined by the cache and the following action taken. The TL3 is no~ acces~ed and therefor~ the ~irky Bit i9 not changed in the pase tabl~ or TLB.

, ~

~3 A. LV is 1 and HITs ~rlt~ a word, quadword or 16 word~
from I~O to memory. InYalidate llne or line~ in cache.

. MISS: Write a word9 quadword, or 16 word~ from I/O to memory.

Vlrtual addre3s to real address mapping system ~nformation i~ uniquely stored in each line ror each of the W and X halves of the cache memory subsy3tem. This provide for extremely high-speed translation of virtual to real addres~e~ to accelerate mappin~ of the vlrtual to real addre~s ~pace, 90 a~ to facilitate nece~ary in/out swapplng procedure~ wlth ~econdary storage sy~tem3, such a~ through the I/O prooes~or 150 of FIG. 1. The sy~tem lnYormation in each line o~ storage in the TLB memory sub~y~tem 350 provides all neces~ary protection ard rewrite information. The used bit for each ~ubsystem line provldes indlcation for rewrLting into the lea~t recently u3ed half of the memory ~ubsy3tem. Other replacement strat~gieq could be implemented.
Where a high-~peed communication-q ~tructure i~
provided, such as in a monolithic lntegrated cache-MMU, thLs cache-MMU system arohltecture enhance~ very high-speed cache system operation ard provldes ror great applications ver~atility.
As illustrated in FIG. 14, the quad word boundary can be utilized to advantage in a line register architecture. The memory array of the cache memory 320 o~ FIG. 9 is coupled to a line register 400 which contalns four word~ Or word storage within a line boundary. The cache memory 3ystem 320 outputs four words at a time per cache hlt to the line reglster~ 400 which 3electlvely store and forward the quad word output from the ca¢hc memo~ry sub~ystem 320 to the cache output regi~ter, ~uch a~ COR 230 of FIG. 8~ Thl~ tran~fer .~' '.,' ' clear when the ~'~uad boundary equal~ zero" comparator output occurq. The output Or the cache output re~l~ter of the ~ystem in~erface of the cache-MMU ystem la therearter coupled to the addresY data function code (i.e. ADF) bus Or the proces~or/ca¢he bus (i.e. bu~e~ 121 or t31, and bu~ 115 of FIG. 1).
The accumulator register (i.e. 310 o~ FIG. 9) i9 also coupled to the proces30r/cache interface bus to receive addreqs information therefrom. I~ the cache memory management unit i~ configured as a data cache, the accumulator regiqter ~tores the address from the proce ~or/cache bus ~or use by the cache memory ~ub~ystem. If conflgured a~ an in~truction cache, the accumulator reglster 310 ls con~1gured as a program counter, to both receive addre~s in~ormation from the processor/cache interface bus, and to increment itself until a new authorized address is received from the proce~sor/cache bus.
The output from the accumulator regi3ter 310 i~
coupled to a quad line boundary register 410, quad boundary comparator 420, and state control logic 430.
The quad-word line boundary regi~ter 4tO ~tores the ~tarting address o~ the quad-word line boundary ~or the words stored in the line regi~ter 400.
The output o~ the quad-word line boundary regiYter 410 i~ coupled to quad-word line boundary comparator 420. The comparator 420 compares the register 410 output to the virtual addre3q output of the address reglster (i.e. accumulator-regi~ter 310) to determine whether the requested word is within the current quad-word boundary for the line register 400. The state control logic 430 then determines the selection of either the line register 400 output or the access to the cache memory subsystem 320. The control logic 430 then selectlvely multiplexe~ to select the appropriate word ~rom the line registers.

~3 FIC. 15 illustrates the load t~m~ng for the cache-MMU systems 120 and 130 of FIG. 1. In the preferred embodlment, this i~ Or data withln quad word or 16-word boundarie~. Alternatively, thi~ can be for any size block of data. FIG. 15 illustrates the operatl~n o~
the data cache 130 loadlng from the CPU 110, or alternatively of the in~truotlon cache 120 loadlng on a branah operatlon. The master clock MCLK signal output of the ystem clock 160 of FIG. 1 is shown at the top of FIG. 15 with a time chart indicating 0, 30, 60, 90 and 120 nanosecond (i.e. n~) point~ from the start of the load cycle.
At the beginning of this cycle, a valid address i~ loaded from the CPU to the accumulator regl~ter of the respeot1ve cache-MMU system, and a funct~on code i-q provided to indicate the type of transfer, as discussed in greater detall elsewhere herein. The On~ polnt occur~
when the ASF signal is valid indicating an address ~trobe in proce~q. If the data requested is on a quad l~ne boundary for a new acces~, the data i3 available at the halfway point between the 90 and 120 nanosecond points of MCLK. However, where the acces3 i9 for a reque~t within a quad word boundary, the data acce~s ti~ing i~ much faster (e.g. at the 60ns point), a~ shown with the phantom lines on the ADF signal waveform, indicating data transfer within a quad line boundary.
Referring to FIG. 16, the ~tore operation for the cache-MMU sy~tems 120 and 130 of FIG. 1 is illustrated for ~torage from the CPU to the oaohe in a copyback mode, and additionally to ma~n memory 140 for the write-through mode. The master clock, MCLK, i9 output from the y~tem clock 160, a~ illu~trated in FIG.
15 a~ a re~erence llne. At time T1, the address ~trobe sign~l ig activated indicating a valid address follows.
At time T2, approximately one quarter MCLK clock cycle later, valid addres~ and functlon code outputs are recelved on the appropriate lines of the processor/cache ,.

3~ J

lnter~ace bus, PDF and FC, respectively. At tlme T3, the addres~ l~nes ara tr~-Qtated (floated) and data i~
wrltten to the cache memor~ and/or to the main memory, as appropriate, Multiple data word~ can be tran~ferred.
Single, quad or 16-word mode i9 determined by the funotion code on the FC lines. At time T4, the reqpon~e coda i~ output indicating that the tran~fer i8 complete, ending the cycle.
Both Copy Back and Write Through main memory 140 update ~trategies are available in the cache-MMU and can be intermixed on a page basi~ Control bits located in the page table~ are loaded into the TLB to determlne whlch strategy i9 u~ed~
Copy back wlll generally yield higher performance. Data is written back to main memory only when it ~ removed from the cache-MMU. Those write~ can be largely overlapped with fetcheq of blocks into the cache. Thu~, copy back will in general cut bu~ traffic, and will minimlze delays due to queueing on ~uCceQsive write~.
Write through has two advantages. First, main memory i~ always up to date, ~y3tem reliabllity is lmproved, ~ince a cache chip or processor failure will not cau~e the loss of main memory content~. Second, in a multiproces~or ~ystem, write through facilitates the maintenance of conq~tency between maln memory ~hared among the proce~ors~
The operating ~ystem can make the~e tags which determine wrlte through vq. copy back svailable to the u~er9 90 that they can make the appropr~ate choice.
FIGS. 17A-B illustrate the data flow of operation3 between the CPU 410, the cache-MMU 412, and the main memory 414. Referring to FIG. 17A, the data flow for a copy-back fa3t write operation i~
lllustrated. The CPU 410 outputs data for storage in the cache-memory mana8ement unit 412. This dirtie~ the content3 of the cache memory for that location. On - ~4 -purge, the cache-memory management unlt 412 rewrite~ the dirty data to the re~pective private page in main memory 4140 The processor 410 ean simultaneously wrlte new data into the cache-MMU 412 torage locatlons which are belng purged. This provides the adv~ntage o~ rast overall operatlon3 on write.
Referrlng to FIG. 17~, the write-through mode Or operatlon i~ illustrated. Thl~ mode malntains data con~istency, at some sacrlfice in overall write speed.
The CPU 410 write~ simultaneou~ly to the cache memory of the cache-memory manasement unit 412, and to the ~hared page in the main memory 414~ Thl~ in~ure~ that the data ~tored at a particular looation in a shared page i~ the most current value~ as updated by other program~.
Referrlng to FIG. 18, the data flow and ~tate flow lnteraction of the CPU 510, cache memory subsy~tem 512, and TLB/memory subsy~tem 514 are illustrated. Alqo illu~trated l~ the interaction of the cache-MMU and CPU
with the main memory 516, illu~trating the DAT opsratlon ~or copyback and write-through modes, and the temporal relationship of events.
The CPU 510 outputs a virtual addres , at step one, to the TLB/memory ~ubqy~tem 514 which outputs a real addre~q to the cache memory ubsystem 512, at step two.
If a write--through operation i~ occurring or on a cache ml~ 9 the real addres~ i~ also ~ent to the main memory 516. On a DAT operation, a portion of the vlrtual address plu~ the segment Table Orlgin addre~s are sent to main memory at step two.
At step three, for the store mode, data i~
written out from the CPU 510 for storage ln the oache memory subay~tem 512 for both copyback and write-through mode~, and addit~onally for ~torage in the main memory 516 for the write-through mode. For the load mode o~
operation, 3tep three con ist~ o~ data being loaded from the cache memory ~ub~y~tem 512 to th~ CPU 510. On a cache mi~, data i5 loaded from the main memory 516 to ~3 th~ cache memory sub~y~tem 512 and to the CPU 510 during step three. On a cache miss in copyback mode, when dirty data is pre~ent in the cache memory (i.e7 the dirty bit is ~et)~ the memory subsystem 512 outputs the dirty data back to the main ~emory 516.
Referring to FIG. 19, the data flow and opGration of the DAT and TLB addre~s translation process are illustrated. '~hen a virtual addre~s requires translation to a real addre~s, and there are no translation value~, corresponding to the requested translation, ~tored ln the cache memory management unit system, the operation as illuqtrated in FIG. 19 occurs.
The reque~ted virtual addres~, as ~tored in the virtual address reglster-accumulator (l.e. 310 of FIG. 9), provides a virtual address l'VA" (e.g. 32 bits) which require~ translation. As di~cussed with reference to FIG. 7C, the virtual addre3s i9 comprised of 10 bits of segment data virtual address VA<31:22>, 10 bitq of page addres , VA<21:12> and 12 bit~ of diQplacement addreqs, VA<11:0>.
The DAT losic performs the dyna~ic addres~
tran~lation when there i9 a mi3s in the TLB. The DAT
logic waits for the write register to be empty and then performs two read aceesses to main memory. The flr3t read adds the Qegment number to a segment table origin (STO), and obtains the address of the page table. The second read add~ the pag~ umber to the page table origin, and gets the real addres Or the page, as well as other useful information such as protection bits, copy back~write through status, dirty bits, etc. For each ne~
user or prseess a new segment table origin can be used.
The STO register in the DAT is loaded under CPU
control. There are two STO registers, one for user mode, and the other for supervi~or mode. The hardware automatically selects the proper register depending on themode in the proceq~or status word (PSW).

' 1;~8 The acces~ protectlon bit~ ~n the page tables are checked by the ~AT logic for protect viol~tlons. If they occur, a CPU trap 1~ 8enerated. If a parity error occur~ durln~ a DAT operatlon while rea~lng maln memory, suoh that the data is not ¢orrected and hence su3pect, a CPU trap is generated.
A PF bit ln the page table or segment table i~
the page ~ault indic~tor. The bit i~ ~et or re~et by the ~ortware .
The ~ystem can be in a non mapped mode, with no virtual addressing. In this mode, the DAT facility i inactive and the protection bits are not used. However, thi~ mode ~hould be used only rarely, due to the vulnerability of the sy~tem to bug~ and malicious damage.
After the DAT logic has completed a translation, the Virtual Address, Real Address and System Tagq are sent to the TL~, where they are ~tored for future use until replaced.
The DAT will re~pond to the followin~ Memory Mapped I/0 Command~:

o Load Supervisor ST0 Register (privileged) o Read Supervisor ST0 Register o Load User ST0 Register ~privileged) o Read U~er ST0 Register o Read Virtual Addres~ that cau~ed page or protectlon fault.

This i~ discussed ln greater detail wi~h re~erence to FIG. 2~.
As discussed hereinafter with reference to FIG.
21, the cache memory management unit system includes a register stack. This resl~ter ~tack contains a segment table origin (i.e. ST0) regi~ter therein for each of the ~upervisor and user segment table origins for the then current supervi~or and user, Por the re~pectlvs cache-me~ory management unLt. The segment table origin reglster contalns a 32-bit ~alu~, the mo~t slgni~icant 20 bits o~ which repre~ent the segment table origin Yalue.
As illiustrated in FIG. 19, thi~ STO value is concatinated as the most ~ignif1cant portion of a word in an STO Entry Address Accumulator, with the 10-bit ~egment addre~ from the virtual addres~ reglYter 310 concatinated a the next mo~t ~ignificant portion of the word in the STO Entry Address Accumulator. The re3ultant 30 bit address forms a pointer to a segment table in the main memory.
The Segment Table Xntry Addres~ (i.e. STOEA) accumulator, within ths cache memory management unit, accumulate~ and concat~nates the address to be output to the main memory so a~ to address the qegment table in main memory. A 32-bit addres~ i9 con~tructed by utilizing the ~egment table origin 20 bits as addre3s bits STOEA<31:12>, utilizing the virtual addreq~ ~egment bits CVA31:22] a~ the next ten bits, STOEA<11:2>, of the qegment table addre~, and concatinating zeros for bit pos1tlons STQEA<1:0> of the ~egment table addre~s which is output to main memory from the STOEA accumulator. The ~egment table entry address output from the ~egment table entry addre~ accumulator of the cache-MMU is output via the ~y~tem bu~ to main memory. Thi~ provides access to the re~pective page table entry (i.e PTE) within the segment table in main memory corresponding to the segment table entry address output from the cache MMU system.
The mo~t ~igni~icant 20 data bits, 31:12, of the addressed main memory locatlon are output from the main memory back to the eache-MMU for storage in the Page Table Entry Addre~s (i.e. PTEA~ accumulator in the DAT of the caohe MMU sy~tem. These 20 bit3 of the page table entry address are concatinated in the P.T.E.A.
accumulator as the most significant 20 blts of a 32-bit words. The next most significant 10 bit~ are ¢oncatinated with the output from the virtual addresY
regi~ter 310, bit~ VA<21:12~, repreqenting the pa~e 1.;~3 ~electlon bits. The lea~t two sieniricant bit3 of the page table entry addre~s accumulator output are zeros.
Thc page table entry addres~ accumulator o~ the cache-M~U
outputs a 32~bit addre~ to the main memory vla the system bu~.
The page table entry address select~ the entry point to a line ln the page table in main memory. Each line in the page table ls compriqsd o~multlple fields, comprl~ing the translated real address, ~ystem tags, protection, dirty, referenced, and page fault values for the corresponding virtual addre~s. The selected line from the page table contains, as illuqtrated, 20 bits of real addre~s "RA", five bits o~ sy~tem tag information ST, four bits of protection level information PL, one bit of dirty information D, one bit of re~erenced information R, and pase fault informatlon PF. These fields are dl~cussed in greater detail with reference to FIGS. 1lA-B.
The qelected line from the page table is i9 tran~ferred from the main memory back to the TLB in the cache-MMU for storage in the memory array Or the TLB.
Next, the 20 bit~ of real address from the TLB, for the Ju~t referenoed line ln the page table, are output and coupled to the mo~t significant 20 bits of the Real Address accumulator in the cache-MMU. These 20 bits are concatinated in the Real Address accumulator as the mo3t qlgnificant 20 bits, with the least signiricant 12 bitq Or the virtual address reglster 310, VA<11:0>, providing a 32-bit real addre~ output from the Real Address Accumulator. This output from the Real Addre~s accumulator is then output, via the system bu~, to maln memory to Qelect the desired real addre~s locat~on.
Re~ponsive to this Real Address output, a block o~ wordq i~ transferred back to ths cache memory 3ubsystem for storage therein. The cache-MMU then trans~ers the inltlally requested word or word~ oP in~ormation to the CPU. The procedure illu~trat~d in FIG. 19 i~ only needed 3 ~ J

when the virtual address contalned ln the regiqter accumulator 310 doe~ not ha~e corresponding translat~on values ~tored in the TLB of the cache-MMU. Thus, for any addressable locatlons presently -qtored in the cache MMU, translat~on data is already pre~ent. Th~s would include all case~ o~ write-back to main memory Prom the cache.
Referring to FIG. 20, a block diagram of the cache-MMU is illu~trated. The proceq~or to cache bus, 121 or 131 o~ FIG. 1, eouples to the CPU lnterface 600.
The cache memory sub3y~tem 610, TLB ~ub3y~tem 620, regi3ter ~tack 630, ~ystem interface 640, and microprogrammed control and DAT logic 650 ~re ail coupled to the CPU lnter~ace 600. A vlrtual addre~ bu~ (i.e.
VA) ~9 coupled ~rom the CPU lnterface 600 to each of the cache 3ub~ystem 6109 TL8 subsy~tem 620, and regl3ter stack ~ubsystem 630. A data output bus (l.e. DO) from the cache sub~ystem 610 to the CPU interface 600 couple~
the data output fro~ the memory subsystem o~ the cache memory ~ub~ystem 610, illustrated as D0~31:00].
A bi-directional data bu~, designated nDT[31:00] provides selective coupling of data, virtual addre~, real addre 9, or ~unction code, depending upon the operation being performed by the cache-MMU. The nDT
bu~ couple~ to cache-MMU ~y~tem element~ 600, 610, 620, 630, 640, and 650. The ~y~tem interface 640 couple~ to the ~y~tem bus on one side and couples to the nDT bus and the SYRA bus on the lnternal cache-MMU side. The SYRA
bus provlde~ a real address from the system buq via the ~y~tem inter~ace 640 to the TLB 620 and cache subsy.ste~
610. As illustrated, the lea~t ~ignificant 12 bits, representing the displacement portion of the address, are coupled to the cache memory subsy3tem 510. The moqt signi~icant 20 bit~, SYRA~31:12~ are coupled from the SYRA bus to the TLB subsystem 620. The control and DAT
logic 650 coordlnates ~y3te~ bu~ lnterface after a TLB
620 mi~s or cache ~ub3y~ta~ 610 m~s, and control3 DAT
operatlons.

~ J

Re~erring to FIC. 21, a more detailed block diagram of FIG. 20 ig illustrated. The cache output register 601, cache lnput reglster 6Q3, and addre~s input reglst~r 605 cf the CPU inter~ace 600 are deqcr~bed ln greater detail with reference to FIG. 8. FIC. 21 further ~llustrateq the multiplexer 602, read-wrlte logic 604 ~or performlng read~modify~write operations, function code register 606 and trap encoder 607~ ~
The read/modify/write logic 604 coordinates multiplexlng of the cache memory subsystem output, via multiplexer 614 from the cache memory 611 of the cache memory subsystem 610, and vla multiplexer 602 of CPU
interface 600 for ~elective interconnectlon to the cache output re~iqter 601 and thererrom to the proces~or/cache bus. Alternatively, thc mult~plexer 602 can receive data from the system bus lnterface 640 via the nDT bu~
internal to the cache-MMU system, or from the read/modify/wrlte logic 604. The RMW logic 604 ha3 as lnput~ thereto the cache output regi~ter 601 output, and the cache input register 603 output. The function code register 606 and trap code encoder 607 are coupled to the processor. The function code register 606 is in responsive to function code~ received from the proce~sor for providing signals to other portion of the cache-MMU
qy~tem. The trap logic 607 respond~ to error faults ~rom within the cache-MMU system and provides output~ to the processor r-esponsive to the trap losic for the given error fault.
The cache memory sub~yste~ 610 is comprised Or a cache memory array 611 having two 64-line cache stores, as desQribed with reference to FIG. 9. The quad word output Prom each o~ the W and X halves of the cache memory array 611 are coupled to respective quad-word line resister~ 612 and 616. Quad word register~ 612 and 616 are each independently coupled to the nDT bus, for coupling to the proce~sor~cache bus via the CPU interface 600 or the system bus via via the sy~te~ interface 640.

The real address outputs from the W and X
halve~ of the cache memory array 611 are coupled to one ~nput each of comparator~ 615 and 617, respectlvely, each of whlch provide a hit~mis~ ~lgnal output. The other lnputs of each of the comparators 615 and 617 are coupled to the output oP multlplexer 618. The multiplexer 618 output~ a real addres~. The real addreqq input~ are ooupled to the multiplexer 618 from the ~ystem bus interface 640 via the SYRA bus therefrom, and from multlplexer 622 of the TLB ~ub~y-~tem ~20 which provides a tran~lated real addresq from it~ TLB memory array 621 respon~ive to a phy~ical addresq received from the proces~or/cache buq vLa the CPU interface 600.
The quad word regi~ters 612 and 6t6 each have independent output~ coupling to multiplexer 614.
~ultiplexer 614 selectively output~ the word of .~elected information to multiplexer 602 for selective coupling to the cache output regi~ter 601.
A~ diqcu~ed with re~erence to FIG. 9, multiplexer 613 ~electi~ely couples a lower portion of a real addre~q, either from the CPU interface 600 or from the TLB 620, to the multiplexer 613 for selective output and coupling to the cache memory array 611, to ~elect a line therein.
The TLB memory array 621 qelectively provides output from a ~elected line therein respon~ive to either an addre33 from the nDT bu3 or an addre~ ~upplied from the CPU interface 600 as output via the addres~ input reglster 605. A portion (l.e. lower portion bits 12 to 0) of the v~rtual addre~s output of addre~3 input reglster 605 is coupled to the TLB memory sub3y~tem 621, and a more ~igniflcant portlon (i.e. bit~ 31 to 22) 15 coupled to one input each of comparatorq 623 and 624 of the TLB 620. The tran~lated v~rtual addre~q output fro~
the TLB memory array qub~y~tem 621, for each of the W and X halveq~ a~ di~cus~ed wlth regard to FIG. 9, are coupled to the other input~ of comparator~ 623 and 624.

Comparator 623 and 624 each provlde independent hit/mlsa slgnal output~. The multiplexer 622 has Real Addre~s inputs coupling thersto as output from the W and X halveq o~ the TLB memory subsystem 621. The multlplexer 622 qelectlvely provide~ output o~ the translated real address to the lnput of multiplexer 618 Or the cache memory sub~y~tem 610, responsive to the hit/mi3s outputs of comparators 623 and 624.
The address protection logLc 625 provides selective protectlon of read and write acce~s for certain TL8 lines, responsive to information as initially loaded from the page table entry as diqcu~sed with reference to FIG. 19.
The regiqter stack 630 provides for storage of segment table origin values in two sesment ta~le original registers. The reg1ster stock 630 includes segment table origin supervisor and user resister3, a fault addresq reg~.ster F, and other re~iqters, ~uch as an error address register.
The control and DAT logic 650 provides direct address translation logic, ~etch logic, write logic, read logic, and I/0 command operational log~c.
Referring to FIG. 22, a detalled block diagram o~ the control logic microengine 650 of FIG. 21 l~
illustratedO The microengine is comprised of a read-only memory 700 and a microengine operational subsystem comprising program counter 710, stack pointer 715, in.qtruction register 720, vector generator 730, condition code signal selector 740, signal controller and instruction decoder 750, and output register 760.
The program counter 710 is co~prised of a program counter-accumulator register 712, a multiplexer 713, and increment logic 711. The multiplexer 713 provides a signal output to the program counter-accumulator regi~ter 712 reQponsive to a multiplex select signal MUXSLT, as output from the signal controller/in tructlon decoder 750. This 3elects one J

of: the ei~ht bi t vector addrea~ QUtpUt9 from the vector generator 730j the output of the next sequential program counter addresq from the increment logic 711, re~pon~lve to a PC increment q~gnal PCINC as output ~rom the signal controller/lnqtruction decoder sy~tem 750; or a branch addre~s a~ output ~ro~ the branch addre~3 r~gister o~ the in~truotion register 720. The output o~ the multiplexer 713 i~ coupled to the program counter accu~ulator regiqter 712 for selective output therefrom a~ a PC
output addre~ PCOUT. PCOUT l~ coupled to the increment logic 711, to the qtack polnter 715, and to the addreaa ~election inputs of the read-only memory ~ubsy~tem 70~.
A~ illustrated in FIG. 22, the memory 700 includes 256 lines of 52 bit~ each, each lLne having an instruction and/or data value to be output to the in~truction register 720 and/or the output regi~ter 760. The most significant bit po~ition (i.e. output bits 51 to 48~ are coupled from the read-only memory subsy~tem 700 to the Type o~ Instructlon register 723 o~
the Instruction Regiater 720. These bits indlcate whether the remaining bit~ of the line comprise an in~truction or control signal output. The remaining bitq of the line (l.e. bits 47 to 0) are coupled to the output reBi~ter 760, and to the instructlon regi~ter 720. These blts are coupled tc the branch address register 721 (i.e.
bits 40 to 47 o~ the read-only memory 700 output) and to the condition code register 722 (i.e. bits 26 to 0).
The output from the instruction register 723 i~
coupled ~rom the instruction register 723 to the signal controller 750. The ln~truction register 723 outputa instruction type inrormation, respon~ive to a ~Rhold signal as output ~rom the signal controller 750. For example, ut~lizing bita 48 to 51 Or the read-only memory 709 output, a 000 could indicate an output instruction, OOt a branch instruction, 010 a call instruct~on, 011 a wait instruction, 100 a return instructlon, 101 and 110 vector operation~, and 111 a no-op operation.

The output Or the condltion code regL3ter 722i5 coupled to the condition ~gnal ~election logic 740.
The condition code decoder 740 al~o ha~ condltion code and statu~ input~ coupled to it. The~e slgnal~ indicate a cache or TLB mi~q, a functlon code to tell the status of the operation ~uch as read or wrlte, 3tatu~ and condition code information, etc. The condition code decoder 740 provide~ a "token" output to the signal controller 750 to indicate ~tatu~, and ~urther output~ a vector number to the Yector generator 730. The comblnatlon Or the ml~ and/or function code information def~ne~ the de tlnation address for the vec~or process.
The signal controller 750 provides vector ~ignal timing output~ e. YCT~9 VCTc) coupled ~o the vector generator 730. Where a vector operation i9 indicated, the vector address i9 loaded from the vector generator 730 into the program counter accumulator 712 via multiplexer 713, and the PC counter 710 i9 incremented tp ~equence instruction~ until the vector routine i~ completed.
The branch addre~a register 721 qelectively output~ branch addre3~ signals to the program counter 710 for utilization thereby in accordance with control ~ignals a~ output ~rom the slgnal controller and in~truction decoder 750. Output of signal~ from the output regi ter 760 are re~ponsive to the ~elective output of an output register hold "OR hold" ~i~nal from ~ignal controller 750 to the output regi~ter 760. The s1~nals aY output from the output r~glster 760 are coupled to other area~ of the cache-MMU syatem (i.e.
control ~ignal~ and/or data) for utilization by the other area~ of the cache MMU ~y~tem.

While there have been descrlbed aboYe varlou3 embodiment3 of the preqent invention, for the purpo~es of illustrating the manner in which the invention may be u~ed to advantage, it will be appreciated that the inven-tion 1~ not llmited to the disclosed embodiment~.
Accordlngly, any modification, variation or equivalent arrangement ~ithin the ~cope of the accompanying claim should be consldered to be wlthln the s¢ope Or the inven-tion.

Claims

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:

1. A method for communicating instructions between an instruction cache and a processor comprising the steps of:
(a) storing a first address value in a counter;
(b) addressing the instruction cache with the value stored in the counter;
(c) communicating said instruction stored in the addressed location of the instruction cache to a multistage instruction buffer;
(d) serially communicating the instructions stored in the instruction buffer to the processor;
(e) generating a cache advance signal when a stage in the instruction buffer is empty;
(f) independently incrementing the counter in response to the cache advance signal;
(g) repeating steps (b) through (f) until the occurrence of either one of a context switch or a branch.

2. The method according to claim 1 further comprising the steps of:
(h) storing a second address value in the counter upon the occurrence of either one of a context switch or a branch; and (i) repeating steps (b) through (f) until the occurrence of another one of either a context switch or a branch.

3. A microprocessor comprising:
an addressable cache memory;
execution means for processing digital information received from the cache memory; and interface means, coupled to the cache memory and to the execution means, for retrieving digital information from the cache memory and for communicating the retrieved digital information to the execution means, the interface means comprising.
a counter coupled to the memory, a value stored in the counter being used for addressing the cache memory;
address communicating means coupled to the counter for communicating an address from the execution means to the counter;
address storing means, coupled to the counter and to the address communicating means, for storing the address in the counter; and incrementing means coupled to the counter for selectively incrementing the initial address stored in the counter;
wherein the address communicating means operates independently of the incrementing means.

4. The microprocessor according to claim 3 wherein the address communicating means communicates the address to the counter in response to the occurrence of a prescribed event in the execution means.

5. The microprocessor according to claim 4 wherein the address communicating means communicates the address to the counter only in response to the occurrence of the prescribed event in the execution means.

6. The microprocessor according to claim 4 wherein the prescribed event is one of a context switch or a branch.

7. The microprocessor according to claim 6 wherein the incrementing means generates a cache advance signal for incrementing the counter.

8. The microprocessor according to claim 7 wherein the cache memory comprises a separate addressable instruction cache memory and a separate addressable data cache memory, and wherein the counter is coupled to the instruction cache memory for addressing instructions stored therein.

9, The microprocessor according to claim 8 wherein the interface means further comprises instruction retrieval means for retrieving instructions addressed by the counter,and for communicating the retrieved instructions to the execution means.

10. The microprocessor according to claim 9 wherein the incrementing means repetitively generates the cache advance signal after an address is stored in the counter so that the instruction retrieval means repetitively communicates instructions from the instruction cache to the execution means independently of any further address being stored in the counter by the address storing means.

11. The microprocessor according to claim 10 wherein the instruction retrieval means further comprises:
means for receiving clock signals from the execution means; and a multistage instruction buffer for serially storing instructions received from the instruction cache memory and for communicating the stored instructions to the execution means in response to the clock signals.

12. The microprocessor according to claim 11 wherein the instruction buffer further comprises:
buffer advance means for serially shifting the plurality of instructions through the buffer stages; and cache advance means, coupled to the buffer advance means, for generating the cache advance signal when the plurality of instructions are shifted through the buffer stages by a prescribed amount.

13. The microprocessor according to claim 12 wherein the operation of the address communicating means and the incrementing means are mutually exclusive.