EP0591405A1

EP0591405A1 - Multiprocessor array

Info

Publication number: EP0591405A1
Application number: EP92914447A
Authority: EP
Inventors: Kin M. Ho; Dietmar M. Kurpanek; Adam W. K. Li; Jonathan W. Liu; Brian J. Sassone; Tahir Q. Sheikh; Sam Tam
Original assignee: Unisys Corp
Current assignee: Unisys Corp
Priority date: 1991-06-20
Filing date: 1992-06-19
Publication date: 1994-04-13
Also published as: WO1993000639A1; JPH06508707A

Abstract

Système informatique multiprocesseur dans lequel un processeur de base est couplé en réunion logique asynchrone à un adaptateur d'entrée/sortie associé, le processeur ainsi que l'adaptateur E/S comprenant chacun une antémémoire privée associée par laquelle ils sont l'un et l'autre reliés à un bus MP partagé commun par l'intermédiaire d'un seul canal connecteur.A multiprocessor computer system in which a base processor is coupled in an asynchronous logical union to an associated I / O adapter, the processor as well as the I / O adapter each including an associated private cache by which they are one and the same. 'other connected to a common shared MP bus via a single connector channel.

Description

MULTI-PROCESSOR ARRAY

Field of Invention;

This invention relates to computer systems, and more particularly to techniques for arranging central processor means thereof including techniques for arranging a single input/output channel for several processors in a single system.

Background, Features;

In data processing systems utilizing a number of processors it is typically advantageous to intercouple these via a single shared common system bus. In addition, data processing systems utilizing peripheral bus means for a number of processors typically use a single common input/output channel and intercouple it with all processors via a single shared common system bus. Yet this is not easy, and presents grave problems of overloading the shared bus and slowing the system. The instant approach addresses this problem and offers some solutions, such as by providing the I/O channel and each processor with its own dedicated cache memory, coupling the main (base) processor to the input/output (I/O) channel in O-Ring fashion, and coupling the main processor and I/O channel to the common system bus via a single channel.

An object hereof is to address at least some of the foregoing problems and to provide at least some of the mentioned, and other, advantages.

Brief Description of the Drawings;

These and other features and advantages of the present invention will be appreciated by workers as they become better understood by reference to the following detailed description of the present preferred embodiments which should be considered in conjunction with the accompanying drawings, wherein like reference symbols denote like elements: Fig. 1 is a generalized simplified block diagram of a preferred embodiment;

Fig. 2 is a -more detailed version of this embodiment, while Fig. 2A, 2B, 2C are enlargements of portions of Fig. 2. . Generalized Embodiment, Fig. 1:

Generally, the present invention comprises a multiprocessor system with processors coupled, by a common bus arrangement, as is an input/output adapter (hereinafter "I/O") to common memory. A "main memory" is coupled, by the common bus arrangement, to the several central processors (hereinafter "CPUs")- the common bus thus being shared between such CPUs. One of the processors is a base processor coupled to the I/O adapter (by the common bus arrangement) . The I/O adapter is the only one for the system, and is coupled, by output bus means, to I/O devices, and by a ring coupling to the base processor. The I/O adapter, and each CPU, has its own dedicated cache memory.

Each cache memory is preferably operated as a "write-back" cache which updates "main memory" only upon a CPU initiated "read miss" or "write miss" (on a dirty cache line) .

Fig. 1 particularly represents a generalized, simplified block diagram of a multiple microprocessor computer system along the foregoing lines, being characterized by a base CPU 1 (central processing unit, e.g., preferably using Intel 80486 microprocessor chip), preferably coupled to its own associated private cache memory unit 1-C and, thence, via a connect^•channel 8, to a single, shared MP (multiprocessor) bus 21. Workers will understand that CPU 1 may preferably comprise a single processor card (printed circuit board—including cache 1-C) and that the entire computer system preferably involves a number of other similar (application) processor units (up to five, here), such as like CPU/cache units 23, 25—all similar to CPU 1/cache 1-C and linked by the shared MP bus 21. Preferably, one or several "main memory" units 9 are also coupled to this MP bus 21. Each CPU may be understood as connected to its cadhe memory to accommodate direct transmission there between (cf. requests, data). CPU 1 is also, coupled, in "O-Ring fashion" (see further below) to the I/O adapter (channel) 5 (input-output control unit—the only one in the system, preferably on a single board) which is, in turn, coupled to I/O bus means (preferably to a SCSI bus ^"30 and an EISA bus 40, as illustrated, there being an associated EISA chip set on I/O card 5, along with an associated private cache memory 5-C). Notably, I/O unit 5 (with its cache 5-C) is coupled to MP bus 21 via the common single channel (connector) 8 that also couples CPU 1 (and its cache 1-C) to MP bus 21. This O-Ring coupling logically ties CPU 1 and I/O 5 together (exclusively) in a ring-like configuration, for bilateral, asynchronous intercommunication. As a salient feature, CPU 1 and 1/0-5 are so connected in "O-Ring" fashion ("O-Ring architecture") to "talk bilaterally and asynchronously" over two inter- couplingε (channels). Preferably, I/O register R-0 are provided on the base and application processor units to facilitate communication with, and between, the processors. This O-Ring logically ties CPU 1 TO 1/0-5, these being -shown in more detail in Figs. 2A, 2B (see inside respective dotted-lines) . Note (Fig. 2A) that the line (connector channel 8) connecting CPU 1 and I/O 5 with MP bus 21, is preferably coupled thereto via a SAD (system address) bus. Preferably, channel 8 comprises a MAD bus and a bidirectional buffer-register (BCT-G52-M) A, linking MP bus 21 with SAD bus and thus linking 1/0-5 with all processors. Workers will recognize the advantageous use of such O-Ring architecture with such a common access-line to MP bus 21.

Buses MAD, SAD are depicted in Fig. 2A, along with two private buses VD, VA from base processor 1 to I/O board 5.

Some novel, advantageous functions of such an O-Ring coupling are:

1 - It reduces' traffic load on the common system bus (MP bus); and so 2 - opens-up.access for the bus user; and

3 - base processor 1 can "talk" asynchronously with I/O board 5 (not synchronously, as is conventional).

4 - The base processor cache can operate asynchronously with its processor, and also with

MP bus, thus reducing access-time to the cache, and improving performance of the base processor; and

5 - MP bus is free to monitor cache-access while the base processor and I/O channel are otherwise occupied (e.g., accessing one another). SAD bus also links MP bus with a Cache Address bus (CA bus), with a Cache Data bus (CD bus), with a Cache Tag data bus (TD bus) and with a DMA ASIC portion of 1/0-5 ("Direct Memory Access" chip to move data between disks and CP-memory or cache memory 1-C?). Register BCT-652-M is a bidirectional buffer/register coupled between SAD bus and MP bus to transfer data to the CPU 1, 1/0-5 ring (e.g., from main memory, from other CPUs).

A composite "bus tag" arrangement is provided with two tags for each CPU (board); as part of this, we provide a "cache tag" aa (Fig. 2A) and "bus tag" bb (Fig. 2A); their contents (they are static RAMs) should always be the same. Also provided is a math co-processor dd (e.g., preferably a "4167" chip by Weitek Corp. or an Intel 80387).

There is also a Processor Address bus (PA bus; preferably for Intel 80486 microprocessor) and a Processor Data bus (PD bus, or 80486 data bus). All the above five buses are indicated in Figs. 2A, 2B. An address buffer 543 is- provided to isolate CA bus and PA bus. And a latch 574 is provided to store the SAD address, while a D-flip flop 574 is provided to "evict" addresses to the SAD bus. A bidirectional buffer 245 provides the path for updating tag data and for "snooping" the CPU (i.e., to invalidate data elsewhere when it is updated in a given memory site) . A comparator unit 521 is provided to check for cache "hit"/"miss" . Features:

The subject embodiment has several noteworthy features. For instance, as noted in Fig. 1, the base processor CPU 1 is coupled for bilateral asynchronous communication with the associated I/O unit 5 (cf. asynchronous I/O-CPU interface 2). It is conventional for such an interface to be synchronous (e.g., clock- synchronized; e.g., see US 4,669,043 to aplinsky, or US 4,591,977 to Nissen et al. where I/O is directly-coupled to a CPU; or see US 4,931,984 to Ny or US 4,161,024 to Joyce et al. ) .

This "O-Ring" architecture may be contrasted with U.S. Patent 4,351,025 where a control CPU is coupled by a separate bus to a master control I/O unit; or with more conventional arrangements coupling I/O and CPU via a single shared bus—and doing so in a synchronous, tightly-coupled interface. This O-Ring coupling, with its two channels enhances system performance—for instance, allowing MP bus to access CP-cache 1-C (or I/O cache 5-C) while CPU 1 simultaneously accesses EISA bus, and with no problematic contention. And, the asynchronous nature of this CPU-I/O interface allows I/O to operate at any frequency, thus enhancing system-modularity and design flexibility. Thus O-Ring . couple 2 will be assumed to preferably comprise a pair of bidirectional connection channels between CPU 1 and I/O adapter 5 (e.g., vs. U.S. 4,459,655 using a master bus 17 along with a conversation bus 16 which allows two-way communication between slave- modules 12, 13, 14). . '

Similarly, each CPU in the system is provided with private cache memory means (e.g., 1-C for CPU 1) to which it is asynchronously coupled; likewise our I/O unit 5 is coupled, asynchronously, with an associated cache means 5-C. And, each cache is also coupled asynchronously with MP bus 21. By contrast the cited Joyce patent teaches no such private caching and no such bilateral asynchronous cache interfacing either.

Also, our multiple* processor system preferably uses a base processor (e.g., CPU 1) along with application processor CPUs, each preferably with associated cache memory) to share a single MP bus (21 in Fig. 1).

More conventional systems use a number of CPU buses in such a situation (e.g., as in U.S. 4,459,655 to Willemin which uses two buses, or U.S. 4,351,025; or U.S.

4,161,024; or see U.S. 4,692,862 to Cousin et al. which teaches an interconnect-network for communication between processors) .

Also noteworthy is our coupling each CPU with the common shared bus via a single channel, and also via the respective associated private cache memory (e.g., vs. the multiprocessor system in U.S. 4,591,977 to Nissen which teaches other local memory units for each CPU, these unconnected to a common bus and also using a common memory that must be time-shared and is not asynchronous with any CPU or any common bus).

Our single-channel coupling of CPU 1 and I/O 5 to MP bus 21 reduces the capacitive load on MP bus, and so accelerates its operation. The asynchronous coupling of each cache to its CPU, and to MP bus 21, (vs. synchronous interface) provides great versatility: e.g. , the cache can run at CPU-frequency when CPU "owns" it, at MP bus frequency when MP bus "owns" it, thus eliminating "excess synchronization time" (to access cache from either side), and so increasing throughput of the overall system. And, so providing a private,- dedicated write-back cache for each processor, significantly reduces traffic on MP bus, enabling MP bus to support more processors. [Note coupling of 1/0-5 and control CPU 1, via respective caches, to common MP bus 21; i.e., CPU 1 to 1-C to channel 8, and 1/0-5 to 5-C to channel 8, and via channel 8 to MP bus 21 in Fig. 1. ]

Our system is also constructed with a "separate supervisor mode" whereby each processor in the system (e.g., 23, 25) can schedule and spin-off operations by itself— nlike more conventional "master-slave" multiprocessor systems (e.g., in cited Willemin) . Detailed Preferred Embodiment (Figs. 2):

Fig. 2 expands on the described simplified arrangement of Fig. 1 in a multiprocessor system adapted for file server and OLTP (on-line transaction processing) applications. In this preferred detailed embodiment, using an arrangement generally like that in Fig. 1, note the comparable elements: Base processor CPU 1', (including processor chip CP-11', and associated private cache memory 1-C), I/O adapter card 5' (including its own private cache memory). Processor CPU 1' and I/O adapter 5' will be understood as configured in O-Ring fashion, being intercoupled along VA bus and VD bus, via interface control IFC, and being both. coupled to MP-bus 21' along a single common" channel 8', (including MAD bus and buffer/register BCT-652), with I/O adapter 5' coupled to access SCSI bus and EISA bus.

As best seen in Fig. 2A, a replica of Fig. 2, a system address bus (SAD bus) couples channel 8' (register BCT-652 thereof) to a pair of (tag-check units H-A, H-B, via a latch 573, as well as to base processor array CP-1' (via register/buffers BCT-652: -C, -B_lf -B₂, and via evict unit 574-F); and also couples channel 8' to I/O adapter (via control IFC, and DMA-ASIC-DD, and BSAD bus and a tag- snoop unit 245-E).

Registers BCT-652-Bi, are -B₂ are .coupled by a cache address bus (CA bus) to data cache 3' and to Moesi cache 3", as well as to a pair of isolation buffers 543-G' , 453-G", and also to a comparator stage 521-h (Fig. 2A) .

A cache data bus (CD bus) couples data cache 3' with address buffer 543-G and register BCT-652-C. A cache tag .data bus (TD bus) couples evict unit

574-F with tag cache TC, with comparator 521-h and with buffer 543-G" (Fig. 2B).

Thus, in effect, SAD bus provides linkage between MP bus on the one hand, and the following: BSAD bus, CA bus, CD bus, TD bus, and DMA ASIC DD (140, via tag snoop unit 245-E. ) .

In Fig. 2B, also note that an 80486 data bus (PD bus) links buffer 543G and tag-snoop unit 245-E' with base processor chip CP-11' and with math coprocessor chip dd. Also, note processor address bus (PA bus, cf. address bus for Intel 80486 microprocessor) which links processors CP-11' and dd with evict unit 574-F' and with buffer 543-G' and tag cache TC. I/O System (1/0-5', Fig. 2C):

Fig. 2C is an enlargement of the 1/0-5' portion of Figs. 2 which will be better understood by the following description of salient component parts (Intel chip designations) as follows:

82358 Bus Controller (5-B):

The 82358 EBC is the central component of the EISA system. The EBC performs the translations between host CPU cycles, AT cycles, and EISA cycles. Masters on any of the three buses communicate with the other buses through the EBC. It takes care of all necessary timing alignments and translations for the different buses to communicate.

The EBC sits between the fast host (CPU) bus and the 8 Mhz EISA/AT buses. It watches cycles initiated on all buses. When a host bus master initiates a cycle and no host slave responds, the EBC forwards the cycle to the EISA and AT buses. All cycles initiated by EISA bus masters are forwarded to the host and AT buses. It also provides the control for the address and data buffers between the buses and takes care of inserting delays between back to back I/O cycles coming from the host bus to the EISA bus. 82357 Integrated System Peripheral (5-A):

The 82357 ISP is a multi-function support peripheral that is designed to work in conjunction with the 82358 EISA bus controller to provide most of the system functions necessary in EISA specific applications. The 82357 ISP is comprised of several computer system functions that are typically found in separate LSI and VLSI components. They include: a high-performance 7-channel programmable DMA controller; an arbitration scheme that allows efficient bus sharing among multiple EISA masters and DMA devices; a 15 level programmable interrupt controller which provides level-or-edge triggered interrupt capability on a channel by channel basis; non- maskable interrupt logic for multiple NMI control and generation; refresh address generation and control; 5 counter/timers which provide a system timer interrupt for a time of day, diskette time-out, DRAM refresh requests, and other system timing operations.

82352 EISA Bus Buffer (5-C):

The 82352 EBB is used to integrate the data swap logic - and the address buffers. This integrates approximately 17 components and lowers the system board chip count. Additionally, the EBB is designed to meet some of the timing requirements of EISA that would be difficult to do with discrete components and to eliminate excess EMI for FCC testing requirements.

82355 Bus Master Interface Controller (5-E): For add-in board support the 82355 EISA Bus

Master Interface Controller (BMIC) provides a simple, yet powerful and flexible interf ce between the local functions on the bus master board and the EISA bus master protocol. With the help of external buffer devices, the BMIC provides all of the control signals, address lines, and data lines necessary for an EISA bus master to interface to the EISA bus.

The 82355 BMIC^; greatly simplifies the design of 32-bit EISA bus masters. With the BMIC, an expansion board can be implemented with, simple logic similar to that used in traditional AT DMA designs; however, the BMIC also allows the designer to take full advantage of the advanced features of EISA bus masters. Features available when using the BMIC are the burst^" mode for data transfer rates up to 33 megabytes/sec, EISA automatic configuration, and 32-bit address bus which covers the entire 4-gigabyte EISA address space.

In conclusion, it will be understood that the preferred embodiments described herein are only exemplary, and that the invention is capable of many modifications and variations in construction, arrangement and use without departing from the spirit of the claims. For example, the means and methods disclosed herein are also applicable to other related computer systems. Also, the present invention is applicable for enhancing other multiprocessor arrangements. The above examples of possible variations of the present invention ar merely illustrative. Accordingly, the present invention is to be considered as including all possible modifications and variations coming within the scope of the invention as defined by the appended claims.

Claims

What is claimed is:

1. A multiprocessor computer system including multiprocessor bus means, base processor means and input/output adapter means for accessing output buses and related equipment, wherein said processor means and said adapter means are inter-coupled in asynchronous O-Ring fashion by a pair of connector means and are both coupled to said multiprocessor bus means via single channel connector means.

2. The invention of claim 1- wherein said base processor means and said input/output adapter means each include an associated private cache memory coupled directly and asynchronously thereto.

3. The invention of claim 2, wherein each said cache memory is also coupled asynchronously and directly to said multiprocessor bus means.

4. The invention of claim 1, wherein said processor means and said adapter, means each comprises a separate circuit board.

5. The invention of claim 4, wherein said pair of connections between said processor and adapter means are asynchronously mediated via interface control means.

6. The invention tif claim 3, wherein said multiprocessor bus means is so coupled to said processor means and adapter means via intermediate buffer means and associated system address bus means.

7. The invention of claim 6, wherein said multiprocessor bus means is coupled with said system address bus means via bidirectional buffer/register means.

8. The invention of claim 1, wherein said system also includes application processor means and wherein all said processor means are intercoupled by said multiprocessor bus means.

9. The invention of claim 8, wherein said application processor means each includes a private, associated cache memory coupled directly, and asynchronously, thereto.

10. A data Multi-processor processing system comprising: system bus means; main memory means coupled to said .bus means; base processor means coupled asynchronously to said bus via prescribed single channel means; other processor means coupled asynchronously to said bus;

I/O adapter means coupled asynchronously, directly to said base processor means via a pair of asynchronous, bi-lateral connect channels adapted for asynchronous bilateral intercommunication; and also coupled asynchronously to said bus means via said single channel means.

11. The system of claim 10, wherein said base processor means includes associated private cache memory means coupled directly thereto, and to said system bus, for asynchronous communication, also being coupled to said bus means via said single channel^" means, shared in common with said I/O adapter means.

12. The system of claim 11, wherein said system also includes system-bus-monitoring means for producing a predetermined output to said main processor means when said system bus is transmitting data corresponding to that in said cache memory means associated with, and coupled directly to, said main processor means.

13. The system of claim 12, wherein said cache memory also includes replacement and update means responsive to said predetermined output for replacing data in a specific address in said cache memory corresponding to said specific address in main memory with the data on said system bus.

14. Multiprocessor apparatus comprising in combination: a plurality of microprocessor processing units adapted to receive and process data signals, said units arranged in parallel with one another and connected to a common system bus in asynchronous fashion; a plurality of local private cache memory means, one being directly, asynchronously connected to each said processing unit and being adapted for storing data, instruction and control signals; common memory means coupled directly to said bus means for providing data, instruction and control signals to each processing unit; input/output control means coupled to a main, controlling one of said processing units for communicating directly, asynchronously therewith via a pair of channels, while being also coupled asynchronously with said bus means in common with said main processing unit via a single shared channel means.

15. A method of arranging the inpu /output adapter means in a computer system, this system also including base processor means, and associated multiprocessor bus means for communication with, and between, memory and other processor means, this method including: intercoupling said base processor means with said input/output adapter means in bilateral asynchronous O-Ring fashion with at least two interconnect means; and interconnecting this O-Ring array with all other processor means and shared memory means using said multiprocessor bus means.

16. The method of claim 15, wherein said O-Ring array is coupled to said multiprocessor bus means via single channel connect means.

17. The method of claim 16, wherein private cache memory means is directly coupled asynchronously to each processor means, and to said adapter means, and to said multiprocessor bus means as well.

18. The method of claim 15, where said base processor means and said adapter means each comprises a separate circuit board.

19. The method of claim 18, wherein said two interconnect means between said base processor and adapter means are asynchronously mediated via interface control means.

20. The method of claim 17, wherein said multiprocessor bus means is so coupled to said base processor means and adapter means via intermediate buffer means and via associated system address bus means.

21. The method of claim 20, wherein said multiprocessor bus means^" is coupled with said system address bus means via bidirectional buffer/register means.

22. The method of claim 15, wherein said other processor means comprise •one or more independent application processor means and wherein all said processor means are intercoupled by said multiprocessor bus means.

23. The method of claim 22, wherein said application processor means also each include private, associated cache memory means coupled directly, and asynchronously, thereto.

24. A method of arranging a data processing system which includes input/output channel means, system bus means, main memory means coupled to said bus means, base processor means coupled asynchronously to said bus means via prescribed single channel means, and other processor means coupled asynchronously to said bus means; said method comprising: coupling said input/output channel means directly to said base processor means via a pair of connect channels adapted for asynchronous bilateral intercommunication and thus creating an O-Ring array; and coupling this O-Ring array asynchronously to said system bus means via said single channel means.

25. The system of claim 24, wherein said input/output channel means and base processor means each include associated private cache memory means coupled directly thereto, and to said single channel means for asynchronous communication, said cache memory means being coupled to said bus means via said single channel^" means.

26. The system of claim 25, wherein said system also includes system-bus-monitoring means for producing a predetermined output to said base processor means when said system bus means is transmitting data corresponding to that in said cache memory means associated with, and coupled directly to, said base processor means.

27. The system of claim 26, wherein said cache memory also includes replacement and update means responsive to said predetermined output for replacing data in a specific address in said cache memory corresponding to said specific address in main memory with the data on said system bus means.

28. A data processing arrangement comprising, in combination: input/output channel means; a plurality of independent data processing units adapted to receive and process data signals, said units and said input/output channel means being inter-connected via a common system bus in a synchronous fashion; a plurality of local private cache memory means, one being directly, asynchronously connected to said I/O channel means and each said processing unit, and being adapted for storing data, instruction and control signals; common memory means coupled directly to said bus for providing data, instruction and control signals to each processing unit; said input/output channel means being coupled to a base one of said processing units for communicating directly, asynchronously therewith via a pair of channels to form an O-Ring array, this array being also coupled asynchronously with said bus via a single shared bus-channel means.