GB2249243A - Message routing in a multiprocessor computer system - Google Patents

Message routing in a multiprocessor computer system Download PDF

Info

Publication number
GB2249243A
GB2249243A GB9118463A GB9118463A GB2249243A GB 2249243 A GB2249243 A GB 2249243A GB 9118463 A GB9118463 A GB 9118463A GB 9118463 A GB9118463 A GB 9118463A GB 2249243 A GB2249243 A GB 2249243A
Authority
GB
United Kingdom
Prior art keywords
strobe
information
router
nodes
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB9118463A
Other versions
GB2249243B (en
GB9118463D0 (en
Inventor
Steven F Nugent
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of GB9118463D0 publication Critical patent/GB9118463D0/en
Publication of GB2249243A publication Critical patent/GB2249243A/en
Application granted granted Critical
Publication of GB2249243B publication Critical patent/GB2249243B/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • G06F15/17381Two dimensional, e.g. mesh, torus

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A parallel processing computer system as in GB 2227341 has an improved architecture for communication of information between nodes. The computer system comprises at least three nodes, each comprising means for processing information and a routing means for routing information between nodes. The routing means allow reservation of a route through the network of nodes. Messages may then be transmitted from an origin node to a destination node over the reserved route. Use of a route reservation system reduces requirements for buffering of information at intermediate nodes on a route, improves message passing latency and increases node-to-node bandwidth. The present invention teaches communication of messages between nodes in a synchronous manner using a common strobe (clock) signal, which is modified by regenerating alternate edges of the signal, in order to eliminate pulse shrinkage of the strobe signal. <IMAGE>

Description

1 2,24 9243 MESSAGE ROUTING IN A MULTIPROCESSOR
COMPUTER SYSTEM BACkGROUND OFTHE INVENTION
I'llis application is a continuation-in-part patent application of copending United Stales Palent Application Serial No. 07/298,551, filed January 18, 1989, tilled Message Routing in a Multiprocessor Comp ter System.
1. Field of (lie invention. The present invention relates to the field of parallel processing 10 computer systems.
2. - Ilrior Art.
A number ol'parallel processing computer systems are well known in the prior art. Generally, in such systems a large number of processors are inlerconnected in a network. In such networks each of the processors may execute instructions in parallel. In general, such parallel processing computer systems may be divided into two categories: (I) a single instruction stream, multiple data stream system (SIMD) and (2) a multiple instruction stream, multiple data system stream (MIMD) system. In a SIMD system, each of the plurality of processors simultaueously executes the same instruction on different data. In MIMD system, each of the plurality of processors may simultaneously execute a d ' ifferent instruction on different data.
In either SIMD or MIMI) system, some means is required to allow communication between processors in the computer system. In such systems, it is known to logically organize processors in an n-cube.
A discussion orsuch n-cube systems may be found in Herbert Sullivan and T.R. Bashkow, A Large Scale Homogeneous, Fully Distributed Parallel Machine, Proceedings of the 4th Annual Symposium on Computer Architecture, pp. 105-117, 1977. Sullivan et al. discusses a number of interconnection structures, including connection of processors on a boolean n-cube. The described boolean n-cube is 811 interconnection of N (N=2n) processors which may be thought or as 1 1 being placed at the corners of an n-dimensional cube. Sullivan-et a]. discloses the location ol'a processor which may be described by designating one processor as the origin with a binary address of (0,0.... 0) of n bits. Other processors may then have their locations expressed as an n bit binary number in which each bit position is regarded as a coordinate along one of the n-dimensions. In such a system, when one processor is directly linked to another, their addresses will differ by just one bit. The position of this bit indicates the direction in n-space along which communication between the processors takes place. Thus, the address of one processor with respect to a neighboring processor differs by only one bit.
Sullivan et a]. describes that in such a system a relative address may be computed by taking the bit-by-bit sum (modulo 2) of the addresses of two processors. This bit-by-bit summation is the equivalent of taking an exclusive OR of the two addresses. The number of non-zero bits in the resulting relative address represents the number of links which must be traversed to get from one processor to another.
United States Patent Number 4,598,400 Hillis describes a similar n-cube parallel processing computersystern in which an array of nodes are interconnected in a pattern of two or more dimensions and communication between the nodes is directed by addresses indicating displacement of the nodes. Hillis specifically discloses a system in which a message packet may be routed from one node to another in an n-cube network. The message packet comprises relative address information and information to be communicated between the nodes.
Many known parallel processing computer systems utilize a store-an d forward mechanism for communicating messages from one node to another. The Hi-Ilis system describes such a store-and-forward mechanism. Such st ore-and -forward mechanisms are more clearly described in Parviz Kerniani and Leonard Kleinrock, Virtual CutThrough: A nem, Computer Communication Switching Technique, Computer Networks, Vol. 3, 1979, pp 267-286. Kermani et-al.
9 L A distinguishes slore-and-forward systems froni circuit switching systems. Specifically, a circuit switching system is described as a system in which a complete route for communication between two nodes is set up before communication begins. The communication route is then tied up dUrinll the entire period of communication between the two nodes. In st ore-and -forward (or message) switching systems, messages are routed to a destination node without establishing a route beforehand. In such systems, the route is established dynamically during communication of the message, C generally based on address information in (lie message. Generally, messages are stored ai intermediate nodes before being forwarded to a selected next node. Kerniani et af. further discusses the idea of packet switching systenis. A packet switching system recognizes improved utilization of resources and reduction of network delay may be realized in some network systems by dividing a message into smaller units ternied pZickets. In such systems, each packet (instead of message) carries its own addressing information.
Kerniani et af. observes that extra delay is incurred in known systems because a message (or packet) is not permitted to be transnitted from one node to the next before the message is completely received. Therefore, Kermani et a]. discloses an idea termed ciii-ilii.oiikjh" For establishing a communication route.
The virwal ciii-tlii-oiigh system is a hybrid of circuit switching and packet switchinO techniques in which a message may begin transmission on an outgoing channel.upon receipt of routing information in the-message packet and selection of an outgoing channel. This system leads to throughput times exactly the same as in a store-and-forward system when all intermediate channels are busy. When all interniediate nodes are idle, this system leads to throughput times similar to a circuit switched system. However, the systern disclosed by Kerniani et a]. still requires sufficient buffering to allow an entire message to be stored at each node when all channels are busy.
3 1 W.J. Dally, A IlLS1 Archfiecture for Concurrent Data Structures, Ph.D Thesis, Department of Computer Science, California Institute of Technology. Technical Report 5209, March 1986, discusses a messagepassing CO11CUITC111. architecture to achieve a reduced message passing latency. In Chapter 3, Dally discusses a balanced binary n-cube architecture.
In Chapter 5, DLilly discusses an application for reducing message latency. In general, ILtlly discloses use of a worinhole routing method, rather than it st ore-it rid -forward method. A wornihole routing method is characterized by a node beginning to forward each byte of a message to the next node as the bytes of the message arrive, rather than wailing for tile next arrival of the entir packet before beginning transmission to the next node. Wormhole routing thus results in messalle latency, which is the sum of two terms, one of which depends on the niessage length L and the other of which depends on the ritiniber Of COII]11]LllliCittiOll channels traversed D. Store-and-forward rotitin- yields latency depending upon the product t of Land D. (See D Lilly at pagC 153).
A further advantage of a wornihole routing method is that communications do riot USC tip the memory bandwidth of intermediate nodes. In the.Dally system, packets do not interact with the processor or memory of intermediate node,; along the. route, but rather remain strictly within a routing chip network, until they reach their destination.
However, the 1Lally discloses a self-timed system, permitting each processing node to operate at its own rate with no global synchronization. (See Dally at page 153).
Dally at pages 154-157 further discloses a message packet containing comprising relative X and Y address fields, a variable size data field comprisin. a plurality of non-zero data bytes and a tail byte.
It is desired to develop an improved method of communication between nodes in a parallel processing computer system.
As another objective of the present invention, it is desired to develop a parallel processing computer system having reduced 4 message passing lalency and increased node-to-node channel bandwidth.
As another object ofthe present invention, it is desired to develop a sysiem which efficiently passes messages without requiring 5 buffering for niessage packets it each node.
As another object ofthe present invention, it is desired to develop a systern in which data coninitinicated within a system is controlled by a clock coniniunicated with the data.
An ifflierent linfilation iii a system controlled by a clock (strobe) 10 communicated with [lie data is the ability to extend the system topology without bound. This liniiiation occurs due to the fact that the strobe signal is not reglenerated as it is routed through each node of a path as is the data. Since the strobe is not regenerated, it is susceptible to a plierionienon known as pulse shrinkage.
Pulse shrinkaoe occurs when a sicynal is buffered through C_ c devices that have unequal rise arid fail times. Pulse shrinkage can cause severe asyninietry in the strobe signal and ultimately can cause data errors.
Data errors can occur when data hold times are violated due to pulse shrinkage. Dant liold tinie iii the present invention is guaranteed by the freqUency of the strobe clock. Lower frequencies will create more hold tinie. Shice the data is validated on both edges of the strobe, any asymmetry in this signal will increase the effective frequency and reduce the available data hold time. As the length of the route is increase(], the effects of pulse shrinkage become more prononced and will eventually cause errors.
It is therefore,iiiotliet. objective of the present invention to provide a system wherein pulse shrinkage of the strobe is eliminated.
0 C; SUMMARY OFTHE INVENTION
A parallel processing coniputer system is described. The present invention comprises a computer systern having a plurality of processing nodes which are interconnected in a binary n-cube. Each node comprises a processing means for processing information and a router means for routino information between nodes in the n-cube.
The router nicans accepts address information from the processing rneans and communicates the address information from node-to-node to establish a communication route for information from an origin node to it destination node. After a communication route is established, the destination node responds over the same route, in reverse order, witli an, ickiiowle(ltleiiieiii that it is ready to accept information.
Communication of' information then commences over tile reserved route. At the completion of communication of information, the route is released and channels used by the route are made available for communication between other nodes. In the present invention. each router means comprises two channels for communication of information. A first channel is utilized to transmit information frorn a node to an adjacent node and a second channel is utilized to receive information from adjacent nodes. The present invention allows communication of information between nodes under control of a clock "strobe" transmitted with the information.
Each of the channels comprises means for communication of data information (both actual niessagge data and status/control information) and for communication of clock. ing infonliation for controlling transmission and reception of the data information.
The alternate-ed ge. regenerat ion circuit described herein eliminates the pulse shrinkage hazard by regenerating every other edge of the strobe signal as it is routed through each node of the route. All odd edges of the strobe signal are unmodified by the routing hardware. They are used to validate the data rece ived at each router 6 and to clock (lie data through to the next router. The buffered odd clock edges are flien transmitted to the next router. Upon reception of a message. the even edges of the strobe signal validate the data so that it can be stored in a receive register. This is the same as the odd edges. The even edoes, however, are handled differently. After the data has been latched in the receiving register, it is clocked to the next register within the router using a modified or "synthetic" even clock edge rather than the received even strobe edges. The synthetic clock edge is generated by delaying the odd clock edges an amount approximately equal to halfthe period ofthe strobe sional. This pi:sitions the even ed-es ideally in tirne and compensates for any pulse shrinkage that mitflit be present in the received strobe waveform. The c strobe is in effect "i.ciciiei-,tfeci" and its symmetry when transmitted from the router will be consistent froni one router to the next.
7 BRIEI. DESCRIPTION OF THE DRAWINGS
Figure 1 isa diaoram illustrating a n-cube network of processors C, c n as may be utilized by the present invention.
Figure 2(a) is a block diagram illustrating a router architecture of the present invention.
Fi-ure 2(b) isa block dia-rani illustratin. oroanization of status C C:, 0 route as may be utilized by the present invention.
Figure 3 is a block diagrarn illustrating a physical channel between nodes of the present invention.
Fi-ure 4(a) is a finling diagrani illustrating channel timing as may be utilized by the present invention.
Figure 4(b) is an illustration of a data format for transmission of data and status information as may be utilized by the present invention.
Figure 4(c) is a illustration ofa format for transmission of status information as may be utilized by tile present invention.
Figure 5 is a diagrarn illustrating establishment of a communication route in a networked computer system as may be accomplished by the present invention.
Finure 6 is a diagram illustrating acknowledgement of 0 0 C establishment of a route. in networked computer system as may be accomplished by file present invention.
8 Figure 7 is a diagram illustrating message transmission in a networked con-tpuier system is may be accomplished by the present invention.
Figure 8 is it diagrarn illustrating release of a route in a c networked computer system as may be accomplished by- the present invention.
Figure 9 ilitisti.;tte.s tile (lata latelling circuitry in a typical router C of the invention described in the parent patent application.
Figure 10 illustrates a first embodiment of a router of tile present invention includink) c synthetic clock generator circuitry for regenerating the strobe signal.
Figure 11 illustrates a second embodiment of a router of the present invention including synthetic clock generator circuitry for regenerating the strohe sional.
Pigure 12 illustrates a preferred embodiment of the synthetic clock generator circuitry ofthe present invention.
Figure 13 illustrates a timing diagram of the synthetic clock generator circuit of' the present invention.
9 DETAILED DESCRIPTION OFTHE PRESENT INVENTION
A parallel processing COMPLIter system is described. In the following description, 11LInierous specific details are set forth in order to provide a thOrOL11,111 understanding of the present invention. It will be apparent, however. to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known Circuits, SIRIctures and techniques have not been shown in detail in order not to unnecessarily obscure the present invention.
The present invention relates to parallel processing computer systems. The preferred embodiment of the present invention is commercially available Under the tradename iPSC/2tm from Intel Corporation of Santa Clara, California. The iPSC12tm computer is a second generation concurrent processing computer system. The routing system of the iPSC/21111 is more fully described in Steven F.
Nugent, The iPSC121121 Dij-e(-r_COjjjje(..(nn Communications Technology, Intel SCierltifiC COMPUters, distributed Hypercube Conference, January 19-20, 1998.
OVERVIEW OF THE PRESENT INVENTION The present invention discloses a direct connection routing mechanism which provides for improved performance over known parallel processin, a computer systems. The direct connection C mechanism enhances performance in parallel processing computer systems by reducing messag passing latency, increasing node-to-node C" bandwidth and allowing for siniultaneous bidirectional message traffic between any two nodes.
The direct connect routing system is a hardware controlled message passing system comprising.1 Plurality of routers, each router coupled with computation nodes, the routers for allowing passing of messages of arbitrary size between pairs of computational nodes. The routers form a circuil-switched network that dynamically creates a route from a source node to a destination node. The route remains open for the duration of the message. The route comprises a series of ill channels that forill a Ltilique route from the source node to the destination node. The route may pass through some number of intermediate nodes in defining the route. The route allows transmission of data and a clock controllint, transmission of the data over the same route.
Channels in the preferred embodiment of the present invention are bitserial and full duplex and provide connection from one node to its nearest neighbor nodes in ri-space. In the preferred embodiment, a router supports connections for tip to eight full duplex channels and may be interconnected to form networks of Lip to seven dimensions containing 128 nodes. It is apparent to one of ordinary skill in the art that alternative enibodinients may be constructed having a greater or fewer number of dinlensions and/or nodes.
Each of the eiglit channels is routed independently, allowing up to eight niessagles to lie routed simultaneously. In the preferred embodiment, one chaimel per router is dedicated to act as an external route into the neiwork and allows remote devices to access the full routing capabilities of' tile network.
The router communicates with its computational node over two unidirectional parallel buses.
Routing in the preferred embodiment is based on the n-cube routing algorithm discussed in Sullivan et it]. This algorithm guarantees a deadlock free network. As will be described in more detail below, in the present invention. rottles are dynamically constructed for each message prior to its Ininsmission. A complete route is built in a step-by-step process in which route segments are arbitrated from at each router. After a route is defined, the channels which constitute the route are held for the duration of tile message. Transmission of a message begins when the destination node is ready to begin accepting the message and channels are released when the end of the message passes through the routers connected by the channel.
1 The direct connection routing system of the present invention is a variation on iiop.i;ihi)le routing discussed b DaLill c y y. As one inventive 11 aspect of the present invenlitin, the niessage is transmitted after the route is established. rather than establishing the route as part of the 0 transmission of the messa-e, -a,, discussed by DaLl. This aspect of the C -1 y present invention allows the system to operate completely synchronously and reduces or eliminates the need for flow control buffering in intermediate routers.
In the present invention, a routin- probe comprising relative address information is first transmitted from router to router in the network- in order to establish a route between an origin node and a destination node. After the route is established, the message is communicated between the two iiocle.s. Further, the present invention provides separate circuitry different from the computation circuitry of a node, for controlling routing.. Usim, these aspects of the present invention, message passing latency is significantly reduced over known systems. As discussed above. such known systems largely utilize store-and- forward, packet switching networks.
Using the above described techniques, messages are routed from an origin node to a destination node encountering minimal delays in routing through intermediate nodes. Further, the routing of messages through internleoiate nodes does riot require interruption of processes on those nodes or flow control buffering at the intermediate nodes.
The preferred embodiment or the present invention implements routers using programmable gate arrays.
The preferred embodiment comprises a collection of single board processors or nodes interconnected with full duplexed, bit-serial channels to form a cube where each node has N nearest neighbor nodes. The system is then said to have -a dimension N. The preferred embodiment comprises 128 nodes where N equals 7. Referring to Figure 1, channel and node naming conventions used herein are illustrated. Figure I illUstrales a ctibe having a dimension 3.
In the preferred embodiment, nodes ar-, assigned unique addresses so that the address of any two nearest neighbor nodes differ 12 by one binary diiit. For example, the address of node 0 100 is 000. The address of node 1 101, one of node O's 100 nearest neighbors, is 001. Therefore, these two nodes' addresses differ only in one binary digit.
Tile present invention defines the dimension of the channel between any two nodes by taking the binary Exclusive OR of the addresses of the two nodes. After taking the binary Exclusive OR, the bit position remaining a one (bit position 0, 1 or 2 in the case of Figure 1) is the channel number. For example, after taking tile Exclusive OR of 000 and 001, the address of'riode 0 100 and node 1 101, respectively, the result is 001. As can be seen in the result, bit position 0 is a one. 'rherefore, these two nodes are connected by a channel being desigilaied as having dimension 0, channel 0 102.
Although tile preferred embodiment calculates a relative address at the origin node and transmits the relative address from node-to-node to establish the route, several alternative embodiments are available.
For example, in one alternative embodiment, the destination node's absolute address is routed From node-to-node. At each node, the relative address is COMPLIted based on the destination node's absolute address and the address of' tile current node. This relative address is used for determining the channel oil which to transmit to the next node.
ROUTI--.R ARCHITECTURE Figure 2(a) illustrates a router of the present invention. The router of tile preferred embodiment comprises eight independent routing elements 201-208, one for each of eight incoming channels (numbered 0-7) 211-2 18. The rout ing elements 201-208 dynamically create messane routes through modules of the computer system of the present invention. Each routint, elcillent 201-208 is capable of driving several outeoino channels 221-228. one at a time. Since more than one routing element 201-208 may request the same outgoing channel C 221-228 sinlultarieously, ail arbitration mechanism 230 is provided for resolving conflicts.
13 The router of the preferred embodiment further comprises two unidirectional parallel channels, a node source 231 and a node sink 232. Any of the routing elements 201-209 may request the node sink channel 232 for output and. likewise, the node source channel 231 has access to all outgoing ChLtillnels 221-228.
In the preferred embodiment, the channel 7 routing element 208 operates as a rernote 1/0 port. This provides an 1/0 gateway into and Out of tile network l7or remote devices Such as disk farms, graphics devices and real tinie 1/0. In the pi-cferred embodiment, channel 7 of node 0 serves as the host inierface. Channel 7 in other nodes is general purpose and used ill tile Currently preferred embodiment as an 1/0 gateway to disk Farms.
As will be explained in more detail below, the present invention provides for rOLItill" of a routing probe from a origin node to a destination. The routing of the routing probe acts to reserve a route for subsequent transmission ol'a message. -rhis reserved route may be referred to as a primary message route.
STATUS ROUTES In addition to [lie primary message routes, the preferred embodiment provides a secondary route, referred to as the status route, which routes stalus information from the destination node to the source node ofeach niessage. The Stahl.%Joute is used in the preferred embodiment to provide flow control for messages. To pass status inforniation between routers, status information is multiplexed onto the channels during message transmission. In the absence of messages, status information is passed continuously.
To support establishment ol'siatus routes, routers of the preferred embodiment comprise "%end status" logic. 'Mis status logic is illustrated with refercnce to Fig-ure 2(b). The "send status" logic allows status inrormation, indicating, the destination node is ready to 9:1 receive a message being C, routed from the destination node through intermediate nodes back to the origin node. Each router is capable of C- routing Status infOrillatiOll For eight Simultaneous messages. The 14 "destination ready" stalus infornialion is passed from the destination node back to the ofigin node over the sanie intermediate nodes in tile opposite direction from the niessage.
As discussed above in the preferred embodiment, status information is multiplexed with data during message transmission. As can be seen with reference to Fioure 2(b), send status information is provided from stalus switch 256 oil send status lines 257 to the output channels 258. 'I'llis status information is multiplexed with the data on channel out lines ChOCh7 251.
In the absence of' niessatle traffic, status generator 250 provides status information to be sent out over lines CliO-C]i7 251. Status generptor 250 provides the sanle send status as provided to the the routers on input channels 259 for all channels that are idle. This status information is provided to stattis generator 250 over send status lines 254.
It will be apparent to one of ordinary skill in the art that alternative techniques nlay be wilizeci for conimunication of status information. For example, (firect wil-illo of nodes may be utilized for communication of Matus inronliation. Alternatively, explicit status 0 messages may be transmitted. Each of these techniques will have various advantages and disadvantages.
Responsive to receiving a routing probe at the destination node, the destination ready sional is originated by the destination node, generated by the deserializer and Output Oil line 252. After passing through any intermediate routers, the sional arrives at the source router serializer as an Allow Dala control signal on line 253. The Allow Data sional. as the nanic iniplies, controls the transmission of data from the SOL11-M I-OLIter scrializer.
CHANNEL DESCRIPTION
Channels in the preferred embodiment connect a router coupled with a node with each of' the node's nearest neighbor's routers. In the preferred embodiment, each channel comprises four conductors 301 304, as shown in Figure 3. Labelling of the conductors in Figure 3 may C- -- be understood with reference to node 0. Strobe out conductor 301 transinits strobe si-iials out from node 0. Data out conductors 303 transmit data signals from ilode 0. Strobe in conductor 303 is coupled to allow node 0 to receive strobe signals. Data in conductors 304 are coupled to allow node 0 to receive data signals. Thus, the conductors 301- 304 may be thought ofas comprising two pair of conductors for each channel; a first pait. comprising strobe out conductors 301 and data out conductors 302 and a second pair comprising stobe in conductors 303 an(] data in conductois 304. The pairs operate independently ofeach other.
Serial data, control and stattis bits are transferred across the data lines. The strobe lines are used to validate the data lines and also provide a clock source for the subsequent router. As can be seen with reference to Fioure 4(A), both rising ed-es, such as edges 411 and 421,and fallin- edge.s. stich as edges 412 and 422, of strobe signals 401 and 403 are used to validate data lines 402 and 404.
In the present invention, the clock signal communicated over lines 301 and 303 is used to clock tile associated data on lines 302 and 304, respectively. This clock signal is transmitted with the data over the entire message route. Usin- this technique of transmitting a clock signal with a dala si-nal, data maY always be controlled by a single clock while each node (and tile router associated with the node) may operate oil its own clock. A channel at a given node is controlled by the clock signal transmitted with the data it is receiving.
For example, with reference to Fi0Ure 1, assume data is to be communicated from node 4 (.address 100) to node 1 (address 001).
Taking the Exclusive OR of 100 and 001 yields a relative address of 101. Therefore. the data will be routed from the serializer of node 4 to the channel 0 routint, element of node 5. The data will then be routed out of the channel 0 rotiiinc, element of node 5 on channel 2 to the 0 - channel 2 routing element of node 1.
During this process, a clock signal is generated by the serializer 0 C on node 4 which is transmitted alont, with the data over the strobe out 0 16 line 301 of Figure 3. This clock signal is received by the channel 0 routing element of node 5 and is used to control the channel 0 routing element. The clock is i-cii.,tiisxiii(ted with the data over channel 2 from the channel 0 routing element and is received by the channel 2 routing 0 element of node 1. Thus, the clock follows the data throughout its transmission along the route.
One advantage of' the I-OL11ing technique of the present invention over full handshake protocols is that the technique of the present invention allows for a generally higher data transfer rate. Transfer rates of handshake proiocols are gencrally lower because of latency caused by the required nocle-to-node acknowledgements and speed degradation as the channels are made physically longer. In the present invention, the use of FIFO buffers at the message destinations and clock signals following data signals 1111-0110110LIt data transmission eliminates the need for handshake protocols. Consequently, the throughput is 1101 ',1 fUlIC0011 01'61111110 length or acknowledgment delay. Tile data bandwidth ofthe preferred embodiment is 2.8 Mbytes/second.
In the present invention, two status/control bits are passed on a continuous repetitive basis between nearest neighbor nodes, whether or not message transmission is occurring. These bits are END OF MESSAGE (EOM)and READY STATUS (RDY). The EOM bit indicates that the last word ofthe message has been transmitted. This bit is ignored unless a message is in progress. The RDY bit represents the state of readiness of' the destination node of an established route.
- EOM and RDY bits are passed in one of two formats: (1) a first format, illustrated in Figure 4(13),.allows the EOM bit 431 and RDY bit 432 to be interspersed within a data message 430 and (2) a second format, illustrated in Figurc 4(C), allows EOM bit 441 and RDY bit 442 to be passed in the absence of message traffic.
The first formal comprises, in addition to the above-mentioned EOM bit 431 and RDY bil 432, two bits 433 and 434 for indicating the transfer is a data message transfer and sixteen data bits 435. In the 17 preferred ernbodiniew, the two bits 433 and 434 are set to 0 to indicate the messa-e is a data iransfer niessatle.
The second forinat, rel'erred to is a "status nibble", comprises four bits, the EOM bit 44 1, the RDY bit 442 and two bits 443 and 444 for indicating the format is a status only transfer. In the preferred embodiment, these two bils 443 and 444 are set to 1. During transmission Ol'Slatt.1-S nibbles, the EOM bit is ignored. Status nibbles are repetitively transmitted by all routers in the absence of a data transfer.
The RDY bit is stored as it is received at each router in a "Destination Ready" register and is used for flow control in the system as described above.
As described above, the present invention utilizes two "start bits", 433 and 434 or 443 and 444, on both the status and data formats.
Two bits are utilized because the message is processed in two halves by the routers. Odd numbered bits are processed independently from the even numbered bits in tile router. This allows for higher data transfer rates than otherwise possible in the the gate arrays of the preferred embodiment.
As one advaniage ol'status information being interspersed with message data in the message format of the present invention, the end of a message can easily be detected by rojiters on the fly. This eliminates the need for a inessage size counter in the routers and, thereby, removes any limits to maximum message size. Therefore, messages in the present invention may be of any arbitrary size.
Each message in the present invention involves one sending node and one receiving node. rhe routes that messages take through the network are unique between any two nodes. The combination of channels that compose aroute are defined by the binary-cube routing algorithm -as described by Herbert Sullivan and T.R. Bashkow, A Large Scale Homogeneous. Fully Distributed Parallel Machine, Proceedings of tile 4th A1111t.1,11 Symposium on Computer Architecture, pp. 105-117, 1977. ']'his al- rithin is further described with reference CIO is j to C.R. Lang, Jr., 77u, Lxiension qj'017ject-0i-iented Languages to a Homogeneous, Colicuri-ent At-chiiectio-e, Department of Computer Science, California Institute of Technolay, Technical Report Number 5014., May, 1982. Using - such a binary-cube algorithm guarantees that no circular routes will occur in the message routing and, thus, prevents deadlock fi-orn occurring.
The algorillini staies that in order to guarantee against deadlock, messages in -binary cubes can be I-OLIted in increasingly higher dimensions until the desiination is reached. The channel numbering defined above corresI)onds to these dimensions. Routes may consist of increasingly higher numbcred channels, but are not necessarily contiguous. Routing of messages from higher numbered channels to lower numbered cliannels (or channels of' the same dimension) is not allowed. For insiance, a route may consist of channel 0 - channel 2 channel 3 which involves the routers of nodes 0, 1, 5 and 13. In this case the Source router is it node 0, the intermediate routers are at nodes I an(] 5 and the destination I-OLIter is -at node 13.
A rouling operation of the present invention can be broken into C, four phases: establishing a rouie, acknowledgement of the destination node being ready to receive a message, message transmission and releasing connections. To initime the routing of a message, the source node transfers a minimum of one 32 bit word to the its router. The low order 16 bits of this first 32 bit word comprise a routing probe. The routing probe comprises addressing information and is used to establish the connections through intermediate routers which make up the route C, that the message takes. In [lie preferred embodiment, the high order I eight bits of the routing probe are set to zeros.
Z The low order eight bits of the routing probe are calculated by taking the Exclusive OR of the binary address of the destination node and the source node. E-ach bit ofilie routing probe corresponds to a channel that the message can be routed on, (The preferred C embodiment comprises a 7-dimensional binary cube, the eighth bit is used for addressing the external 1/0 channel.) 19 The first seornent ol'the route is established when the serializer in the source router requests tile OUtOOill" channel that corresponds to the lowest order bit set in the routing probe. Requests for the same channel are arbitrated aniong local requestors by the arbiter. The arbiter grants one request lit cl tinle, Using a "round robin" arbitration scheme. When the channel is granted. tile routing probe is sent by the source router before in), message transmission takes place.
For example, il'a rotifing probe is transferred to the router in which bit N is the lowest order bit set. channel N will be requested.
When the arbiter grants channel N, the routing probe will be transmitted to the imennediate router that is the nearest neighbor to the source nocle oil channel N.
In the preferred embodiment, upon receiving the routing probe, the intermediate router siores the routing probe and discards the high order 8 bits (all ol'which are zeroes). thus creating a short routing probe. The discarded bits will be reconstructed at the destination router. The short routing probe is passed between intermediate routers, reservin- additional seaments ofthe route.
The intermediaie routers examine bits N+I to bit 7 of the short routing probe to determine the lowest order bit that is set. The outgoing channel, corresponding to tile the first bit that is set, will be requested and the short 10LItint" probe will.wait. When the outgoing channel is granted, the short routing probe is transmitted to the next router in the route. As illustrated in Figure 5, this process repeats until the routing probe is received by the destination router.
Referrino to Fi-ure 5. a rnessat, is to be transmitted from C Z. _-e source node 2 (binary address 10) 501 to destination node 1 (binary address 01) 503 in a 2-dirnensional cube. The source node 501 transfers a routing probe 10 its router 511. As described above, the routing probe comprises the relative address of the source and destination nodes: thus. in the example in Figure 5, the routing probe contains the addi-css I 1 (10 XOR 01 = 11). In this case, bit zero, corresponding to channel 0, is a 1. Thus, as described above, the routed probe requests channel 0 for transmission.
When tile routing probe is oranted access to channel 0, the routing probe is sent over channel 0 to router 512 corresponding to inlemlediate node 5022. As described above, the routing algorithm of the present invenflon requires rouler.512 to send the routing probe out on a channel of' hipher dimension than it was received oil. Therefore.
router 5 12 begins examinin- bits ofthe router probe for 1 bits beginning with tile bit in bit position 1 (the routing probe was received by router 512 oil channel 0). After finding the first 1 bit, a request is made for tile channel corresponding, to tile 1 bit. In this particular example, the first 1 bit is in bit position 1 and a request is made for channel 1.
The routing probe is transmitted oil channel 1 to router 5 13 corresponding, with dest inal ion node.503. Router 5 13 examines the routing probe beginning with file bits ofhigher dimension than the c c, Z:
channel the routing prohe was received oil. In the illustrated example, all remaining bits are 0. Theref'ore. router 513 determines the routing probe has reached its final destination.
'Router 513 pads tile routing, probe with eight zeros to restore it to its original stage. If' tile oestinalion router can accept a message, it will signal in acknowledgement (the RDY.bit).
This betlins the acknowledgement phase of the routing operation.
The acknowledgement phase requires fliat a deterministic connection be. made frorn the destination router.back to the source router for the purpose of carrying flow control information. This is termed the "Status route" and follows exactly through the same intermediate nodes as the niessage route. but in the opposite direction from the destination node go tile source node.
For example. if' a message. routed from CHANNEL 2 IN to C_ CHANNEL 4 OUT al.in intermediate router, a connection from CHANNEL 4 IN to CHANNEL 2 OUT is made for the status route.
21 The status route, like the niessage route, maintains its connection for the duration ofthe niessage.
Figure 6 illustrates the acknowledgement phase of the routing operation of the present invention. In Figure 6, node 601 corresponds to node 501 of Figure 5; router 611 corresponds to router 511, node 602 corresponds to node.502, etc. Figures 7 and 8 have similar labelling correspondence.
As shown by Ffinure 6,,in acknowledgement is sent from router 613 (corresponding with destination node 1 603) over channel 1 to intermediate router 612. liitci-iiie(litte router 612 forwards to acknowledeement to ori-in rower 611 over channel 0 where it is received by, node 2 601. As will be understood by.one of ordinary skill in the art, RDY stalus information is transmitted in the formats discussed above. Therefore, niessaLle information from a different origin node nlay be transmitted simultaneously with the status information over the sarne channel. If there are no requests to use the same channel, a status nibble (discussed above) is transmitted.
When the RDY bit finally reaches the source node 601, the message transmission phase begins. The source ROUTER can transmit data continuously into the network, (in the format described above) until the end of the niessa-C is sent or a not ready indication is received over the stattis route. In the preferred embodiment, messages are not buffered in the intermediate routers.
As can be scen with reference to Figure 7, the message information is frorn node 2 701 to router 711 and then out the serializer Of rOLIter 71 L The message information is then transmitted over [lie reserved route (CHANNEL 0 to intermediate router 712, CHANNEL 1 to destination router 713). The message is then deserialized at roter 713 an(] transmitted to destination node 703.
If, durino transmission of' the niessage, the source router 711 receives a not ready indication on incoming channel 0, it will discontinue ti.aii.siiii.ssioil of the niessage and transmit status nibbles.
When a ready indication is it,,,aiti received on incoming channel 0, the 22 source router will again begin transmission of the message. In the C_ preferred ernbodiniew, the destination router stores any message information which is ill transit at the tinie the not ready indication is active. Therefore, when a niessage is throttled by a riot ready indication, no data bits reniain stored on (lie network,, but rather are stored in the FIFO bUffer of the destination router. This method of throttling niessage transmission by receiving all indication the destination node is not ready and suspending transmission of the message ill response to such ill indication provides for flow control in cl the network of (fie present invention.
After complefion of' transmission of' a message, the source router appends a checkstini word to the niessage. The data format which contains the clieckstini word has the EON1 bit set. The checksum provides a rneans to verify messape integrity ill order to detect hardware faifures should theY ocetir.
As shown ill Fi!,Lit.e 8, the transmission of a word with the EOM bit set causes tile source rower 811 to release the outgoing channel (channel 0) reserved for the niessage. At each intermediate router in 11P the route (rouici. 812 ill tile illustrated example), the channel reserved for the niesstle is released when tile word with the EOM bit set is retransmitted. Those channels are then free to be used for other messages.
When a word xvith the EOM bit set is received at the destination router 813, it is assunied that tile accompanying data information is the checksum for the iiicsst,,e. The checksuni information is used to verify the integrity ofthe niessage. Since the checksum information is not part of the orioinal niessage, it is stripped off by the destination router 813. The result is stored for further inspection at the destination node 803.
As described carlier, strobe lines are used to validate data lines and also to provide a clock source for cl Subsequent router. Both rising edges and falling c(]-c% arc tiscd to validate data. Rising and falling edges may be denoted as even and odd edges, since the direction of 23 1 1 the strobe's transition can be, equivalently implemented in either direction. Figure 9 illustrates it sysleill using a strobe in this manner. Referring to Figure 9, a plurality oflatches (901, 902) is depicted. These latches are used to hold tile routin probe as it is received by the router. Each latch is coupled to a strobe line 903 and a data line 904 as shown in Figure 9. The pit irality of latches (901, 902) is logically divided into two banks; an odd bank 901 triggered on an odd edge of the strobe signal appearing oil strobe line 903 and and even bank 902 triggered oil,tit excti edge of the strobe signal. In this way, bits of the routing probe can he latched in and stored oil both rising and falling edges of the strobe. 'File strobe sikInal on strobe line 903 is amplified to produce a strobe out signal 905 which is coupled to each of the routers in the network. 17his coninion clock signal is used to provide a synchronous data transfer between routers.
The system for connecting, a plurality of routers to a common strobe firte as depicted in Hottre () may be subject to a limitation on the number of rOLIters thus connected. After the strobe signal on strobe line 903 is supplied to and amplified by a number cif routers, the strobe signal may be subject to pulse shrink-age. Pulse shrinkage occurs when a signal is buffered through devices that have unequal rise and fall times. Pulse slii-iiik.a,,c cart cause severe asymmetry in the strobe C_ signal and ultimately call cause data errors'.
Referring to Figures 10 and 11, an improved method for providing a common strobe signial to a plurality of routers is illustrated. This improved means and method is denoted herein as alternate edge strobe regeneration. The ithernate edne re,,,>eneration circuit eliminates the pulse shrinkaw problem by regenerating every other edge of the strobe signal,is it is routed through each router. All even edges of the strobe signal are modified by the routing hardware while odd edges of the strobe are sent unmodified to tile next router. It will be apparent to those skilled in the art that odd edges may coincide with rising edges of a signal while even ed,,Cs may correspond to the falling edge of a C_ signal. Similarly, the reverse configuration may equivalently be 24 implemented, that is. even edgees may coincide with rising edges of a signal while odd edges inay correspond to the falling edge of a signal.
Figure 10 illustrates a first alternative embodiment of alternate edge strobe regeneration. Figure 11 illustrates a second alternative embodiment of alternate edge strobe regeneration.
Referring to FiCI.LUT 10, a plurality of latches (1001 and 1002) is shown. These latches are sirnilar to those depicted in Figure 9 and used for receiving and sioring the routing probe. As in Figure 9, data line 904 is coupled to each latch.
Strobe line 903 is connected to latches 1001 and 1002 and a synthetic clock generator circuit 1003 in two equivalent ways as depicted in a firsI enibodinient illusirated in Figure 10 and a second embodiment illusintied in 11. The improved system includes a synthetic clock -encrator circuit 1003 used for modifying the strobe signal transmitted oil strobe line 903. The design and operation of synthetic clock vencrator 1003 is described below and illustrated in Figure 12.
Referrins-, to FiQure 10, strobe line 903 is coupled directly to odd latches 1001 and directly to synthetic clock generator 1003.
Strobe line 903 is also directly connected to the first even latch of the set of even latches 1002. The reniaining even latches 1002 are coupled to the output si(ic 1004 of synthetic clock generator circuit 1003. By coupling the latches to strobe line 903 as depicted in Figure 10, the odd edoes ofthe strobe sional drive latches 1001 as directly received oil strobe line 903 while the modified even edges produced by the synthetic clock generator 1003 drive all but the first of the even latches 1002. Similarly, the modified even edges of the strobe signal as output by the synthetic clock -enerator 1003 are amplified to produce c a strobe out signal 1005 which are then sent on to the next router. IIiis method of modifying tile strobe signal using alternate edge cl z4 0 regeneration eliminates pulse shrinkage.
Referrin. now to Fioure 11. a second alternative embodiment of the strobe connection is ilklStrated. In Figure 11, strobe line 903 is directly connected to synthetic clock generator 1003. In addition, strobe line 903 is directly connected to the first odd latch of the set of odd latches 1001 and the first even latch of the set of even latches 1002. Thereafter, all latches on the odd side 1001 and on the even side 1002 are connected to the output side 1004 of synthetic clock generator 1003. As configured in this second embodiment, as illustrated in Fi-Llre 11, both tile unmodified odd edge of the strobe signal and the modified even edge of the strobe signal are supplied by the output side 1004 of'synthetic clock generator 1003. This second embodiment is functionally equivalent to the first embodiment depicted in Figure 10. The modified strobe signal is amplified to produce a strobe out signal 1005 which is passed on to the next router. Using the configuration of either the first or second embodiments, the synthetic clock generator 1003 operates to prevent pulse shrinkage of the strobe signal from corrupting data received by latches 1001 and 1002 in each router of the network.
Referring now to Figure 12, a preferred embodiment of the synthetic clock o nerator Cil-CUit 1003 is illustrated. The circuit 1003 L-e includes delay devices, flip-flop devices, and logic devices each of which are independently well known in the art. Strobe line 903 provides the single input to synthetic clock generator 1003. In the preferred embodiment of the present invention, the strobe signal carried on strobe line 903 is an active low signal that initiates an active pulse by the occurrence of a falling edge. It will be apparent to those skilled in the art that an equivalent alternative embodiment may be implemented where the strobe signal is an active high signal where an 0 active pulse is initiated by a rising edge.
L_ 0 Initially, flip-flop devices FFl 1203 and FF2 1204 are in a reset condition after a reset signal 1320 is applied to the clear input of both 0 flip-flops 1203 and 1204. Both FFI and FF2 output a low Q signal when reset. Since the Q Output 1215 of FF2 1204 is low and the signal on line 1211 is initially high, the output of gate 1205 is low. Thus, once the reset signal 1320 is removed (goes low), the clear input 1216 to FFI 26 A 1203 becomes inactive. A thiling diagram depiction of reset signal 1320 is depicted in Figure 13.
Referring 82Mfin to Figure 12 all(] Figure 13, the strobe signal on strobe fine 903 is initially in a high state. Ilius, after tile strobe signal is inverted oil tile clock inpul to FI-1 1203, tile output Q 1212 of FP] 1203 renlains ill it resel stale. Delay components 1201 and 1202 are used for delaying both the fising and tile fiffling (o(ld and even) edges or a signal as it passes through tile component. The delay components 1201 and 1202 are wcli known to those of ordinary skill in tile art.
When a falling edge 1302 occurs oil strobe fine 903, both delay component 1201 and delay component 1202 receive the falling edge at nearly the sanie finie. After a delay of'j nanoseconds U and k are positive numbers), tile clock sitInal 1210 applied to FP] 1203 transition. % to an active stale thereby catising, the Q output 1212 of FFI 1203 to is transition to a high state,is %flown by rising edge 1304 ill Figure 13.
L- C" Referring again to Figure 12, tile output Q 1212 of FFI 1203 is also inverted and received by tile clock input of FF2 1204. Wheil the output Q 1212 1 1203 goes to a low state, the clock input 1212 to FF2 1204 becomes active. However, when the clear input 1211 to FF2 1104 is high, (lie Q output 12 15 of FF2 1204 remains low.
After a delay of j+k nanoseconds referenced from the time of tile falling edge 1302 on'the strobe signal 903, a falling edge appears on (lie output side 1211 of delay 1202. This falling edge signal oil line 1211 is supplied to both Rate 1205 and file clear input to FF2 1204. As thesignal an fine 1211 transitions to a low state. both inputs to gate 1205 a're low. Since both inputs to gate 1205 are low, tile output 1214 0 produced by gate 1205 transitions to a high state. 'Mis high state on 1 line 1214 is passed 1hrough gate 1241 and serves to clear FFI 1203 thereby causing a transiflon at tile Q output 1212 of FFI 1203 from a high state to a low slate. Ilitis, a falling edge appears on synthetic strobe output 1213. 17his synthetic failing edge 1306 is depicted in Figure 13. Since tile cleat. input to FF2 1204 is inactive and the clock input 1212 is now Bel ive, tile Q output 1215 of FF2 -1204 transitions to 27 a high state. This transition serve.% to render inactive the clear signal 1216 to FFI 1203 so fliat it is not active when the next rising transition on line 12 10 is received.
Thus, summarizing the operation of the circuit depicted in Figure 12, a falling edge on strobe line 903 causes the Output Q 1212 of FFI 1203 to rise after j nanoseconds. This can occur because the clear input to F171 1203 is disabled. Afterj+k nanoseconds, the signal on line 1211 supplied to CAte 1205 transitions to a low state thereby setting the clear input 1216 to FFI 1203. This clear signal serves to reset FFI 1203 causing the synthetic falling edge to appear on synthetic strobe output line 1213. IT2 1204 then denates the clear signal to FFI 1203 ensuring that IT1 12031 can respond to the next strobe low-to-high C transition.
Using a preferred embodiment of the synthetic clock generator circuit as depicted in Figure 12 and described above, a synchronous strobe signal can be supplied to an unlimited number of routers in a parallel processing network without the danger of experiencing data loss due to a pulse shrinkage condition occurring on the strobe line.
Thus, an improved parallel processing computer system with alternate edge strobe regeneration is described. Although the present invention has been described with specific reference to a number of details of the preferred embodinient, it will be apparent that a number of modifications and wariations may be employed without departure from the scope and spirit of-the present invention. Accordingly, all such variations and modifications are included within the intended scope of the invention as defined by the following claims.
28 11 1

Claims (20)

  1. I. A computer systen) having at least three nodes, each of said nodes comprising:
    a processing niewls for processing information.
    a router niewls for rotiling inforniation between nodes, said router means having:
    (a) illeans for accepting address information from said processing means., (b) daia coniniunication nicans for receiving avid transmitting data and stalus information with an adjacent node; (c) strobe coninitinication nicans for receiving strobe signals with said inrorniation and transmitting %aid strobe signals with said c information to a next router Ilicans unfil a destination node is reached, and is (d) strobe modification means for modifying said strobe signals being received with said information by said strobe communication means.
  2. 2. The comptiter systeni as claimed in claim 1 wherein said strobe 20 modification illeans includes a spithelic strobe generator.
  3. 3. The computer system is recited by Claim 1 wherein said strobe communication Illealis provides all even c16ck edge and an odd clock edge, said even clock edge indicating valid data for said data c -- conillitin icat i 011 Illealls. said odd clock edge further indicating valid data for said data communication means.
  4. 4. The computer systeill as claimed in claim 1 wherein said router illeans ftirther includinc:
    latching nicans lor receiving said information sent to said router, said hatching illeans coupled to said strobe communication means and c said strobe modification nicans.
    29 1
  5. 5. The computer sysieni as clainied in claim 1 wherein said router means further includin.l: even latching rneans for receiving nil even portion of said information communicaled to said router, said even latching fileans coupled to said strobe communication illeans and said strobe modification nleans, wherein said even latching means receives a modified strobe siOnal; and odd latching nicans for receiving,ill odd portion of said information communicated to said router, said odd latching means coupled to said strobe conlinuilication means and said strobe modification iliciii%, wherein said odd latching means receives all unmodified strobe
  6. 6. The computer system.is clainied ill clairn 5 wherein said odd latching means receives a modified strobe signal.
  7. 7. The coniptites- system as clainied ill claim 1 wherein said strobe modification nicans further includes delay means for delaying the communication of said strobe sionals for a Fixed period of time.
  8. 8. The computer systeill as clainled ill clairn 3 wherein said even clock edges are synthetically generated by said strobe modification means based oil said odd edges.
  9. 9. The computer systern as, claimed ill claim 3 wherein said strobe modification nicans leave.% said odd edges substantially unmodified while said strobe modification means synthetically generates said even edges.
  10. 10. A parallel processing computer systern having a plurality of nodes for processing inforniation, maid plurality of n'odes interconnected in a binary n-cube, each of said nodes coupled with adjacent nodes for 311 1 1 providing a conirnunicafion route between said nodes, each of said Communication routes Comprising: a first line for conimunicating information from a first node to a second node, 1,01. t-011111 nill icat ing synthetic strobe information b between %.,lid finst ilode and %.,lid second node, said synthetic strobe information for clockint, %,.lid data information; and alternate edge strobe regeneration means for synthetically generating alternate portions of said strobe inforiliation.
  11. 11. Ill a computer sysleni having,it least three nodes, each of said nodes having a processing illeans for processing information and a 0 router means for routing information between nodes, a process for Communicating inforillatioll Comprising tile steps of:
    c is (a) acceptint, address inforillation from said processing mealls; (b) receiving strobe signals from a rquier means; (c) niodifying, said said strobe signals being received ill said receiving step to produce iiioclitie(l strobe signal.%; (d) receiving data and stalus information from an adjacent router means using unmodified strobe signals, and (e) transmitting dala and status ifIF01'111,1tion using said modified strobe signals to all adjacent router means.
  12. 12. The process as clainled ill elitini 11 wherein said modifying step is perfornied by a synthetic strobe generator.
  13. 13. The process,is recited by Claim 11 wherein said strobe signals include an even clock edge and an odd clock edge, said even clock edge indicating valid data for said receiving data step and said transmitting data step, said odd clock edge further indicating valid data for said said receiving data step and said transmitting data step.
  14. 14. The process as claimed in claim 11 further including the step of:
    31 t latching said itilot.iii.,,tit)li sent to %,,lid router, -,..lid latching step performed using an unmodified strobe signal.
  15. 15. Ile process as clainled in claini 11 further including the steps of:
    latching,ill even portion information communicated to said router, said even latellitiO step performed using all unmodified strobe signal, transmitting a modified even edge of said strobe signal to a next router; and latching in odd portion ofsaid information communicated to said router, said odd latchim. step using,ill unmodified strobe signal.
  16. 16. The Process 'IS Claillied ill claim 11 Further including the steps of:
    latchin.g ill even portion of.,;,ti(f inffinllation communicated to said router, said even hatching, step performed using said modified strobe signal., and latching ill odd portion of said information communicated to said router, said odd latellint, step performed using said modified strobe signal.
  17. 17. The process as claimed ill claim 11 further including a step of:
    delaying the ofsaid strobe signals for a fixed period of tinle.
  18. 18. Tile pi.oces.% as claimed ill claim 13 wherein said even clock edges are synthetically generated by said strobe modification tneans based oil said odd edges.
  19. 19. The process as claimed ill clainj 13 wherein said odd edges are substantially tiniilodifid in said modifying step while said even edges are synthetically igenerated.
    32
  20. 20. In a parallel processing computer system having a plurality of nodes for processing information, said plurality of nodes interconnected in a binary ri-cube, each ol'said nodes coupled with adjacent nodes for providing a coninitin ieti i oil route between said nodes, a process for 5 communicating information comprising the steps of:
    coninitinicalino data and stattis inforrnation from a first node to a second node; communicatino synthetic strobe information between said first node and said second node, said synthetic strobe information for 10 clockinCY said data and staltis information; and L_ s nthetically generating ahernate portions of said strobe y & information.
    33
GB9118463A 1990-10-15 1991-08-29 Message routing in a multiprocessor computer system Expired - Lifetime GB2249243B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US59707390A 1990-10-15 1990-10-15

Publications (3)

Publication Number Publication Date
GB9118463D0 GB9118463D0 (en) 1991-10-16
GB2249243A true GB2249243A (en) 1992-04-29
GB2249243B GB2249243B (en) 1994-10-05

Family

ID=24389979

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9118463A Expired - Lifetime GB2249243B (en) 1990-10-15 1991-08-29 Message routing in a multiprocessor computer system

Country Status (4)

Country Link
DE (1) DE4134012B4 (en)
GB (1) GB2249243B (en)
HK (1) HK56595A (en)
SG (1) SG10795G (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1329855C (en) * 2003-11-14 2007-08-01 华为技术有限公司 Double-CPU micro-kernel based on MIPS64

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0132826A1 (en) * 1983-07-25 1985-02-13 Ciba-Geigy Ag N-(1-nitrophenyl)-5-aminopyrimidine derivatives, their preparation and use
WO1987000374A1 (en) * 1985-06-27 1987-01-15 American Telephone & Telegraph Company Reliable synchronous inter-node communication in a self-routing network
US4933933A (en) * 1986-12-19 1990-06-12 The California Institute Of Technology Torus routing chip
GB2227341A (en) * 1989-01-18 1990-07-25 Intel Corp Message routing in a multiprocessor computer system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4598400A (en) * 1983-05-31 1986-07-01 Thinking Machines Corporation Method and apparatus for routing message packets
JPH0640649B2 (en) * 1986-04-16 1994-05-25 株式会社日立製作所 Multi-stage regenerative repeater

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0132826A1 (en) * 1983-07-25 1985-02-13 Ciba-Geigy Ag N-(1-nitrophenyl)-5-aminopyrimidine derivatives, their preparation and use
WO1987000374A1 (en) * 1985-06-27 1987-01-15 American Telephone & Telegraph Company Reliable synchronous inter-node communication in a self-routing network
US4933933A (en) * 1986-12-19 1990-06-12 The California Institute Of Technology Torus routing chip
GB2227341A (en) * 1989-01-18 1990-07-25 Intel Corp Message routing in a multiprocessor computer system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1329855C (en) * 2003-11-14 2007-08-01 华为技术有限公司 Double-CPU micro-kernel based on MIPS64

Also Published As

Publication number Publication date
SG10795G (en) 1995-09-01
GB2249243B (en) 1994-10-05
DE4134012B4 (en) 2005-10-27
HK56595A (en) 1995-04-21
DE4134012A1 (en) 1992-04-16
GB9118463D0 (en) 1991-10-16

Similar Documents

Publication Publication Date Title
US5347450A (en) Message routing in a multiprocessor computer system
Nugent The iPSC/2 direct-connect communications technology
US5398317A (en) Synchronous message routing using a retransmitted clock signal in a multiprocessor computer system
US5175733A (en) Adaptive message routing for multi-dimensional networks
US4623996A (en) Packet switched multiple queue NXM switch node and processing method
Tamir et al. Dynamically-allocated multi-queue buffers for VLSI communication switches
Duato et al. Performance evaluation of adaptive routing algorithms for k-ary n-cubes
KR900006791B1 (en) Packet switched multiport memory nxm switch node and processing method
US5594866A (en) Message routing in a multi-processor computer system with alternate edge strobe regeneration
US6272134B1 (en) Multicast frame support in hardware routing assist
US6304568B1 (en) Interconnection network extendable bandwidth and method of transferring data therein
JP2533223B2 (en) Multi-stage communication network
JPH10506736A (en) Adaptive routing mechanism for torus interconnection networks.
US7773616B2 (en) System and method for communicating on a richly connected multi-processor computer system using a pool of buffers for dynamic association with a virtual channel
JP2004525449A (en) Interconnect system
US20080107106A1 (en) System and method for preventing deadlock in richly-connected multi-processor computer system using dynamic assignment of virtual channels
JPH06214965A (en) Digital computer
US20080109586A1 (en) System and method for arbitration for virtual channels to prevent livelock in a richly-connected multi-processor computer system
GB2249243A (en) Message routing in a multiprocessor computer system
Aoyama Design issues in implementing an adaptive router
Mu et al. A 9.6 GigaByte/s throughput Plesiochronous routing chip
Rekha et al. Analysis and Design of Novel Secured NoC for High Speed Communications
Golota et al. A universal, dynamically adaptable and programmable network router for parallel computers
KR0164966B1 (en) The multistage interconnection network with the folded structure and loop-back function
Corporaal et al. Design and evaluation of communication processors supporting message passing in distributed memory systems

Legal Events

Date Code Title Description
PE20 Patent expired after termination of 20 years

Expiry date: 20110828