WO1994016500A2

WO1994016500A2 - A structured programmable datapath for a digital processor

Info

Publication number: WO1994016500A2
Application number: PCT/US1993/012573
Authority: WO
Inventors: Le Trong Nguyen; Ho Dai Truong
Original assignee: Seiko Epson Corporation
Priority date: 1992-12-31
Filing date: 1993-12-23
Publication date: 1994-07-21
Also published as: WO1994016500A3

Abstract

A digital processor datapath comprising a plurality of bit slices (202) arranged on a chip in mirrored pairs (204), each bit slice (202) comprising a plurality of basic cells (203), wherein adjacent bit slices (202) form channelless boundaries therebetween. Each basic cell (203) comprises a plurality of device layers. The plurality of device layers are identical for each basic cell (203). A plurality of mask programmable conducting layers are formed over said device layers. The mask programmable conducting layers and the device layers are selectively interconnected, so that each basic cell (203) forms one of the electronic functions comprising multiplexing, inverting, latching, NANDing, NORing, exclusive ORing and exclusive NORing.

Description

A Structured Programmable Datapath for a Digital Processor

Inventors

Le Nguyen Ho Dai Truong

Mask Work Notice

The drawings of this patent document contain material to which a claim of mask work and/or copyright is made. The mask work and /or copyright owner, assignee herein, has no objection to the duplication of the drawings, but reserves all other mask work and /or copyright rights whatsoever.

Background of the Invention

1. Field of the Invention

The present invention relates to digital processors, and more particularly, the present invention relates to structured programmable datapaths for digital processors.

2. Related Art

This section includes a brief introduction to the Central Processing Unit (CPU) datapath, a discussion of datapath design goals, and a discussion of conventional approaches to Very large Scale Integration (VLSI) design. A more general discussion of datapaths is found in a book titled, VLSI RISC Architecture and Organization,

(Marcel Dekker, Inc., 1989; Chapters 1 and 4), by Stephen B. Furber.

A. The CPU Datapath

The CPU of a computer is the part that executes instructions and does the computing. It consists essentially of one or more functional units, registers, and control logic. Typically, VLSI processors include functional units such as an Arithmetic and Logic Unit (ALU), a shifter, and the like. The registers include user registers, plus others which are not user accessible. The control logic is a mixture of programmable logic arrays (PLAs), random logic, and read only memory (ROM) for storing microcode. The section of the CPU through which data flows is called the "datapath." The datapath includes functional units, and those registers that contain data. Datapaths are usually organized around "buses," which are common routes for moving data from one register to another or to a functional unit. A 32-bit CPU, for example, will use a 32-bit wide bus so that 32 bits may be moved "in parallel." The number of available pathways comprising a bus is an important determinant of cost and performance. User registers are usually arranged into a regular block (or bank) where the design of the block can be highly optimized for speed and space. Blcck size depends on the number of pathways that pass through it. B. Datapath Design Goals

Two major datapath design goals include: 1) minimizing turn around time from initial to final design, including design modifications, and 2) the speed at which the datapath can process data. A third goal, which is common to semiconductor chip design in general, is to achieve high packing dessity on the chip.

Today's microprocessor developers typically design one basic datapath for a chip (or a small family of chips) that is mass produced and sold to many different customers. Large design teams may take years to conceptualize, design, layout and prototype the datapath for such a chip. Additionally, if a chip fails to work properly, it may take months to identify the problem and develop a solution.

There is a great incentive to produce chips that function properly the first time. This is not a simple task given a chip that has the complexity of a 32 bit processor because the ability to accurately varify a VLSI design prior to fabrication, is extremly time intensive. Once a design is correct, it is tempting to assume that mass production is straightforward. This assumption is false however. When a large integrated circuit is manufactured in volume at least half of the manufactured chips will not work. The defects that cause failure are varied and random, and all chips must be thoroughly tested to identify the rejects. Many test vectors must be used to exercise all the transistors on the chip to ensure they are functioning. Critical paths must be measured to make sure that transistors are up to the required speed and strength.

Designing for testability is particularly vital for conventional datapath design, because testing costs can be a significant proportion of the total chip cost. Design and development of test programs absorbs as much effort as the logic design of the datapath, and can be greatly eased by careful consideration of test issues during the design phase. Furthermore, hardware, usually in form of added logic, is sometimes added to simplify testing. This addition increases cost in the form of chip area.

The ability to easily modify or change an existing design can alleviate the pressure of having to produce a device that works the first time. For datapaths, however, modification is difficult because of conventional datapath design techniques. In conventional datapath design, the inability to easily modify or change a design is based on the fact that the designs are optimized for speed. Because performance of the CPU is ultimately limited by datapath cycle time, design is optimized by looking at the principle components that determine that cycle time. This usually requires handcrafting sections of the datapath.

G Conventional Approaches to VLSI Design

Once the datapath design is chosen and data routes have been determined for every instruction, control logic is added. Control logic controls the flow of data through the datapath. It is important that control logic does not limit the datapath cycle time. It is not necessary, however, to make the control logic any faster than is necessary to keep the datapath cycling at its maximum speed. Design methodology for the control logic is more orientated towards fast implementation than fast operation. VLSI designers strive to build regular structures using standard cells, gate arrays, and the like. However, conventional regular structures tend not to meet the above noted datapath speed goals. Gate arrays are conventionally used for control logic, not for datapaths. They have been ignored by datapath designers because the implementation of a datapath in a conventional gate array is inefficient, and therefore the operation is slow.

Because datapath redesign is time consuming, control logic redesigns are implemented using gate arrays because of their repetitive cell structure and their ease of re-design. Quick design turn around of control logic has been achieved using "channeless gate array" structures. For example, U.S. Pat. No. 5,079,614 discloses a channeless gate array including comb-shaped gate electrodes and a basic cell layout that permits simplified interconnection patterns for high utilization rates.

ROM's and programmable logic arrays (PLA's) are also used to implement control logic. The overall structure of ROM's and PLA's is regular and the functionality is defined by the presence or absence of a minor feature (e.g., a fused link) in the array. These devices are usually generated automatically from a tabular or Boolean expression of the desired function, and are therefore very easy to layout and to modify if necessary. A drawback to these devices is that they tend to be slower and require more layout area than a similar implementation based on gates.

The intermediate approach to random control logic design is to use a cell library, where a set of standard gates, latches, flip-flops, and the like, is designed and characterized. The circuit designer selects appropriate cells and wires them together to implement a design. This approach does not yield the smallest possible layout, but datapath speed can be good.

A datapath layout is needed that would facilitate non-trivial modifications thereto without costly redesign. The ability to simply correct the datapath is taboo according to conventional datapath wisdom when a post layout design logic flaw is detected. Rather than re-routing data using extra control logic, what is desired is a datapath that is less design rigid, fast and can be easily modified.

Summary of the Invention The present invention is directed to a structured CPU datapath comprising basic cells. The basic cells are laid out in a channelless matrix. The matrix is partitioned into repetitive, mirrored pair bit slices. The bit slices are arranged in mirrored pairs so that adjacent bit slices can share power and ground buses. Each mirrored pair shares a center power bus and adjacent mirrored pairs share ground buses, or vice versa.

The basic cells are programmable and easily modified. In addition, the basic cells are configurable to realize a plurality of standard operations with the necessary speed to implement a CPU datapath. The programmability feature is performed by configurating a datapath using the upper layers of the chip, including polysilicon and metalization layers. The lower layers of the chip comprise the active devices (i.e., transistors) of the datapath, and include conductive and semiconductive layers. The upper and lower layers need not be laid out according to conventional standard cell library techniques. In contrast to conventional CPU datapath layout techniques, the steps necessary for manufacturing a datapath according to the present invention are the same for different datapath designs. Therefore, changes to the datapath can be made without time consuming redesign of the active device layers.

Although it is counter-intuitive according to conventional CPU design theory to implement a datapath using a basic cell approach, the present invention permits the size of the datapath, number and type of functional units, number of registers, and the like, to be changed rather easily after completion of original datapath layout and wiring. This is not possible using conventional datapath layout techniques.

The present invention eliminates a significant portion of the time required for conventional datapath design. By eliminating the time for laying out the datapath elements and buses, the present invention provides for quicker turn around time for custom CPU designs compared to conventional techniques.

Custom design of datapaths according to the present invention permits manufacturers to custom design datapaths based on the customers' specific needs. Thus, the present invention allows production of many microprocessor chips having different datapaths for separate customers in a short period of time, rather that one chip for many customers as in today's marketplace.

The foregoing and other features and advantages of the present invention will be apparent from the following more particular description of the preferred embodiments of the invention, as illustrated in the accompanying drawings.

Brief Description of the Drawings

FIG. 1 is a block diagram of a conventional CPU datapath.

FIG. 2 shows a high level block diagram of a datapath having mirrored pair bit slices comprising basic cells in accordance with the present invention. FIG. 3 shows a representative transistor-level diagram of a preferred embodiment of a basic cell of the present invention.

FIG's. 4A-C and 5A-C show examples of a basic cell layout in accordance with the present invention.

FIG's. 6A-B and 7A-B show circuit diagrams corresponding to the basic cells in FIG's 4A-C and 5A-C, respectively.

FIG's. 8A-B and 9A-B show further examples that can be implemented using the basic cell of FIG. 3.

FIG's. 10A shows a representative transistor-level diagram of another basic cell of the present invention.

FIG's. 10B-C show layouts of the basic cell in FIG. 10A without and with metalization layers.

In the drawings, like reference numbers indicate identical or functionally similar elements. Detailed Description of the Invention

The present invention will now be discussed in connection with a VLSI microprocessor chip's CPU datapath. However, the present invention can be applied to any system having a datapath, as will become evident to those skilled in the art. The present invention is therefore not limited to a microprocessor per se. The terms microprocessor, host, CPU, and digital processor are often used interchangeably in this field. The term "CPU" is used hereafter with the understanding that other similar terms could be substituted therefore without changing the underlying meaning of this disclosure.

The terms chip, integrated circuit, semiconductor device and microelectronic device are often used interchangeably in this field. The present invention is applicable to all of the above as they are generally understood in the field.

The expression, rail-to-rail, is generally understood to mean switching the voltage magnitude of a signal from the most positive to the most negative power supply voltage available in the device, or vice versa. This is the meaning given to this expression throughout the instant description of the invention.

The terms metal line, trace, wire, conductor, signal path and signalling medium are all related. These terms are generally interchangeable, and appear in order from most specific to most general. In this field, metal lines are sometimes referred to as traces, wires, lines, interconnect or simply metal. Metal lines, generally Al or an alloy of Al and Cu, are conductors which provide signal paths for coupling, or interconnecting, electrical circuitry. Conductors other than metal are available in microelectronic devices. Materials such as doped polysilicon, doped single-crystal silicon (often referred to simply as diffusion, regardless of whether such doping is achieved by thermal diffusion or ion implantation), Ti, Mo, or refractory metal silicides are examples of other conductors. Signalling medium is the most general term and encompasses the others.

The term power bus(es) as used in this application refers collectively to metal lines which connect circuitry, substrate or wells to voltage supplies such as V^d, V_ss, ground or any other voltage supply used by the chip. The terms pass gate, pass device, pass transistor, transfer gate, transfer device and transmission gate are used interchangeably for the purposes of this disclosure, and are used to describe a transistor circuit which electrically couples/decouples a first node to/from a second node under control of signal applied to at least one MOSFET gate electrode.

I. Basic Operation of a Datapath.

A typical datapath operation reads data from two registers, combines the data in an ALU, and writes the result back to a register, requiring three accesses in all. Normally, therefore, the register bank will have at least two read buses and one write bus. (Some designs use the same physical bus for both reading and writing.) If a CPU is to achieve a data STORE operation in a single cycle using base-plus-index addressing (a normal addressing mode on a Complex Instruction Set Computer (CISC)), then it must read three registers in the cycle, which requires three read buses. Datapath design therefore usually originates with the register bank. The number and width of the registers is generally fixed in the instruction set design (though not always), as are the required operations and addressing modes. If the design calls for a very high number of register bits, then the pressure to keep the number of physical buses low is great. If the number of register bits is moderate, it may be possible to gain performance by providing many access buses, thus allowing greater parallelism. Once the register bus structure is decided, the functional units must be connected to those or other buses, and an analysis performed of the data routes required by the various instructions. The bus structure design is iterated until the cost/performance balance is appropriate. A single CPU datapath is shown in FIG 1. A register bank 102 has two read ports 104 and 106 which are the sources of ALU operands, and one write port 108 for the ALU results. ALU results may also be written to a memory address register 110. For a store instruction where the address has been set up in a register, that address is placed onto a B bus 111 and fed to memory address register 110 via an ALU 112 (which is configured for this operation to feed a B bus 111 input directly to the result), and the result to be stored is placed onto A bus 113 and sent to memory (not shown). A load could use the same route to generate the address, and then feed the loaded value from A bus 113 and ALU 112 into the destination register.

This datapath will handle internal operations and memory data accesses satisfactorily, but will be very inefficient at fetching instructions. Unless a separate instruction fetch unit is used, every new instruction will require an ALU operation to increment the program counter (which must be kept somewhere in the general register bank). If the instruction set is very complex and uses many cycles per instruction, the overhead of processing the program counter in the main ALU may be acceptable. A RISC processor is certain to contain some additional functional unit (for instance an address incrementer) to reduce contention between the instruction fetch and execution activities.

Once a datapath design is chosen similar to that shown in Figure 1, additional buses, registers, functional units and the like, are then added to avoid the bottlenecks that are identified in the course of determining the data routes for each instruction and the instruction set. It is useful to have a clear understanding of which instructions are most important to the eventual performance of the CPU, so that the datapath is optimized to these instructions in preference to less critical instructions. It is also useful to identify the theoretical minimum number of cycles for each instruction.

II. Layout Regularity

An important aspect of general VLSI design layout is regularity. The present invention achieves a very high level of regularity for a CPU datapath. Although datapaths permit the logic for all bit slices to be identical in most respects, carry look-ahead circuitry tends to break this regularity (repeating typically only every four bits across the ALU). As a consequence of such layout considerations, routability between cells, as well as due to speed constraints, basic cell techniques have not been used by designers to implement datapaths. HI. The Datapath with Mirrored Pair Bit Slices

A high level block diagram of a channelless datapath having mirrored pair bit slices comprising basic cells is shown in FIG. 2. FIG. 2 shows datapath bit slices 202 laid out adjacent to one another. Each datapath bit slice 202 comprises a plurality of basic cells 203. Mirrored pairs of datapath bit slices 202 are shown generally at reference numerals 204. Each mirrored pair of bit slices shares a center power or ground bus and adjacent mirrored pairs 204 also share power or ground buses at their borders. For example, voltage supply buses 206 found in the center of each mirrored bit slice pair are labeled Vgg. Thus, each bit slice of a mirrored pair shares a single Vgg bus. In this example, adjacent mirrored pairs 204 share a bus 208 such as voltage supply VDD. FIG. 2 is a very general drawing and does not show data lines which would generally lie in the vertical direction, nor control lines which would generally lie in the horizontal direction.

IV. The Datapath Basic Cell

A representative transistor-level diagram of a preferred embodiment of basic cell 203 is shown generally in FIG. 3. Basic cell 203 in FIG. 3 comprises an N-well and a P-well shown at 302 and 304, respectively. Three P-channel MOSFET's labeled Pi through P3 are formed in N-well 302. Four N-channel MOSFET's labeled

Nj through N4 are formed in P-well 304.

In a preferred embodiment of the present invention, a single polysilicon dual metal process is used for manufacturing of the datapath. A first set of metal interconnect (called metalization Mj) runs horizontally across the basic cells. A second set of metal interconnect (called metalization M2) runs vertically across the basic cells. The directions for metal interconnectors Mi and M2 are shown by the arrows in FIG. 3.

Each basic cells 203 of the present invention is configurable to realize a plurality of standard datapath operations. Two example basic cells are shown in FIG's. 4A and 5A. FIG's. 4A and 5A show the diffusion regions for transistors PI-P3 and N1-N4, as well as the polysilicon gate electrodes for those transistors. The diffusion regions, polysilicon layers, and substrate of the basic cell 203 shown in FIG's. 4A and 5A are identical. The polysilicon gates corresponding to each transistor are pointed to by arrows labeled with the respective transistor's name. In addition, high concentration N⁺ diffusion regions 402 and P+ diffusion regions 404 are also shown. Regions 402 and 404 function as low resistivity contact regions for biasing wells 302 and 304, respectively.

Upon inspection of FIG's. 4A and 5A, those skilled in the art will readily identify those diffusion regions which are common to adjacent transistors. For example, a diffusion region 406 is common to transistors Pi and P3. Region 406 functions as the source for P3 and a drain for Pi. As a further example, N diffusion region 408 in P-well 304 serves as a common source or drain for transistors N2-N4. FIG's. 4B-C and 5B-C show metalization layers Mi and M2 along with contacts and vias which are used to interconnect transistors P1-P3 and N1-N4 to realize two separate circuits having the same basic cell 203.

FIG. 4A shows the basic cell for two interconnected inverters which form a buffer circuit. FIG. 4B shows the basic cell of FIG. 4A along with metal interconnection Mi for the buffer circuit. FIG. 4C shows the basic cell of FIG. 4A with the metalization of Mi of FIG. 4B together with a second metalization layer M2. Thus, FIG. 4C shows a completed buffer circuit based on the basic cell 203.

FIG. 5A shows a basic cell which is identical to the basic cell of FIG. 4A, but will be configured to form a two-input multiplexer (MUX2) as demonstrated in FIG's. 5B and 5C. FIG. 5B shows the basic cell of FIG. 5A and a first metalization layer M i . FIG. 5C shows the basic cell 203, metalization layer Mi and a second metalization layer M2 for forming MUX2.

Turning again to FIG. 4B, six Mi metalizations with contacts to polysilicon gate electrodes or diffusion regions are shown generally at 410- 420. Similarly, FIG. 5B shows eight Mi metalizations with contacts to gate electrode polysilicon or diffusion regions. The eight Mi metalizations of FIG. 5B are labeled 502-516.

FIG. 4C shows the second metalization M2 for implementing the buffer circuit. A total of 14 M2 lines vertically traverse the basic cell. However, only four M2 lines are used to connect to various Mi locations. The four M2 lines and their via connections to Mi are shown at 422-430. Note that the vias shown at 426 and 430 both connect to a single M2 line which is connected to voltage supply Vgg. A circuit diagram of the buffer circuit of FIG's. 4A-4C is shown at FIG. 6A. A schematic diagram of the two inverters forming the buffer is shown at FIG. 6B. The second layer of metalization M2 for MUX2 of FIG's. 5A and 5B is shown in

FIG. 5C. In this example, five M2 lines are utilized. The five M2 interconnect lines and their respective vias to Mi are shown generally at 518-526. A final circuit diagram of MUX2 in FIG's. 5A-5B is shown at FIG. 7A. A schematic diagram of MUX2 is shown at FIG. 7B. FIG. 7A shows two inputs A and B at the left side of the figure and a single output (OUT) at the right side of the figure. Two control inputs Ci and C2 are provided to transistors N2 and N3, respectively, which function as pass transistors. As shown in FIG. 7B, transistors Pi and Ni form an inverter for outputting the input selected by control signals Ci and C2. Transistors P3 and N4 form a small feedback inverter for stabilizing the output of the inverter formed by transistor Pi and transistor Ni.

In addition to the buffer circuit of FIG's. 4A-4C and the multiplexer of FIG's. 5A-5C, basic cell 203 may also be configured using metalization layers Mi and M2 to form a two input NAND circuit, a two input NOR circuit, an exclusive OR, an exclusive NOR, or a simple latch. Circuit diagrams and representative layout configurations for the NAND and NOR embodiments are shown FIG's. 8 A and 8B and FIG's. 9A and 9B, respectively. Interconnections for the latch and the exclusive OR and exclusive NOR circuits should become evident to those skilled in the art based on the above description of the present invention. In addition, multiple input multiplexers can be built using two or more vertically adjacent basic cells 203. Similarly, more complex circuits such as flip-flops, barrel shifters, and combinational logic circuits can be realized in a datapath according to the present invention as will become evident to those skilled in the art.

A further exemplary basic cell 1002 is shown in FIG's. lOA-lOC. In this further embodiment, the basic cell is larger than basic cell 203. Basic cell 1002 comprises four P-channel transistors P1-P4 and five N-channel transistors N1-N5.

Basic cells 1002 can be arranged in a similar fashion as basic cells 203 of FIG. 2. FIG. 10B shows the diffusion regions and polysilicon gates of transistors P1-P4 and transistors N1-N5 for basic cell 1002. FIG. 10C shows a complete two input NOR circuit for basic cell 1002 having two metalization layers and respective interconnects therebetween.

The present invention may easily be extended to other types of basic cells that are configurable to implement circuits with functional requirements different than a general datapath without departing from the scope of the above description. A digital signal processor's datapath may require basic cells capable of being configured to perform a set of logic functions that are different form those required by the basic cells in a graphic processor's datapath, for example. The basic cells of the digital signal processor datapath may be layed out so that the cells can be configured as various latch devices. Alternatively, basic cells of the graphics processor datapath may be layed out so that the cells can be configured to imlpement devices for arithmetic operations.

While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. Thus the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

Claims 1 •What is claimed is:

1. A digital processor datapath comprising: a plurality of bit slices arranged on a chip in mirrored pairs, each bi slice comprising a plurality of basic cells, wherein adjacent bit slices for channelless boundaries therebetween and each basic cell comprises: a plurality of device layers, said plurality of device layers bein identical for each basic cell; and a plurality of mask programmable conducting layers formed over said device layers; and means for selectively interconnecting at least one of said plurality o mask programmable conducting layers and at least one of said device layers, wherein each basic cell forms one of the electronic functions comprisin multiplexing, inverting, latching, NANDing, NORing, exclusive ORing and exclusive NORing.

2. The digital processor datapath according to claim 1, further comprising: voltage supply buses, wherein each bus is positioned above a bit slice boundary, and said buses are implemented using a subset of said plurality o mask programmable conducting layers.

3. The digital processor datapath according to claim 2, wherein each basi cell includes a p-type conductivity well and an n-type conductivity well forme adjacent one another and having a dividing boundary oriented parallel to sai bit slice boundaries.

4. A method of manufacturing a digital processor datapath comprising the steps of:

1) forming a plurality of device layers in a plurality of bit slices arranged in mirrored pairs, each bit slice comprising a plurality of basic cells, adjacent bit slices having boundaries therebetween, wherein said plurality o device layers are identical for each basic cell; 2) forming a plurality of mask programmable conducting layers over said device layers; and

3) selectively interconnecting at least one of said plurality of mas programmable conducting layers and at least one of said device layers, wherein each basic cell forms one of the electronic functions comprisin multiplexing, inverting, latching, NANDing, NORing, exclusive ORin and exclusive NORing.

5. The method according to claim 4, further comprising the step of: forming voltage supply buses above the bit slice boundaries using a subset of said plurality of mask programmable conducting layers.