EP1161722A1 - A data processing device with distributed register file - Google Patents

A data processing device with distributed register file

Info

Publication number
EP1161722A1
EP1161722A1 EP00904914A EP00904914A EP1161722A1 EP 1161722 A1 EP1161722 A1 EP 1161722A1 EP 00904914 A EP00904914 A EP 00904914A EP 00904914 A EP00904914 A EP 00904914A EP 1161722 A1 EP1161722 A1 EP 1161722A1
Authority
EP
European Patent Office
Prior art keywords
output
input
register
crossbar
processing device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00904914A
Other languages
German (de)
French (fr)
Inventor
Jean-Paul Theis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
THEIS JEAN PAUL
Original Assignee
THEIS JEAN PAUL
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by THEIS JEAN PAUL filed Critical THEIS JEAN PAUL
Publication of EP1161722A1 publication Critical patent/EP1161722A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers

Definitions

  • a data processing device with distributed register file A data processing device with distributed register file
  • the present invention relates to the field of architecture design of data processing devices. More specifically, the invention is dealing with architecture design issues at register-transfer level and is focusing on data path architectures of processing devices.
  • the term 'data processing device' has a very broad meaning and can stand for terms like (micro)processor, micro-controller, central processing unit (CPU), digital signal processor (DSP), application specific integrated circuit (ASIC), application specific standard product (ASSP), application specific instruction set processor (ASIP).
  • a register-transfer level architecture of a processing device can be thought of as consisting of a limited number of elementary building blocks with which the processing device is built up.
  • the register transfer-level architecture of a processing device typically consists of Processing Elements (PEs), register files, busses, crossbars and a control unit which are arranged and connected to each other in a well defined manner.
  • a crossbar is a building block that makes connections between its inputs and outputs.
  • a fully connected crossbar is able to connect any input to one, more or even all outputs.
  • a partially connected crossbar is able to connect any input to one or more but not all outputs.
  • Multiplexers/demultiplexers are crossbars with one input/output and one or more outputs/inputs respectively.
  • the (register-transfer-level) data path architecture of a processing device comprises only building blocks directly involved in the data processing, e.g. PEs, register files, busses and crossbars, but not any control units used to control the building blocks of the data path. Therefore in all the following figures, control signals for crossbars, PEs and register files will only be shown when they are relevant in the context of the present invention. Furthermore, in all the figures that follow, arrows represent either bussed connections between building blocks or bussed inputs and bussed outputs of building blocks and processing devices, where the bus width of a bussed connection or of a bussed input/output is equal to one or more bits.
  • control signals for register files f.ex. determine the addresses of the register locations to/from which data are written/read respectively or represent clocking signals.
  • Register file inputs are also called write ports and register file outputs are also called read ports. Read/write ports may have simultaneous access to all register locations in the register file.
  • Control signals for PEs f.ex. select the operations to be performed.
  • Control signals for crossbars determine the connections to be made between crossbar inputs and crossbar outputs.
  • Figures 2, 3 and 4 can be used to retrace briefly the evolutionary steps of register-transfer-level architectures of processing devices with a distributed register file.
  • FIG. 2 shows one of the first data path architectures of a processing device with a distributed register file. It was called the polycyclic processor and was developed by ESL Inc. in the early 80's.
  • the data path architecture at register-transfer-level is shown and consists of a set of PEs whose inputs and outputs are connected to a crossbar with delay elements at each cross point.
  • the delay elements can be thought of as a particular implementation of a register file.
  • PEs, crossbar and delay elements are connected in the following way : (1) each PE (data) output is connected to as many independent cross points in a row of the crossbar as there are PE inputs (2) each PE input is connected to and selected out of as many cross points in a column as there are PE outputs.
  • a next step in the evolution of data path architectures with a distributed register file consisted in integrating the crossbar with delay elements as shown in figure 2 directly into the data path architecture of a microprocessor. This step was done in the data path architecture (again at register-transfer-level) of a video signal processor which was developed in the late 80's by Philips Research and which is shown in figure 3.
  • the crossbar is called a switch matrix
  • the delay elements are called Silos
  • the PEs are called ALEs (Arithmetic Logic Elements), where ALE is yet another word for ALU.
  • the Silos are used for slightly different data storage purposes : 1) as Memory Elements (MEs) which contain in addition to the Silos conventional memory for program data and logic for address calculation 2) as Buffer Elements (BEs) for buffering data 3) as Output Elements (OEs) for buffering data before they leave the processor.
  • MEs Memory Elements
  • BEs Buffer Elements
  • OEs Output Elements
  • the PEs can be of different type, and with several data inputs and data outputs.
  • the register files can be of different type as well as, like f.ex. stacks, FIFOs and register files with rotating properly where the data rotate in the register file, and they may have several read and write ports from and to which data can be read and written simultaneously.
  • the crossbar can be fully or only partially connected.
  • outputs of register files may be connected to PE inputs and/or to processor outputs.
  • Data path architectures with distributed register file try to overcome these shortcomings by using several and smaller register files with only a few read/write ports. All these register files together are of about the same size as a big single register file.
  • the prize that is paid to overcome the problems linked to a single register file consists in bigger code size.
  • data path architectures with distributed register files are typically VLIW processor architectures where a compiler is optimizing the program code statically in order to optimally exploit the multiple register.
  • the program code of VLIW processors is typically twice as large as for 'conventional ' processors (processing devices) with a single registerfile.
  • Figure 1 shows the data path architecture of a 'conventional' processor with a single registerfile.
  • FIG. 2 shows the data path architecture of the polycyclic processor developed by ESL Inc.
  • Figure 3 shows the data path architecture of a video signal processor developed by Philips Research.
  • Figure 4 shows the data path architecture of a processor with a distributed register file according to the prior art.
  • Figure 5 shows the data path architecture of a processing device with a distributed register file based on the present invention.
  • Figure 6 shows a specific example of the data path architecture of a processing device with a distributed register file based on the present invention.
  • Figure 7 shows two variants of a specific type of register file containing a shift register connected to a crossbar. One variant of this type of register file is shown at lower right, the other variant is shown at lower left.
  • Figure 8 shows a specific example of an array of processing devices built up according to the rules based on the present invention.
  • Figure 9 shows two processing devices of an array and visualizes the rules concerning a) processing device inputs which are connected to an array input and b) processing device inputs which are connected to an output of a processing device of the array.
  • Data path architectures of processing devices with a distributed register file based on the present invention differ significantly from the data path architectures of the prior art and are obtained by applying a set of building rules to a set of building blocks. The differences with the prior art will become clear when discussing these building rules.
  • a processing device comprising one or more inputs, one or more outputs and one or more processing elements, each processing element having one or more inputs and one or more outputs.
  • the terms 'data path architecture', 'crossbar', 'register file' and 'processing element' always refer to the considered processing device.
  • the first type of data path architecture of a processing device based on the present invention contains :
  • each processing device input is connected to the input of a register file
  • each output of each processing element is connected to the input of a registerfile
  • any output of any processing element and any other output of any processing element are not connected to the same input of a register file
  • each register file is connected either to an output of a processing element or to a processing device input
  • each output of each register file is connected to an input of a crossbar
  • any output of any register file and any other output of any other register file are not connected to a same input of a crossbar (I) each input of each crossbar is connected to an output of a register file
  • any input of any crossbar and any input of any other crossbar are not connected to the same output of any register file
  • each processing device output is connected to the output of a crossbar
  • each input of each processing element is connected to the output of a crossbar
  • any processing device output and any other processing device output are not connected to the same output of a crossbar
  • any processing element input and any other processing element input are not connected to the same output of a crossbar
  • the output of each crossbar is connected either to a processing device output or to an input of a processing element (s) the output of any crossbar and the output of any other crossbar are neither connected to a same processing device output nor to a same input of a processing element
  • the data values appearing on one or more inputs of the register file may be written/read into/from register locations according to similar rules as for the connections to be done inside a crossbar, depending on the bus width of the registerfile inputs, of the register cells contained in the registerfile and of the registerfile outputs.
  • FIG 5 A processing device with a data path architecture built up according to the rules mentioned above is shown in figure 5.
  • Figure 5 aims at visualizing the above rules, therefore the number of processing device inputs and outputs, the number of PEs as well as the number of PE inputs and PE outputs is not further specified.
  • figure 6 shows a specific example of a processing device with such a data path architecture : it contains two PEs, two processing device inputs and two processing device outputs. Each PE has two inputs and two outputs. Register files have either one, two or three outputs. Furthermore the number of existing connections between outputs of register file and inputs of crossbars differ from register file to register file and from crossbar to crossbar, in other words not all connections that are allowed by the rules are effectively realized.
  • the second type of data path architecture of a processing device based on the present invention slightly differs from the first type in the way that this second type of data path architecture contains one or more register files of a same type, this type of register file being shown in figure 7 and denoted by ' SR + # '.
  • This type of register file is shown in figure 7 and denoted by ' SR + # '.
  • the shift register contains one or more register cells
  • the shift register has one input and as many outputs as there are register cells contained in the shift register, each register cell having one input and one output
  • the crossbar is either partially or fully connected and has as many outputs as there are registerfile outputs
  • the crossbar has as many inputs as there are register cells, contained the shift register
  • the crossbar has as many inputs as the number obtained by incrementing by one the number of register cells contained in the shift register
  • the register file input is connected to the input of the shift register
  • the register file input is connected to the input of the shift register and to an input of the crossbar
  • each shift register output is connected to an input of the crossbar (m) any shift register output and any other shift register output are not connected to a same input of the crossbar (n) in case of one variant, each input of the crossbar is connected to a shift register output (o) in case of the other variant, each input of the crossbar is connected either to a shift register output or to the register file input (p) any input of the crossbar and any other input of the crossbar are neither connected to the same shift register output nor to the register file input (q) each output of the crossbar is connected to a register file output (
  • the difference between the two variants lies in the fact that in case of the variant shown at lower left in figure 7, the register file input can directly be forwarded to one or more register file outputs without traversing a cell of the shift register.
  • the shift register contained in the register file may have a gated clock input, in other words the contents of the register cells are only then shifted by one position in the shift direction within every clock cycle of some clock used in the processing device if some signals generated in the control unit(s) of the processing device have a specific value.
  • the value of these signals may change from clock cycle to clock cycle of some clock used in the processing device and generally depend on the program code, on the instructions that are executed by the processing device, on results of operations performed by the PEs and on data values stored in the register files.
  • the first cell of the shift register is the register cell with label 1
  • the last cell of the shift register is the cell with label m. Note that concerning the bus width of any connections between any inputs and outputs of the shift register, of the crossbar, of any register cell of the shift register and of the register file itself the same remark holds as for the connections done inside a processing device with a data path architecture of the first type as described above.
  • the present invention is also dealing with arrays of processing devices.
  • the data path architecture of the processing devices used in these arrays is closely related to the data path architecture of the first and second type as described above.
  • An array comprising two or more processing devices and one or more array inputs and one or more array outputs.
  • Each processing device of the considered array has one or more inputs and one or more outputs.
  • the term 'processing device' always refers to the considered array.
  • the array is built up according to the following rules :
  • each array input is connected to one or more inputs of one or more processing devices
  • each output of each processing device is connected to one or more inputs of one or more processing devices orto one or more array outputs
  • any output of any processing device and any output of any other processing device are neither connected to a same input of a processing device nor to a same array output
  • the first and second type of data path architecture as described above are used inside 'stand alone' processing devices, in other words processing devices which are not part of an array of several processing devices.
  • the type of data path architecture of processing devices which are part of an array slightly differs from the first and second type of data path architecture of a 'stand alone' processing device. The difference consists in the number of register files used inside each processing device of the array as well as in the way that inputs of register files are connected to processing device inputs.
  • the difference is as follows : if an input of any processing device of the considered array is not connected to an output of a processing device of the considered array but is connected to an array input, then it is connected to the input of a register file in the same way as for the data path architecture of the first and second type described above; if an input of any processing device of the considered array is connected to an output of a processing device of the considered array but is not connected to an array input, then it is directly connected to one or more inputs of one or more crossbars of the considered processing device.
  • Figure 9 shows thereby two processing devices of an array.
  • the input of the processing device at the right side which is connected to an output of the processing device at the left side, is not connected to an input of a register file but directly connected to one or more inputs of one or more crossbars of that processing device.
  • the input of the processing device at the right side which is connected to an array input, is connected to an input of a register file in the same way as forthe data path architecture of the first or second type described above.
  • the terms 'processing device input', 'processing device output', crossbar(s)', 'register file(s)' and 'processing elements)' always refer to the considered processing device.
  • each processing device of the considered array contains :
  • processing element outputs correspond to all the outputs of all the processing elements of the considered processing device
  • marked processing device inputs correspond to all those inputs of the considered processing device which are connected to an array input
  • each register file has one input and one or more outputs
  • each crossbar has one output and one or more inputs and has a register-transfer-level data path architecture which is built up according to the following rules :
  • each processing device input which is connected to an array input is connected to the input of a registerfile
  • each processing device input which is connected to an output of a processing device of the considered array is connected to one or more inputs of one or more crossbars
  • any processing device input and any other processing device input are neither connected to the same input of a register file nor to a same input of a crossbar
  • each output of each processing element is connected to the input of a register file (i) any output of any processing element and any other output of any processing element are not connected to the same input of a register file 0) the input of each register file is connected either to an output of a processing element or to a processing device input (k) the input of any register file and the input of any other register file are not connected to a same output of any processing element (I) the input of any register file and the input of any other register file are not connected to a same processing device input (m) each output of each registerfile is connected to an input of a crossbar
  • any output of any register file and any other output of any other register file are not connected to a same input of a crossbar
  • each input of each crossbar is connected either to an output of a register file or to a processing device input
  • any input of any crossbar and any input of any other crossbar are neither connected to the same output of any register file nor to a same processing device input
  • each processing device output is connected to the output of a crossbar
  • each input of each processing element is connected to the output of a crossbar
  • any processing device output and any other processing device output are not connected to the same output of a crossbar
  • any processing element input and any other processing element input are not connected to the same output of a crossbar
  • the output of each crossbar is connected either to a processing device output or to an input of a processing element
  • the output of any crossbar and the output of any other crossbar are neither connected to a same processing device output nor to a same input of a processing element
  • the application domain of 'stand alone' processing devices with a data path architecture based on the present invention is the same as the application domain of arrays of processing devices with data path architectures based on the present invention and consists of applications within image/multimedia/signal processing, graphics processing and linear algebra.
  • the present invention concerns a processing device according to claim 1 and an array of processing devices according to claim 8.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

The present invention introduces data path architectures of processing devices with a distributed register file. These data path architectures are obtained by applying a set of building rules to a set of building blocks. The number of register files corresponds to the number of processing device inputs and processing element outputs. Specific rules for connecting the register files to the processing elements via distributed crossbars are given. Arrays of processing devices are considered as well. Data path architectures of processing devices used in these arrays slightly differ from those of stand alone processing devices.

Description

A data processing device with distributed register file
1. Field of the invention
The present invention relates to the field of architecture design of data processing devices. More specifically, the invention is dealing with architecture design issues at register-transfer level and is focusing on data path architectures of processing devices.
2. Conventions, definition of terms, terminology
First, it should be noted that in the literature the two expressions 'distributed register file' and 'distributed register files' (files with an 's') are used synonymously and stand for two or more register files.
The term 'data processing device' has a very broad meaning and can stand for terms like (micro)processor, micro-controller, central processing unit (CPU), digital signal processor (DSP), application specific integrated circuit (ASIC), application specific standard product (ASSP), application specific instruction set processor (ASIP). As mentioned before, the present invention is dealing with architecture design issues at register-transfer-level. A register-transfer level architecture of a processing device can be thought of as consisting of a limited number of elementary building blocks with which the processing device is built up. The register transfer-level architecture of a processing device typically consists of Processing Elements (PEs), register files, busses, crossbars and a control unit which are arranged and connected to each other in a well defined manner. The way how these building blocks are arranged and connected together determines the features of an architecture of such a processing device. The term 'PE' is frequently used in the same sense as 'data procesing device'. However in the text that follows, the term 'PE' has a more restricted meaning and will represent, unless specified otherwise, either Arithmetic Logic Units (ALUs), floating point units (FPUs) or other functional units (FUs) of a processing device. A crossbar is a building block that makes connections between its inputs and outputs. A fully connected crossbar is able to connect any input to one, more or even all outputs. A partially connected crossbar is able to connect any input to one or more but not all outputs. Multiplexers/demultiplexers are crossbars with one input/output and one or more outputs/inputs respectively. The meaning of the other before mentioned building blocks is identical to the one normally described in the literature.
The (register-transfer-level) data path architecture of a processing device comprises only building blocks directly involved in the data processing, e.g. PEs, register files, busses and crossbars, but not any control units used to control the building blocks of the data path. Therefore in all the following figures, control signals for crossbars, PEs and register files will only be shown when they are relevant in the context of the present invention. Furthermore, in all the figures that follow, arrows represent either bussed connections between building blocks or bussed inputs and bussed outputs of building blocks and processing devices, where the bus width of a bussed connection or of a bussed input/output is equal to one or more bits. Unless specified otherwise, all the inputs and outputs of building blocks and of a processing device itself refer to data and not to control signals. It is assumed that all the control signals for all the building blocks are generated from one or more control units of the processing device, these control units typically comprising instruction decode and execution units as well as memory management units. Control signals for register files f.ex. determine the addresses of the register locations to/from which data are written/read respectively or represent clocking signals. Register file inputs are also called write ports and register file outputs are also called read ports. Read/write ports may have simultaneous access to all register locations in the register file. Control signals for PEs f.ex. select the operations to be performed. Control signals for crossbars determine the connections to be made between crossbar inputs and crossbar outputs.
3. Prior Art
Before investigating the prior art in processor architectures with a distributed register file, it is worthwhile to have in mind the data path architecture of a 'conventional' processor with a single register file as it is used in today's microprocessors and as shown in figure 1. It is characterized by the fact that all the PE outputs are connected to the same one register file and that all the PEs may have simultaneous access (for reading and writing data) to any register location in the register file.
Figures 2, 3 and 4 can be used to retrace briefly the evolutionary steps of register-transfer-level architectures of processing devices with a distributed register file.
Figure 2 shows one of the first data path architectures of a processing device with a distributed register file. It was called the polycyclic processor and was developed by ESL Inc. in the early 80's. The data path architecture at register-transfer-level is shown and consists of a set of PEs whose inputs and outputs are connected to a crossbar with delay elements at each cross point. The delay elements can be thought of as a particular implementation of a register file. PEs, crossbar and delay elements are connected in the following way : (1) each PE (data) output is connected to as many independent cross points in a row of the crossbar as there are PE inputs (2) each PE input is connected to and selected out of as many cross points in a column as there are PE outputs. For the configuration as shown in figure 2 with two PEs having each 2 inputs and 1 output, this implies a fully connected crossbar with 4X2 independent cross points, in other words 2 rows with each 4 cross points or equivalently 4 columns with each 2 cross points, and with as many delay elements as cross points. Note that this is drawn symbolically in figure 2. The detailed architecture of the crossbar with the delay elements is not shown. It is important to note that ESL Inc. had microprocessors in mind when speaking of PEs. Therefore, the crossbar with delay elements is first of all an efficient method of exchanging data between microprocessors, hence an efficient method to build multi-processor systems. A next step in the evolution of data path architectures with a distributed register file consisted in integrating the crossbar with delay elements as shown in figure 2 directly into the data path architecture of a microprocessor. This step was done in the data path architecture (again at register-transfer-level) of a video signal processor which was developed in the late 80's by Philips Research and which is shown in figure 3.
Although the terminology used in figure 3 is slightly different from that used in figure 2, the building blocks in question and the way in which they are connected together are identical : in figure 3 the crossbar is called a switch matrix, the delay elements are called Silos, the PEs are called ALEs (Arithmetic Logic Elements), where ALE is yet another word for ALU. In figure 3, the Silos are used for slightly different data storage purposes : 1) as Memory Elements (MEs) which contain in addition to the Silos conventional memory for program data and logic for address calculation 2) as Buffer Elements (BEs) for buffering data 3) as Output Elements (OEs) for buffering data before they leave the processor. As mentioned above, the way in which these building blocks are connected together is the same as in figure 2, with the only difference lying in a more explicit separation and drawing of crossbar (switch matrix) and delay elements (Silos).
Finally, replacing delay elements (Silos) with conventional register files leads to a data path architecture with distributed register file as shown in figure 4. In figure 4, the PEs can be of different type, and with several data inputs and data outputs. The register files can be of different type as well as, like f.ex. stacks, FIFOs and register files with rotating properly where the data rotate in the register file, and they may have several read and write ports from and to which data can be read and written simultaneously. The crossbar can be fully or only partially connected. Furthermore, outputs of register files may be connected to PE inputs and/or to processor outputs.
It is interesting to see that the way in which the building blocks, consisting of crossbar, register files (Silos, delay elements) and PEs (microprocessors, ALUs, ALEs), are connected together in figures 2, 3 and 4 appears to be identical and is based on the following rules : 1) take the data outputs of the PEs (ALUs, ALEs) and connect them to the crossbar inputs 2) take the crossbar outputs and connect them to the inputs of the register file, delay elements and Silos 3) take the outputs of the register files, delay elements and Silos and connect them to the (data) inputs of the PEs.
Before closing this section over the prior art in data path architectures of processing devices with a distributed register file, their short comings and major points for improvement will be shortly discussed.
Two major shortcomings of a 'conventional' data path architecture with a single register file are the VLSI design challenge of the single registerfile and the power consumption of the single register file. Today's microprocessors (f.ex. Pentium, PowerPC) have single register files containing at least 128 80-bit floating point registers and having at least 4 read and 4 write ports. This leads to a big silicon area of the register file which leads in its turn to an increase in read/write/access cycle times due to long wire lines to be charged and discharged. In order to compensate for this effect, special design techniques have to be utilized in order to keep the read/write/access cycle times down to an acceptable level. This however, together with the big silicon area, goes to the detriment of power consumption and therefore big single register files with multiple read/write ports are not very power efficient.
Data path architectures with distributed register file try to overcome these shortcomings by using several and smaller register files with only a few read/write ports. All these register files together are of about the same size as a big single register file. However in case of a data path architecture like in figure 4, the prize that is paid to overcome the problems linked to a single register file consists in bigger code size. This is due to the fact that data path architectures with distributed register files are typically VLIW processor architectures where a compiler is optimizing the program code statically in order to optimally exploit the multiple register. For a certain number of reasons however, the program code of VLIW processors is typically twice as large as for 'conventional ' processors (processing devices) with a single registerfile.
Another major point for improvement of processing devices with a single register file as well as with a distributed register file concerns the implementation costs. It was already mentioned that for a certain number of reasons single register files are always of big size and therefore have high implementation costs in term of silicon area. However the same is true for distributed register files if they are used as in figure 4 because they require a large crossbar to make the connections between the multiple register files and the PEs.
It is the goal of the present invention to overcome these shortcomings of existing data path architectures with a single register file as well as with a distributed register file.
4. Brief description of the drawings
Figure 1 shows the data path architecture of a 'conventional' processor with a single registerfile.
Figure 2 shows the data path architecture of the polycyclic processor developed by ESL Inc.
Figure 3 shows the data path architecture of a video signal processor developed by Philips Research.
Figure 4 shows the data path architecture of a processor with a distributed register file according to the prior art.
Figure 5 shows the data path architecture of a processing device with a distributed register file based on the present invention.
Figure 6 shows a specific example of the data path architecture of a processing device with a distributed register file based on the present invention. Figure 7 shows two variants of a specific type of register file containing a shift register connected to a crossbar. One variant of this type of register file is shown at lower right, the other variant is shown at lower left.
Figure 8 shows a specific example of an array of processing devices built up according to the rules based on the present invention.
Figure 9 shows two processing devices of an array and visualizes the rules concerning a) processing device inputs which are connected to an array input and b) processing device inputs which are connected to an output of a processing device of the array.
5. Detailed description of the drawings
The main aspects of the present invention are described by referring to the figures mentioned in this section.
Data path architectures of processing devices with a distributed register file based on the present invention differ significantly from the data path architectures of the prior art and are obtained by applying a set of building rules to a set of building blocks. The differences with the prior art will become clear when discussing these building rules.
Considered is a processing device comprising one or more inputs, one or more outputs and one or more processing elements, each processing element having one or more inputs and one or more outputs. In the following, unless mentioned explicitly, the terms 'data path architecture', 'crossbar', 'register file' and 'processing element' always refer to the considered processing device.
The first type of data path architecture of a processing device based on the present invention contains :
(a) as many register files as there are processing device inputs and processing element outputs, where processing element outputs correspond to all the outputs of all the processing elements of the considered processing device and where all the register files have each one input and one or more outputs
(b) as many crossbars as there are processing device outputs and processing element inputs, where processing element inputs correspond to all the inputs of all the processing elements of the considered processing device and where all the crossbars have each one output and one or more inputs and has a register-transfer-level data path architecture which is built up according to the following rules :
(c) each processing device input is connected to the input of a register file
(d) any processing device input and any other processing device input are not connected to the same input of a register file
(e) each output of each processing element is connected to the input of a registerfile (f) any output of any processing element and any other output of any processing element are not connected to the same input of a register file
(g) the input of each register file is connected either to an output of a processing element or to a processing device input
(h) the input of any register file and the input of any other register file are not connected to a same output of any processing element (i) the input of any register file and the input of any other register file are not connected to a same processing device input 0) each output of each register file is connected to an input of a crossbar
(k) any output of any register file and any other output of any other register file are not connected to a same input of a crossbar (I) each input of each crossbar is connected to an output of a register file
(m) any input of any crossbar and any input of any other crossbar are not connected to the same output of any register file (n) each processing device output is connected to the output of a crossbar (o) each input of each processing element is connected to the output of a crossbar (p) any processing device output and any other processing device output are not connected to the same output of a crossbar (q) any processing element input and any other processing element input are not connected to the same output of a crossbar (r) the output of each crossbar is connected either to a processing device output or to an input of a processing element (s) the output of any crossbar and the output of any other crossbar are neither connected to a same processing device output nor to a same input of a processing element
Note that the rules as described in (c)-(s) do not imply that each output of each register file has necessarily to be connected to all the inputs of all the crossbars. It is left up to the designer to decide which connections between outputs of register files and inputs of crossbars he wants to implement. Therefore, any crossbar has as many inputs as there are outputs of register files connected to that crossbar.
Furthermore, it should be noted that normally all inputs and all outputs of all register files, crossbars and processing elements as well as all processing device inputs and all processing device outputs have the same bus width, the bus width being equal to one or more bits; in other words, all connections as specified in (c)-(s) have the same bus width, the bus width being equal to one or more bits. However it is also conceivable that the bus width differs from connection to connection and from input/output to input/output of building blocks. In case of PEs, the bus width of the PE inputs may well be different from the bus width of the PE outputs, depending of the operations that are performed in the PEs. In case of a crossbar, the connections may be done according to some rule, f.ex. connecting only the most/least significant bits of an input whose bus width is wider than the one of a crossbar output to which the connection is done or f.ex. filling the most/least significant bits of a crossbar output, whose bus width is wider than the one of a crossbar input to which the connection is done, with some specific values. In case of a register file, the data values appearing on one or more inputs of the register file may be written/read into/from register locations according to similar rules as for the connections to be done inside a crossbar, depending on the bus width of the registerfile inputs, of the register cells contained in the registerfile and of the registerfile outputs.
A processing device with a data path architecture built up according to the rules mentioned above is shown in figure 5. Figure 5 aims at visualizing the above rules, therefore the number of processing device inputs and outputs, the number of PEs as well as the number of PE inputs and PE outputs is not further specified. In contrast, figure 6 shows a specific example of a processing device with such a data path architecture : it contains two PEs, two processing device inputs and two processing device outputs. Each PE has two inputs and two outputs. Register files have either one, two or three outputs. Furthermore the number of existing connections between outputs of register file and inputs of crossbars differ from register file to register file and from crossbar to crossbar, in other words not all connections that are allowed by the rules are effectively realized.
The second type of data path architecture of a processing device based on the present invention slightly differs from the first type in the way that this second type of data path architecture contains one or more register files of a same type, this type of register file being shown in figure 7 and denoted by ' SR + # '. There are basically two slightly different variants of this type of register file, one variant shown at lower right in figure 7 and the other variant shown at lower left in figure 7. Both variants contain a shift register and a crossbar, the crossbar being denoted by ' # ' in figure 7, and where
(a) the shift register contains one or more register cells
(b) the shift register has one input and as many outputs as there are register cells contained in the shift register, each register cell having one input and one output
(c) the crossbar is either partially or fully connected and has as many outputs as there are registerfile outputs
(d) in case of one variant, the crossbar has as many inputs as there are register cells, contained the shift register
(e) in case of the other variant, the crossbar has as many inputs as the number obtained by incrementing by one the number of register cells contained in the shift register
(f) in case of one variant, the register file input is connected to the input of the shift register
(g) in case of the other variant, the register file input is connected to the input of the shift register and to an input of the crossbar
(h) the input of the shift register is connected to the input of the first register cell of the shift register (i) the inputs and outputs of the register cells of the shift register are connected in such a way as to form a shift register (j) the output of each register cell of the shift register is connected to a shift register output (k) the output of any register cell of the shift register and the output of any other register cell of the shift register are not connected to a same shift register output (I) each shift register output is connected to an input of the crossbar (m) any shift register output and any other shift register output are not connected to a same input of the crossbar (n) in case of one variant, each input of the crossbar is connected to a shift register output (o) in case of the other variant, each input of the crossbar is connected either to a shift register output or to the register file input (p) any input of the crossbar and any other input of the crossbar are neither connected to the same shift register output nor to the register file input (q) each output of the crossbar is connected to a register file output (r) any output of the crossbar and any other output of the crossbar are not connected to a same register file output (s) each register file output is connected to a crossbar output (t) any register file output and any other register file output are not connected to a same crossbar output
The difference between the two variants lies in the fact that in case of the variant shown at lower left in figure 7, the register file input can directly be forwarded to one or more register file outputs without traversing a cell of the shift register. Furthermore, the shift register contained in the register file may have a gated clock input, in other words the contents of the register cells are only then shifted by one position in the shift direction within every clock cycle of some clock used in the processing device if some signals generated in the control unit(s) of the processing device have a specific value. The value of these signals may change from clock cycle to clock cycle of some clock used in the processing device and generally depend on the program code, on the instructions that are executed by the processing device, on results of operations performed by the PEs and on data values stored in the register files. As mentioned before, a shift register with m cells has a 'shift direction', in other words there exists a increasing order of register cells labeled 1,2 ...m such that when the shift register is clocked the content of register cell with label i is shifted into register cell with label i+1 , for i=1 ,2 m-1.
The first cell of the shift register is the register cell with label 1 , the last cell of the shift register is the cell with label m. Note that concerning the bus width of any connections between any inputs and outputs of the shift register, of the crossbar, of any register cell of the shift register and of the register file itself the same remark holds as for the connections done inside a processing device with a data path architecture of the first type as described above.
The present invention is also dealing with arrays of processing devices. The data path architecture of the processing devices used in these arrays is closely related to the data path architecture of the first and second type as described above. Considered is an array comprising two or more processing devices and one or more array inputs and one or more array outputs. Each processing device of the considered array has one or more inputs and one or more outputs. In the following, unless mentioned explicitly, the term 'processing device' always refers to the considered array.
The array is built up according to the following rules :
(a) each array input is connected to one or more inputs of one or more processing devices
(b) any array input and any other array input are not connected to a same input of a processing device
(c) each output of each processing device is connected to one or more inputs of one or more processing devices orto one or more array outputs
(d) any output of any processing device and any output of any other processing device are neither connected to a same input of a processing device nor to a same array output
Here again, concerning the bus width of any connections between any inputs and outputs of the array itself and/or of any processing devices of the array, the same remark holds as for the connections done inside a processing device with a data path architecture of the first or second type as described above. Finally, it should be noted that the rules as described in (a)-(d) allow for regular and irregular connections as it is exemplified by the array shown in figure 8.
Furthermore, the rules as described in (a)-(d) do not imply that all possible connections, which are allowed by the rules, between inputs/outputs of processing devices and array inputs/outputs are effectively realized. It is left up to the designer to decide which connections he wants to implement.
The first and second type of data path architecture as described above are used inside 'stand alone' processing devices, in other words processing devices which are not part of an array of several processing devices. The type of data path architecture of processing devices which are part of an array slightly differs from the first and second type of data path architecture of a 'stand alone' processing device. The difference consists in the number of register files used inside each processing device of the array as well as in the way that inputs of register files are connected to processing device inputs. In a few words, the difference is as follows : if an input of any processing device of the considered array is not connected to an output of a processing device of the considered array but is connected to an array input, then it is connected to the input of a register file in the same way as for the data path architecture of the first and second type described above; if an input of any processing device of the considered array is connected to an output of a processing device of the considered array but is not connected to an array input, then it is directly connected to one or more inputs of one or more crossbars of the considered processing device. This rule is visualized in figure 9. Figure 9 shows thereby two processing devices of an array. As one can see, the input of the processing device at the right side, which is connected to an output of the processing device at the left side, is not connected to an input of a register file but directly connected to one or more inputs of one or more crossbars of that processing device. On the other hand, the input of the processing device at the right side, which is connected to an array input, is connected to an input of a register file in the same way as forthe data path architecture of the first or second type described above.
In the following, unless mentioned explicitly, the terms 'processing device input', 'processing device output', crossbar(s)', 'register file(s)' and 'processing elements)' always refer to the considered processing device.
In detail, this means that each processing device of the considered array contains :
(a) one or more processing device inputs and one or more processing device outputs
(b) one or more processing elements, each processing element having one or more inputs and one or more outputs
(c) as many register files as the considered processing device has processing element outputs and marked processing device inputs, where
1. processing element outputs correspond to all the outputs of all the processing elements of the considered processing device
2. marked processing device inputs correspond to all those inputs of the considered processing device which are connected to an array input
3. each register file has one input and one or more outputs
(d) as many crossbars as there are processing device outputs and processing element inputs, where processing element inputs correspond to the inputs of all the processing elements of the considered processing device and where each crossbar has one output and one or more inputs and has a register-transfer-level data path architecture which is built up according to the following rules :
(e) each processing device input which is connected to an array input is connected to the input of a registerfile
(f) each processing device input which is connected to an output of a processing device of the considered array is connected to one or more inputs of one or more crossbars
(g) any processing device input and any other processing device input are neither connected to the same input of a register file nor to a same input of a crossbar
(h) each output of each processing element is connected to the input of a register file (i) any output of any processing element and any other output of any processing element are not connected to the same input of a register file 0) the input of each register file is connected either to an output of a processing element or to a processing device input (k) the input of any register file and the input of any other register file are not connected to a same output of any processing element (I) the input of any register file and the input of any other register file are not connected to a same processing device input (m) each output of each registerfile is connected to an input of a crossbar
(n) any output of any register file and any other output of any other register file are not connected to a same input of a crossbar (o) each input of each crossbar is connected either to an output of a register file or to a processing device input (p) any input of any crossbar and any input of any other crossbar are neither connected to the same output of any register file nor to a same processing device input (q) each processing device output is connected to the output of a crossbar (r) each input of each processing element is connected to the output of a crossbar (s) any processing device output and any other processing device output are not connected to the same output of a crossbar (t) any processing element input and any other processing element input are not connected to the same output of a crossbar (u) the output of each crossbar is connected either to a processing device output or to an input of a processing element (v) the output of any crossbar and the output of any other crossbar are neither connected to a same processing device output nor to a same input of a processing element
Here again, concerning the bus width of any connections between any inputs and outputs of building blocks of a processing device of the array, the same remark holds as forthe connections done inside a processing device with a data path architecture of the first or second type described above.
Furthermore, concerning any processing device of the array, the rules as described in (e)-(v) above do not imply that each processing device input (which is connected to an array input) or each output of each register file has necessarily to be connected to all the inputs of all the crossbars. It is left up to the designer to decide which connections between outputs of register files and inputs of crossbars he wants to implement Therefore, any crossbar has as many inputs as there are outputs of register files and processing device inputs connected to that crossbar.
It should be mentioned that current semiconductor process technology allows to integrate arrays containing several processing devices onto a single chip. The application domain of 'stand alone' processing devices with a data path architecture based on the present invention is the same as the application domain of arrays of processing devices with data path architectures based on the present invention and consists of applications within image/multimedia/signal processing, graphics processing and linear algebra.
Before closing this section, it is important to mention that for a certain number of reasons concerning code density, compiler optimization, power consumption and computing power performance it is particularly interesting to let all the processing elements of a processing device with a data path architecture based on the present invention be of the same type (in other words to let all the processing elements be of the same type), whether the considered processing device is a 'stand alone' processing device or whether the considered processing device is part of an array of processing devices based on the present invention.
6. Summary of the invention
The present invention concerns a processing device according to claim 1 and an array of processing devices according to claim 8.

Claims

ClaimsWhat is claimed is :
1. A processing device comprising :
(a) one or more processing device inputs
(b) one or more processing device outputs
(c) one or more processing elements, each processing element having one or more inputs and one or more outputs
(d) as many register files as there are processing device inputs and processing element outputs, where processing element outputs correspond to all the outputs of all the processing elements of the considered processing device and where all the register files have each one input and one or more outputs
(e) as many crossbars as there are processing device outputs and processing element inputs, where processing element inputs correspond to all the inputs of all the processing elements of the considered processing device and where all the crossbars have each one output and one or more inputs and having a register-transfer-level data path architecture which is built up according to the following rules :
(f) each processing device input is connected to the input of a register file
(g) any processing device input and any other processing device input are not connected to the same input of a registerfile
(h) each output of each processing element is connected to the input of a register file (i) any output of any processing element and any other output of any processing element are not connected to the same input of a register file (j) the input of each register file is connected either to an output of a processing element orto a processing device input (k) the input of any register file and the input of any other register file are not connected to a same output of any processing element (I) the input of any register file and the input of any other register file are not connected to a same processing device input (m) each output of each register file is connected to an input of a crossbar (n) any output of any register file and any other output of any other register file are not connected to a same input of a crossbar (o) each input of each crossbar is connected to an output of a register file (p) any input of any crossbar and any input of any other crossbar are not connected to the same output of any register file (q) each processing device output is connected to the output of a crossbar
(r) each input of each processing element is connected to the output of a crossbar (s) any processing device output and any other processing device output are not connected to the same output of a crossbar (t) any processing element input and any other processing element input are not connected to the same output of a crossbar (u) the output of each crossbar is connected either to a processing device output or to an input of a processing element (v) the output of any crossbar and the output of any other crossbar are neither connected to a same processing device output nor to a same input of a processing element
2. A processing device as claimed in claim 1 , where one or more or all register files are of a same type, this type of register file comprising :
(a) a shift register containing one or more register cells, the shift register having one input and as many outputs as there are register cells contained in the shift register, each register cell having one input and one output
(b) a crossbar which is either partially or fully connected and which has as many inputs as there are register cells contained in the shift register and which has as many outputs as there are register file outputs and where
(c) the register file input is connected to the input of the shift register
(d) the input of the shift register is connected to the input of the first register cell of the shift register
(e) the inputs and outputs of the register cells of the shift register are connected in such a way as to form a shift register
(f) the output of each register cell of the shift register is connected to a shift register output
(g) the output of any register cell of the shift register and the output of any other register cell of the shift register are not connected to a same shift register output
(h) each shift register output is connected to an input of the crossbar
(i) any shift register output and any other shift register output are not connected to a same input of the crossbar (j) each input of the crossbar is connected to a shift register output (k) any input of the crossbar and any other input of the crossbar are neither connected to the same shift register output nor to the register file input (I) each output of the crossbar is connected to a register file output (m) any output of the crossbar and any other output of the crossbar are not connected to a same registerfile output (n) each register file output is connected to a crossbar output (o) any register file output and any other register file output are not connected to a same crossbar output
3. A processing device as claimed in claim 1 , where one or more or all register files are of a same type, this type of register file comprising :
(a) a shift register containing one or more register cells, the shift register having one input and as many outputs as there are register cells contained in the shift register, each register cell having one input and one output
(b) a crossbar which is either partially or fully connected and which has as many inputs as the number obtained by incrementing by one the number of register cells contained in the shift register and as many outputs as there are register file outputs and where
(c) the register file input is connected to the input of the shift register and to an input of the crossbar
(d) the input of the shift register is connected to the input of the first register cell of the shift register
(e) the inputs and outputs of the register cells of the shift register are connected in such a way as to form a shift register
(f) the output of each register cell of the shift register is connected to a shift register output
(g) the output of any register cell of the shift register and the output of any other register cell of the shift register are not connected to a same shift register output
(h) each shift register output is connected to an input of the crossbar
(i) any shift register output and any other shift register output are not connected to a same input of the crossbar (j) each input of the crossbar is connected eitherto a shift register output orto the register file input (k) any input of the crossbar and any other input of the crossbar are neither connected to the same shift register output nor to the registerfile input (I) each output of the crossbar is connected to a register file output (m) any output of the crossbar and any other output of the crossbar are not connected to a same register file output (n) each register file output is connected to a crossbar output (o) any register file output and any other register file output are not connected to a same crossbar output
4. A processing device as claimed in claim 2, where the shift register of said type of registerfile contains at least 4 register cells
5. A processing device as claimed in claim 3, where the shift register of said type of register file contains at least 4 register cells
6. A processing device as claimed in claim 4, where said type of register file has at least two outputs
7. A processing device as claimed in claim 5, where said type of register file has at least two outputs
8. An array comprising :
(a) two or more processing devices, each processing device of the considered array having one or more inputs and one or more outputs
(b) one or more array inputs and one or more array outputs. where in the following, unless mentioned explicitly, the term 'processing device(s)' always refer to the considered array and where the array is built up according to the following rules :
(c) each array input is connected to one or more inputs of one or more processing devices
(d) any array input and any other array input are not connected to a same input of a processing device
(e) each output of each processing device is connected to one or more inputs of one or more processing devices or to one or more array outputs
(f) any output of any processing device and any output of any other processing device are neither connected to a same input of a processing device nor to a same array output where in the following, unless mentioned explicitly, the terms 'processing device input', 'processing device output', crossbar(s)', 'register file(s)' and 'processing elements)' always refer to the considered processing device and where each processing device of the array contains :
(g) one or more processing device inputs and one or more processing device outputs
(h) one or more processing elements, each processing element having one or more inputs and one or more outputs (i) as many register files as the considered processing device has processing element outputs and marked processing device inputs, where i. processing element outputs correspond to all the outputs of all the processing elements of the considered processing device ii. marked processing device inputs correspond to all those inputs of the considered processing device which are connected to an array input Hi. each register file has one input and one or more outputs (j) as many crossbars as there are processing device outputs and processing element inputs, where processing element inputs correspond to the inputs of all the processing elements of the considered processing device and where each crossbar has one output and one or more inputs and where each processing device of the array has a register-transfer-level data path architecture which is built up according to the following rules :
(k) each processing device input which is connected to an array input is connected to the input of a registerfile (I) each processing device input which is connected to an output of a processing device of the considered array is connected to one or more inputs of one or more crossbars (m) any processing device input and any other processing device input are neither connected to the same input of a register file nor to a same input of a crossbar (n) each output of each processing element is connected to the input of a register file
(o) any output of any processing element and any other output of any processing element are not connected to the same input of a register file (p) the input of each register file is connected either to an output of a processing element orto a processing device input (q) the input of any register file and the input of any other register file are not connected to a same output of any processing element (r) the input of any register file and the input of any other register file are not connected to a same processing device input (s) each output of each register file is connected to an input of a crossbar (t) any output of any register file and any other output of any other register file are not connected to a same input of a crossbar (u) each input of each crossbar is connected either to an output of a register file or to a processing device input (v) any input of any crossbar and any input of any other crossbar are neither connected to the same output of any register file nor to a same processing device input (w) each processing device output is connected to the output of a crossbar (x) each input of each processing element is connected to the output of a crossbar (y) any processing device output and any other processing device output are not connected to the same output of a crossbar (z) any processing element input and any other processing element input are not connected to the same output of a crossbar (aa)the output of each crossbar is connected either to a processing device output orto an input of a processing element - (bb)the output of any crossbar and the output of any other crossbar are neither connected to a same processing device output nor to a same input of a processing element
9. An array as claimed in claim 8, where one or more or all register files of all the processing devices of the array are of a same type, this type of register file comprising :
(a) a shift register containing one or more register cells, the shift register having one input and as many outputs as there are register cells contained in the shift register, each register cell having one input and one output
(b) a crossbar which is either partially or fully connected and which has as many inputs as there are register cells contained in the shift register and which has as many outputs as there are registerfile outputs and where
(c) the register file input is connected to the input of the shift register
(d) the input of the shift register is connected to the input of the first register cell of the shift register (e) the inputs and outputs of the register cells of the shift register are connected in such a way as to form a shift register
(f) the output of each register cell of the shift register is connected to a shift register output
(g) the output of any register cell of the shift register and the output of any other register cell of the shift register are not connected to a same shift register output
(h) each shift register output is connected to an input of the crossbar
(i) any shift register output and any other shift register output are not connected to a same output of the crossbar (j) each input of the crossbar is connected to a shift register output (k) any input of the crossbar and any other input of the crossbar are neither connected to the same shift register output nor to the register file input (I) each output of the crossbar is connected to a register file output (m) any output of the crossbar and any other output of the crossbar are not connected to a same register file output (n) each register file output is connected to a crossbar output (o) any register file output and any other register file output are not connected to a same crossbar output
10. An array as claimed in claim 8, where one or more or all register files of all the processing devices of the array are of a same type, this type of register file comprising :
(a) a shift register containing one or more register cells, the shift register having one input and as many outputs as there are register cells contained in the shift register, each register cell having one input and one output
(b) a crossbar which is either partially or fully connected and which has as many inputs as the number obtained by incrementing by one the number of register cells contained in the shift register and as many outputs as there are register file outputs and where
(c) the register file input is connected to the input of the shift register and to an input of the crossbar
(d) the input of the shift register is connected to the input of the first register cell of the shift register
(e) the inputs and outputs of the register cells of the shift register are connected in such a way as to form a shift register
(f) the output of each register cell of the shift register is connected to a shift register output
(g) the output of any register cell of the shift register and the output of any other register cell of the shift register are not connected to a same shift register output
(h) each shift register output is connected to an input of the crossbar (i) any shift register output and any other shift register output are not connected to a same input of the crossbar (j) each input of the crossbar is connected either to a shift register output or to the register file input (k) any input of the crossbar and any other input of the crossbar are neither connected to the same shift register output nor to the register file input (I) each output of the crossbar is connected to a register file output (m) any output of the crossbar and any other output of the crossbar are not connected to a same register file output (n) each register file output is connected to a crossbar output (o) any register file output and any other register file output are not connected to a same crossbar output
11. An array as claimed in claim 9, where the shift register of said type of register file contains at least 4 register cells
12. An array as claimed in claim 10, where the shift register of said type of register file contains at least 4 register cells
13. An array as claimed in claim 11 , where said type of register file has at least two outputs
14. An array as claimed in claim 12, where said type of register file has at least two outputs
EP00904914A 2000-01-14 2000-01-14 A data processing device with distributed register file Withdrawn EP1161722A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2000/000259 WO2001052060A1 (en) 2000-01-14 2000-01-14 A data processing device with distributed register file

Publications (1)

Publication Number Publication Date
EP1161722A1 true EP1161722A1 (en) 2001-12-12

Family

ID=8163794

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00904914A Withdrawn EP1161722A1 (en) 2000-01-14 2000-01-14 A data processing device with distributed register file

Country Status (2)

Country Link
EP (1) EP1161722A1 (en)
WO (1) WO2001052060A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100947446B1 (en) * 2002-03-28 2010-03-11 엔엑스피 비 브이 Vliw processor
EP1499957B1 (en) 2002-04-10 2009-09-23 Nxp B.V. Data processing system with multiple register banks
US9367462B2 (en) * 2009-12-29 2016-06-14 Empire Technology Development Llc Shared memories for energy efficient multi-core processors

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5692139A (en) * 1988-01-11 1997-11-25 North American Philips Corporation, Signetics Div. VLIW processing device including improved memory for avoiding collisions without an excessive number of ports
US6067613A (en) * 1993-11-30 2000-05-23 Texas Instruments Incorporated Rotation register for orthogonal data transformation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0152060A1 *

Also Published As

Publication number Publication date
WO2001052060A1 (en) 2001-07-19

Similar Documents

Publication Publication Date Title
US6175892B1 (en) Registers and methods for accessing registers for use in a single instruction multiple data system
US5301340A (en) IC chips including ALUs and identical register files whereby a number of ALUs directly and concurrently write results to every register file per cycle
EP0539595A1 (en) Data processor and data processing method
US6925553B2 (en) Staggering execution of a single packed data instruction using the same circuit
US6002880A (en) VLIW processor with less instruction issue slots than functional units
US7506135B1 (en) Histogram generation with vector operations in SIMD and VLIW processor by consolidating LUTs storing parallel update incremented count values for vector data elements
US20080215855A1 (en) Execution unit for performing shuffle and other operations
US20040111590A1 (en) Self-configuring processing element
US5268856A (en) Bit serial floating point parallel processing system and method
US20100023730A1 (en) Circular Register Arrays of a Computer
US7500089B2 (en) SIMD processor with exchange sort instruction operating or plural data elements simultaneously
CA2478570A1 (en) Data processing apparatus and system and method for controlling memory access
US20220083342A1 (en) Multiplier-Accumulator Circuitry having Processing Pipelines and Methods of Operating Same
WO2001052060A1 (en) A data processing device with distributed register file
JP2002529847A (en) Digital signal processor with bit FIFO
EP1632845B1 (en) Processor with a register file that supports multiple-issue execution
US7178008B2 (en) Register access scheduling method for multi-bank register file of a super-scalar parallel processor
NZ207326A (en) Associative data processing array
US9317474B2 (en) Semiconductor device
US7596678B2 (en) Method of shifting data along diagonals in a group of processing elements to transpose the data
US20040128475A1 (en) Widely accessible processor register file and method for use
EP4100860A1 (en) Method and device for the conception of a computational memory circuit
US20060242213A1 (en) Variable Precision Processor
US5928350A (en) Wide memory architecture vector processor using nxP bits wide memory bus for transferring P n-bit vector operands in one cycle
Sherburne et al. A 32b NMOS microprocessor with a large register file

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

17P Request for examination filed

Effective date: 20020403

RBV Designated contracting states (corrected)

Designated state(s): DE FR GB IT

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20050802