GB2230627A

GB2230627A - Recursive processor for multiplication

Info

Publication number: GB2230627A
Application number: GB8913525A
Authority: GB
Inventors: Simon Christian Knowles; John Graham Mcwhirter; John Vincent Mccanny; Roger Francis Woods
Original assignee: UK Secretary of State for Defence
Current assignee: UK Secretary of State for Defence
Priority date: 1988-04-06
Filing date: 1989-06-13
Publication date: 1990-10-24
Also published as: GB2218545A; CA1311062C; GB8808025D0; GB8913525D0; GB2218545B

Abstract

A recursive processor suitable for infinite impulse response (IIR) filter applications incorporates multiplier cells 16 connected to form rows 12 and columns 14. Each row is arranged to multiply by a coefficient. It begins with accumulator cells 8 and 20, and continues with multiplier cells 16 arranged to multiply by individual coefficient digits and disposed in descending order of digit significance. Columns 14 other than the first column 14, begin with a multiplier cell 16, and the higher significance columns 14, to 144 terminate at respective accumulator cells 20. Any intervening multiplier cells are arranged in ascending order of multiplier digit significance. The multiplier and accumulator coils 16 to 20 operate in accordance with signed digit number representation arithmetic involving digit redundancy. They generate sum and transfer digits for output down columns and along rows respectively to neighbouring cells in the direction of increasing coefficient digit significance. The cell arithmetic employed makes it possible to compute results most significant digit first. Each result digit is recycled when formed to provide a multiplicand digit input for all multiplier cells of a respective row selected in accordance with result digit significance. The processor is also arranged to add successive non-recursive input terms to multiplier products. This provides for the processor to provide a first order IIR filter section, and two cascaded processors provide a second order IIR filter section. The recursive computations required for a first order IIR filter section is effected irrespective of word length with latency of only two clock cycles. <IMAGE>

Description

RECt E PROCESSOR This inventiOn relates to a recursive processor, ie a processor arranged to recycle output results to its input for the production of further results.

Digital data processors for multiplication of numbers, vectors and matrices are available in the prior art, as are related devices for correlation and convolution.

British Patent No 2,106,257B (Reference (1)) describes bit-level systolic arrays for (a) number-number multiplication, (b) matrix-vector multiplication and (c) convolution. British Patent No 2,144,245B (Reference (2)) describes a similar array for matrix-matrix multiplication, and British Patent No 2,147,721B (Reference (3)) relates to further developments for improvement of array efficiency. References (1) to (3) disclose arrays of logic cells with nearest neighbour row and column interconnections for bit propagation. Figure 1 of Reference (1) shows additional diagonal connections between second nearest neighbour cells. Each cell is a gated full adder with single bit inputs. It generates the product of two multiplicand bits, adds the product to input sum and carry bits and produces new sum and curry bits.The sum bits accumulate in cascade down array columns (or diagonals in Reference (1) Figure 1).

Multiplicand bits propagate along array rows. One-bit intercell latches activated by clock signals provide for bit storage and advance between cells, and ensure that the arrays are pipelined at the cell or bit level. Where appropriate, the arrays include column output accumulators arranged to sum separately computed contributions to output terms.

Published British Patent Application Nos 2,168,509A, 2,187,579A and 2,192,474A (References (4), (5) and (6) respectively) demonstrate further bit-level systolic arrays which exhibit improved properties by the use of stationary multiplicative coefficients. Each coefficient remains associated with a respective cell, unlike References (1) to (3). However, data bits propagate along array rows for multiplication at gated full adder cells as before, and sum bits accumulate in cascade down array columns. Stationary array coefficients are also disclosed by Urquhart and Wood in the GEC Journal of Research, Vol 2, No 1, 1984, pp 52-55 (Reference (7)) and Proc IEE Part F, Vol 131 No 6, 1984, pp 623-31 (Reference (8)).These arrays also employ gated full adder cells with row and column interconnections.

One major area of application of bit-level systolic arrays is in the field of digital filters. Correlators and convolvers disclosed in References (1), (5) and (6) are examples of non-recursive, finite impulse response (FIR) filters. In digital signal processing, the correlation operation is defined by:

where: ai (i = 0 to N-1) represents a set of N correlation coefficients, xn+i is the (n+i)th input value, Yn is the nth correlation result.

Successive values of xn+i form an input data stream, and successive Yn values the filtered output stream.

Digital filters based on the prior art of References (1) to (8) are unsuitable for recursive filter applications, as the following analysis will show. These prior art arrays are pipelined at the bit level by clock-activated latches in the lines interconnecting neighbouring logic cells. This allows each array cell to compute a bit-level contribution to an output result while other cells are computing other contributions. Accordingly, data may be input on every clock cycle of operation without waiting for successive results to emerge from the array. Furthermore, the operating speed is not governed by the time taken for the whole array to compute a result. It is governed by the maximum clock rate of the latches associated with a single logic cell, which is much greater.However, against this, it is a basic feature of prior art systolic arrays that there is a time delay between data input and result output. In a typical case such as Reference (5), one row of logic cells is required per multiplicative coefficient in a coefficient set for convolution or correlation. In addition, an array output accumulator may be required to sum separately computed contributions to individual output terms.

There is typically a delay of one clock cycle per row for arrays accumulating results down columns. To this must be added any output accumulator delay. In the case of a digital filter based on the Reference (5) bitrerial data input device, an N-point convolution or correlation with N p-bit coefficients provides a delay of N+2(p-1) clock cycles between input of a data bit and output of a result bit. Furthermore, for output results q bits in length, there is a delay of N+2(p-1) + (q-1 ) clock cycles between initiation of data input and output of the final bit of a result from the output accumulator. In the case of a 16-point convolution with 8-bit coefficients and data, which produces 20-bit results, the delay will be 49 clock cycles.If the array is clocked at 5 MHz, the delay is about 10 microseconds, and it is referred to as the "latency" of the processor.

It does not give rise to difficulty in the case of FIR filters, since it merely means that there is an insignificant delay between initiation of data input and that of result output. Thereafter, input and output proceed at the same rate; ie input is received and output is generated on each clock cycle.

However, the latency of prior art digital processors gives rise to difficulty in the area of recursive processing, as required in infinite impulse response (IIR) filters.

The simplest form of IIR filter is that where the output depends both on the input and on an earlier output. It is known as a "first-order section". The computation can be expressed in the form: Yn - aoxn +alan 1 + blY,,1 1 (2) where xn-1 and Xn are successive data values in a continuously sampled stream, ao, al and bl are coefficients determining the filter response function, and Yn-1 and Yn are successive output results.

Equation (2) may be rewritten: Yn un + b1Yn-1 (3) where u - ax + a x (4) n on Equation (3) demonstrates that Yn is the sum of a non-recursive term un (depending only on input data) and a recursive term arising from the immediatels preceding result yn-1 This can be rewritten to express Yn in terms of Yn-k (k = 2, 3 ...) if required.

Any processor arranged to implement Equation (3) requires access to Yn-1 (or an earlier result) in order to compute Yn. Accordingly the processor must compute and output Yn-l before beginning the computation of Yn. The characteristics of prior art processors now become much more serious, since their latency interval of many clock cycles must intervene between the computation of each pair of successive results. Instead of producing a new result every clock cycle, results are therefore spaced by the latency interval which may be 50 or more clock cycles. A latency of 50 cycles in a parallel recursive computation corresponds to the processor being only 2% efficient, or alternatively having an operating rate which is 1/50th that of a similar non-recursive processor.

The construction of digital filters has been discussed by R F Lyon in VLSI Signal Processing, a Bit Serial Approach, P B Denyer and D Renshaw, Addison Wesley, pp 253-262 1985. It is also described by Jackson et al in IEEE Trans on Audio and Electroacoustics, Vol AU-16, No 3 pp 421-423 1968. Neither of these addresses the problem of latency and inefficiency in IIR filters.

The latency problem is discussed by Parhi and Messerschmitt in ICASSP 87, pp 1855-1858. Their basic approach is to rearrange the algorithm expressed by Equations (3) and (4) above so that Yn becomes expressed in terms of Yn-k.

They point out that the latency problem is inherent in recursive algorithms, but describe how it can be tolerated by a so-called "look-ahead" approach. In essence, this amounts to coping with latency by arranging the algorithm to employ as a recursive input whatever output is available. A parallel processor with a latency of k clock cycles will have Yn-k available at its output when un in Equation (3) is to be input.Since Equation (4) gives y ~ aoxn +alXn~l + blyn-l then V 1 aoXn-l ~ aOXn +alxn#2 +blyn~2 (5) and #n-2 - aoxn-2 + alxn-3 + (6) Expressing Yn in terms of Yn-3: yn = ao xn +a1xn-1 +b1 [ ao xn-1 +a1xn-2 +b1 {ao xn-2 +a1xn-3+b1yn-3}] (7) By induction, yn in terms of Yn-k is given by

The right hand side of Equation (8) consists of a non-recursive summation term together with a recursive term consisting of the product of Yn-k and a coefficient. Parhi and Messerschmitt have therefore dealt with the latency problem by choosing the feedback term Yn-k to be sufficiently early in the output yn series for latency to be accommodated.However, the price they pay for this approach is the requirement to evaluate the Equation (8) summation term. This requires the addition of k terms, each of which involves a respective coefficient bi multiplying the sum of two products of multibit numbers. This rapidly becomes unnamanageable as k increases, since each aoxn-i alone would require a processing array as described in Reference (1). The Parhi et al approach consequently deals with latency, but only at the price of requiring an undesirably large non-recursive processor. For example, if k is 50 as for a typical prior art processor, the procedure requires the summation of fifty multiply twice, add, multiply operations.

It is an obJect of the present invention to provide an alternative fc.-r of recursive processor.

The present invention provides a recursive processor for multiplying successive recycled output results by a coefficient and adding their products to respective input terms, the processor including multiplier cells and accumulating means connected to form rows and columns, and wherein: (1) each row is arranged to multiply a respective recycled digit by at least the more significant coefficients digits; (2) the multiplier cells are associated with individual coefficient digits decreasing in significance along rows and increasing in significance down columns containing more than one such cell in each case; (3) each multiplier cell is arranged to compute output sum and where necessary transfer digits corresponding to recycle and coefficient digits multiplied together and where necessary added to input digits;; (4) each row begins with a respective accumulating means having at least such of the following functions as are required by availability of relevant neighbouring processor elements: (i) receipt of its most significant cell's transfer digit output, (ii) receipt of output sum digits from the respective preceding row s accumulating means and most significant multiplier cell, (iii) processing received digits to generate output digits of differing significance, passing the most significant of such digits to a respective processor output and passing the remaining one or more digits to accumulating means in a respective succeeding row; (5) the multiplier cells and accumulating means operate in accordance with signed digit number representation arithmetic involving digit redundancy and providing most significant digit first computation;; (6) the processor includes row neighbour interconnection lines for passage of transfer digits between adjacent multiplier cells in the direction of increasing coefficient digit significance and between each row's most significant multiplier cell and accumulating means; tie the processor includes column neighbour interconnection lines arranged for sum digit transfer down each column between accumulating means, between multiplier cells and between most significant multiplier cells and accumulating means where available in each case, the column interconnection lines including clock-activated storing means for sum digit storage and advance between rows; and (8) each accumulating means has a respective most significant digit output connectable as a recycled multiplicand digit input to all multiplier cells of a respective row selected in accordance with accumulation arithmetic, recycled digit significance and coefficient magnitude.

The invention provides the advantage that it is suitable for performing recursive computations without prior art latency disadvantages. In one embodiment in particular, it is capable of performing the Equation (3) recursive computation (first order IIR filter section) with a latency of only two clock activation cycles irrespective of input word length or number of processor stages. Furthermore, unlike Parhi and Messerschmitt, it is necessary only to compute un = aoxn + alxn-l for input to the processor of the invention, instead of the unmanageably large non-recursive term given in Equation (8).

I he invention achieves these advantages by being configured to operate in accordance with signed digit number representation arithmetic involving digit redundancy. In this form of arithmetic, each digit may take positive or negative values and there is more than one way of representing a number. A processor in accordance with the invention implementing this arithmetic computes output results most significant digit first and subsequent digits in descending order of significance. This means that the most important part of a result is available first for recycling to produce a successive result. In contradistinction, prior art devices compute least significant digits first, and must await propagation of carry digits up to more significant computations before producing a result for recycling.

A further consequence and advantage of the invention is that, in computations requiring accuracy only to digits of higher significance, circuitry devoted to lower significance computations may be omitted, unlike prior art devices. Each succeeding row of a processor configured in this way has progressively fewer processing elements, which leads to circuitry savings compared to the prior art.

Each first row multiplier cell may be arranged to add a respective digit of an input term to the product of its associated coefficient digit and multiplicand input. Alternatively, the most significant multiplier cell in each row may be arranged to add a respective input term digit to the product of its coefficient digit and multiplicand input. This latter arrangement provides for input to and output from the processor with like temporal skew, ie delay between adjacent digits. It facilitates connection of first and second processors of the invention in series to form a cascaded arrangement. Such an arrangement includes recycle connections from the first processor's accumulating means most significant digit outputs to multiplier cell multiplicand inputs of respective rows of both processors.Moreover, the second processor has accumulating means most significant digit outputs connected to respective most significant multiplier cell addition inputs of the second processor. It is suitable for use in computation of the second order IIR filter section relation.

In one embodiment, the invention is configured in accordance with radix 4 signed digit number representation arithmetic. It incorporates multiplier cells with multiply, full add and half add functions. Where required to accept temporally skewed input digits, it incorporates most significant multiplier cells with multiply and two full add functions. Processors which are incomplete because lower significance outputs are not computed may incorporate multiplier cells some of which have reduced addition functions. The multiply operation involves input digits in the maximally redundant radix 4 set (-3 to +3), coefficient digits and output sum digits in the range (-2 to +2) (minimally redundant set), and transfer digits in the range (-1 to +1). The accumulating means operation involves summation of transfer digits output from the most significant multiplier cell in a row with (where appropriate) the output sum digit from the most significant multiplier cell of the row above. This generates a sum digit for output to the row below, and a transfer digit for addition to the preceding row's accumulating means output sum digit to provide the row output or recycle digit.

In order that the invention might be more fully understood, embodiments thereof will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 is a schematic drawing of a recursive processor of the invention; Figure 2 illustrates input and output terms for cells in the Figure 1 processor; Figure 3 schematically shows an alternative form of processor of the invention adapted for increased coefficient magnitude and temporally skewed input; Figures 4, 5, 6 and 7 are simplified drawings of processors of the invention illustrating implementation as building blocks for various IIR filter functions; Figure 8 is a more detailed drawing of the processor of Figure 6; Figure 9 is a generalised representation of a digital filter;; Figure 10 is a further embodiment of a processor of the invention arranged to implement radix 2 arithmetic; and Figure 11 is an alternative version of the Figure 10 embodiment arranged for skewed input and increased coefficient magnitude.

Referring to Figure 1, there is shown in schematic form a recursive processor 10 of the invention. The processor 10 incorporates processing cells in four rows 121 to 124 indicated by horizontal chain lines. The processing cells are of varying kinds to be referenced later. Successive rows 122 to 124 are right shifted compared to respective rows 121 to 123 immediately above, and this structure defines cells in seven columns 141 to 147 indicated by vertical chain lines. In the following description, cell location in the processor 10 will be identified by indices applied to corresponding reference numerals; ie Xii would indicate an ith row, jth column cell of type X, where X is a reference numeral.

The processor 10 comprises three types of cell 16, 18 and 20, of which 16 is a composite such as first row, third column cell 1613 indicated within dotted lines.

The cell 1613 incorporates three subcells, types A, B and C, which have differing functions. The subcell structure is shown to assist later analysis of mode of operation. Not all cells 16 incorporate all three subcells A, B and C.

The last one or two cells 16 on the right hand end of each row lack type C or type B and C subcells. The purpose of this representation is to illustrate an important advantage of the invention. Unlike the prior art it computes output digits in descending order of significance, the most significant digit appearing first as will be described later. Accordingly, when accuracy to a given number of digits is required, an array of the invention need only incorporate those cells and parts of cells which contribute to those digits. Other cells are omitted, which gives rise to the apparently incomplete processor structure illustrated.

All four rows 121 to 124 include respective type D cells 1811 and 1822 to 1844.

The second to fourth rows 122 to 124 include respective type E cells 2021 to 2043.

Referring now also to Figure 2, the five cell and subcell types A to E are illustrated with labelled input and output digits. The type A subcell is a multiplier. It receives an input digit yin from vertically above, and multiplies this by a single digit coefficient bw (m = 1, 2, 3 or 4) with which it is labelled.

It provides an intermediate sum output digit wout vertically below, and a transfer digit tout of relatively higher significance below and to the left. The type B subcell is equivalent in arithmetic terms to the type D cell; both are full adders, but they perform different roles in the processor 10. The expression "full adder" is used by analogy with binary logic to describe a three input device, a "half adder" being a two input device. Both types B and D receive three inputs from above, and add them to produce intermediate sum and transfer digits to be passed on below and below left respectively. However, type B has inputs Yin.

and win and tin together with outputs wOut and touts whereas the equivalents for type D are yin, tin and tin together with töut and wÖut. The digit nomenclature is intended to indicate the progress of a computation in the processor 10. Thus an unprimed digit arises prior to a primed digit, which in turn arises prior to a doubly primed digit. Moreover, a digit such as tout from a type B subcell is tin for a respective type D cell or type C subcell connected thereto. The nomenclature accordingly assists analysis to be given later.

The A subcells in each cell 16 receive multipiicand inputs recycled via feedback lines 301 to 304 connected to respective rows 121 to 124. This generates product and transfer digits wout and tout, of which wout becomes a vertical input to the associated B subcell immediately below; tout passes diagonally down to the neighbouring B subcell where available, or to a D cell otherwise. Each B subcell also receives a transfer digit output from a right hand neighbouring A subcell and an input from above. For first row cells 16, this last input is uln to Un on lines 321 to 324. For second, third and fourth row processing cells 16 containing B subcells, this input is received from the like cell in the same column of the row immediately above.Signals pass between rows (ie down columns) via clock-activated latches or storing devices indicated by triangles such as 34, all latches being clocked in synchronism.

Each B subcell adds its three inputs together to produce output sum and transfer digits Wout and tout The sum wOut passes to the respective C subcell if available. The transfer digit tout passes diagonally down to a left hand neighbouring C subcell where available, or to a D cell otherwise. Each C subcell adds its two inputs from above and above right to produce a single digit output. This output passes via a latch 34 to a B subcell or a D cell in the same column and in the row immediately below. Like the equivalent E cells, each C subcell is a half adder without any transfer digit output.

Each D cell adds its three inputs from above, above upper right and above lower right (Yin. tin, tin) to produce intermediate sum and transfer digits w'Out and tout. The first row D cell has one input permanently zero, and for this cell töut must also be zero. It could in fact be replaced by a half adder type E cell. Each D cell provides but as a wln input to the E cell of the row below where available, or to a lowermost array output 364 via two latches in series in the case of the last row D cell. Transfer digits töut from second to fourth row D cells pass as diagonal inputs to respective like row E cells for addition to tout from previous row D cells. Each E cell output is a single digit Yout which passes via a latch 34 to a respective array output 361 to 363. The array outputs 361 to 364 are connected to respective feedback lines 301 to 304.

The processor 10 operates on totally different principles as compared to prior art devices. They employ conventional binary arithmetic, in which successive digits (bits in binary arithmetic) are computed in ascending order of significance. In particular, least significant bits are computed first and carry bits propagate to computations of higher significance. In consequence, the greater the significance or importance of a digit, the longer it takes to emerge in prior art devices.

Furthermore, the value of the least significant digit affects those of higher significance; eg adding 0001 to 0111 produces 1000 in binary arithmetic. If the least significant bit is not computed to reduce processor size and processing time, the most significant bit is wrong in this example. The processor of the invention operates in accordance with so-called redundant number arithmetic based on signed digit number representations (SDNRs). These differ from conventional arithmetic in that any digit may have a positive or a negative value, and also in that there is more that one way of representing a number. Furthermore, there is normally a choice of digit sets. Before discussing the way in which the processor 10 executes a computation, this form of arithmetic will be discussed.

In conventional radix 10 (decimal) arithmetic, the digit set is O to 9 and the digits are all of the same sign. The processor 10 is constructed to operate with an SDNR of radix 4 and digit sets such as -3, -2, -1, 0, 1, 2, 3. This is known as the maximally redundant set. It is also possible to employ other digit sets with this radix such as the minimally redundant set -2 to 2. The redundancy aspect is exemplified by the decimal number 15, which in the maximally redundant radix 4 digit set is (1,0,-1), (0,3,3) or (1,-3,0,-1) etc.

Generally speaking, for a radix r the maximally and minimally redundant sets have (2r-1) and (r+1 ) digits respectively. For radix 2 these are the same, a set of three digits 1, 0 and -1.

The processor 10 operates on fractional quantities, so that recursion or repeated feedback from ouptut to input does not produce overflow. Thus inputs u1 to u4 (the digits of the nth value of u) at 321 to 324 have digit significance of 4-1 to 4-4 respectively, ie 1/4 to 1/256. Similarly, the (n-1)th result output digits yn0-1 to y3n-1 at 361 to 364 have significance 40 to 4-3 respectively, or 1 to 1164.

In addition, the digits b: to b 1 of the multiplying coefficient b1 have the same significance as Un to un. In consequence, each column 141 to 147 of the processor 10 is of constant digit significance, and this significance reduces by one per column from left to right. In cell 1613 for example, subcell A multiplies an inpu; -1 by by to produce a lower significance digit wout to be passed on in the same column, together with a higher significance digit tout to be passed to the left. The digits wout and tout have significances 2 and 1 respectively. The columns 141 to 147 therefore correspond to digit significances 0 to 6 respectively.

The processor 10 is designed for values of b1 in the range -0.2222 to +0.2222 in radix 4, equivalent approximately to -0.66 to +0.66 in decimal.

Individual digits of the kinds y and u may take any value in the maximally redundant radix 4 set (-3 to 3). However, digits of the kinds b1 and w are restricted in this example of the invention to the minimally redundant radix 4 set (-2 to 2). Moreover, transfer digits of the kind t are restricted to values (-1 to 1). These digit value restrictions apply irrespective of whether or not the relevant digit is primed or carries a subscript.

Having clarified the arithmetic rules implemented by the processor 10, its operation will now be discussed in more detail. Figure 1 illustrates the first row 121 of the processor 10 receiving input of the four digits u1 to U representing the nth non-recursive input un. Each row 12m (m = 1 to 4) is also receiving a respective feedback digit yrnn~ l generated in the preceding or (n-1 )th computation.The processor 10 is designed to implement the first order FIR filter section relation given in Equation (3) and repeated below for convenience: yn = un + b1yn-1 (3) The first row 121 of the processor 10 implements multiplication of the most significant digit Y n-1 of the (n-1 )th output result Yn-1 by the coefficient b1 having digits bl to bl. Moreover, the first row 121 provides for addition of the non-recursive term un with digits Un to un. The first step in this row is for each type A subsell to receive input of Y n-1 and multiply it by the respective digit bml (m = 1 to 4). Processing cell 1613 within dotted lines will be considered first.Its multiplication operation involves a digit Y n-1 in the range (-3 to 3) and a digit b 2 in the range (-2 to 2). The product is therefore in the range (-6 to 6). However, the allowed range for wOut from a type A subcell is (-2 to 2), so a transfer digit tout is generated which is one level higher in significance. This is expressed by: out + rt bl ~ Yin (9) where r is the radix and m is the multiplier digit significance; r = 4, and m = 2 in the present case.

Since r = 4 and tout = + 1 or 0, rtout = + 4 or 0. Values of the product by yin from -2 to 2 are expressed by tout = 0 and wout = bw 'Yin' Values + 3 of the product are expressed by wout = ; 1 and tout = + 1. Values + 4 to + 6 are expressed by wOut = + O to + 2 and tout = + 1. All possible products of the A subcell multiplication are therefore accommodated within the digit limits previously given.

The A subcell of cell 1613 passes the transfer digit tout diagonally down to the left hand neighbouring B subcell of cell 1612, where it becomes tin. It is added to u1 and to wOut of the A subcell of cell 1612, with which it shares like digit significance. These three digits have value ranges (-1 to 1), (-3 to 3) and (-2 to 2) respectively. Their sum consequently lies in the range (-6 to 6), which is the same as the output range from a type A multiplier subcell. It can accordingly be accommodated by similar output digits wOut and tout in ranges (-2 to 2) and (-1 to 1) respectively. The B subcell function is expressed by w'out + rt'out = y'in + tin + win (10) (yin' is replaced by uin for top row B subcells.) The sum digit w0,#t from the B subcell passes to the associated C subcell where available, which has the function yout' - win' + tin' Of these. tin arises from a neighbouring B subcell, and Yout passes as input to the row below.

The transfer digit tout of the B subcell passes to the adjoining column's D cell 1811, where it is added to the multiplier or A subcell transfer digit tout. Other D cells 1822 to 1844 receive input of Yout (range -3 to 3) from a C subcell in the row above. Cell 1811 has no row above however, and its Yin input is set permanently to zero. Moreover, since it is adding two transfer digits both in the range -1 to 1, its output is in the range -2 to 2. Consequently, its transfer digit output töut is therefore permanently zero, and does not require to be connected. Hence cell 1811 has a Yin input and a töut output which are unconnected.More generally, for other D cells 1822 to 1844, the function is given by wout" + rtout" - Yin' + tin + tin' (11) Equation (11) is similar to Equation (10); Yln is in the range (-3 to 3), and the equivalent for tin and tln is (-1 to 1). The combined range is (-5 to 5), which can be accommodated by r = 4, töut in the range (-1 to 1) and wÖut in the range (-2 to 2) as previously described regarding B subcell.

The D cell 1811 provides the sum tout of both first row, second column transfer digits tin and tin as an input wln to the second row E cell 2021. This E cell also receives input of a transfer digit tìn generated as töut from the second row D cell 1822. The inputs to E cell 2021 are both of digit significance 0, since both are one level higher than that of column 142; ie E cell 2021 is summing wOut (which arose from two first row, second column transfer bits) with a second row, second column transfer bit tout. Since wÖut and töut are in the ranges (-2 to 2) and (-1 to 1) respectively, E cell 2021 produces their sum Yout in the range (-3 to 3).This involves no digit significance increase, since the range is acceptable for terms of type y. The sum yout was computed inter alia from most significant digits Y0n-1 and ud , and forms the most significant digit Y0n of the result Yn succeeding that illustrated in Figure 1. The computation producing Y n passes through the two left hand first row latches 34 in parallel, and subsequently through the left hand second row latch 34 under E cell 2021; yon accordingly takes two clock cycles to emerge from the processor 10 after the preceding Y n-1 is generated.

The digit y1 of second highest significance of the succeeding result Yn is formed similarly. The equivalent preceding digit Y1n-1 is fed back from the third row output 362 via the line 302 to each of the second row A subcells of cells 1623 to 1626. It is formed one clock cycle later than Y n-l. and is one level lower in digit significance. Accordingly, its multiplication by b 1 to b 1 will yield products one level lower than those involving like multiplying coefficients in row 12, above. This is accommodated as shown in Figure 1 by the shift of row 122 by one column to the right relative to row 121.

The computation of y1 is similar to that of yO,, and so it will be described in outline only. It is produced as Yout from third row E cell 2032, and is the sum of the transfer digit töut from the third row D cell 1833 and the digit wÖut from the second row D cell 1822. Of these, cell 1822 sums the lower order digit Yout from its column neighbour 1612 in the row above with transfer digits from its row neighbour 1623. In turn, cell 1623 receives two transfer digits from its neighbour 1624. The other input to third row E cell 2032 from its row neighbour D cell #1833 arises from the C subcell of cell 1623 in the row above and transfer digits from like row member 1634. In turn, cell 1623 receives input from its column neighbour C subcell in the row above.

The digit YW third in order of significance is formed similarly to y1, except that rows and columns further down and to the right become involved in computation.

It is output from fourth row E cell 2043. As has been discussed, its value can be affected only by outputs from cells up to three steps to the right in row 124 and up to four steps to the right in row 123. Of these, subcells B and C (not shown) of cells 1637 and 1846 and subcells C (not shown) of cells 1636 and 1845 cannot contribute to y2 and are omitted from the processor 10. This omission does not affect the value of y2, or indeed those of Y0n and yl, which are correctly calculated. The processor 10 therefore contains the minimum of circuitry necessary to calculate the three highest significance result digits.

The least significant result digit y3 is derived as wo,,ut from fourth row D cell 1 It passes via two latches 34 to the lowermost processor output 364. It may be inaccurate by 1, ie if its calculated value is x in the range (-2 to 2), its actual value will be x-1, x or x+l. This corresponds to a truncation error in the final result. The lowermost latch 34 in Figure 1 is the only remaining item of a fifth row of cells, and corresponds to a latch following a type E cell output. It acts as a single digit accumulating means, there being no row neighbours generating other digits.

The foregoing discussion of mode of operation has been restricted to details of computation. Timing of processor operation will now be described. Initially, the processor outputs 361 to 364 are zero. The digits U11 to u41 of u1, the first non-recursive input term, are fed to the processor's first row B subcells. Time is subsequently allowed for the first row cells to settle to final states. The processor latches (eg 34) are then clocked in synchronism by a common system clock (not shown) connected thereto. This advances the first row cell outputs to the second row for a second cycle of computation. After the settling time interval, the latches 34 are clocked once more.This advances the most significant digit y Ci of the first result y1 from E cell 2021 to the uppermost processor output 361, where it is fed back to first row cells. In addition, intermediate results from other second row cells pass via latches 34 to third row cells. At the same time, ie after two clock cycles, the digits u l to u42 of the second non-recursive term u2 are input to first row cells. The processor latches are clocked a third time to advance first row outputs to second row cells.This also brings the second most significant digit y 1 to the second row output 362, which is one cycle later than the appearance of y Oi at 361. On its appearance at 362, yl is fed back as an input to second row cells.

The latches 34 continue to be clocked in synchronism, with successive values of Un being input to the first processor row 121 every two clock cycles. Similarly, successive values of output digits ym emerge (m+2) clock cycles later, where m = O to 3; ie there is a one clock cycle delay between emergence of digits differing in significance by one. In general, the digit line of mth significance of the nth result Yn appears at output 36m+1 after (2n+m) clock cycles, and is fed back for input to processor row 12m+1 . This illustrates an important feature of the processor 10. that each output digit Y n is recycled for subsequent use as soon as it is generated.Furthermore, as has been said, there is a delay of two clock cycles between input of un and output of yO#, but this is independent of input or output word length (number of digits per term un or Yn). Successive values of each of Yn to y3 appear every two clock cycles, so the processor 10 is 50% efficient. The efficiency may be improved to 100% when two independent operations are to be executed, since they may be interleaved and computed on alternate cycles. The input non-recursive term would be Un on even clock cycles and Un say on odd clock cycles. These efficiency values of 50% or 100% compare very favourably with much lower prior art values involving digital processors calculating least significant digits (bits) first.Efficiency may also be improved by combining the functions of two neighbouring rows into that of a single row. Alternatively, Yn may be expressed in terms of Yn-2 as described with reference to Equations (4) to (7). The advantages of the invention arise from processor design based on signed digit number representation arithmetic with digit redundancy. This allows most significant digits to be computed first, and makes transfer digit propagation of short finite length.

The foregoing analysis demonstrates that the processor 10 executes the recursive first order IIR filter section computation given by Equation (3) as Yll Un + blY,,1 (3) provided that un is computed separately, where: Un a down + (4) Here xn represents a continuous data stream and ao, a1 and b1 are coefficients determining the filter frequency response; un may be calculated repeatedly separately from the processor 10, and employing the data stream xn as input to simple prior art multipliers described for example in British Patent No 2,1 06,287B. It is non-recursive, so that feedback or recursion loop delay problems do not arise.Moreover, it involves only two multiplication operations followed by addition.

The processor 10 incorporates latches 34 between adjacent rows 12 but not between adjacent columns 14. It is in fact possible to introduce equivalent latches between columns, provided that inputs to first row cells have relative delays increasing with decreasing digit significance. However analysis of this configuration shows that the effect is to slow down processor operation. Prior art devices normally employ latches between both row and column neighbour cells.

The processor 10 accepts successive input terms Un with digits um (m = 1 to 4) supplied in synchronism to first row multiplier cells 1612 to 1615. It accepts an input and produces an output every two latch or clock activation cycles. It produces output terms Yn with digits ynm (m = O to 3) (2+m) clock cycles after input of Un; ie the most significant digit Y0n arrives after two clock cycles and successive digits are increasingly more delayed. This is referred to in the art of array processors as a "temporal skew". The processor 10 is therefore characterised by synchronous input but temporally skewed output.It is more convenient in some circumstances to provide a processor characterised by input and output of like temporal skew to facilitate connection in cascade. Moreover, the processor 10 is restricted as has been said to values of coefficient digits Ibll etc not greater than 2. The maximum value of the coefficient b1 is therefore 0.2222 in radix 4, or about 0.66 when converted to radix 10. It may be convenient to employ larger values of b1 to provide more freedom of choice of filter response characteristics. The penalty for this is that input data values may then produce overflow.

Referring now to Figure 3, there is shown a further processor 50 of the invention. It is a modified version of that shown in Figure 1 to implement skewed input of Un terms and multiplication by coefficients in the larger range of about -2.6 to +2.6 in radix 10. Cell inputs and outputs are illustrated in a box 51. The processor 50 is of very similar construction to the processor 10, and consequently its description will be largely restricted to aspects of difference. It multiplies by coefficient digits b Oi to b l contained in respective multiplier cells of each row. Each digit can take any values in the range (-2 to +2), and so the value of b- is in the range -2.222 to +2.222 in radix 4.

The processor 50 contains multiplier cells 52 and type D' and E accumulator cells 54 and 56 arranged in rows 581 to 584 and columns 601 to 607. Latches indicated by triangles such as 62 provide pipelining as before between column neighbours but not row neighbours. Ignoring subcells which are absent since not required for higher output digit significance, most of the multiplier cells 52 have the same structure as cells 16. They incorporate subcells A, B, B' and C as shown at 51 providing multiply, full add and half add functions. However, the most significant multiplier cell 52j,j+1 (j = 1 to 4) in each row 54j incorporates an A subcell together with successive B and B' subcells.The C subcell of other multiplier cells is therefore replaced by a type B' subcell, and this provides for an additional addition function via input lines 631 to 634. The B' subcells have an identical function to that of B subcells, but input and output signals differ as shown in the box 51. The lines 631 to 634 supply respective input digits U n to u3 to most significant multiplier cells 5212 etc. However, synchronous input digits v1 to v4 may also be supplied to top row B sub-cells as illustrated, in a similar manner to inputs ut etc in the Figure 1 processor 10.

The effect of exchanging a C subcell for a B' subcell is to introduce an extra transfer digit tout from the most significant multiplier cells 5212 etc. These cells now produce three transfer digits, as opposed to two for other multiplier cells such as 5213. The type D' accumulator cells in at least the second row onwards must therefore accept four inputs, one sum digit of type y' (range -3 to +3) and three transfer digits (range -1 to +1). The range of the sum of these digits is t to +6), which can be accommodated by output sum and transfer digits of types w and t. As previously described with reference to Equation (11), D cells in the processor 10 had inputs with a sum in the range (-5 to +5), whereas the output expression w + rt has a range (-6 to +6). The full output range of the D cell was therefore not exercised, and it can accommodate an extra transfer digit input as provided in the processor 50. The prime superscript to D' cells accordingly indicates a similar function to that of D cells, but including an additional transfer digit input.

In consequence of the increase in maximum size of the coefficient 1b1 I to 2.66, recycled output digits yP generate products of higher digit significance as compared to the equivalent for the processor 10. To compensate for this, output digits are fed back to the next but one preceding row (as opposed to the preceding row in the processor 10). Each output digit Yi-1 (p = O to 3) is recycled via a line 64p+1 to the respective row 58p+l. The most significant output digit yOn-1 will be multiplied inter alia by most significant coefficient digit b01. This produces a product with two digits, tout of significance +1 and wout of significance 0.The latter is of the same significance as #-#. It must therefore be produced in the same processor column, since as previously indicated each column is associated with sum and transfer digits of constant significance.

The selection of the row to which each output digit is recycled is in accordance with the significance of the recycled digit in combination with that of the most significant coefficient digit, the latter indicating coefficient magnitude. To illustrate this, consider the recycling of Yh-1~ - When multiplied by b01, this digit will provide sum and transfer digits of significance 1 and 0 respectively. The transfer digit will be added to other digits to produce an output digit of significance 1; ie the multiply and add operations introduces a digit significance increase of 2, as compared to that of the sum digit arising from multiplication and equal to the digit significances of yet~1 and b Oi added together.Each recycled digit therefore generates a contribution to another digit two levels higher in significance in consequence firstly of multiplication by b and secondly of accumulation with other digits in accordance with radix 4 arithmetic. Each recycled digit must accordingly be fed back to the next but one preceding row.

In comparison, the processor 10 operates with like arithmetic, but its maximum coefficient digit b l is one level lower in digit significance. Each of its recycle digits can only produce a contribution to a respective output digit one level higher in significance. It therefore requires feedback of each recycle digit only to the respective preceding row. If the most significant coefficient were to be of significance 2, 3 or smaller, feedback or recycling would be in the same row or a subsequent row to compensate for decreasing coefficient magnitude. Moreover, other processors of the invention using radices other than 4 may produce a recycle digit significance increase of more than 2 in consequence of accumulation.

The selection of the row to receive feedback is adjusted to compensate for this.

Generally speaking, recycle row selection is in accordance with accumulation arithmetic, coefficient magnitude and recycle digit significance.

Because the processor 50 recycles between non-adjacent rows, contributions to output results must pass through three latches 62. A new result is therefore generated every three cycles after input, and the generation and input rates are once every three cycles. This is slower than the processor 10, and is a consequence of increasing coefficient size.

The processor 50 also differs from the processor 10 in that it has two overflow outputs O/F. These are the transfer digit output of the first row D' cell 5411 and the sum output of the second row E cell 5621. The need for this arises as follows. The coefficient bl may take values in the range +2.66 to -2.66 (approximately) in radix 10, and each Un has a maximum value approaching 4.

Equation (3) provides: yn = un + b1yn-1 Operation of Equation (3) allows Yn to increase without limit as n increases if b1 and un are large enough. This is a basic characteristic of any recursive processor such as an IIR filter. The net effect of increasing allowed maximum values of b1 from 10.661 (processor 10) to 12.661 (processor 50) is to produce the possibility of overflow. To avoid overflow, either the input non-recursive terms un (and/or vn) or the coefficient bl must be restricted in value. A non-zero value at either of the outputs at O/F indicates that this restriction has not been observed, and provides an alarm.

The output digits Y2n-1 and Y3n-1 are output from cells 5444 and 5245 via two and three latches 62 respectively. These latches preserve the temporal output skew, ie the one cycle delay between output digits of adjacent significance. They also correspond to vestigial extra rows lacking multiply/add functions as previously described.

The processor 50 contains a respective B subcell in each first row multiplier cell for the purposes of adding a nonskewed input vn. If this is not required, the B subcells may be omitted. First row cells would then contain only A and C subcells, apart from the most significant first row cell which would retain a B' subcell instead of a C subcell to add un.

Referring now to Figures 4 to 7, in which like parts are like-referenced, there are shown schematic functional drawings illustrating use of the processor 50 as a building block for use in IIR filter applications. For illustrational clarity, individual digits of terms being processed are not shown, and inputs of the kind Vn are not employed.

The processor 50 is illustrated in open loop form as a building block allowing any required connection scheme, ie there is (as yet) no recycle connection. It receives an input term u with temporally skewed digits indicated by a diagonal line 70. It receives a second skewed input y' (diagonal line 72), and y' is multiplied by a coefficient b at 74. The product by' is added to u at 76, and the sum passes via a system delay represented by clock-activated storing means 78 to provide an output y at 80. The output signal y has a temporal skew indicated by a line 81 and equivalent to those of inputs u and y'.

In Figure 5, the processor 50 is shown arranged as a first order section FIR filter. This merely requires a connection 82 from the output 80 to provide a multiplicand input at 74. By virtue of the delay introduced at 78, un+1 provides an input 70 simultaneously with output of Yn at 80, Yn being given by: Yn ~ Un + blYn-l which is Equation (3) as before.On the subsequent processing cycle (three clock cycles later), ie one processing cycle after that illustrated in Figure 5, un+2 will provide the input 70, and the output at 80 will be Yn+1 given by Yn+l - Un+l + buy, (12) Accordingly, by induction, a stream of input terms Un (n = 0,1,2,...) gives rise to a stream of output terms Yn representing the input filtered by a first order IIR filter section.

Figure 6 schematically illustrates two processors 50 and 50' connected in cascade to form a second order filter section. They are each equivalent to that shown in Figure 4, and like parts are like-referenced. However, parts of the upper processor 50' have prime indices to distinguish them from the accompanying processor 50, and they are associated with respective multiplier coefficients b2 and b1.

The output 80 of the processor 50 is connected to a recycle line 82 providing multiplicand inputs at 74 and 74' to both processors 50 and 50' simultaneously.

The output 80 is illustrated at a time when it has received yn from storing means 78. At this time, the upper processor 50' is receiving an input 70' of un+2, which consequently becomes added by adder 76' to b2yn from multiplier 74'. The output of adder 76' input to storing means 78' is therefore un+2 + b2Yn- Simultaneously, the output of stor;ng mean 78' is the earlier equivalent of this involving the input term un+1, ie Un+1 + b2Yn-1 The output of storing means 78' is the input to adder 76, where it is added to blyn generated by multiplier 74 from Yn recycled via line 82.This produces Yn+1 = un+1 + blyn + b2yn-i at the input to storing means 78 simultaneously with output therefrom of Yn The expression for Yn is given by replacing n+1 by n in that for Yin+1. which provides Y, Un + b1Yn-1 + b2Yn#2 (13) Equation (13) is the general expression for the recursive portion of a second order IIR filter section, which demonstrates that two cascaded processors of the invention can provide such a filter. As illustrated in Figure 7, three processors 50, 50' and 50" of the invention may be cascaded to provide a third order IIR filter section, and additional processors may be added to construct higher order filters.

Referring now to Figure 8, the second order IIR filter section of Figure 6 is shout in more detail. Cell reference numerals have been omitted to reduce complexity. It comprises two processors equivalent to 50 and 50' and referenced accordingly, but these are modified as previously described by omission of unnecessary top row B sub-cells, there being no vn input. In consequence, the top row D' cell of each processor 50/50' cannot produce an overflow output.

This can only occur at a second row E cell. Successive digits uw are input to the most significant multiplier B' sub-cell of the left hand processor 50'.

Intermediate output digits are fed from the processor 50' via lines 901 to 904 to respective most significant multiplier B' sub-cells of the processor 50. Recycle or feedback digits Y n-1 to yW-i are fed from the processor 50 as multiplicand inputs to respective rows of both processors 50 and 50' via lines 921 to 924.

Referring now to Figure 9, there is shown a schematic representation of a general digital filter 120. It has a left hand or input portion 122 comprising a chain of latches 124o to 124N~1 connected in series. An input xn is connected to multiplier 1260 and latch 1240. The outputs of the latches are connected to respective multipliers 1261 to 126N associated with multiplicative coefficients aO to aN respectively. The filter 120 also has an output portion 130 comprising a second series chain of latches 1320 to 132M-1 with outputs connected to respective multipliers 1341 to 134M The multipliers 1341 to 134M are associated with multiplicative coefficients b1 to bM respectively.

The outputs of both sets of multipliers 1260 to 126N and 1341 to 134M are summed by an adder 138. The filter 120 has an input 140 which is shown receiving xn, this being the nth in a series x1, x2, x3 ... etc. The latches 124o to 124N-1 and 1320 to 132M-1 correspond to delays between computation stages, and are assumed to be equal. This is easily ensured in practice by providing additional internal delays for faster computation stages.

As illustrated, when multiplier 1260 receives xn, multiplier 1261 is receiving xn-l input one processing cycle earlier to multiplier 1260 and delayed at latch 1240.

Similarly, the ith multiplier 126i receives x,-i (i = O to N). Simultaneously, the adder 138 has a sum output 142 providing Yn and connected to the first output latch 1320. By virtue of delay at successive latches 1320, 1321, etc, the ith output multiplier 13Ai receives Yn-i (i = O to M) when Yn is output by the adder 13S. By inspection, it can be seen that yn, the output of adder 13S is given by Yn = aoxn +a1xn-1 + ... + aNxn-N +b1yn-1 + b2yn-1+ ... + bMyn-M (14)

The non-recursive summation term (FIR filter relation) involving xn-i in Equation (15) may be computed by prior art multipliers.The recursive term (nR filter relation) involving Yn-i is computed by building block processors 50 of the invention and illustrated in Figures 4 to 7. The invention accordingly provides for the general digital filter to be realised.

Referring now to Figure 10, there is shown a further embodiment 200 of a processor of the invention. The processor 200 operates in accordance with a radix 2 signed digit number representation arithmetic. It has a structure very similar to those of processors 10 and 50 described earlier, and is illustrated to indicate the consequences of a change of radix. Its description will accordingly be restricted to areas where it differs to previous embodiments. Figure 10 includes a box 202 illustrating each cell's input and output digit sets, together with a respective digital circuit for its implementation.

The processor 200 incorporates rows of multiplier cells such as third row cell 204, most of which consist of P, Q, R and T cells. For convenience, the distinction between cells and subcells will not be preserved. The third row begins with accumulator cells consisting of T and Q cells in cascade, an R cell and a T cell. The accumulator cells sum transfer digits from the most significant third row multiplier cell 204 with sum digits output from Q and R accumulator cells and the T cell of the most significant multiplier in the row immediately above (the second row).

Each P cell is arranged to multiply a respective digit bl by a feedback input digit y. Both digits are in the radix 2 redundant digit set (-1, 0, 1). The digit significance of bl ranges from 2 to 5. The multiplication produces a product in the range (-1, 0, 1). The product is passed to the associated Q cell, where it is added to an input digit in the range -1 to 1. The input digit is uw (m = 2 to 5) for top row multiplier cells or upper column neighbour sum outputs for second to fourth row multiplier cells. The Q cell produces a sum in the range (-2 to 2). This is represented by a sum digit wOut in the digit set (-2, -1, 0) and a transfer digit t equal to O or 1.The sum digit wOut is passed to the associated R cell, which adds it to a first transfer digit (equal to O or 1) from the adjacent multiplier cell. This produces a sum digit wOut equal to O or 1, which passes to the associated T cell for addition to a second transfer digit (equal to -1 or 0) from the adjacent multiplier cell. The multiplier T cell output is a single digit in the range -1 to 1, and passes down to the respective column neighbour cell below. The most significant multiplier cells such as 204 produce (as has been said) two transfer digits of value 0 or 1 and -1 or 0.

These are added in the row neighbour T accumulator cell to produce a value in the range -1 to 1. This value is passed to the associated Q accumulator cell, which adds it to the column neighbour T cell output from above and in the range -1 to 1. The Q accumulator cell provides a transfer digit output equal to O or 1 to its row neighbour R accumulator cell. It also provides a sum output digit in the range -2 to O to its column neighbour R accumulator cell in the row below.

Each R accumulator cell sums the transfer digit (0,1) from its right with the sum output (-2, -1, 0) from above. It provides a transfer digit equal to -1 or O to its row neighbour T accumulator cell, together with a sum output digit equal to O or 1 to its column neighbour T accumulator in the row below. Each T accumulator cell receives transfer digits from above and to its right with respective values 0 or 1 and -1 or 0. It adds these to produce a single output digit in the range -1 to 1. This output digit becomes the row output, and is fed back as a multiplicand input to the respective immediately preceding row.

Figure 10 demonstrates that radix 2 processing increases the number of significance levels involved in accumulation as compared to earlier radix 4 processors; ie the processor 200 provides for an increase of up to two significance steps in accumulation as compared to one for the processor 10.

However, since the processor 200 is employed with multiplicative coefficient digits b1 of maximum significance 2 (as opposed to 1 for the processor 10), recycled digits are still fed back to the preceding row as in the earlier device.

The invention may be implemented with signed digit number representation arithmetic of any radix. Further details of such arithmetic are described by A Avizienis in "Signed-digit Number Representations for Fast Parallel Arithmetic" IRE Trans Computers Vol EC-lO pp 389-400, Sept 1961.

Referring now to Figure 11, this shows a version 220 of the Figure 10 radix 2 processor 200 with modifications in accordance with Figure 3; ie the processor 220 incorporates skewed inputs Un etc and skewed outputs Y0n etc. Coefficients b0 etc have maximum significance of zero. Top row multiplier cells contain only multiplication subcells (type P), there being no addition of terms such as vn etc. These multiplier cells therefore do not generate transfer digits, although such a digit is generated in top row cell types Q and X. Multiplier cells in the second and subsequent rows do however generate transfer digits. Moreover, most significant multiplier cells each include a respective type Q subcell for addition of a non-recursive input digit Un etc. This is analogous to the extra type B subcell in the most significant multiplier cells in Figure 3.

Claims

1. A recursive processor for multiplying successive recycled output results by a coefficient and adding their products to respective input terms, the processor including multiplier cells and accumulating means connected to form rows and columns, and wherein: (1) each row is arranged to multiply a respective recycled digit by at least the more significant coefficient digits; (2) the multiplier cells are associated with individual coefficient digits decreasing in significance along rows and increasing in significance down columns containing more than one such cell in each case; (3) each multiplier cell is arranged to compute output sum and where necessary transfer digits corresponding to recycled and coefficient digits multiplied together and where necessary added to input digits;; (4) each row begins with a respective accumulating means having at least such of the following functions as are required by availability of relevant neighbouring processor elements: (i) receipt of its most significant cell's transfer digit output, (ii) receipt of output sum digits from the respective preceding row's accumulating means and most significant multiplier cell, (iii) processing received digits to generate output digits of differing significance, passing the most significant of such digits to a respective processor output and passing the remaining one or more digits to accumulating means in a respective succeeding row; (5) the multiplier cells and accumulating means operate in accordance with signed digit number representation arithmetic involving digit redundancy and providing most significant digit first computation;; (6) the processor includes row neighbour interconnection lines for passage of transfer digits between adjacent multiplier cells in the direction of increasing coefficient digit significance and between each row's most significant multiplier cell and accumulating means; (7) the processor includes column neighbour interconnection lines arranged for sum digit transfer down each column between accumulating means, between multiplier cells and between most significant multiplier cells and accumulating means where available in each case, the column interconnection lines including clock-activated storing means for sum digit storage and advance between rows; and (8) each accumulating means has a respective most significant digit output connectable as a recycled multiplicand digit input to all multiplier cells of a respective row selected in accordance with accumulation arithmetic, recycled digit significance and coefficient magnitude.

2. A processor according to Claim 1 wherein each first row multiplier cell is arranged to add a respective digit of a processor input term to the product of its associated coefficient digit and recycled multiplicand input digit.

3. A processor according to Claim 1 including most significant multiplier cells arranged to add respective processor input term digits to their respective multiplication products.

4. A processor according to Claim 3 arranged in cascade with a second like processor, and including recycle connections from the first processor's accumulating means most significant digit outputs to multiplicand inputs of respective rows of both processors, together with connections from the second processor's accumulating means most significant digit outputs to respective most significant multiplier cell addition inputs of the first processor.

5. A processor according to any preceding claim wherein the multiplier cells and accumulating means are arranged to operate in accordance with radix 4 or radix 2 arithmetic.

Amendments to the claims have been filed as follows 1. A processor for performing multiplication operations comprising multiplying input terms by a coefficient to form products. the processor (eg 10) including an array of multiplier cells (16) arranged to multiply by coefficient digits and to add product digits arising from multiplication, together with accumulating means (18,20,34) arranged to add array output digits, characterised in that the accumulating means (18,20,34) is arranged to compute output result digits most significant digit first and in descending order of digit significance in accordance with a redundant arithmetic scheme.

2. A processor according to Claim 1 wherein the multiplier cells (12) are connected to form rows (12) each arranged to multiply by at least the more significant digits, characterised in that each result digit is connected by a respective feedback line (30) to a respective multiplier cell row (12) as a common multiplicand input term digit for each multiplier cell (12) of the row (12) as appropriate to implement recursive computations.

3. A processor according to Claim 1 or 2 characterised in that the array is arranged to add respective second input terms to the products.

4. A processor according to Claim 1, 2 or 3 wherein the accumulating means (is.20.34) and multiplier cells (16) are connected to form rows (12) and columns (14) of the array, characterised in that: (a) each row (12) containing multiplier cells (16) is arranged to multiply by at least the more significant coefficient digits with multiplier coefficient digit significance diminishing along the row; (b) the columns (14) are associated with digits of respective significances, and columns (14) containing multiplier cells (16) exhibit multiplier coefficient digit significance increasing down the columns (14); (c) the array is arranged to add second input terms to products by at least one of the following means:: (i) addition functions for first row multiplier cells, (ii) addition functions for multiplier cells associated with most significant coefficient digits; (d) intercell connections are arranged for movement down columns (14) of lower significance digits generated in the accumulating means (18,20,34) and multiplier cells (16) and for movement to neighbouring higher significance columns where available of like generated transfer digits; (e) clock-activated latches (34) are arranged in intercell connections between column neighbour accumulating means parts (eg 1822,2032) and multiplier cells (eg 1614,1624) to provide for storage and advance of digits from row (12m) to row (12m+1);; (f) multiplier cells (16) in at least the second and subsequent rows have addition functions for adding digits received from respective column neighbours to multiplication product digits and digits received from respective neighbouring columns where available; and (g) the columns (14) terminate in respective accumulating means parts (18,20,34) arranged to generate respective output result digits in accordance with a redundant arithmetic scheme.

5. A processor according to Claim 1. 2, 3 or 4 characterised in that multiplier cells (16) required to add transfer digits received from respective neighbouring columns (14) are arranged to accept each such digit at a later computational stage to that in which it was generated as appropriate to restrict transfer digit propagation within any row (12).

6. A processor according to Claim 5 characterised in that multiplier cells (eg 316) associated with coefficient digits of lesser significance are connected to pass transfer digits between adjacent rows in accordance with a carry-save structure.

7. A processor according to Claim 6 characterised in that it includes multiplier cells (eg 304,316) arranged to receive multiplicand inputs in signed binary digit form and to perform addition in accordance with non-redundant binary arithmetic.

8. A processor according to Claim 3 characterised in that it is a first processor (50) arranged in cascade with a second like processor (50'), and in that the first processor (50) has accumulating means outputs connected to multiplicand inputs of both processors (50,50'), and the second processor (50') has accumulating means outputs connected to first processor additive inputs as appropriate for implementing second order recursive computations.

9. A processor according to Claim 3 characterised in that it is a member of a set of like processors (50,50',50") connected in cascade, and in that one processor (50) has accumulating means outputs connected to multiplicand inputs of the other processors (50',50"), and the other processors (50',50") each have accumulating means outputs connected to additive inputs of a respective succeeding processor (50',50).