WO1998015947A2

WO1998015947A2 - Parallel spectral reed-solomon encoder and decoder

Info

Publication number: WO1998015947A2
Application number: PCT/US1997/018108
Authority: WO
Inventors: Mark A. Neifeld; Satish Sridharan
Original assignee: Arizona Board Of Regents On Behalf Of The University Of Arizona
Priority date: 1996-10-08
Filing date: 1997-10-07
Publication date: 1998-04-16
Also published as: AU5894498A; WO1998015947A3

Abstract

In the disclosed error correcting scheme, information data is encoded (110) and decoded (130) in parallel and in the spectral or frequency domains based on a Reed-Solomon (RS) code. As a result, when compared with space domain decoding, the spectral decoding scheme of the present invention shifts some of the computationally intensive modules into the encoder (110) thus reducing decoder (130) complexity. Thus, integrated circuit implementations of the error correcting scheme of the present invention are faster, have reduced power dissipation and occupy less chip area than serial encoders and decoders.

Description

PARALLEL SPECTRAL REED-SOLOMON ENCODER AND DECODER

This application claims priority of U.S. Provisional Application Serial No. 60/027,952 filed October 8, 1996. The U.S. Government has a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of contract nos. AFOSR F496209310477 and AASERT 496209410303 awarded by the United States Air Force.

BACKGROUND OF THE INVENTION Error correction plays an import.ant role in both communication and storage systems.

Optical communication systems typically operate at very high speeds and may involve a limited degree of parallelism via wavelength and/or pol.arization multiplexing. Space or time division multiplexing may significantly increase the parallelism of optical communication channels for use within computer interconnect environments. To ensure efficient utilization of channel bandwidth and avoid unwanted data bottlenecks, it is necessary to implement high speed error decoders within such systems.

Optical memories offer high storage capacities, and with volume storage techniques, can achieve very high aggregate data rates via parallel access. Optical memories however, like other storage media are prone to errors owing to media defects, noise and faulty read/write systems. Conventional error correction techniques involve decoding in a time-sequential (i.e., serial) fashion; however, the highly parallel nature of the data retrieved from page access optical memory, for example, requires an alternate solution, since such a serial decoding scheme can produce a severe bottleneck in the system.

SUMMARY OF THE INVENTION

Consistent with the present invention, a system for transmitting or storing user

information data is provided, which comprises an encoder configured to convert the user

information data into space or time domain encoded data in accordance with a Reed-Solomon

code. The encoder further supplies the encoded data to a medium for either transmission or

storage. A decoder is also provided which is coupled to the medium for decoding the encoded

data in the spectral or frequency domain and in a parallel format to thereby obtain corrected user

information data.

BRIEF DESCRIPTION OF THE DRAWINGS

Advantages of the present invention will be apparent from the following detailed

description of the presently preferred embodiments thereof, which description should be

considered in conjunction with the accompanying drawings in which:

Fig. 1 illustrates a data encoding/decoding system in accordance with the present

invention;

Fig. 2 illustrates a block diagram of a parallel Berlekamp Algorithm circuit in accordance

with the present invention;

Fig. 3 is a block diagram of a parallel Fourier transform circuit;

Fig. 4 is a detailed block diagram of the parallel Berlekamp Algorithm circuit;

Fig. 5 is a detailed block diagram of a parallel Recursive Extension circuit in accordance

with the present invention;

Figs. 6(a) and 6(b) are block diagrams illustrating a serial decoder and a parallel decoder

in accordance with the present invention, respectively; Fig. 7 illustrates plots of very large scale integrated circuit area as a function of code-rate

for both serial decoders and decoders in accordance with the present invention;

Fig. 8 illustrates plots of power dissipation as a function of block size (n) for serial

decoders and parallel decoders in accordance with the present invention;

Fig. 9 illustrates plots of information rate as a function of block size for serial decoders

and parallel decoders in accordance with the present invention; and

Fig. 10 illustrates plots of VLSI area for parallel space domain decoders and a decoders in

accordance with the present invention.

DETAILED DESCRIPTION

In accordance with the error correcting scheme of the present invention, information data

is encoded and decoded in parallel and in the spectral or frequency domain based on a Reed-

Solomon (RS) code. As a result, when compared with space or time domain decoding, the

spectral decoding scheme of the present invention shifts some of the computationally intensive

modules into the encoder thus reducing decoder complexity. Thus, integrated circuit

implementations of the error correcting scheme of the present invention are faster, have reduced

power dissipation and occupy less chip area than conventional serial encoders and decoders.

Turning to the drawings in which like reference characters indicate the same or similar

elements in each of the several views, Fig. 1 illustrates a functional block diagram of an

encoding/decoding system 100 in accordance with the present invention. Encoding/decoding

system 100 includes an encoder 110 which encodes received user information data in accordance

with an RS code, and supplies the encoded data to a medium or channel 120 (e.g., an optical

storage medium or transmission line) to a decoder 130. The encoded data, usually in the form of bits of electronic data, can be supplied to an electrical-to-optical conversion element (not shown)

prior to transmission through channel 120. Upon receipt of the encoded binary data, decoder 130

including optical to electrical conversion elements (not shown) decodes the encoded binary data

and outputs the information data.

As further shown in Fig. 1, user information data is typically segmented into symbols,

each symbol being m bits in length, where m is an integer. Typically, k symbols (k being

another integer) of user information data are input to encoder 110. The k symbols of user

information data is treated as spectral domain data and a plurality of error correcting symbols,

each typically having a value of zero, are attached to the k symbols of information data by an

appending circuit 111. The resulting group of symbols is referred to as a codeword vector C of

length n, n being an integer greater than k, such that there are n-k zero symbols appended to the k

symbols of information data. Circuit 112 next acts on the symbols of codeword C in parallel to

generate an inverse finite field Fourier transform (F⁴T^"') that converts spectral domain codeword

C into a space domain vector c, which is then supplied to medium or channel 120.

Vector r, which is the signal received from channel 120, is next detected at receiver 130.

Any errors occurring during transmission can be represented by a vector e, such that r = c + e,

where c is the inverse F T of codeword vector C. Circuit 131 constitutes spectral domain

conversion means, which in this example, typically performs an F⁴T on the received vector r to

obtain spectral domain vector R, which is used in the decoding process, discussed in greater

detail below. Other circuits that perform space to spectral domain conversion are also considered

within the scope of the present invention. Since the F⁴T is linear, transformed vector R equals

the sum of codeword vector C and the F⁴T of e, i.e., the error vector E comprising a plurality of

error symbols. Since the encoding was performed in the spectral domain, the last 2t (i.e., 2t=n-k) symbols of the code word C are zeros. Accordingly, the last 2t symbols of the error vector E equal the last 2t symbols of the vector R. That is,

E_j= R_j forj = k, k + l, . . ., n— 1, where Ej is the jth symbol of the spectral error vector E. The last 2t symbols or syndromes of the

error vector E are thus obtained directly from the F⁴T of the received vector r, and supplied to

parallel Berlekamp Algorithm (BA) circuit 132, which outputs error coefficients of an error

locator polynomial Λ(x). As discussed in greater detail below, the coefficients of the error

locator polynomial are used to calculate each component of the error vector E. Once E is known,

it is then added to R to obtain the codeword C and the original k symbols of user information

data.

The operation of parallel BA circuit 132 will now be described in greater detail. Given

that v < t symbol errors occur, the error locator polynomial is defined as:

Λ^(z) = π^(l -**''⁾. [1]

I=L

As is generally understood, α in the above equation is the nth root of unity over a

particular mathematical set known as a finite Galois Field GF(q^m), and i, is the location of the ith

error in the error vector. The error locator polynomial is defined such that if the ith symbol of

the error vector e is non-zero (i.e., an error has occurred at the ith symbol), then α^"1 is a root of

Λ(x). The inverse F⁴T of Λ(x) can be calculated to obtain the polynomial λ(x), defined as:

λ(x) =F^T-¹[Λ(^χ)]

[2]

= l + λ₁ι + λ₂ι² + . . . + λ_«r' . This polynomial is characterized whereby λjβ_j = 0, for all i = 1, 2, . . ., t, and facilitates

determination of the error vector E through a process referred to as recursive extension, to be

discussed in greater detail below.

For a t symbol error correcting RS code, the error vector can have at most t symbol

errors. This me∑ms that the error locator polynomial is at most a degree t polynomial. The BA

which is used to compute the error locator polynomial, is defined by the following set of

recursive equations with recursion index r:

L_r = δr (r -Lr-l) + (l - δ_r)Lr-l, [3]

( A<-^r _ 1 -Δ_rx \ AC'"¹ \

where S„ S₂, . . ., S_2t are the syndrome symbols, Λ(x) the error locator polynomial and B(x) an

intermediate polynomial. The initial conditions are Λ⁽⁰⁾(x)=l, B^<0)(x)=l and L₀=0. This set of

recursive equations are executed for 2t iterations and for each iteration, δ_r= 1 if both z^≠O .and

2L_r., ≤ r-l else δ_r = 0.

There are 2t stages of BA circuit 132. The initial value of the error locator polynomial is

a constant equal to one, for example, and the degree of this polynomial typically increases with

each stage of the algorithm. Hence the architectural complexity of the algorithm increases from

Stage 1 until Stage t and from then on remains the same until the last stage. Alternatively, in a

serial decoding implementation, a single Berlekamp stage is used repeatedly 2t times to obtain

the error locator polynomial. In the parallel mode, this time multiplexing is unfolded to obtain a

pipeline processing architecture. Fig. 2 illustrates the parallel pipeline architecture of BA circuit 132 in accordance with

the present invention. As shown in Fig. 2, syndrome symbols S„ S₂, . . ., S_2t are supplied in

parallel to the input Stage 1 of BA circuit 132, which outputs the syndromes and polynomials

Λ⁽¹⁾(x) and B⁽¹⁾(x) in parallel to Stage 2. The outputs of Stage 2 are supplied to Stage 3, and the

outputs of each successive stage are supplied to the next stage of the BA circuit 132.

Since the degrees of the polynomials involved in the BA computation increase from

Stage 1 to Stage t, the associated integrated circuit area also increases. After the t* stage, the

degree of these polynomials remain constant and hence all the subsequent stages require the

same area, except for the last stage, i.e., Stage 2t, whereby the polynomial B(x) need not be

computed. Accordingly, the integrated circuit area occupied by Stage 2t is nearly half the area of

the second to the last stage, Stage 2t-l.

The parallel structure shown in Fig. 2 advantageously achieves a data pipeline

architecture and localized connectivity. Due to the pipeline architecture, each stage of the BA

processes a different code word at each clock cycle. For instance, if during the first clock cycle a

syndrome is loaded into Stage 1 , this stage performs its task and the results are loaded into the

Stage 2 during the next clock cycle. In the meantime, a new syndrome is loaded into the Stage 1.

Thus, during any single clock cycle, there are as many as 2t different code words being processed

simultaneously. Due to the localized connectivity of this architecture, efficient VLSI area

utilization, which is not wire dominated, can be realized. Further, although the operational speed

of the circuit may be limited by gate delays, long metal wire capacitances do not significantly

slow the speed of the circuit. Returning to Fig. 1, coefficients A of the error locator polynomial output from BA circuit

132 and selected syndrome symbols are used by Recursive Extension (RE) circuit 133 to

calculate error vector E. As noted above, error vector E is added to vector R to obtain codeword

vector C, and thus, the k symbols of information data. The calculation of error vector E will now

be described in greater detail.

Error vector E can be obtained from the error locator polynomial using RE. In the space

domain, vectors λ and e satisfy λ • e = 0. Accordingly, in the spectral domain the RE is defined by the convolution: —

Equation 4 above takes into account that the maximum degree of the error locator polynomial is

t, yielding Λ_j=0 for j>t. Since Λ₀ equal one, further simplification of this equation yields:

Ei = ∑ Λ,- £, , i = 0, 1, .... n - 1,

[5] = Λι£,-_ι + Λj£,-_3 + .. . + A,£ __t.

Furthermore, since C_j = R_j - E_j = 0, for j = k, k + 1, . . ., n-1, therefore E_j = R_j, for j = k, k + 1, . . .,

n-1. These known 2t symbols and the t coefficients of the error locator polynomial are used in

the above equation to obtain the remaining k symbols of the error vector. As in the case of the

BA, a serial implementation of this decoder would use a single stage of the RE with time

multiplexing to obtain the error vector E. In the parallel implementation of the present invention,

however, this time multiplexing scheme is unfolded to obtain a k stage parallel processing pipeline architecture. A more detailed description of RE circuit 133 will be described below with

reference to Fig. 5. The architecture of the RE functional module also has a data pipeline

architecture and localized connectivity and results in an efficient implementation of this

functional module. Due to its pipeline structure, at each clock cycle there are as many as k

different code words being processed by the k stages of the RE. In addition, all k stages of the

RE typically have a similar construction, thus greatly simplifying the design process.

While the BA and RE circuits 132 and 133 constituting an error signal generating means,

calculate the error vector E, the output of the F⁴T circuit 131 (vector R) is temporarily stored in a

delay random access memory (RAM) 134. Upon completion of the error vector E calculation,

RE circuit 133 supplies error vector E to a comparison means, which in this example comprises a

summing or adder circuit for adding vectors E and R. The resulting sum, is the codeword vector

C. Alternatively, if appropriate, a negated vector E can be subtracted from vector R to obtain

codeword vector C. The zero symbols attached to vector C are then truncated, thereby leaving

the k symbols of information data.

Decoder 134 including circuits 131, 132 and 133 will next be described in greater detail

with reference to Figs. 3-5.

Fourier transform circuit 131 will first be described. Fourier transform circuit 131 is

realized using a parallel hardware realization of the Cooley-Tukey fast Fourier transform (FFT)

algorithm. The Cooley-Tukey algorithm itself is described in M. A. Neifeld et al., "Parallel Error

Correction For Optical Memories", Optical Memory and Neural Networks, vol. 3, no. 2, pp. 87-

98, 1994, for example, and its implementation is described in greater detail below. Finite field Fourier transforms (F³Ts) are operable over a particular mathematical set

referred to as a finite field and, therefore, differ from the conventional discrete Fourier transform

(DFTs). Preferably, the DFT is modified for operation over the finite field and becomes:

n-l l$ = ∑ .- *", [6] ι=0

where α is the nth root of unity over the finite field, v_; is the i^th element of the space domain

vector v and V,- is the j^th~element of the Fourier transform vector V. Implementation of the F³T

using the above formula directly, requires approximately n² multiplications and this exhaustive

technique can impose severe constraints on computational resources, particularly when n is large.

Several efficient FFT algorithms available for computing the conventional DFT can be modified

and made applicable to the finite field domain yielding finite-field-fast-Fourier-transforms

(F⁴Ts). The most popular F⁴T available involves the so-called butterfly technique; however, this

approach is useful only for the case when n is a power of 2. In the case of RS codes, where n =

2^m' the butterfly technique is not typically employed. There are, however, several other

algorithms available for the case when n is a composite. One such algorithm, the Cooley-Tukey

algorithm is preferred.

The Cooley-Tukey F⁴T algorithm is applicable for cases when n is a composite number,

i.e., a number which can be expressed as a product. More specifically, if n can be written as n =

n'n", introducing a new set of indices defined as

:' = :' + n :' , for i = 0, . . . , n - 1; i = 0 , n - 1, j = n " .' + J •" , f ror j ■' = r 0., . . . , n ' - 11 ; j ^■" = n 0. . . . , n " - 11, T L7' lJ the F⁴T equation can be rewritten as

-I n - 1

+n i )(n ; +; )

". + = Σ Σ °⁽'^' i +n I

=0 i =0 [8]

Exp.anding the products in the exponent and assuming α^n'=γ and α^{n "}=β dropping the term α^n'n"'^"j',

since α is of order n' n", gives

Thus, the input and output data vectors have been mapped into two-dimensional arrays and an

address re-shuffling has taken place. The F⁴T written in this new form requires at most n(n' + n")

multiplications, as compared with n² multiplications in the conventional technique. The inner

and outer sums in the Cooley-Tukey F⁴T formula each receive n numbers so that the inner sum is

an n point F³T for each value of i' and the outer sum is an n point F³T for each value of i". If n' or

n" is a composite number, then they in turn can be simplified by another application of the F³T.

In this way, if n has factors {n_;}, the F⁴T can be decomposed to a form requiring roughly n ^

multiplications.

An illustration of the parallel implementation of the Cooley-Tukey algorithm for the

codeword of length n=15 case is shown in Fig. 3. Upper circuit blocks 310-1, 310-2 and 310-3

in Fig. 3 receive 15 symbols in parallel. These 15 symbols constitute the input space domain

vector V_j, i.e., the received vector r. A first group of five symbols is supplied in parallel to block 310-1, and second and third groups of five symbols each are similarly fed to blocks 310-2 and

310-3, respectively. It is understood that blocks 310-2 and 310-3 have a similar construction as

block 310-1.

As further shown in Fig. 3, five input symbols are supplied on respective input lines 311-

1 to 311-5 coupled to block 310-1. These input lines are coupled to columns of finite field

multiplication stages 312-1 to 312-5. Each column includes five finite field multiplication stages

(including multipliers ml to m5, which carry out the multiplications required by equation 9

above and corresponding m bit XOR gates, which facilitate additions required by equation 9),

and each multiplier receives a corresponding input symbol. Further, each multiplier multiplies

an input symbol with a known or fixed symbol, .and the outputs of two adjacent multipliers

within each column are exclusive-ORed by m bit XOR gates 313, which are also arranged in

columns 313-1 to 313-5 adjacent each of multiplier columns 312-1, 312-2 and 312-3,

respectively. The output of each of the XOR gates within a given column is supplied to a

successive XOR gates within the column, which in turn exclusive-ORs the previous output with

the product generated by an adjacent multiplier. Thus, for example, the outputs of multiplier ml

and m2 in column 312-1 are supplied to m bit XOR gate 313-1 '. The output of XOR gate 313-1'

is supplied to a successive m bit XOR gate, which exclusive-ORs the output of gate 313-1 ' with

the product generated by multiplier m3 (not shown) in column 312-1. The outputs of the XOR

gates in each of columns 313-1 to 313-5 are output from one XOR gate to the next in a cascaded

manner, and the last m bit XOR gate in each of columns 313-1 to 313-5 outputs a corresponding

one of the 15 intermediate symbols supplied to lower blocks 320-1 to 320-3. Accordingly, for

example, m bit XOR gate 313-1 " outputs one of the 15 intermediate symbols. Preferably, the 15

intermediate symbols are generated in a single clock cycle. Moreover, blocks 310-2 to 310-3 respectively output groups of five intermediate symbols

each. Thus, the upper blocks constitute a 3 point F³T, and since each of blocks 310-1 to 310-3

output five intermediate symbols each, the total number of intermediate symbols is fifteen.

Each of lower blocks 320-1 to 320-3 receives the 15 intermediate symbols in parallel, and

each output three groups of five symbols each . The lower blocks thus constitute a five point

F³T, and together the upper and lower blocks output symbols V_j (i.e., spectral domain vector R),

preferably in a second clock cycle. Lower blocks 320-1 to 320-3 have a similar construction and

generate vector V in a similar fashion as the upper blocks output the intermediate symbols.

However, instead of the multipliers being arranged in columns, the multiplier in the lower blocks are arranged in rows, e.g. row 321 of lower block 320-1. As further shown in Fig. 3, the outputs

of groups of three multipliers, instead of five in the upper blocks, are supplied to m bit XOR

gates 322.

The upper and lower blocks constitute an F T and operate to multiply an unknown finite

field symbol with a known finite field symbol. As a result, less integrated circuit area is required

than a generic multiplier which operates on two unknown symbols.

Upon completion of the F⁴T, the last 2t symbols of the resulting vector are passed to the

first stage of the BA circuit 132. Fig. 4 shows a detailed block diagram of an arbitrary stage of

BA circuit 132 shown in Fig. 2.

As noted above, the initial values of parameters Λ°(x) and B°(x) .are set to a constant, 1,

and loaded into latches 420 and 430, respectively. In addition, the syndrome symbols are loaded

into latches 410 at the top of the structure shown in Fig. 4. Latches 410, 420 and 430 output the syndrome symbols, Λ°(x) and B°(x) to finite field multiplication stage 435, which includes

generic multipliers M.

M operate simultaneously and in parallel, and the outputs of each of the multipliers are

supplied to cascaded m bit XOR gates 436 to thereby compute intermediate symbol Δ, (recursive

index r equaling 1 in Stage 1 of BA circuit 132 and being incremented for each successive stage)

using equation 3 above. Typically, due to the parallel construction of BA circuit 132, symbols Δ_r

and Δ_r ^"' are obtained and Λ^r(x) and B^r(x) are output in a single clock cycle.

Symbol Δ_[ is next supplied to Symbol Inverse block 437, which outputs symbol Δ,^'1, the

finite field inverse of Δ„ using a look-up table (not shown). The next two blocks 440 and 445

then compute the coefficients of the Λ'(x) and B'(x) polynomials based on Δ_1; Δ,^"1, Λ°(x) .and

B°(x). Block 440 includes an array 441 of finite field multiplication stages M, which operate

simultaneously and in parallel on parameters Δ_l5 Λ°(x) and B°(x) to, generate outputs which feed

corresponding m bit XOR gates 442. The outputs of m bit XOR gates 442 constitute the

coefficients of polynomial Λ'(x). Similarly, array 446 of finite field multiplication stages M

operate simultaneously and in parallel on Δ,^'1, Λ°(x) and B°(x) to output the coefficients of

polynomial B'(x). The syndrome symbols stored in latches 410, and the coefficients of

polynomials A'(x) and B'(x) are passed to Stage 2 of Fig. 2. Each stage of BA circuit 132

similarly outputs polynomial coefficients and syndrome symbols to successive stages, such that

at the end of 2t stages, the desired error locator polynomial Λ(x) is obtained.

Once the coefficients of the error locator polynomial are obtained, the next stage in the

decoding process involves computing the error vector using recursive extension (RE). RE circuit

133 will now be described with reference to Fig. 5. Each of Stages 1 through k of RE circuit 133 outputs a respective symbol of error vector

E. For example, Stage 1 (block 510) outputs E₀, while stages 2, 3, . . . k output symbols E„ E₂,

E_k.,, respectively. Block 510 illustrates Stage 1 in greater detail. It is understood that Stages 2-k have a similar construction as Stage 1.

Block 510 outputs symbol E₀ based on the last t syndrome symbols of vector R

represented by E_k+t, E_k+t+! . . . E_n., in Fig. 5, and the last t coefficients of the error locator

polynomial (A, . . . Λ_r . . . A ) in accordance with equation 5 above. As further shown in Fig. 5,

the syndrome symbols -and error locator polynomial coefficients are supplied to array 512 of

generic multipliers M via latches 511. The outputs of multipliers M are next exclusive ORed in a

cascaded manner by m bit XOR gates 513. Following this first stage, output E₀, the coefficients

of the error locator polynomial, and the t-1 necessary symbols of the syndrome are provided as

inputs to the next RE stage to compute E,. This procedure is carried out for k stages at the end of

which, all the symbols of the error vector are obtained. Comparing the error vector E, by either

subtraction or addition, with vector R yields codeword C. As noted above, the original k

information symbols are thus readily obtained by truncating the n-k zero symbols appended at

the end of the codeword.

An example of the operation of decoder 130 will be presented below.

Assume the spectral domain codeword C for a (15,9) RS code, including the appended

0's is:

C = (α⁹, α¹, α°, α⁵, α", α⁹, α¹³, α°, α⁸, 0, 0, 0, 0, 0, 0),

where α is the 15^th root of unit over a mathematical set known as a finite field, which is obtained

using the generator polynomial x⁴=x + 1. Performing the inverse F T yields c = (α¹, α⁸, α", α², α², α⁵, α⁹, α⁵, α⁶, α⁸, α⁸, α⁹, α¹⁴, α⁷, α¹³),

which is the output from F⁴T^"1 circuit 112 shown in Fig. 1.

The vector c is then transmitted over channel 120, as discussed above. Assuming the

occurrence of a three symbol error in the form

e = (0, 0, 0, 0, 0, 0, α¹⁴, α⁷, 0, 0, 0, 0, 0, α°, 0),

with the corresponding F⁴T error vector E of the form

E = (α⁴, α\ α⁶, α⁴, ³, 0, α¹³, α\ ⁸, α¹³, α⁹, ¹⁴, 0, α¹, 0),

results in the vector r received at receiver 130

r = (α¹, α\ α¹¹, α², α², α⁵, α⁴, α¹³, α⁶, α⁸, α⁸, α⁹, α¹⁴, α⁹, α¹³),

whose F⁴T, i.e., the output of circuit 131, yields

R = (α¹⁴, 0, α¹³, α⁸, α⁵, α⁹, 0, α⁴, 0, α¹³, α⁹, α¹⁴, 0, α¹, 0),

The last 2t = 6 symbols comprise the symbol symbols

S, = α¹³, S₂ = α⁹, S₃ = α¹⁴, S₄ = 0, S₅ = α¹, S₆ = 0. These six syndrome symbols are provided as inputs to Stage 1 of BA circuit 133. The various

outputs from Stage 1 are as follows:

Λ(x) = α¹³x + 1;

B(x) = α²; .and

L. = l.

The outputs after Stage 2 of BA circuit 132 are:

Λ(x) = α^ux + 1;

B(x) = α²x; and

L₂ = l. The outputs after the third stage of BA circuit 132 are:

Λ(x) = α¹ x ²+ α^πx + l;

B(x) - ¹⁴x + α³; and

L₃ = 2.

The outputs of the fourth stage of BA circuit 132 are:

Λ(x) = α³x ²+ α¹³x + 1;

B(x) = α¹⁴x² + α³x; and

L₄ = 2. — -

The outputs of the fifth stage of BA circuit 132 are:

Λ(x) = α⁴x ³+ α¹³x² + α¹³x + 1 ;

B(x) = α¹³x² + α⁸x + α¹⁰; and

L₅ = 3.

The outputs after 2t^th (in this case sixth) stage are:

Λ(x) = α^πx ³+ α³x² + α⁹x + 1;

B(x) = α¹³x³ + α⁸x² + α¹⁰x; and

L₆ = 3.

The desired error locator polynomial, is therefore:

Λ(x) = α¹ 'x ³+ α³x² + α⁹x + 1.

As noted above, the coefficients of the error locator polynomial are provided as inputs to

RE circuit 133 along with t symbols of the error vector E. Accordingly, in the present example,

the following parameters are input to RE circuit 133:

Λ₃=α^u; Λ₂=α³; Λ, = 0; S₄ = E,₂ = 0; S₅ = E₁₃ = cc¹; and S₆ = E₁₄ = 0. The output of the first stage of RE circuit 133 is given by

E₀ = A, E_I4 + Λ₂ + E,₃ + Λ₃ E₁₂.

Substituting the appropriate values in the above equation gives

E₀ = α⁴.

Similarly, subsequent stages of RE circuit 133 give the other symbols of the error vector E in accordance with equation 5 above, namely

E, = α¹, E₂ = α⁶, E₃ = α⁴, E₄ = α³, E₅ = 0, E₆ = α¹³, E₇ = α¹, E₈ = α⁸.

The error vector E thus- obtained is:

E = (α\ α¹, α⁶, α⁴, α³, 0, α¹³, α\ α⁸, α¹³, α⁹, α¹⁴, 0, cc¹, 0).

Subtracting this error vector from the vector R output from F⁴T circuit 131 yields the codeword

C:

C = (α⁹, α¹, cc°, α⁵, α", α⁹, α¹³, α°, α⁸, 0, 0, 0, 0, 0, 0),

Based on the above description, the various computational units of an example of the

present invention including a (15,9) Reed-Solomon spectral decoder were designed and laid out

using a conventional VLSI layout tool. The decoder was fabricated in a 6mm x 4mm MOSIS

SMALL chip using a 2μm CMOS process. From the layout of the (15,9) spectral decoder,

scaling laws for VLSI area, electrical power dissipation and information rate were derived for

codes of different block sizes and error correction capability. These scaling laws were expressed

in terms of the code size (n), symbol size (m), error correction capability (t) and the process size

(λ). A summary of these scaling laws is presented below and comparisons are drawn with serial

decoder designs. In a serial implementation, one symbol of the received vector is provided as input to the

decoder during each clock cycle. This can act as a bottleneck in high data rate applications. One

way to increase decoding speed within the serial paradigm is to utilize an array of serial decoders

operating in parallel. In particular, as shown in Fig. 6(a), codewords Cl, C2, ... CP are supplied

serially in a symbol-wise fashion to serial F⁴T, BA and RE blocks 610-1. . .610-P, 620-1...620-

P, and 630-1...630-P, respectively. Each decoder, e.g., blocks 610-1, 620-1 and 630-1, will then

operate independently on separate code words. If there are P serial decoders in such an array, the

aggregate decoding data rate achieved is P times that of a single serial decoder.

In accordance with the present invention, however, time domain decoding is unfolded to

produce a parallel pipeline decoder, as shown in Fig. 6(b). This decoder receives an entire code

word at each clock cycle. That is blocks 131, 132 and 133 operate on all the received symbols

corresponding to an entire codeword in parallel and at the same time. Such a decoding paradigm

provides the decoder simultaneous access to all code word symbols, and therefore can yield

significant savings in implementational resources as compared with the array of serial decoders

shown in Fig. 6(a).

Fig. 7 illustrates very large scale integrated circuit (VLSI) area as a function of code rate

for a serial decoder and a parallel decoder in accordance with the present invention. As can be

seen from this figure, the parallel decoder in accordance with the present invention requires less

VLSI area than an array of serial decoders for codes of all block sizes considered. The

performance of the parallel decoder in accordance with the present invention is significantly

improved at relatively high code rates. Many practical code-rates are in the range of 0.75 to 0.80,

and in this range, parallel decoders in accordance with the present invention offer a much better implementation efficiency than the array of serial decoders, as represented by more than an order

of magnitude improvement in area shown in Fig. 7.

In a typical application environment it is common to be provided with a specification of

raw input bit error rate (BER) and desired output BER. Such a system specification defines the

code rate required for each RS block size n. In generating the plot shown in Fig. 7, an input BER

of 10^"4 and a required output BER of 10^"12 was assumed. Given these BER requirements, the

required code rate was computed for each RS block size considered and these operating points

are represented by the solid circles in Fig. 7. Further, it should be noted that within a typical chip

area of 1 cm², a parallel decoder for decoding codewords having lengths of n=63 can be easily

realized using a 1 μm CMOS process.

Additional advantages will become apparent in light of the description of Figs. 8 and 9

below. In particular, comparisons between the serial decoders and parallel decoders in

accordance with the present invention will be drawn as a function of block size assuming

operation at the input/output BER goals indicated above.

Scaling laws were also derived for another important implementational characteristic: the

power dissipation from a chip on which such a decoder has been fabricated. The power

dissipation is directly linked to the number of transistors on the chip and the rate at which the

chip is operated. If P_c is the total electrical power dissipated from the chip, C the gate

capacitance of a transistor, N the total number of transistors on the chip and f the clock rate, the

total power dissipation can be written as P_c = Vi CV²(N/2)f, where it has been assumed that at

any given time, half of the transistors on the chip are active. A typical value of V is 5 V, and a

typical value of C for a lμm CMOS P-well process is 6.9 x 10^"'⁵ F. Current CMOS technology offers clock rates of 100 MHz, and hence both types of decoders were operated at 100 MHz. The

corresponding total power dissipations were computed for a fixed input/output BER and .are

shown in Fig. 8. As can be seen from this figure, the array of serial decoders dissipates more

power than the parallel decoder in accordance with the present invention. More importantly, for

codes of block sizes up to 63, the total power dissipated by the parallel decoder is less than 1W.

Another parameter which can be considered is the power density on the chip, since this

correlates directly with system coolong requirements. This metric is obtained by dividing the

total power (Fig. 8) by.lhe total area (Fig. 7). It has been determined that serial decoders

dissipate approximately lW/cm2, whereas the parallel decoder in accordance with the present

invention, owing to its reduced area requirements, can dissipate up to a maximum of 2W/cm2

depending on the block size chosen. Accordingly, although the power density of the parallel

implementation is higher than that of the serial array, the power density levels associated with the

present example of the invention are relatively low and are not expected to limit system

performance.

Information rate is an additional performance metric to be considered. In the present

example, the total chip area is assumed to be fixed at 1 cm², and the clock rate is assumed to be

100 MHz. Fixing the VLSI area in turn fixes the total number of decoders that can be realized in

a chip and hence the parallelism is also fixed. The information rate can now be defined as

Information rate = Parallelism x Clock x Code-rate = mn x # of decoders x Clock x k/n. Fig. 9

illustrates a plot of information rate as a function of block size for the parallel decoder in

accordance with the present invention and the array of serial decoders. Both decoders offer

higher information rates as they operate at smaller block sizes. However, for the same

input/output BER requirements, smaller block sizes require lower code rates. Accordingly, the overhead or the redundancy necessary to achieve the same BER performance increases for

smaller block sizes. In the case of the parallel decoder in accordance with the present invention,

there is a peak at a block size corresponding to a codeword length of n = 63. The resulting

maximum information rate which can be achieved is 77.4 Gbps. There is not much difference in

the information rate when using codes of block sizes ranging from n = 15 to n = 63; however,

operating at n = 63 yields improved channel efficiency (i.e., higher code rate) and larger burst

error correction capability.

Extensive SPIGE simulations were carried out to test the different functional modules of

the (15,9) spectral decoder. This design used the Cooley-Tukey F⁴T, 6 stages of the Berlekamp

algorithm and 9 stages of the RE. The total transistor count of the spectral decoder was 33,000.

The chip was fabricated using a 2μm CMOS process and SPICE simulation for this process

resulted in a maximum clock of 41.1 MHz at an aggregate data rate of 2.46 Gbps.

Parallel decoding in the spectral domain will now be compared with conventional parallel

decoding in the space domain, as described in Neifeld et al., supra. In the space domain parallel

decoder, the Euclidean algorithm is used to determine the error locator polynomial, followed by a

Chien search to complete the error correction. The syndrome, Euclidean algorithm and the Chien

search functional modules have parallel pipeline processing and each of the funcitonal modules

are locally connected. As seen in Fig. 10, which compares the VLSI area associated with the

space domain decoder and the decoder in accordance with the present invention, the spectral

decoder in accordance with the present invention requires much less VLSI area than the parallel

space decoder. It is noted, however, that since the analysis of the spectral decoder did not include the F⁴T circuit, the syndrom area was not included in the space decoder to justify a fair

comparison.

Moreover, the parallel spectral decoder in accordance with the present invention can

process more bits in a given clock cycle than the conventional parallel space decoder. Further,

the parallel spectral decoder has improved power dissipation, and has higher page rates and

information rate throughputs than the conventional parallel space decoder.

The present invention can be used in applications involving a high degree of parallelism

or very high aggregate .data rates. For naturally parallel data, the decoder input format is well

suited. In the case of high speed serial data, an input shift register can be used to facilitate high

speed serial input and parallel decoding at a lower clock rate. This is an attractive alternative to

serial decoding since the VLSI area necessary to implement the parallel spectral decoder is

significantly lower than that required for an array of serial decoders offering identical

performance. For a fixed VLSI area of 1 cm², a process size of lμm and a clock rate of 100

MHz, the n = 63 parallel spectral RS code offers a maximum information rate of 77.4 Gbps.

While the foregoing invention has been described in terms of the embodiments discussed

above, numerous variations are possible. For example, the present invention can be used to

encode and decode other RS codes, such as RS product codes. Accordingly, modifications and

changes such as those suggested above, but not limited thereto, are considered to be within the

scope of the following claims.

Claims

What is claimed is:

1. An encoding/decoding system, comprising:

an encoder circuit configured to convert user information data into space domain encoded

data in accordance with a Reed-Solomon code, said encoder further supplying said encoded data

to a medium; and

a decoder circuit coupled to said medium, said decoder being configured to decode said

encoded data in a spectral domain and in a parallel format to thereby obtain said user information

data.

2. An encoding/decoding system in accordance with claim 1, wherein said encoder

circuit comprises:

a space domain transforming circuit acting on said user information data in parallel to

transform said user information data into space domain data to thereby generate said encoded

data for transmission through said transmission medium.

3. An encoding/decoding system in accordance with claim 2, wherein said space

domain transforming circuit comprises an inverse finite field fast Fourier transform circuit.

4. An encoding/decoding system in accordance with claim 2, wherein said user

information data comprises a plurality of information symbols, said encoder further comprising:.

an appending circuit for attaching a plurality of error correcting symbols to k information

symbols where k is an integer, each said error correcting symbols being substantially the same,

said appending circuit outputting a codeword to said space domain transforming circuit, said

codeword being n symbols in length, where n is an integer greater than k, said codeword

including said k information symbols and n-k error correcting symbols.

5. An encoding/decoding system in accordance with claim 4, wherein each said

plurality of error correcting symbols comprises zeros.

6. An encoding/decoding system in accordance with claim 1, wherein said decoder circuit comprises:

a spectral domain transforming circuit transforming said encoded data retrieved from said

medium into spectral domain data, said spectral domain data comprising a plurality of spectral domain symbols;

an error signal generating circuit coupled to said inverse spectral domain transforming

circuit, said error signal generating circuit outputting a plurality of error symbols in response to

said spectral domain data; and

a comparison circuit comparing said error symbols with said spectral domain symbols to

obtain said user information data as a result of said comparison.

7. An encoding/decoding system in accordance with claim 6, wherein said spectral

domain transforming circuit comprises a finite field fast Fourier transform circuit.

8. An encoding/decoding system in accordance with claim 6, wherein said error

signal generating circuit comprises:

a Berlekamp algorithm circuit outputting a plurality of error coefficients in response to

first selected symbols of said spectral domain symbols; and

a recursive extension circuit outputting said error symbols in response to second selected

symbols of said spectral domain data and said plurality of error coefficients.

9. An encoding/decoding system in accordance with claim 6, wherein said

comparison circuit comprises a summing circuit for adding said error bits to said spectral domain

bits to thereby obtain said user information data.

10. An encoding/decoding system in accordance with claim 6, wherein said

comparison circuit comprises a difference circuit for subtracting said error bits from said spectral

domain bits to thereby obtain said user information data.

11. An encoding/decoding system in accordance with claim 6, further comprising a

memory for storing said spectral domain symbols and supplying said spectral domain data to said

comparison circuit.

12. A decoder correcting errors present in Reed-Solomon encoded data and decoding

said encoded data to output user information data, said decoder comprising:

a spectral domain transforming circuit transforming said Reed-Solomon encoded data

into parallel spectral domain data, said parallel spectral domain data comprising a plurality of

spectral domain symbols; an error signal generating circuit coupled to said inverse spectral domain transforming

said parallel spectral domain symbols; and

obtain said user information data as a result of said comparison.

13. A decoder in accordance with claim 12, wherein said spectral domain

transforming circuit comprises a finite field fast Fourier transform circuit.

14. A decoder in accordance with claim 12, wherein said error signal generating

circuit comprises: a Berlekamp algorithm circuit acting on parallel first selected symbols of said spectral

domain symbols to output a plurality of error coefficients in parallel; and a recursive extension circuit acting on said plurality of parallel error coefficients and

second parallel selected symbols of said spectral domain data to output said error symbols.

15. A decoder in accordance with claim 12, wherein said comparison circuit

comprises a summing circuit for adding said error symbols to said spectral domain symbols to

thereby obtain said user information data.

16. A decoder in accordance with claim 12, wherein said comparison circuit

comprises a difference circuit for subtracting said error bits from said spectral domain bits to

thereby obtain said user- information data.

17. A decoder in accordance with claim 12, further comprising a memory for storing

said spectral domain symbols and supplying said spectral domain symbols to said comparison

circuit.

18. A decoder for decoding Reed-Solomon encoded data, comprising:

spectral domain conversion means coupled to a medium for performing a finite field fast

Fourier transform on said Reed-Solomon encoded data received from said medium, said spectral

domain conversion means outputting spectral domain data in parallel;

error signal generating means coupled to said spectral domain conversion means for

outputting error data in response to said parallel spectral data domain data; and

comparison means for comparing said spectral domain data with said error data to obtain

corrected information data.

19. A method for decoding Reed-Solomon encoded data, comprising the steps of:

receiving said Reed-Solomon encoded data from a medium;

performing a finite field fast Fourier transform on said Reed-Solomon encoded data to

generate parallel spectral domain data; generating error data in response to said parallel spectral data domain data; and

comparing said spectral domain data with said error data to obtain corrected user

information data.