CA2310186A1

CA2310186A1 - Method and system for decoding

Info

Publication number: CA2310186A1
Application number: CA 2310186
Authority: CA
Inventors: Jeffrey P. Castura; Frank Kschischang
Original assignee: Jeffrey P. Castura; Frank Kschischang; Soma Networks, Inc.
Current assignee: Soma Networks Inc
Priority date: 2000-06-02
Filing date: 2000-06-02
Publication date: 2001-12-02

Abstract

Reduced complexity decoding algorithms for Low Density Parity Check codes are presented. The performance of these algorithms is optimized using the concept of density evolution and they are shown to perform well in practical decoding situations.
The codes are examined from a performance vs. complexity point of view, and it is shown that there is an optimal complexity for practical decoders beyond which performance will suffer.
The idea of practical decoding is used to develop the sum-transform-sum algorithm, which is very well suited for a fixed- point hardware implementation. The performance of this decoder approaches that of the sum-product decoder, but is much less complex.

Description

Method and System for Decoding Contents List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . ix 1 Introduction 1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 1 1.2 An Introduction to LDPC Codes . . . . . . . . . . . . . . . . . . . . . .
. . 3 1.3 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 5 1.4 Organization of this work . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 6 2 LDPC Code Fundamentals 2.1 Construction of LDPC Codes . . . . . . . . . . . . .
. . . . . . . . . . . . . 7 2.2 Properties of Low Density Parity Check Codes . . . . .
. . . . . . . . . . . . 9 2.2.1 Goodness properties . . . . . . . . . . . . . . .
. . . . . . . . . . . . 10 2.2.2 Distance Properties . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 10 2.2.3 Encoding . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 11 2.2.4 Decoding . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 12 2.3 Factor Graphs and Message-Passing Algorithms . . . . .
. . . . . . . . . . . 13 2.4 Density Evolution . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 19 2.4.1 Fixed Points . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 24 3 Optimization of LDPC Decoders 26 3.1 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 26 3.1.1 Optimization Approach . . . . . . . . . . . . . . . . . . . . . . . . .
. 28 3.2 Optimization of the Sum-Product Algorithm . . . . . . . . . . . . . . . .
. . 30 3.3 Optimizing the Min-Sure Algorithm . . . . . . . . . . . . . . . . . . . .
. . . 32 m 3.4 The Modified Min-Sum Algorithm . . . . . . . . . . . . . . . . . . . . . .
. . 36 3.4.1 Comments on Good Degree Sequences . . . . . . . . . . . . . . . . . 40 3.5 Finite Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~ .
. . . . . 41 3.6 Performance vs. C;omplexity . . . . . . . . . . . . . . . . . . . . . . .
. . . . 44 4 The Sum-Transform-Sum Algorithm 49 4.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . .
. _ . . 49 4.1.1 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 52 4.2 Optimization and Performance . . . . . . . . . . . . . . . . . . . . . . .
. . . 53 4.3 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 57 4.4 Implementation Architecture . . . . . . . . . . . . . . . . . . . . . . .
. . . . 58 Conclusions and Future Work 62 5.1 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . . .
. . . . 63 A The Forward-Backward Algorithm 65 v List of Figures 1.1 Factor graph representation of a LDPC code . . . . . . . . . . . . . . . .
. . 4 2.1 Message passing on a factor graph . . . . . . . . . . . . . . . . . . . .
. . . . 16 2.2 Generalized message passing at variable and check nodes . . . . . . . . .
. . 17 2.3 Tree representation of a section of figure 1.1 . . . . . . . . . . . . . .
. . . . 20 2.4 Fixed points in density evolution . . . . . . . . . . . . . . . . . . . .
. . . . 25 3.1Sum-Product algorithm performance . . . . . . . . .
. . . . . . . . . . . . . 31 3.2Sum-Product algorithm performance vs. iterations. . .
. . . . . . . . . . . . 32 3.3Check node output vs. input . . . . . . . . . . . .
. . . . . . . . . . . . . . . 33 3.4Optimized min-Burn performance . . . . . . . . . .
. . . . . . . . . . . . . . 37 3.5Optimized modified min-sum performance . . . . . .
. . . . . . . . . . . . . 38 3.6Min-Sum performance vs. block length . . . . . . .
. . . . . . . . . . . . . . 42 3.7Modified min-sure performance vs. block length . . .
. . . . . . . . . . . . . 42 3.8Performance vs. maximum iterations . . . . . . . .
. . . . . . . . . . . . . . 43 3.9Performance at high SNR, . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 43 3.10Performance vs. Complexity for BER of 10-3 . . . .
. . . . . . . . . . . . . 46 3.11Performance vs. Complexity for BER, of 10-5 . . . .
. . . . . . . . . . . . . 46 3.12Performance vs. Complexity for BER of 10-5, 25 . . .
iterations . . . . . . . 47 3.13Performance vs. Complexity for BER of 10-5, n=100,000. . .
. . . . . . . . 48 3.14Performance vs. Complexity for BER, of 10-5,n=1000. . .
. . . . . . . . . . 48 4.1 Bit allocation for the SXS algorithm . . . . . . . . . . . . . . . . . . .
. . . 52 4.2 SXS performance for regular 3-6 code length 10,000 . . . . . . . . . . . .
. . 54 vi 4.3 SXS performance for regular 3-6 code length 1000 . . . . . . . . . . . . .
. . 55 4.4 Comparison of SXS and sum-product performance . . . . . . . . . . . . . .
. 55 4.5 Performance of optimized SXS codes . . . . . . . . . . . . . . . . . . . .
. . 56 4.6 Performance-Complexity comparison of the SXS algorithm . . . . . . . . . .

4.7 Block diagram of SXS architecture . . . . . . . . . . . . . . . . . . . .
. . . 60 A.l Trellis diagram of check with sites, ~1, x2 and x3 . . . . . . . . . . . .
. . . . 66 A.2 FBA c~, ~3 computation . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 68 vu List of Tables 2.1 Possible binary message representations for iterative decoding algorithms . . 18 3.1 Optimized min-sum degree sequences . . . . . . . . . . . . . . . . . . . .
. . 35 3.2 Comparison between sum-product and min-sum algorithm thresholds . . . . 36 3.3 Optimized modified ruin-Burn degree sequences . . . . . . . . . . . . . .
. . . 39 4.1 Threshold performance for SXS codes . . . . . . . . . . . . . . . . . . .
. . . 53 4.2 Complexity of SXS algorithm . . . . . . . . . . . . . . . . . . . . . . .
. . . 58 viii List of Acronyms APP A Posteriori Probability AWGN Additive White Gaussian Noise BCJR Forward-Backward Algorithm (B.C.J.R. are the names of its inventor) BER Bit Error Rate BSC Binary Symmetric Channel CDMA Code Division Multiple Access FBA Forward-Backward Algorithm FEC Forward Error Correction GA Genetic Algorithm LDPC Low Density Parity Check LLR Log Likelihood Ratio LLDP Log Likelihood Difference Pair MAP Maximum A-Posteriori SNR Signal to Noise Ratio SXS Sum-Transform-Sum Algorithm ix Chapter 1 Introduction 1.1 History Communication systems of all types have to deal with the presence of noise over the com-munications channel, and it is the goal of system designers to mitigate the effect of noise in the system as efficiently as possible. To this end, many communications devices utilize some sort of forward error correction (FEC) code, which entails the addition of controlled redundancy to a transmitted signal in the hope that this structure will allow errors caused by the noisy channel to be detected and corrected.
Shannon put bounds on the signal-to-noise ratio (SNR) that is required to communicate reliably over an additive white Gaussian noise (AWGN') channel, and for a certain rate R of transmission, this Shannon limit is ~22~
Eb 22R __ 1 No > 2R , (l.l) where R. is defined as the ratio of information bits to transmitted symbols.
Traditionally, codes such as Hamming codes, BCH codes, and Reed-Solomon codes with algebraic decod-ing algorithms, and convolutional codes with trellis-based decoding algorithms have been employed to achieve gains in overall system performance. In recent years however, there has been a great deal of interest in iterative decoding schemes. Based on the idea of using very powerful and complex codes and utilizing sub-optimal decoding methods, they perform much better than any other class of code known. In fact, codes based on iterative decoding can perform very well at SNRs that are within a fraction of a decibel of the Shannon limit.
There are two common types of iteratively decoded codes: low density parity check codes and turbo codes. Low density parity check codes (LDPC codes) were first introduced by Gallager in his PhD thesis in 1960 ~6~, and the performance of these codes was better than most other codes known at the time. Unfortunately; it was too difficult to implement the decoding algorithm fox practical applications given the technology of the day. Indeed, Gallager was only able to simulate LDPC codes for short block lengths and relatively high probability ~f bit error on the IBM7090 computer he had available to hire.
Thus, due to the relative complexity, and the excitement at the time around convolutional codes, Gallager's LDPC codes remained relatively unmentioned for the next thirty years ~28~.
Bcrrou et al. ~2~, astounded the coding community in 1993 with their discovery of turbo codes, also known as parallel concatenated convolutional codes. The performance of these codes was shown to come to within less than 1dB of the Shannon limit when using iterative decoding. This remarkable improvement over the previously best known codes renewed interest in the search for other Shannon limit approaching codes and iterative decoding schemes.
MacKay was the first to show in 1996 that LDPC codes of long block length are very good and that their performance is close to, though not quite as good as, turbo codes ~14~. Since then, a great deal of inters>.st has been shown in these codes and a number of improvements and advancements have pushed their performance closer to the Shannon limit.
Some of these advancements include LDPC codes over GF(2"~) (4~, irregular LDPC codes (12~, and techniques to analyze the behaviour of different LDPC codes decoding algorithms (19, 18~.
Recently, results have been presented showing that some constructions of LDPC
codes actually outperform turbo-codes. For example, Spielman gave experimental results for a rate 1/2 irregular LDPC code that achieves a bit error probability of 3 x 10-5 for an Eb/No of 0.7 dB ~26~. R.ichardson and lJrbanke have also reported similar results ~18~. In fact, they have reported a construction for highly irregular LDPC codes of extremely long block length that achieves an asymptotically good performance to within 0.06dB of the Shannon limit for the AWGN channel. This is a remarkable result and an important milestone in the development of error control codes.

1.2 An Introduction to LDPC Codes LDPC codes, despite the long name, are simply binary linear block codes based on a sparse parity check matrix. A code rnay be defined as an LDPC code if the average weight of rows and columns in the parity check matrix is small. As linear block codes, they have well defined generator and parity check matrices G and H respectively, and like the case for other binary linear block codes, the parity check matrix is used to decode a noise-corrupted codeword.
As Shannon's Channel Coding Theorem shows, good codes can be found by randomly choosing a codebook, and using a very large block length (22~. These very good codes can achieve an arbitrarily low probability of bit error for any rate below capacity. Unfortunately, Shannon did not give any constructive and practical methods for encoding and decoding these codes, and this has been the problem that coding theorists have since attempted to resolve (3~.
Gallager developed LL>PC codes with the benefits of randomness while maintaining a practical encoding and decoding scheme. Ilis LDPC codes are created by randomly creating a large sparse parity check matrix H with uniform row and column weight, both of which are small to keep H sparse. By keeping the matrix sparse, the complexity of decoding is kept within reasonable limits.
In the realm of graph theory, this corresponds to a random bipartite graph that has a constant small number of edges terminating on both variable nodes and check nodes. This constraint of uniform variable and check node degree gives rise to what is known as regular LDPC codes, which is the original construction Gallager outlined.
It is a simple matter to convert a parity check matrix to a bipartite graph.
Each row in H is represented by a parity check node, and each column is represented by a variable node, and these two types of nodes arc placed across from each other on the graph.
For every element in the binary matrix H that is a 1, an edge is drawn connecting the corresponding check node (row) and variable node (column). Let dv be the degree of the variable nodes (weight of rows in H), d~ the degree of the check nodes (weight of columns in H), and n be the length of a codeword (number of columns in H). A regular LDPC code C then can be described as an f ra, ov, d,~} LDPC code.

Figure 1.1: Factor graph representation of a LDPC code It is also possible to create irregular LDPC codes. These codes differ from regular codes by the fact that the weight of the columns and rows in H are not equal for all columns or rows. For example, some columns may have weight 3, while others may have weight 4 or 5. The same holds true for the weights of the rows. Well chosen irregular codes have been shown to perform better than their regular counterparts ~14~. The degrees that arise in the graph can be described by the degree sequence polynomials p(x) and ~(x) respectively, where ~(~) - ~ ~7,x.~__i and p(:r) = ~ p~.xz i (1.2) in which ~i (resp.pi) represents the fraction of edges which are incident on a variable (resp.
check) node of degree i. It follows that ~(1) = p(1) = 1. These polynomials sufficiently describe an erase~rrsble of codes that rnay be generated from a given ~(x) and p(~); they do not describe a single construction. Note that throughout the remainder of this paper, the above polynomials are used in place of their integer counterparts d." and d,~
when discussing irregular LDPC codes.
Example: For the graph in Figure 1.1, ~(~) = 3 (2:r + :c2) and p(:c) = 9 (x -I- 6x2 + 2:c3) (1.3) A lower bound on the rate R of an LDPC code can be determined from the degree sequence polynomials that describe it. Let a be the number of edges in the graph. The number of variable nodes of degree i is ~-. Therefore the total number of variable nodes n is the summation over all degrees and is thus i n - ~ e~2 - a ~ a(x)dx. (1.4) i ,«
Similarly the total number of check nodes is i r - ~ ePi - a ~ P(x)d~. (1.5) i i Since the dimension of the code is at least k > n - r with strict; inequality if some of the r check nodes are linearly dependent, we find R - k > 1 - _r - 1 - fo P(~)d~. (1.6) n n fo ,~(x)dx 1.3 Generalizations There are many areas of interest to researchers studying LDPC; codes. Since LDPC codes are linear block codes, it is possible to define them over a larger field, GF(q) for q > 2.
LDPC codes defined over larger fields tend to perform better than similar codes taken over GF(2). In fact, as q increases, the performance improves. MacKay showed in ~4~
a regular LDPC code defined over GF(16) with block length on the order of 24k bits that performs better by about 0.6dB than a comparable code over GF(2). The same trend is seen when dealing with irregular LDPC codes. lVlacKay's results show an improvement of about 0.5dB
when taking a code over C~F(8) instead of GF(2).
Decoding complexity is an issue with larger fields. The decoding complexity for the sum-product algorithm as used by MacKay for his tests scales as q 2, so decoding over GF(8) instead of GF(2) results in a complexity increase by a factor of 16. There is work being done by MacKay and others in an attempt to find simpler methods to decode over larger fields in an optimal or near-optimal manner ~4~. If the complexity can be reduced, even in a sub-optimal decoding scheme, the resulting performance improvement due to the larger fields may be able to offset the loss in performance in a sub-optimal decoder for close to the same amount of computational complexity.
One other area of interest for LDPC codes is their use in high rate or short block length applications, and this too has been exa,rnined by MacKay and Davey ~15~. Their paper shows the performance of a rate 8/9 code of short block length n = -1998 bits to be better than either comparable BCH codes or Reed-Solomon codes using hard-decision decoding by more than ldB.
Despite their relatively poor performance with short block lengths, LDPC codes have been proposed as an FEC scheme for CDMA applications. Work on short block length codes by Sorokine (23~ showed a class of irregular LDPC codes of block lengths on the order of 1000 and less to be particularly effective in fading channels. These codes appear to be well suited for carrying voice information on cellular networks, and allow for a greater user capacity per cell.
1.4 Organization of this work The aim of this work is to investigate reduced complexity decoding methods for LDPC codes.
Chapter 2 introduces the concepts and principles of LDPC codes, and outlines some of the algorithms that may be used for decoding. Density evolution (19J, a technique which allows a decoding algorithm's performance to be determined numerically and aids in an overall understanding of the iter;~tive decoding process, is also described in Chapter 2. Density evolution is applied in Chapter 3 to optimize reduced-complexity LDPC
decoders. fade-offs between performance and complexity are examined and presented. In Chapter 4, an alternate decoding approach is developed that exploits the duality of the LDPC factor graph and is well suited to fixed-point applications. Performance and complexity issues are treated, and possible implementation issues are discussed. Finally, Chapter 5 provides some conclusions and suggestions for further research.

Chapter 2 LDPC Code Fundamentals The defining feature of LDPC codes is their low weight parity check matrices.
It is important to understand the properties that this gives LDPC codes, and we will examine some here.
2.1 Construction of LDPC Codes There are numerous methods for generating parity-check matrices that satisfy LDPC con-straints, and a few commonly used techniques will be outlined here. While either a structured or a pseudo-random approach may be used for construction, it has been found that a higher degree of randomness generally results in codes that perform better than those that are highly structured.
Gallager originally proposed a method of partitioning the parity check matrix H into blocks and performing random permutation operations. Set the first row of H to d~ ones followed by (n - r~~) zeroes ~6~. Create the next (n/d~) - 1 rows (assuming this is an integer) by shifting' the previous row by d~ positions. This sub-section of the matrix is copied and randomly permuted a number of tunes to obtain the desired dimension for H. An example of this construction is given in (2.I). This construction generates regular codes, and is sure to have low probability of repeated rows. However, since the sum of the rows of the top block and the rows of any other block is zero, this method results in linearly dependent rows and as such, the rate of the code is higher than might first be assumed.

Top Block Permutation H 0 0 1 1 0 0 0 1 0 1 0 (2.1) = 0 of Top Block Another Permutation of Top Block An alternate method was presented by MacKay that allows a nearly uniform parity check matrix to be generated (16~, with no strict guarantee of uniformity. The construction operates by assigning a fixed number of ones, j, to each column of the parity check matrix.
A bit in a particular column is assigned to a row with uniform probability, with no regard to whether that particular location is already occupied by a one. It is possible then to have a column with lower weight than j, though it is not possible to have a column with greater weight. resulting in an irregular code. MacKay has also suggested that any rows which share rrmltiple ones in the same location should be avoided. This prevents short cycles from appearing in the graph, which has been shown to reduce the performance of LDPC
codes.
A more general method of LDPC parity check matrix construction is simply to assign one's randomly with low probability to a matrix of the desired dimensions.
This will almost surely result in an irregular code, with average row and column weights that can easily be fixed. For example, the matrix H = (2.2) was obtained in this way. Care must be taken to ensure that there are no identical rows, and that linear dependence among the rows is kept relatively small.
Although the matrix in (2.2) is not likely to be considered sparse, it is a good demonstra-tion of the generation of an irregular LDPC; parity check matrix. The polynomials describing the distribution of the degree's of the check and variable nodes are given in (1.3), and the bipartite graph representing this code is shown in Figure 1.1.
MacKay has also outlined a modification to his construction technique that allows the resulting LDPC code to be encoded much more efficiently, based on the work of Spielrnan ~16, 13, 24J. This will be discussed in detail in Section 2.2.3.
2.2 Properties of Low Density Parity Check Codes It is useful to first outline some terminology. In defining the ability of a family of codes to achieve capacity, at least asymptotically, the terms very good, good and bad are used by MacKay to describe this attribute and are adopted here (14J. Very good codes are those that can asymptotically achieve an arbitrarily low probability of bit error for any rate below capacity using some decoding algorithm (e.g., maximum likelihood decoding).
Good codes are codes that can achieve this same performance, but only up to some rate which is bounded away from the channel capacity. A bad code is any code that has some negligible probability of bit error only for a vanishingly small rate. It is also useful to define a practical code as one which can be implemented effectively with limited computational complexity, i.e., encoded and decoded in a useful amount of time, depending on the application.

2.2.1 Goodness properties In Gallager's original paper on the topic, he proves that LDPC codes are good with a simple decoder (6~. This is based on a binary symmetric channel (BSC), and uses a bit-flipping iterative decoding algorithm he devised which will be discussed in Section 2.2.4. The LDPC
code is limited to achieving arbitrarily low probability of bit error only for channel errors occurring with a frequency less than that which corresponds to capacity of the BSC.
MacKay shows in (14~ that not only are these codes good, but they are very good when certain criteria are met using a sub-optimal typical set decoder. First define a satisfactory matrix as:
A satisfactory (M, ~, t, E) matrix A for the distribution P(x) is a matrix having M rows and L > .~M columns with weight t or less per column, with the following property: if x is generated from P(x), the optimal decoder achieves a probability of bit error less then E. (14~
MacKay's theorem stages that given some distribution P(x) of the probabilities of inputs to the decoder with mean entropy H(X) < 1 bits, and a desired code rate ~ <
1/H(X), there exists some integer t(H(X), R) > 3 with the following implication: There exists some row dimension of H above which there is a satisfactory (M, ~, t, E) matrix A
that can achieve a block error probability of c > 0. In this case, t represents the column weight of H, or dv.
Thus as ~Iv is allowed to grow large, capacity may he achieved. Unfortunately, the complexity of most LDPC decoding algorithms grows as dv gets large.
2.2.2 Distance Properties Regular LDPC codes have the strict constraint on their parity check matrix that each row have some fixed weight a,nd each column also have some fixed weight. With just these constraints, Gallager was able to give bounds on the performance of this ensemble of random codes, bound the distance functions, and provide a powerful yet, simple decoding technique, which can he generalized to belief' propagation.
The LDPC codes Gallager examined exhibit good distance properties for dv > 2, and he showed that the minimum distance of these codes grows linearly with block length n for most codes in the ensemble. MacKay showed in ~14~ that the Gilbert-Varshamov minimum distance bound can be obtained as dv -~ co. For a parity check matrix H with parameters dv and d~ as defined above, the Shannon limiting bound on entropy for the codeword x is H(x) < n/m where m is the size of the information block. It can then be said that for dv > 3 and a fraction d < 0.5, then some H can be found such that the minimum distance of the code is at least cSn.
2.2.3 Encoding One of the obstacles preventing LDPC codes from being widely adopted in practical appli-cations is the complexity involved in the encoding process which can be many times greater than other codes such as c:onvolutional codes or turbo codes. A great deal of attention has been paid to the performance of LDPC codes and efficient optimal decoding methods, but the encoding problem has often been overlooked. Using standard matrix multiplication to generate codewords c frorrr a generator matrix and message word m, i.e. c =
mG, encoding complexity increases to the second power of the block length n, where one would hope that there would be a linear relationship between encoding complexity with block length.
Spielman was one of the first to examine how sparse matrix techniques may be used to efficiently encode LDPC codes ~24~. His codes, which are a subset of codes called c~pander cnde,s, are linear time encodable. These codes however do not perform as well as other LDPC
codes and are not good codes in the technical sense.
MacKay took the work of Spielman in ~25~ and developed a nearly lower-triangle reduced matrix form for H that is almost linear time encodable as mentioned above in section 2.1 ~16, 15~. The structure of H is lower-triangular except for the last M~ rows, which allow all but these last M~ rows to be encoded using sparse matrix techniques, which is a linear time effort. It is not possible to express an LDPC code fully in lower-triangular form as this would violate the LDPC constraints, but it can be made very close. In fact, if the total number of rows in H is M, and if' M< < ~ then MacKay has suggested that the total code can be linear time encodable, thougkr he has not expressly proven that this is possible.

2.2.4 Decoding An LDPC decoder employs iterative decoding, and its goal is to compute the a posteriors probability (APP) of each of the received symbols, taking the symbol with maximum proba-bility as the decision. There are many ways this can be done, one of which is the svm-product algorithm, a practical decoding algorithm. This approach will compute the exact APP for each symbol in a cycle-free graph, and in the case of a graph with cycles, will generally compute close approximations of the APP though this behaviour is not clearly understood.
We will examine a basic instantiation of the MAP decoder to explore its properties.
Gallager devised an algorithm that is similar to sum-product, decoding and developed an analysis of iterative decoding for the binary symmetric channel (BSC) using this algorithm.
This attempt at a clear analysis of iterative decoding is the first step towards a deeper understanding of this decoding approach. This is the first step towards the concept of density evol~atiora, a powerful analytical technique for iterative decoding.
A basic bit-flipping algorithm known as Gallager's Decoding algorithm A ~6J is described here, and other algorithms will be examined in Chapter 3. The procedure is a graph based iterative decoding method using message passing as described in Section 2.3, and assumes a BSC. Bits are read off' the binary channel, and are loaded into the variable nodes of the graph. These bits are the messages which are passed along the edges of the graph and are used for decoding. Messages are successively passed from variable nodes to check nodes, and from check nodes to variables, with new messages being computed at the nodes.
At a check node, parity is computed as follows. Each edge incident on a given check node computes and returns along that edge, parity using modulo-2 addition of all the bits incident on the check, excluding the edge for which parity is being computed.
These results arc passed back along the edges to the variable bits. At a variable node, the value of the bit sent out along an edge is flipped only if al,l of the other edges incident on that node return equal and opposite values from the current; value. So a 1 along an edge will be flipped to a 0 only if all of the other edges incident on that node return the value 0. The procedure then repeats until some termination criterion is satisfied.
Given that the received bits are 1's and 0's, let pi be the probability that a bit is in error after the i'h round of decoding. For some bit in error, a parity check involved with that bit will be incorrect only if an even number of other bits in the check is in error. An error will be corrected only if all of the checks participating with that bit are not satisfied, and this occurs with probability:
~1 + (1 - 2pi-1)d°-ll dv (2.3) Similarly, it is possible for a correct bit to be mistakingly flipped to an incorrect state, and that probability is:
L1 - (1 2pi-1)d<-l~dv (2.4) Z
Applying these results, a recursive relation can be found for the next round's overall proba-bility of bit error:
L1 + (1 - 2pi-a)'~' 1~ ~" L1 - (1 - 2pi-~)d' ll d" ~
pi = po - po 2 + (1 - po) 2 (2.5) This result is an important step towards an understanding of iterative decoding. Gallager showed that there exists some maximum value po of po for which pi goes to zero as i gets large for all po less than po. In other words, if the initial probability of error created by the channel is below some threshold, there is a very high probability for the received word to be properly decoded. This shows that even for the simple decoding algorithm that Gallager used, LDPC codes can achieve an arbitrarily low probability of bit error for some rate that is bounded away from capacity.
Clearly, there are more powerful ways to decode LDPC codes. Gallager's Algorithm A
does not use soft-decisions, and assumes a BSC channel. We shall now explore message passing on graphs to see hove more powerful algorithms may be exploited.
2.3 Factor Graphs and Message-Passing Algorithms Factor graphs are bipartite graphs that describe how a function of many variables is com-prised of its constituent factors ~10~. Figure 1.1, showing the random LDPC
code construction in a graphical form, is an example of a factor graph. In graphical notation, variables are shown as circles, and function nodes are squares, sometimes with an operator in them to describe the function that is being performed at that node. For example, the factor graphs for LDPC codes often have a '+' sign representing the constraint that neighbouring variables have zero sum modulo-2.
We can use this representation to provide a framework for algorithm execution, and in this case we are interested in investigating decoding algorithms. To assist in understanding message passing, an example for the (7, 4) Hamming code will be investigated.
A parity check matrix for the Hamming code is given by H = 0 1 1 0 (l 1 1 (2.6) As a linear block code, we are interested in finding the a-posteriori probabilities (APP) for each received symbol f (=Lei ~ Y) - ~ f (~1~ ~2~ ~~3~ =t4~ ~5~ ~6~ -~7 ~ Y) ('Z.7) x:x~x;
where i is the ith digit in the code and the function ,f (~) represents the conditional ,joint probability mass of (xl, ~ ~ ~ , x7) given the received channel output y. The goal is to calculate each of these probabilities based on the constraints of the code. In order to properly see this, the first digit xl will be examined, noting that the analysis proceeds similarly for all other bits.
We know from Bayes' rule that the probability of having sent a codeword x upon receiving the word y from the noisy channel is f (x ~ Y) _ .f (Y f (Y) (x) (2.8) where ,f (y) can be viewed as a constant since y is known upon reception of the word, and p(x) depends on the code used. If we assume independence of the channel symbols, i.e. a memoryless channel, and assuming that codewords are selected with uniform probability, then we can factor (2.8) into .f~(x ~ Y) = K(Y)~x E C.] ~ f (yi ~ ~i) (2.9) i where K is a constant factor that depends only on y, and the brackets ~~ ~ ~~
follow Iverson's conventions. The above probability can be further factored into each of the constraints of the code, f (x I Y) = K yxs E C~ ~ f (yi I T~ ) (2.10) For the example of the Ha,mrning code, this factorization can be expressed as f (x I Y) = K~x4 ~ x,, ~ ~s ~ x7 = 0~ ~~z ~ x3 ~ ~s ~ x7 = 0~ ~W ~ ~s ~ xs ~
x7 = 0~ ~ f (y~ I xi) (2.11) where the three Boolean propositions are derived from the three rows of (2.6) representing the constraints of the code. Since we are interested in probabilities for each symbol, we compute marginals for xi, which for the case of ~1 is .f (xi I Y) _ .f (xl I W ) ~ .f (x7 I y7) ~ .f (mss I y~) ~ .f (mss I CJs) ~xi ~ ~s ~ x~ ~ x~ = 0]
.27 f~(~'s I Js) ~ f~(TZ I ?J2)(~rz ~ ra E~ ~s ~ T7 = O~ ~ f (r4 I J4)~r4 ~ rs ~
Ts ~ x; = 0 y6 Z'Z xq (2.12) We see then that the problem of computing the marginals has been factored into many smaller prOblem6, and thlS is natllrally eXprPSSPd thrOllgh the llse Of faCtOr graphs and a mPSSage passing algorithm that passes these rnarginals as messages. Unfortunately this factor graph has cycles, so the marginals computed are not exact. This example is used however, since virtually all codes have factor graph representations with cycles.
The goal of MAP decoding is to find the maximal a posteriori symbols of a noise corrupted codeword. In general, this can Be factored into the problem of finding the most likely symbol x~ given that the symbol y~ has been received and that the transmitted word x is subject to the constraints of the code. The sum-product algorithm applied to the factor graph for the code of interest calculates thc: most likely transmitted symbols in this manner.
Message passing on a factor graph describes how the calculated probabilities on the graph will behave over time given a certain set of initial conditions. There are numerous algorithms ~Iverson's convention takes a Boolean proposition and returns a'1' if that proposition is true, otherwise life a '0' is returned. For example, given a proposition P, ~P~ _ 0 otherwise ~t3-,~Qca) ~.~~~fl~
af~
X, f3 ,f2 f2 ~ xfi~ Ja f3 ~ f3 (~~) l~) l=) ld) Figure 2.1: (a)Factor graph representation of a (7,4) Hamming Code. (b) Initial message passing to a check node. (c) Return message from a check node. (d) Return message from a variable node.
that can be used to perform message passing, but in particular the sum-product algorithm will be described as it can be used to decode LDPC codes quite powerfully in many cases.
The BCJR algorithm is actually a specific implementation of the sum-product algorithm, and belief propagation and turbo-decoding also turn out to be v<~riations of the sum-product algorithm. These types of message passing are iterative decoding algorithms.
The message passing algorithm commences by initializing each of the variable nodes with the corresponding received symbol probability. Some message passing schedule is used which sequentially passes all the information from each node to each adjacent node. There are many schedules which can be used, and it is up to the decoder designer to determine a suitable one. For instance, a sequential schedule in which each message is updated in turn such that no message is updated twice before any other message is updated may be used.
This schedule is used in all trials performed in this work. The effect of varying message schedule may have some impact on the performance of the decoder, but the schedule chosen was found to result in good performance.
Messages are represented by the symbol ~, and represent some measure of the reliability of the information being passed. The rule determining the update of the information at each wz-.r(y) wr-~e(a) f wh-~=(~) ~v~r(y) ..+......, y. ...:...... ........ .......
.....,~LO.7(),~~f),....................~ w..
Figure 2.2: Generalized message passing at variable and check nodes node is:
Sum-Product Update Rule (10~: The message sent from a node v on an edge a is the product of the local function at v (or the unit funcaion if 'v is a variable node) with all messages received at v on edges other than e, summarized for the variable associated with e.
The summarization operator refers to the computation of a marginal, meaning that the value is all possible configurations for all values excluding the one which is of interest. Figure 2.2 provides more detail. Within each node, it is assumed that calculations are performed using an efficient method, so in all cases the forward-backward algorithm is used ~11~, as described in Appendix B.
There are many probability metrics that may be used for messages, and a few are outlined in Table 2.1. Depending on what what domains are chosen for representing messages, the form that the sum-product update rules take may vary drastically. Two representations of particular interest are the log likelihood ratio (LhR) and the log likelihood difference pair (LLDP). Using log likelihood ratios, we can write the variable node update rule as ~~i f(x) - ~~n, + ~ ~n-~x(x) (2.13) lxE~a(x~~ f where ~c is the variable node of interest, f is the check node whose edge is attached to x along which we are sending the output. The value p,yh is the message from the channel corresponding to the variable node x being updated, and n(~) is the set of edges connecting Name Representation Domain (E
(p,) IE8) Basic p~0~, p(1~ (0, 1~
= 1 - p(0~

Likelihood Ratio p~ ~ ~0, oo~

Likelihood Difference (p~0~ - p~l~) (-l,1~

Log-likelihood Ratio(LLR) log ( P~) (-oo, oo~

Log-likelihood Difference (sgn (p~0~ p~l~ ({~1}, (0, pair(LLDP) - p~l~) , ~ oo~) log ~p~0~ ~) -Table 2.1: Possible binary message representations for iterative decoding algorithms to x. The check node update equation using log-likelihood ratios is (e~y-~r(~) + 1) + ~ (er~y-~s(~) _ 1) (x) = to yE~(f)\~ yE~(f)\~ 2.14 ~f-»: g ~ (e,~~~f(y) + 1) _ ~ (ei.~~-~f(z) _ 1) ( ) yEm(f)\~ yEra(f)\~;
where d~ is the number of edges connected to ,f , the check node being updated. This rule is actually quite complicated, and is one of the drawbacks of the sum-product algorithm with this representation. It is interesting to note that the update rule for the check nodes using the log-likelihood difference pair is very similar to (2.13), and may be written as u~-r f(s, r) _ ~ ~h-j~(s, r) (2.15) hE~(~)\f where s E W2 - ~1, r E IEg+ and the addition of the signs in {~1{ is real multiplication.
Later, we will exploit this representation in an attempt to reduce the decoding complexity.
Thus, returning to the example of the (7, 4) Hamming code, we can follow the algorithm through. As mentioned previously, the seven variable nodes are initialized to be the corre-sponding probabilities representing the likelihood of the received bits being either a 0 or 1.
Each of the seven variable nodes sends its likelihood message along each edge to the corre-sponding check nodes. The message that is returned to each of the variables is the result of the check node function with all message inputs excluding the input message along the edge that the output message is passed. Figure 2.3 (c) depicts the return message to x~ based on the inputs of xs, x fi, and x7 to the parity check node. Each variable node is updated following the same rule, such that the output along an edge is dependent on all of the inputs to the node excluding the one along the output edge.
At this stage, one full iteration through the graph has been completed and all of the variable nodes have an updated estimate as to their probability mass function.
This itera-tive procedure continues until some termination criterion is met. This could possibly be a convergence measure on the variable node output messages. a threshold on the maximum number of iterations, or a check node 'satisfaction'. Once the algorithm is complete, a de-cision is made based on each bit's probability mass whether it is more likely a 0 or a 1 and this hard decision is output as the decoded word. It is also possible to output soft decisions, meaning that the reliability measure is also output along with the decision.
This approach provides a relatively straightforward method to decode LDPC
codes. One problem that has not, been mentioned is the case of cycles in graphs, which are unavoidable in LDPC codes and most; other codes of finite length. For the cycle free case, the sum-product algorithm exactly computes the desired factors, but this is not the case for graphs with cycles. Despite this drawback, the sum-product, algorithm still works extremely well on graphs with cycles, as the APP estimates are 'good enough'. Many different simulations have shown that for codes without small cycles (of length< 2), the sum-product algorithm works very well.
2.4 Density Evolution Since the rediscovery of LDPC codes in the mid 1990's, a great deal of work has been done to further develop the performance and understanding of LDPC codes and iterative decoding schemes in general. A first extension of Gallager's analysis for decoding outlined in Section 2.2.4 was presented by Luby, Spielman, et al., who used the same bit-flipping algorithm applied to irregular LDPC codes ~12~.
In late 1998, Richardson and Urbanke applied t;he bit-flipping algorithm to a number of channels including the Gaussian channel, and found the maximum allowable parameters for these channels, verifying C''=allager's results and presenting new ones (19~.
They were also able to successfully analyze the capacity of LDPC codes using soft-message-passing algorithms, Depth : l + 1 Depth : l Figure 2.3: Tree representation of a section of figure 1.1 one of which is the sum-product algorithm described in Section 2.3. Their approach will be detailed here, though expanded to include irregular codes.
First it is useful to describe how a factor graph can be unvolved. Message passing algo-rithms work on graphs and it is insightful to be able to view how the messages move through the graph as a function of time. Time in this case is assumed to be some discrete counter that advances at a rate that is a function of the message passing algorithm.
For example one iteration of a decoding algorithm may be considered to be one time step.
It is possible to move backward in time from a certain node n in the graph and recursively view all of the nodes and edges which have contributed to the message being output from n.
If we draw the node n and all of the connected edges and nodes which participate with node n back t time steps, we have an unvolved tree as in Figure 2.3. This is an unvolved graph of depth 2. The resulting graph is a tree or is tree-like, if' and only if all of the nodes in the tree are distinct. This is not. the case in Figure 2.3 as the variable x5 is repeated, but it serves its purpose as an example. A message passing algorithm used on the graph sends messages up the tree, and we are able to understand how the distribution of messages and each level l, in the tree affects the message output at depth l + 1. Thus we see that this represents the time evolution of messages on the graph. We would like to find out how the distribution of the messages evolve as they proceed up the tree in order to understand how iterative decoding methods work. First, some assumptions need to be made about the system.
The all zeros' codeword is to be the transmitted codeword. If' symmetry conditions of the channel and both variable and check nodes are satisfied, then the probability of a decoding error is independent of the transmitted codeword, validating the use of this word and is proved in ~19~. Th.c all zeros' codcword also has the advantage of having a simple distribution of bits, namely a delta function, and this will assist in the simplicity of the analysis.
The symmetry conditions that are needed for the use of the a,ll zeroes word are symmetry in the channel, and symmetry at both cheek and variable node update functions.
Symmetry in the channel implies zero-mean noise: for the AWGN or an equal likelihood of flipping a bit on the BSC. Symmetry at the nodes means that given a set of input message reliabilities, i.e. message magnitudes, to the function, the output reliability should be independent of the signs of the inputs. What these conditions impose is a prevention from the possibility of distributions erroneously drifting.
We will assume that the channel is memoryless and all received symbols are i.
i. d.. Each of these symbols is converted to a probability metric and is used to initialize the 0th level of the tree. The distribution of these probabilities c:an be denoted as Po.
Working under the assumption that the code is tree-like to a depth l, where l, is the decoding round, we can develop an exact analysis of the densities at the lth level of the tree.
Let the messages sent from variable nodes to check nodes be represented by LLR,s, and let the messages sent from check nodes to variable nodes be represented by LLDPs. Let P~
be the probability distribution (density) of messages sent from the set of all variable nodes, and let R.~ be the density of messages passed from the set of all check nodes, both at round l in the decoding process (or depth l on the tree). Also, define ~(.f ) _ ~ ~~f~(' 1) and p(f~) _ ~ Pzf ~~2 11 (2.16) where f is an input distribution, and the notation f ~'~ represents a repeated convolution of the distribution f , :r times. Both ~~ and p2 represent the same degree sequence statistics as in (1.2). Using the LLR, representation as messages, we have an expression for the output of a variable node given in (2.13). The density of these messages for all variable nodes is the convolution of all the input densities participating. This can be written as Pt = Po ~ ~(R~_1) (2.17) where ~ denotes convolution. Similarly, representing the messages being passed to a check node as an LLDP, (2.15) can be used to determine the output message. Again, the density of these messages is the convolution of all of the participating input densities, and can be written R~-a = P(~'c-1) (2.18) In order to develop a relationship between the densities from one round of decoding to the next, we require a transformation function which can convert densities from the LLR
domain to the LLDP domain and back, which are derived in 181 and given here.
To change a probability distribution from either representation to the other one rnay apply the following change of variable y -~ In coth g/2 dg > 0 (2.19) which, for probability distributions, may be written as the functional f~.fl (y) - .f (ln coth ?~/2) ~g ~ ~ (2.20) sinh(y) -This operator maps distributions over the reals, f (y) to distributions over the reals:
7 : ~8~ -> I<8'~ (2.21) Note that In coth y/2 In Goth 2 = y (2.22) so it follows that ~~y~~y f ll (;ey) = f (g). Thus we have a method of converting positive distribu-tions of probabilities from one domain to the other.
For some density g, let g = y+ + g--, where g+ and g- represent the positive and negative portions of the density respectively defined by 9+(~) = 9(~)~ ~ < 0 9+(x) = 9(x)~~ > 0. (2.23) Also define a reflection in the y-axis operator, R, defined as ~Ry-l (:z) = g-(-x). Finally define the change of variable operator that takes as it's input an LLR
distribution over II8 and returns an LLDP distribution over W2 x R+
h : ll8'~ -~ llgu'2 "~+ (2.24) where WZ is the set ~~1~. This operator can then be defined as frf~(5~ r) _ (I, f~f+1(r» + (-I, L7f~f-1~(r)) (2.25) and it's inverse, which takes an LLDP input distribution and returns the equivalent LLR
distribution is defined as ~r lf~(~) _ ~'Yf+~(x) + ~~~'Yf ~~(x) (2.26) Given the previous equations and operators, we can combine them to produce the heart of Richardson and Urbanke's work, their theorem of density evolution on continuous message passing algorithms. If the distribution of messages from check nodes is initialized to erasures Ro = b(0), i.e. p(0~ = p~l~ for all messages, then we can write the theorem of density evolution as:
Density Evolution ~18~: Let Po denote the initial message distrib~ction, under the assumption that the all zeros codeword was transmitted, of an LDPC code .specified bg BdgP degree seqv,ences a(.x,) aid p(.x,). If R~ i.s th,e distribution of messages passed from check nodes to message nodes at round, l, of sum-product decoding, then Rt = 1, lp(h(Po ~ ~(R~_i))) (2.27) This powerful result provides a method to find good LDPC codes that perform very well without the need of simulating codes on a trial and error basis. Degree sequence polyno-mials p(~) and ~(~) can be found which maximize a particular channel parameter, such as the threshold SNR, thus providing near-optimal performance. Since the all zeros' codeword is sent, we know that the algorithm has converged to the correct codeword when the den-sity approaches a delta function where p~0~ = 1, which is at positive infinity for the LLR
representation.
It should be noted that we cannot actually converge to a delta function at infinity, but it can be approached. Thus, the resulting LDPC codes used with the optimized degree sequences may not asymptotically perform as well as expected since the probability of bit error is not zero. It is vanishingly small however, and empirical evidence supports the validity of asymptotic performance.
Hence, we can try to find degree sequences using some directed search that maximizes Po for which the density evolution converges to the all zero codeword distribution, and this maximum is labelled as Po . This threshold is the supremum of all channel parameters for which the probability of error approaches zero for some number of iterations on the graph.
For example, in the AWGN the noise variance ~ can be maximized for which the BER, of the decoder is vanishingly small.
One beneficial feature that stems from this form of analysis is that an additional con-straint on the number of iterations used to decode can be enforced. Since level l of the tree corresponds to the lth decoding iteration, we can attempt to find degree sequence polynomi-als for which P~ is maximized within a constrained number of iterations. This is very useful when searching for good codes that are also practical.
Using these density evolution methods described, Richardson and Urbanke were able to develop what they term as highly irregular LDPC codes that asymptotically perform to within 0.06 dB of the Shannon limit ~18~. This is a significant increase over the previous best known performing code. In the subsequent chapters, we make use of this approach to optimize LDPC codes for reduced complexity decoding algorithms.
2.4.1 Fixed Points It is useful to examine how the the distribution of messages passed during the decoding pro-cess evolves over time. To this end we define a statistic PlE which represents the proportion of messages being passed at round L that, if a hard decision was made as to their sign, are in error.
Using this statistic, the time evolution of errors through the message passing algorithm may be plotted, and two examples are given in Figure 2.4. The curve to the right represents how the distribution of messages approaches the correct codeword (P~ = 0) as the decoding round gets very large, in this case about 1600. The inner curve gives the same evolution for density evolution constrained to maximum decoding iterations of less than 200.
Note that 0.14 0.12 0.1 0 0.08 0 0.06 s_.
s~
0.04 0.02 1 Y 1 I I 1 1 I \ I

Iteration Figure 2.4: Time evolution of hard-decision errors in message-passing. The outer curve represents evolution of a code optimized with no iteration constraints, the inner curve code was optimized with a constraint to a maximum of 200 iterations.
the curves are monotonically decreasing but their slope fluctuates. The regions where the slope approaches 0 are called fired points, and are locally near-stable regions of the graph. At these fixed points, the incremental improvement in PE is very small such that for realizable codes, i.e., codes of finite length with cycles, it is possible that the evolution will become 'stuck' at these points.
In fact, this is the most common cause of decoder failure. Recall that density evolution allows us to find highly optimized LDPC codes and these codes are very sensitive to any deviations in their structure. Codes of shorter block length tend to deviate further from the optimal degree sequences due to finite effects. These deviations result in the possibility of the slope at a fixed point to actually reach 0 depending on the actual received probabilities, such that the code will not improve any further, and a decoding error results.
Techniques to mitigate this effect are discussed in ~18~.

Chapter 3 Optimization of LDPC Decoders In Chapter 2, we developed the concepts of density evolution, and presented a relationship describing the evolution of densities over time though the unvolved tree. We will expand on these ideas in this chapter and develop a similar approach to analyzing decoding algorithms that have a reduced complexity compared to the sum-product algorithm.
The ability to find the threshold performance for a LDPC code leads to the problem of how one can find good degree sequence polynomials that maximize these thresholds. This becomes a non-linear optimization problem with a huge parameter space. We can utilize a powerful, global optimization technique called ,genetic cclgorithrns to provide the means to solve this problem.
3.1 Genetic Algorithms A genetic algorithm (GA) is a global optimization technique which takes the concepts of biological evolution and Darwinian survival and applies them to finding optimal solutions to some generally hard problem. The basic operation of a GA is conceptually simple: One creates a population of solutions to some problem, selects and combines the better solutions in some manner, and uses their recombination to replace poorer solutions. The combination of selection pressure and innovation generally leads to improved solutions, often approaching the globally optimal solution.

The problem to be solved must be expressed through artificial chrorr~osomes, each of which represents one possible solution to the problem. A chromosome may be a string of bits, a list of parameters, or a more complex representation, such as a tree of procedures, but it must represent some embodiment of a solution to the problem. For example, suppose the problem is to find the point furthest away from the origin that is within the bounds of some closed region of a 2-dimensional space. One simple method to encode the problem is to use a chromosome than contains two real numbers (~, y) representing a location on the 2-D surface.
The GA requires a method to evaluate how good or bad a particular solution, or chromo-some, is. This is usually some function that takes as its inputs the chromosome, and outputs a fitness measure, and the fitness is what t;he GA uses to score the potential solution. Using our previous example, the GA's fitness function may take the chromosome, which is an (.r,, ~) coordinate, measure its Euclidean distance from the origin, and determine whether or not is is within the bound of the closed function in question. If it is, it may return a fitness equal to the distance from the origin, otherwise it may return some non-valid measure, such as -1.
Having encoded the problem onto a chromosome, and having an evaluation procedure, the GA will evolve solutions to the problem by creating a population of chromosomes. A
population is generally randomly created, and for each individual in the population, its fitness is determined. The GA will then select those individuals with the highest fitness, and perform some genetic operations on them to create a new generation of individuals for the population, which replace the individuals in the population with generally the worst fitness.
The effect, hopefully, is an overall improvement in fitness.
Genetic operators include selection, recombination, and mv,tatiora, as well as other less common ones. Selection is the basic idea that a GA will prefer better solutions to worse solu-tions. This is Darwin's 'survival of the fittest' in action. There are many ways selection can be accomplished, from choosing the individuals with the highest fitness, to random chance.
Recombination combines various parts of the chromosomes from two of more parents and produces a new chromosome child that contains traits from its parents, just as in biological systems. Mutation makes some sort; of alteration to a chromosome, affecting some change in the trait of that chromosome, such that its fitness may be altered.

Using these operators, the GA repeatedly replaces members of the population and over a number of generations, the average fitness of the population will hopefully improve, as worse individuals are replaced. Usually, the population will converge to some steady-state distribution, meaning an optimum has been found. It may be a, local optimum, not global, and it is often difficult to know which it is. GA practitioners make sure that the initial population is large enough and random enough that the entire search space is adequately covered, and they control the genetic operators in such a way to keep the rate of evolution from converging too quickly on a local optimum before a better optirrrum has been found.
For the work in this paper, the GA package GALib was used (27~. This source-code is a library of C++ classes that allows the user to easily set up chromosomes, fitness functions, and define genetic operators. It is available for use from MIT at the URL:
http : //lancet . mit . edu/ga/ and contains a full description of how the package works.
3.1.1 Optimization Approach The potential search space for this problem is enormous, being all possible degree sequences defining an LDPC code, and it is useful to constrain it in order to reduce the overall com-plexity. To this end, we constrain the rate of the code to be a fixed value, and for all of the cases presented, this value was R = '-'-z . We also constrain the maximum degree that any check or variable node can take. The goal of the optimization is to maximize the threshold performance of an LDPC code through the use of density evolution. As we see in (18~, the threshold performance increases as the maximal degrees increase, so we should be aware of the impact on threshold performance by limiting the maximal degrees for ~(x) and p(x).
Let the maximal degree allowed for a(:z) be dv, and for p(a;) be d~. Define the chromosome for the genetic algorithm to be a list of real numbers q E (0, 1~ such that chromosome " " . ~' ° ~ ~ ~ (3.1) =q2~q3~' '~qdyq2~q3~' '~qd~-1 where qi and qi correspond to the variable and check coefficients of degree i respectively. In order for the polynomials ~(:r) and p(x) to meet their coefficient constraint that ~7, .~i = 1 and ~i p~ = 1, we normalize both sections of the chromosome in the fitness function. Note that we do not include an element in the chromosome for the check node of highest degree.

This is to allow the rate constraint to be met, as the check node variables are normalized so that the desired rate of the resulting code is achieved.
The initial population of size 50 individuals is randomly generated, and the GA com-mences evaluation and evolution. The fitness firnction takes as its input the chromosome, and converts it into the degree sequence polynomials such that the desired code rate is cor-rect. The function performs density evolution to determine the threshold performance for the ensemble of codes described by the polynomials, and the threshold is returned by the function as the fitness score. The number of iterations allowed can also be fixed to limit the search to more practical codes. Trials performed had the maximum number of iterations limited to 25, 50, 10(l, and 200.
Those individuals with the highest fitness are chosen for reproduction, and their offspring are used to replace the individuals with the worst performance. A genetic-like crossover is used for recombination in which two parent chromosomes are are randomly 'spliced' and recombined to create an offspring chromosome. Over the course of many evolutionary steps, the thresholds generally converge on what is likely the maximum threshold achievable given the construction of the decoder and constraints of the code. The best single individual is selected and its chromosome is used to create the degree sequence polynomials for the top performing code ensemble.
The equations used for density evolution analysis contain many convolutions that need to be computed, as seen in (2.27). An efficient method for computing convolutions is to take Fourier transforms and multiply. This is how all convolutions are calculated in this work, and a few notes should be made about the effects of the Fourier Transform. Fourier transforms are efficiently computed using lV-point DFTs where N is the level of quantization in the probability distribution, and the quantization error that results can affect the analysis. The value N must be large enough such that this error is small, and was chosen to be 2048 for these trials which was found to be sufficient through multiple attempts with different values for N. The C library FFTW was used to perform all Fourier transforms in this work (5~.
With the density evolution tools in place, and a global search mechanism available to us, we are ready to optimize the different decoders, beginning with the sum-product decoder.

3.2 Optimization of the Sum-Product Algorithm Using the GA setup discussed, optimally performing codes can be generated for the surn-product algorithm. Density evolution for this algorithm follows the procedure presented in (18~ and the reader is guided to this paper for details. The performance of these codes is very good, and for larger block lengths approach their thresholds. This is the only decoding algorithm examined that consistently improves in performance as the maximal degree in-creases, so long as the number of iterations allowed is sufficiently large, and this behaviour is predicted by MacKay in (14~. On average, a large number of iterations is required to decode these highly optimized codes. The average number of iterations required rises to as high as several hundred for low SNR, conditions. Again, the reader is directed to (18~
for performance evaluation with unlimited iterations.
When a restriction on the maximum number of iterations is enforced, the performance of the Burn-product decoder degrades, and Figure 3.1 provides more detail.
These codes are derived from density evolution optimization with a constraint on the maximum number of iterations set to 200 and a block length of 10,000. For codes with unlimited number of iterations, performance of the Sum-Product algorithm improves as the maximal degree is increased. However, as can be seen from these results, this is not necessarily the case for constrained iterations.
There appears to be an error floor introduced by limiting the number of iterations which gets more severe as the maximal degree is increased, such that the performance of lower degree codes is better than that of larger degree codes for higher SNR, conditions. For low SNR conditions, the higher degree codes outperform the inherently weaker lower degree codes, though still not nearly as well as comparable codes that are allowed a very large number of iterations. As the maximal degree is increased, variables on average participate in a greater number of parity check constraints, meaning that the decoding procedure requires more iterations for all of the received information to have a significant effect. As the number of iterations are allowed to grow for these codes, the error floor disappears and the standard 'waterfall' curve is seen as in Figure 3.2. The performance of these codes when allowed an unlimited number of iterations however, is worse than the codes that were optimized to 0.01 11 ., 09 05 0.001 ,, p 0.0001 ', le-05 le-06 ~ ' 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 SNR (Eb/No) Figure 3.1: Sum-Product algorithm performance for rate 1/2 optimized codes of block length 10,000, maximum number of iterations=200. The numbers 05-20 indicate the maximal variable degree for each code used.
operate with a very large number of iterations.
The error floor problem becomes worse as the code's block length is shortened.
It may be useful to couple these codes with a second high-rate code to remove the error floor while still reducing the overall number of computations required for successful decoding due to a decreased overall number of iterations. Alternatively it would be nice to be able to identify those sections of the graph that are still in error and concentrate the decoding effort on those sub-graphs only.
When the inherent power of the sum-product is limited by practical constraints, there is a significant degradation in the codes' performance, though overall, the performance is still excellent compared with traditional codes. We shall see in the next section that the min-sum algorithm can also be optimized for practical situations, and the reduced complexity decoder can perform nearly as well as the powerful sum-product algorithm.

0.01 0.001 W
0.0001 1 e-05 le-06' ~ ~ ~ ~. ~ ~ . ~ w ~ ~ i 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 SNR (Eb/No) Figure 3.2: Effect of iterations on sum-product algorithm performance for optimized rate 1/2 codes of block length 10,000 and a maximal degree of 20. Iterations range from 200 to 1200 in increments of 200.
3.3 Optimizing the Min-Sum Algorithm Most of the complexity in the sum-product algorithm arises from the need to calculate com-plex products for the check node update rule in (2.14). If this equation is closely examined, certain simplifications may be made to greatly reduce the complexity. Figure 3.3 (a) shows the input-output characteristics of the check node equation based on all inputs being equal, and (b) shows the situation for one input being much less than the rest. It is easy to see that for the case of one input being much less than the rest, output values are about equal to the input values, and even for all inputs being equal as in (a), the difFerence from the input=output line is not too great, and is bounded.
Using this knowledge, we can reduce computational complexity of the check node update rule through the use of an approximating function, and this will be called the main-sum algorithm. The check node function for the min-sum algorithm returns a value computed as follows: Let the magnitude of an output message be the minimum magnitude of all inputs excluding the edge whose output is being calculated. Let the sign of the output message be ~03 ~0 3 2 ~ 2 ' 1 ' 1 ~'~ ~

................,.:................... ~
................................................

-1 ~ -1 ' ' R -2 ~ -2 :

O -3 (a) O _3 fib) ~ ~.

w Input message Input message Figure 3.3: Check node output vs. input curves. (a) Input-Output characteristics of check node for equal inputs. (b) Input-Output characteristics of check node for one message much less than all others.
the rnodulo-2 addition of the signs of all the inputs excluding the edge whose output is being calculated. If the messages are represented by log likelihood ratios given in Table 2.1, this rule can be written as ~~f-~~(x) _ ~ Sgn(uy-~f(y)) ~Ems)\~ buy-,f(y)I (3.2) yEn,(f)\~
where f represents the check node in question, and ~ the variable which connects to f though the particular edge we arc examining. The update rule for the variable side remains the same as for the Sum-Product algorithm in (2.13).
This simplification is a useful step, but we would also like to determine optimal threshold performances for this algorithm using density evolution concepts. In order to do this, the distribution of messages must be determined at the output of the check nodes.
We already know how the density evolves at the variable nodes, and with the described update rules for the min-sum algorithm, the output cumulative distribution of messages from check nodes can be written F(y) = 2 [1- FX(I y I ) + FX(- I y I)1N [1 + (1 + 2FX(o))N] (3.3) where F(y) is the cumulative distribution function of the log-likelihood ratio probability messages of all of the edges being input to check nodes. This function enables us to model the evolution of probabilities on the graph over time. We may now proceed in finding degree sequence polynomials which have optimized performance thresholds.
Families of optimized codes were generated using a genetic algorithm. Degree sequences were limited to maximal degrees of 4, 5, 7, 9, 11, 13, 15, 20, and 25. The maximum number of iterations allowed was set to 25, 50, 100, and 200. For each case, an optimized degree sequence was returned by the GA. Table 3.1 shows the resulting degree sequences and thresholds for the cases in which iterations were limited to 100. Similar results were found when iterations were allowed to be greater or fewer, and in general, the thresholds found grow larger as the number of iterations are allowed to increase. Note that the thresholds given in Table 3.1 grow smaller as the maximal degree increases, and is seen with the sum-product algorithm as well.
The min-sum algorithm, as expected, does not perform as well as the sum-product al-gorithm. The thresholds found using density evolution optimization are greater than the sum-product algorithm, though the trend of decreasing thresholds with increasing maximal degree is still seen. Table 3.2 gives a comparison between the thresholds found for each algo-rithm when iterations are unlimited and limited to 100 iterations. Clearly, from a. threshold perspective, the sum-product algorithm is superior; however it; is useful to compare their performance in practical situations.
First, an examination of the performance curves for the Min-Sum algorithm reveals that for practical cases, larger maximal degree codes perform morse than smaller maximal degree codes for all SNR.s. As an extreme case, one can see in Figure 3.4 that the performance for the code with a maximal degree of 25 performs much worse at all SNRs investigated than the code of maximal degree of 11, which was found to be the, best performing code of the group. Since the min-sum algorithm does not compute exact APP values, errors which occur from the approximation are propagated through t;he graph. Check nodes with high degrees have a much greater chancre of sending a larger error to many more nodes in the graph, and due to this inherent problem, performance of these codes suffer.
There is an error floor that can be seen with these codes and, like decoding with the sum product algorithm, is a result of the iteration constraint put on the codes.
Coupled with the errors introduced by the approximation of the APP values, this results in an error floor dv 4 5 7 9 11 13 15 20 25 ~z 0.3699660.3550970.3166010.3344920.3023080.3093760.2974880.2844560.267085 a3 0.4117070.34071.50.4000990.3752630.3213790.3443620.2656820.2914180.312483 ~a 0.2183270.1638180.0760030.0712850.0346570.0651130.0897330 0 ~s 0 0.1403700 0.0240160.02117!)40.0179040.0897330 0 as 0 0 0 0 0 0 0 0.0512500.051886 a7 0 0 0.2072970 0 0 0.0650600.1106880.046961 aA 0 0 0 0 0 0 0 0.0412010.040485 J~q 0 0 0 0.1949450 0.1464490 0.0512500.023379 ato 0 0 0 0 0.1745840.0516840 0.0355190.026411 att 0 0 0 0 0.1402770 0 0 0 ats U 0 0 0 0 0.0651130 0 0 ata 0 0 0 0 0 0 0.1175400 0 ats 0 0 0 0 0 0 0.0747650 0 0 0 0 0 0 0 0 0.0235300 azo 0 0 0 0 0 0 0 0.1106880 azs 0 0 0 0 0 0 0 0 0.126607 azs 0 0 0 0 0 0 0 0 0.104703 ps 0.6520060.5104730.1086240.3305660.0289710.1578930 0.1821630.050973 ps 0.3479940.3379850.8852160.3461330.0577930.3480330.4150990.2959640.176702 0 0.1515420.0061610.2292960.0837340.380856(1.4414780.0458570.183145 pR 0 0 0 (l.Of140060.8295020.095024(1.1434230.0334830.381413 pa 0 0 0 0 0 0.0181930 0.1557250.203786 -pto 0 ~ 0 ~ 0 l 0 0 0 0 0.2868080.003980 ~

* 1.221 1.213 1.157 1.0891.040 1.016 0.998 0.9670.914 a' 0.86880.86960.87520.88210.88710.88960.89140.89460.9001 I I I

Table 3.1: Good rate 1/2. degree sequences for the min-sum algorithm over the AWGN, optimized for 100 iterations. The values Q* and ( N ) * are the threshold noise power and SNR for the given degree sequence.

IterationsT MaximalSum-Product Min-Sum Degree SNR' SNR' 100 4 0.926 1.221 100 5 0.916 1.213 100 7 0.865 1.157 100 9 0.812 1.089 100 15 0.746 0.998 00 4 0.782 0.967 00 5 0.723 0.931 00 7 0.502 0.787 00 9 0.409 0.692 00 15 0.335 0.651 Table 3.2: Comparison between sum-product and min-sum algorithm thresholds that is severe for few iterations. Like the sum-product algorithm, the error floor becomes less severe, and disappears as the number of itera ions increases.
The performance of the min-sum codes with respect to their threshold value also degrades faster for smaller block length than compared with the sum-product results.
The reason for the degradation in performance is the effect of finite block lengths. Recall that threshold performance is the expected asymptotic performance as the codes length approaches infinity.
The min-sum product algorithm is a<aually an approximation of the sum-product algorithm, and as such, these finite eif'ects are much more pronounced, resulting in a rapid degradation in performance as block lengths decrease.
From the above discussion, we see that though the complexity of the min-sum algorithm is lower than the sum-product, the performance suffers significantly, and it is up to the code designer to decide which may be of more value. It is interesting to ask whether there is a better approximation function than (3.2) that is still computationally efficient. We will attempt to answer this in the next section.
3.4 The Modified Min-Sum Algorithm The simplification of the check node update rule for the min-sum algorithm introduces an error into the output messages, which certainly impacts the performance of the decoder.
This message error, 0~,, defined as the difference between the actual output from a factor 0.01 0.001 crl 0.0001 le-05 le-OE
Figure 3.4: Optimized min-sum algorithm performance for rate 1/2 codes of length 10,000 with 200 maximum iterations.
graph update rule, /~a~t~at and the output from the min-sum update rule, ~Glni'n-suylL, is ~71L2'IL-5'7I,'IIL ~fl(:G'll(lL (3.4) and is bounded by 0 < D~c G log d,~ (3.5) where d~ is the degree of the particular check node. It is easy to check that no error will occur when all message magnitudes but the minimum are infinite, and that the maximal error occurs for the case where all input messages are of the same magnitude.
Knowing this bound, we can attE:mpt t;o improve the min-sum decoding performance by introducing an additive term to the check node update rule. This approach has been used for other codes, and is very similar to the Max-Log-Map algorithm ~17, 1, 20~.
The goal is to closely approximate the transfer function of the check node as simply as possible, and one effective approach is to use a piece-wise linear approximation. A number of dif~'erent limits and variations were tried for the approximation, and the best found is given here. Let ~i.ILZc~~c be equal to the result in {3.2), now let the resulting message output from a check node in 1.G l.~i 1.4 l.b l.b SNR (Eb/No) 0.01 0.001 ~ o.oool W

le-05 le-06 ..05~

le-07 1.1 1.15 1.2 1.25 1.3 1.35 1.4 SNR (Eb/No) Figure 3.5: Performance of the modified min-sum decoder for rate 1/2 codes with block lengths of 10,000 and iter<~,tions limited to 200 for varying maximal degrees.
the modified decoder be l-tinitial - K if ~I,initiad ~ K
N'f->y = ~initiac + K if ~tinitial < -K (3.~) ~tinitia~ otherwise where K is a constant value that is some fraction of log d~. The actual value of K used depends on the average d~ for that code, and was found to be effective when chosen to be 3 log d~. There are many alterations that can be made to this firnction by altering K or the limits. One could do a least squared fit to the transfer function, and the resulting approxi-mating functions use would then depend on complexity and performance considerations. For the remainder of this discussion, only (3.6) will be used.
Once again, the ideas of density evolution may be applied to this modified algorithm to determine optimized degree sequences. The addition of the constant K
complicates the expression for the probability density and numerical methods were used to calculate the required densities. It should be noted that the effect of the additional constant is to 'shift' those portions of the probability distribution towards the y-axis.

d" 4 :i 7 9 13 15 ~a 0.3699660.3684360.3436570.2998280.2722850.278550 a3 0.4117070.3984930.296163(1.2800350.2808720.268489 a4 0.2183270.1255190.2046970.0721210.0048040.059741 a5 0 0.1075520 0.0679800.0765610.041022 afi o 0 o a o a a7 0 0 0.1554830 0 0.044403 a8 0 0 0 0 0 0 ay 0 0 0 (1.2800350.0049160 ale 0 0 0 0 0.1165790 al 0 0 0 0 0 0 ais 0 0 0 0 0.2440420 a74 0 0 0 0 0 0.058201 ~ls 0 0 0 0 0 0.249594 p5 0.6520060.6134660.1917500.3677160.153(1800 p~ 0,3479940.2964020.7624450.0080950.1994900.319192 p7 0 0.0901310.0458060.0047160.0043290.068649 pn 0 0 0 0.6194730.1934560.612159 pa 0 0 0 0 0.4496460 F6 1.2211.197 1.128 0.793 0.7350.709 NO
~

0.8f>880.87120.87820.91270.91890.9216 Table 3.3: Good degree sequences for the modified min-sum algorithm with rate 1~2 codes over the AWGN, optimized for 100 iterations. The values Q* and ( N ) * are the threshold noise power and SNR for the given degree sequence.

Performance curves for the modified min-sum algorithm are given in Figure 3.5 and Table 3.3 shows the optimized degree sequences for the case of iterations limited to 100. The curves look similar to the regular min-sum algorithm with improved overall performance for lower SNR, scenarios. The selected update equation has the effect of introducing many very low-value messages, and these messages are essentially considered erasures by the decoder.
For larger maximal degrees, these small valued messages will be propagated through certain sections of the graph, resulting in some sub-sections of the graph becoming stuck in a non-convergent, inconclusive state. This is seen by the very prominent error floor in these codes.
As with the other algorithms examined, this error floor gets much worse for larger maximal degrees, but a greater number of iterations mitigates the effect. Very high maximal degree codes were not investigated as it was assumed that they would have poor performance, and it appears that this is the case as the maximal degree 9 code is already worse than the maximal degree 7 code for all SNR,s investigated.
The modifications to the min-sum algorithm do give the decoder better performance, but the error floor becomes more pronounced. We see again that there are trade-offs involved with the selection of a decoding algorithm. Depending on the target BER, that is to be achieved, the modified min-sum may or may not be a good choice. We will examine some other factors affecting iterative decoder performance in the next section.
3.4.1 Comments on Good Degree Sequences There are similarities between all of the optimized degree sequences, and it is worthwhile to examine them. The most striking commonality between all the optimized sequences, which is true for all of the algorithms, is the maximization of degree-2 variables.
In all cases, the proportion of degree-2 variables is very close to the lirrriting value such that every check node has two degree-2 variable nodes participating with it. This means that the number of degree-2 variables is less than, but close to the number of check nodes. If the number of degree-2 variables is greater than the number of checks, it would be impossible for a code to be constructed without a, cycle of length 2. It seems then, that the small degree variable nodes, counter intuitively are an important factor in powerful codes, as there are generally a great deal of degree-3 variable nodes in these codes as well.
Besides this concentration, it also appears that good performance is achieved by hav-ing a large proportion of edges associated with the highest variable degrees.
Besides the high number of edges participating with degree 2 and 3 variable nodes, a majority of the remaining edges tend to participate with the highest degree variables, especially in the case of the sum-product algorithm. These edges, however, seem to be the cause of the finite effects that degrade the practical performance of these codes, and there are generally fewer edges participating with these high degree nodes when they are optimized with iteration constraints.
3.5 Finite EfFects Fundamentally, there are two factors preventing the codes presented in the previous sections from achieving their threshold performances. The first is the finite block length of the codes, and the second is the finite number of maximum iterations allowed.
Both the min-sum and modified min-sum algorithms share certain commonalities which will be discussed. First, as with other iterative decoding schemes including turbo-codes, performance is highly dependent on the block length of the code. Figure 3.6 shows this relationship for the min-sum algorithm given a maximum of 200 iterations, and Figure 3.7 gives the same relationship for the modified min-sum algorithm.
In addition to the block length, both the Min-Sum and modified Min-Sum algorithms perform much better when given a greater number of iterations to decode.
Figure 3.8 gives results for the Min-Sum algorithm, and similar results hold for the other decoding methods examined.
Finally, it is useful to investigate the nature of the error floor found in the various decoders. Figure 3.9 shows the error floor drop off for the three decoding schemes discussed at higher SNRs. An examination of Figure 3.9 shows that all three algorithms operate in three 'modes'. The first, at lowest SNR is the standard performance seen for these curves and performance is limited by the noise conditions. As the SNR increases, these codes enter an 'iteration limited' region of performance in which performance improvement with increasing ' ' CA 02310186 2000-06-02 0.01 05,1 0.001 0.0001 ;
r ' lc-0~ ~ 09,104 ~~~ 09,105 ~~~ 05,105 05,10 le-06 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 SNR. (Eb/No) Figure 3.6: Performance of the min-sum decoding scheme for varying block lengths. All algorithms are highly dependent on the block length used. The first number is the maximal variable degree and the second is the block length. Iterations are limited to 200, an<l the rate is 1/2.
0.01 05, 103 0.001 ' ' m p 0.0001 ''~~y ,09,105 09 le-05 ' 05, 104 .~ 05, 10 le-06 1 1.2 1.4 1.6 1.8 2 SNR (Eb/No) Figure 3.7: Performance of the modified min-sum decoder for maximum iterations of 200 for rate 1~2 codes of varying maximal degrees and block lengths.

' ~ CA 02310186 2000-06-02 0.01 05,25 09,25 0.001 ..
''' '. ', .' p 0.0001 ' ', .' .~
..
' ' .05,200 .~
~ ~~~ 05,1000 . , 1 e-05 ' _ ' ~~ 09,200 09,1000'', ~~
le-06 ~
1.2 1.3 1.4 1.5 1.6 1.7 SNR (Eb/No) Figure 3.8: Performance of the min-sum decoder for block length of 10,000, for varying maximum number of iterations.
0.01 0.001 0.0001 W
p le-05 1 e-06 1 e-07 0.9 SNR, (Eb/No) Figure 3.9: Performance of the three decoding schemes at high SNR, for rate 1/2 codes showing the drop o$' of the error floor.

SNR is smaller than expected. Performance is limited by the fixed number of iterations allowed. Finally, at high SNR, regions, performance improvement begins to increase as expected.
3.6 Performance vs. Complexity We have discussed in previous sections some of the impacts that practical considerations, such as the maximum number of iterations allowed and block length, have on performance.
It would be useful to quantify these trade-offs in a meaningful way to try to get a clear picture of the issues, and we accomplish this by plotting the performance of these decoding algorithms against the decoding cost in terms of their computational complexity.
A codes true complexity is not simple to calculate and is highly dependent on how it is implemented. The complexity calculated and used in these trials is based on the direct computational effort required only within the decoding algorithm, and this is the compu-tations required at each node. All basic computations are considered to be of equal effort, so an addition is the same complexity as a compare or a multiplication. Also, overhead computational requirements such as loading and unloading of buffers is not included. The intention is to provide a rough measure that serves as a relative comparison rather than an absolute description.
Based on efficient calculations (i.e., using the FBA), we can derive expressions for the complexity of a given decoding procedure. At a node of degree D, the number of basic computations required for the FBA is 3(D-2). For the sum-product algorithm, 5 operations are required per basic FBA operation at a check node, such that the the complexity per bit of the algorithm, C, can be written as CS_P = I (variable operations + check operations) (3.7) =~I 3- z 5 +5 3- 1 0 - (3.8) fo ~(~r)d~ fo p(~)d~
which simplifies to C,s_ p = I 18 - 5 - 30 fo '~(=~)~x .fo p(x)dx (3.9) where I is the number of iterations used and was set to be one of '?5, 50, 100 or 200, depending on how many maximum number of iterations are allowed. The min-sum algorithm requires an additional select operation at each check node beyond the basic computations, so its complexity per bit can be written ~~M_S-I 6_ 5 - 4 (3.10) fo ~'(~)«'x fo p(x)dx and similarly for the modified min-sum which requires an additional compare and add per bit at each check node, 4 (3.11) Cna.m-s = I 8 - _ .~o '~(x)dx Figures 3.10 and 3.11 give the perforrrrance complexity trade-off for BER
targets of 10-3 and lOw' respectively. The curves represent codes of block lc,ngth 10,000 rate 1/2, and maximum iterations limited to 2(l0. For all of the cases examined, there is an optimal operating point, meaning that for some added complexity in the form of higher maximal degrees, decoding performance actually degrades. This is an important result for practical situations. What seems to be happening in general is that the highest degree nodes require a very large number of iterations to be decoded correctly, and for situations where there is a practical limit on these, the code actually performs worse.
For a target BER, of 1.0-3, one c:an see that there is no obvious 'win' in terms of the performance complexity trade-ofF between the three decoding schemes, although it seem one can achieve higher performance with lower complexity for some eases with the modified min-surer over the standard min-surrr algorithm. For a target BER of 10-5, we see a clear win with the modified min-sum algorithm over the standard min-sum, and it even outperforms the sum-product algorithm in certain instances.
Figure 3.12 shows the performance complexity trade-off for a target BER of 10-5 with iterations constrained to 25. The performance is worse than for the case of maximum itera-dons set to 200, but the general pattern remains. Note that the lower maximal degree codes perform relatively better with fewer iterations. For example, the best performing code for the modified min-sum algorithm with 25 iterations has a maximal degree of 5, whereas it is 9 for 200 iterations.

I I

11 is 2°
0.7 0.8 05 Sum-Product 1.1 °' 11 °~Mod. Min-Sum 1.2 °f 1.3 °' 2° Min-Sum 1.4 4 25 Complexity (Computations/Bit/Iteration) Figure 3.10: Performance Complexity trade-off for a target BER of 10-3 for the sum-product, min-sum and modified min-sum algorithms.
1.1 °5 Sum-Product 1.2 05 a9 O °a 1.3 1.4 Mod. Min-Sum 1.5 0~5 12° Min-Sum 1.~ t i I I I I I I I I I

Complexity (Computations/Bit/Iteration) Figure 3.11: Performance Complexity trade-off for a target BER, of 10-'' for the sum-product, min-sum and modified min-sum algorithms.

1.2 1.3 2p 0 1.4 Sum-Prods 05 p7 1.5 p" p9 Mod. Min-Sum 1.6 o~

1.7 nnll~a I / \ 15 1.8 ~5 'zo Min-Sum f 7 8 9 10 11 12 13 14 Complexity (Computations~Bit/Iteration) Figure 3.12: Performance Complexity trade-off for a target BER of 10-5 for the sum-product, min-sum and modified min-sum algorithms with maximum iter<~tions set to 25.
It is useful to understand the relationship with varying block lengths for these algorithms, so two plots are given in Figures 3.13 and 3.14 for comparison, having block lengths of 100,000 and 1000 respectively. As the block length increases, the codes become more sensitive to the maximal degree parameter. The difference between the performance of the optimal code versus one with a smaller or larger maximal degree is much greater for block lengths of 100,000 than 1000, especially for the min-sum and modified min-sum decoders.
We have seen that for practical cases, reduced complexity decoders such as the min-sum and modified min-sum provide positive alternatives to the sum-product algorithm in terms of decoding power per unit operation. Overall performance does suffer however, and we would like to see ultimately a practical, realizable decoder that can perform as well as the sum-product without the complexity. The answer, perhaps, may be found by examining the actual implementation of the decoder and exploiting the power of the platform.

0.9 0.95 " 15 20 1 Sum-Produc _ 04 13 O 1.05 091 05 1.1 o~
1.15 o Mod. Min-Sum 1.2 1.25 25 Min-Sum 1.3 1.35 Complexity (Computations/Bit/Iteration) Figure 3.13: Performance Complexity trade-off for a target BER of 10-5 for the sum-product, min-sum and modified min-sum algorithms for block length of 100,000.
1.8 -1.82 11 1.84 zo 0 1.86 Sum-Product 1.88 ~5 .n 1.9 1.92 0~5 0~ 09 1.94 0/ ',1 1113 Mod. Min-Sum 1.96 a, 15 1.98 04 Min-Sum Complexity (Computations/Bit/Iteration) Figure 3.14: Performance Complexity trade-off for a target BER, of 10-~ for the sum-product, min-sum and modified min-sum algorithms for block length of 1000.

Chapter 4 The Sum-Transform-Sum Algorithm Applications that require very powerful FEC schemes can greatly benefit from LDPC codes, as the performance is beyond most other coding choices. One difficulty with the LDPC codes presented so far is that they are not necessarily well suited for implementation in devices such as modems. These codes have been developed with decoding performance in mind, with no thought to the application.
A different approach can be used to find a good decoding algorithm. First, understand the implementation, and use the features inherent in the implementation to find powerful decoding algorithms. Applied to a hardware instantiation, this approach leads to the sum-traraaforrrz-surra (SXS) algorithm.
4.1 Algorithm Description The SXS algorithm is a hardware architecture and algorithm for a soft-input, soft-output iterative decoder for LDPC codes, and is similar to the standard sum-product or ruin-sum algorithms. The inputs to the algorithm are soft representations of the probabilities for the possible transmitted symbols, and for a hardware implementation, it is natural to repre-sent these values as fixed-point numbers. The precision with which these number will be represented is a parameter of the system and will depend on, among other things, memory constraints, and hardware complexity.

Unlike the other iterative decoding schemes described, the SXS algorithm utilizes two separate probability representations for messages and requires a transformation algorithm to convert between the two. The two representations for messages used are the LLR and the LLDP, both of which have been previously described and defined. Variable nodes work in the LLR, domain while check node operate in the LLDP domain. Using these representations, both the variable node update equation and check node update equation are simple to calculate, being the summations of (2.13) and (2.15) respectively.
It is useful to note that these two representations are c~uals, meaning that decoding in one domain is equivalent to decoding the dual code in the other domain. The idea of using the dual domain to decode linear codes was proposed originally by Hartmann and R.udolph ~7~.
Forney has suggested the use of dual codes and a Fourier Transform method to convert between them ~9~. The use of the dual code can reduce the complexity involved in the decoding procedure without any performance loss, however the conversion between domains may require intensive computations.
With two separate probability representations, the SXS algorithm requires a function h(x)(s, r) to transform messages between them. Let ~,Tdtio be a message in the log-likelihood ratio domain, and ~,d2 f p be a message in the log-likelihood difference pair domain. The function r(x)(s, r) takes as its argument a message of the form ~~,.~,t~~ and returns a message of the form ~cd2ff. The returned message has the equivalent probability as the input message, only expressed in a different manner. Similarly, the inverse transform function ),-1 (s, r) (x) takes as its input a message of the form Ecd2 j f and returns the equivalent probability in a message of the form uTazio. The function h(x)(s, r) may be written as ea ex 1,(x) (s, r) Csgn L2 C 1 + e~ ) 1] , log 2 Ce.~ ~+ 1 ~ - 1 ~ (4.1) where x is a message of the form ~r~tio. Likewise, the inverse transform function >,-1 (s; r) (x) may be written as 1 + Ser T-r (s, r) (~) = log ~ 1 - ser ~ (4.2) where (s, r) is the sign and magnitude of a message of the form p,~2 f f.
The function in (4.1) and its inverse, (4.2), may be implemented very simply using a pair of look-up tables in memory. If, for example, messages are stored as 8-bit (N-bit in general) values, then two look-up tables each of size 28 = 256 bytes (2N N-bit values in general) are required to completely describe the forward and inverse functions. One need only to calculate the value of the transform for each possible input once, and store the results in memory for future use. Alternatively, the function and its inverse may be used themselves to calculate the necessary transforms on an as-needed basis.
It may also be possible to implement the transformation through the use of a combination of logic and memory. For example, it may be a simple matter to use some logic to determine the most significant bits of the transform output, and determine the least significant bits in a reduced look-up table. This flexibility allows the algorithm to be used in an optimal manner depending on the platform that is it is implemented on. If processing speed is a constraint, it would make sense to take advantage of the look-up table as much as possible as opposed to calculating multiple transforms.
One difficulty in using fixed-point precision for this algorithm is the many-to-one, one-to-many problem, which occurs as a result of the non-linear nature of the transformation in a fixed-point system. Multiple values in one representation may map to the same value in the other and vice versa, making the choice of ranges and precision for these values critical.
For example, the values 7, 8, and 9 may all map to the value 1 in the other domain, and the problem is that information is lost in mapping back from 1 to the original domain.
Another issue with fixed-point irnplernentations is the range of values that nray be taken on by the precision used for the system. If too few bits are used to represent the values, performance may be very poor. In fact, this is exactly the case. When 6 bits are used to represent the data, the decoder works very poorly indeed, but as the precision is increased to beyond 12 bits in general, performance is limited by the power of the code itself, not the fixed-point precision.
The two message representations, uTaaio and ~cd2 f f, have different domains, as shown in Table 2.1, but since a sign bit is naturally represented in the fixed-point world of hardware, the domains become the same, and this is finitely limited by the choice of how decimal bits and whole bits are allocated within the overall bits available. For effective performance, ~7~~zi~ messages may need to take on very large values, whereas ~,ri f f messages may have very small decimal values. It is advantageous to allocate more whole bits to ~e,.atqo messages, and hrE'.Clslon/dTatio AllocationEtdi f f Allocation 6-bit 1-3-2 1-1-4 8-bit 1-3-4 1-1-6 10-bit 1-4-5 1-1-8 12-bit 1-5-6 1-2-9 lf-bit 1-6-9 1-3-12 Figure 4.1: Bit allocation for messages in the SXS algorithm. The first digit is the number of sign bits, then the number of whole bits, then the number of decimal bits more decimal bits to itdi f f messages. Figure 4.1 shows how bits are allocated for varying precision for the trials performed with these codes.
4.1.1 Decoding The decoding algorithm commences by updating the check node output messages.
This is accomplished using the following' equation:
l ~ratia = r 1 ~ sw~~ f f ~ ~ r~d~r f (4.3) xEn(i) xEn(i) where i is the ith edge on the graph and ~,(i) is the set of edges connecting to the check node associated with i, excheding i itself. The variables s and r represent the sign and magnitude for the pair (s, r) which is how Etdi ff messages are represented. This equation is used a number of times based on the structure of the LDPC code to update all of the elements of the variable input array. .fn order to increase the efficiency of this equation, the FBA may be used ~11~, or an efficient one-pass algorithm may also be used ~8~.
Once completed, the variable node edges are then updated. The rule for updating these messages is governed by the equation N'di f f - r ~init + ~ N'ratio (4.4) wEn(j) Code IterationsThreshold Regular (3, 6) 00 1.13dB

Optimized Max Degree200 1.04dB

Optimized Max Degree50 1.26dB

Optimized Max Degree200 0.97dB

Optimized Max Degree50 1.16dB

Table 4.1: Threshold performance for SXS codes where ,j is the .jth edge in the graph and n(,j) is the set of edges connecting to the variable node associated with j, er~cludiray j itself. The order in which the variable and check node input arrays are updated may vary depending on the actual code used. These messages are iteratively updated until some convergence criteria is met, as with the other algorithms explored in this paper.
4.2 Optimization and Performance Density evolution can be used to find optimized codes for this algorithm, and the equations involved are actually identical to those used for the sum-product algorithm.
The surn-product algorithm, however, uses floating-point values, whereas the SXS
algorithm operates in the fixed point domain. As such, density evolution needs to also operate in fixed point.
Fixed-point number precision adds another dimension into the search space for good codes.
Trials were performed at 10-bit and 12-bit precision and all results presented here are from the 12-bit case.
Like the cases for the other decoders optimized in this paper, the SXS
algorithm was optimized for block lengths of 1000, 10000, maximal degrees of 4 and 5, and iterations of 50 and 200. For a, comparison reference, a regwlar f ra, 3, 6} LDPC code is also presented. The threshold for the regular code under Burn-product decoding is 1.13 dB.
Thresholds for the optimized codes are presented in Table 4.1.
The performance for a regular ~ 10000, 3, 6} code is presented in figure 4.2.
Given that 0.01 0.001 W
a 0.0001 le-05 le-06 1.2 1.3 1.4 1.5 1.6 1.7 SNR (Eb/No) Figure 4.2: Performance of the SXS algorithm for a regular {10000, 3, 6} code with maximum iterations set to 200. The effect of varying precision is also shown.
the theoretical threshold for this regular code is 1.13dB, the performance of the decoder is very good, especially considering the limitation on the number of iterations.
Note that the performance of the decoder degrades rapidly for fixed-point precisions of less than 10-bits.
The performance for 8-bit precision is poor, and the 6-bit performance would not even be visible on this graph. Once a limit is reached, at about 12 bits, there appears to be little additional gain by greater precision.
Figure 4.3 shows the performance of this decoder for a regular {1000, 3, 6}
code. The performance is worse than the longer code presented in figure 4.2, as is expected. For comparison with the sum-product algorithm, figure 4.4 shows the performance of both the SXS decoder performance and sum-product decoder performance for the same regular code.
We see from the plot that the sum-product algorithm performs better, though not by much.
Genetic algorithms were used to find good degree sequences for the case of 200 maximum iterations, with maximal variable degrees of 4 anal 5, these two being chosen to keep overall complexity to a minimum. The performance for these two codes and the optimized sum-0.01 0.001 ~-0.0001 \~~
le-06 L ' ' ' ~ ~ ' ' r i 1..5 1.8 1.85 1.9 1.95 2 2.05 2.1 SNR. (Fb/No) Figure 4.3: Performance of the SXS algorithm for a regular {1000, 3, 6} code versus varying maxi-mum iterations. The effect of varying precision is also shown. The curves, from worst to best, are:
10-bit: 50 iterations, 10-bit: 200 iterations, 12-bit: 50 iterations, 12-bit:
200 iterations 0.01 ., 0.001 ..
..
0.0001 . SXS
SP ~\ SXS SP~'~.
le-05 le-06 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 SNR (Eb/No) Figure 4.4: Comparison between SXS and sum-product performance for regular (3, 6) codes.
The two left curves are block lengths of 10,000, the two right curves are block lengths of 1000. In both cases the sum-product decoder is more powerful.

0.01 SPOPT5,103' 0.001 _ ___ ____ __________ __-SPOPT5,104______ __ _________ 0.0001 SXSOPT5,103 le-05 XSOPT5,104 REG,104 REG,103 w~
le-06 ' ' ' ' ' ' ' ' ' ' "
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 SNR (Eb/No) Figure 4.5: Performance of optimized SXS codes with maximum iterations set to 200 of lengths 1000 and 10,000. REG refers to the regular (3, 6) code, OPT5 and SXS-OPT5 refer to the optimized codes of maximum variable 5 for the sum product optimization and SXS
optimization respectively.
product codes of Section 3.2 are seen in Figure 4.5. Surprisingly, the optimized sum-product codes perform very poorly when decoded using the SXS algorithm. Even in the higher SNR
regions, most codewords did not decode correctly. Instead, two or three bits got 'stuck' in an incorrect state. The reason for this is the quantization of values in the SXS
algorithm. The highly optimized sum product codes get stuck at fixed points since the incremental gains made in these regions are quantized away. The optimized SXS codes however, perform well, and their performance may possibly be made better given a more advanced transformation technique.
Solving the transformation mapping problem is not straightforward, and a few approaches were attempted besides simply increasing the precision of the fixed-point values. A 'pes-simistic', 'realistic', and 'optimistic' mapping scheme were all used. The pessimistic approach assumed that in the case of a one-to-many mapping, that the transformed values magnitude was the smallest magnitude that would be mapped to that value. Similarly, the realistic scheme used an average value, while the optimistic scheme used the largest value. None of the attempts tried yielded a significant improvement in performance, and the realistic scheme was selected for the trials.
This mapping problerr~ is similar to the dynamic range problem associated with linear amplifiers, and a useful solution in that realm is companding. It may be possible to further mitigate the effects of this problem borrowing the companding technique.
Companding is allocating the bits in a non-linear manner in such a way that small values are not affected too much by quantization error while large values do not extend beyond the range of the fixed-point number. This is an interesting area for future research for this algorithm.
The presented figures provide a good indication of the performance of this decoding algo-rithm. Given that the SXS algorithm was developed first with the ideas of implementation and simplicity, and then looking toward performance, it is a positive sign to see that the code performs well in general. We shall now examine the complexity of this algorithm.
4.3 Complexity Having been designed with reduced complexity in mind, and having seen that the perfor-mance of these codes can be quite good, it is important to understand the overall complexity that is involved in the practical implementation of the SXS algorithm. To this end, we will calculate the computational power required per bit decoded.
At each computational node, we assume that the FBA is used. It is also assumed that the hardware that the algorithm is implemented on is able to handle the bit-depths required with one operation, e.g., 12-bit memory look-ups require one operations, as do 10-bit additions.
Taking these assumptions into account, the number of computations required per bit is ~'sxs=I 8- 5 - 6 fo ~(x~~x fo p(x~d~ (4.5) It is difficult to do a straight comparison with the decoders discussed in Chapter 3 due to the fixed vs. floating point computation differences. The computations required per bit for the SXS algorithm are presented in Table 4.2. If we assume a one-to-one complexity between fixed and floating point computations, we can compare the optimized SXS
algorithm decoder with an equal complexity min-sum or modified min sum decoder. Plotting the performance-complexity point in Figure 4.6 we see that the overall performance is superior to the ruin-sure Code ComplexitySNR, for 10-3 SNR, for 10-5 BER,~ BER.

Regular (3,6) 5.33 1.27dB 1.41dB

Optimized Max Degree5.06 1.18dB 1.33dB

Optimized Max Degree5.55 1.14dB I 1.28dB

Table 4.2: Complexity and target SNR for SXS Algorithm codes of length 10,000.
Complex-ity is presented as computations per bit per iteration.
decoder with similar complexity, and is not quite as good as the optimal modified min-sum decoder, although the decoding complexity is lower. This situation remains true at lower target BER as well. Given this pessimistic comparison between fixed-point and floating-point complexity, it is encouraging to see that as a worst case, the SXS
algorithm performs quite well for its complexity. A more realistic comparison will only show an overall better outcome in favor of the SXS decoder.
4.4 Implementation Architecture The SXS algorithm has been designed with the intention of being easily irnplernented in a, fixed-point hardware environment, and we have seen that with this goal in mind good performance is still possible. The use of a non-optimized, regular code avoids the problem of error floors, and their constant degree in both check and variable nodes will simplify the structure of the hardware required to implement the algorithm. We now provide an example of a hardware implementation.
Let the code be a regular {1000, 3, 6} code, therefore the number of edges on the graph is 3000. Figure 4.7 shows a block diagram for a possible implementation, and is the used for this example. Assume that there is a method to convert channel information into fixed-point probability messages (~cr~,,i~ messages), and store the received word in an array called Input.
There will be two sets of messages that will be needed, and the memory requirement for these are each of size 3000 and are the check and variable message memory blocks.
There is also the need for two look-up tables to perform the necessary transformations from one message 1.05 OPT5 SXS o, 09 1 ~

. 11 pPT4 SXS ~

1.15~ 05 o ~ Mod. Min-Sum 1.204 \ 13 W 1.2509 '--' 1.3o~

Min-Sum 1.35 ~5 1.4 1.45 4.5 5 5.5 6 6.5 7 Complexity (Computations/Bit/Iteration) Figure 4.6: Performance-Complexity comparison of the SXS algorithm with floating-point algorithms. Target BER performance is 10-3.
domain to the other, called the check and variable node transform blocks. The memory requirements for these arrays is entirely dependent on the fixed-point precision used, and if we assume 8-bit precision, we require a memory of size 2 x 2~ --- 512 bytes.
The decoder must know which edges participate at each of the nodes, and this is achieved through the use of the check and variable node descriptors, both of length 3000. Each of these arrays contains a list of how each edge is connected to its neighbour, and provides the information required to perform the update rules. Note that there is only a need for one of these arrays, as the information from one can be determined from the other, but the format in which the information is stored in both allows for a simple approach to updating both check and variable nodes.
Once the input array is initialized, the variable message memory block is set to be era-sures, and the decoding process may commence. The variable node descriptor is stepped through and in groups of three, the FBA is used to produce three output messages which are transformed and written to the proper locations in the check message memory block. A
similar process is used to then write new variable messages. The process repeats until some Check Node Descriptor Check Node ~ Check Node Update Transform Function Block Block Check Msg. ~ J ~ ~ Variable Msg Memo Memo Decoded Word Block ry ~ ~~ ~~ Block ~
Variable Node Variable Node Transform ~ Update Block Function Block Variable Node ~ ~ Input ~ From Channel Descriptor Figure 4.7: Block diagram of SXS algorithm architecture. Square blocks represent memory locations, rounded boxes are computation nodes.

termination criterion is satisfied, which is not shown in Figure 4.7. The criterion may simply be some counter tracking the iterations, or may be a test to determine if a valid codeword has been reached. After the decoder has terminated, the decoded word is output. This may be a soft decision or a hard decision, but some decision is made and is output. The word may be used from that point in future processing.
This relatively straightforward implementation requires an overall memory of size 13, 512, or about 13k. In general, the amount of memory M needed for this implementation for a code of length n and a bit precision of b is i M = 4n ~ ~(r)dr, + 26+i (4.6) .o It is also useful to look at the throughput capabilities for this architecture. The number of computations per bit per iterations is given in (4.5). Given a rate R = 2 code, and limiting the number of iterations to 25, the number of computations required per information bit is Cl = 25 x 5.33 : 2 = 266.5 (4.7) In a completely serial implementation such as the one presented, if the processor was capable of 40MIPS, then the throughput would be approximately 150kbps. As a very rough compari-son, convolutional codes can achieve about the same throughput on a similar system, though with fewer memory requirements ~21~. However, these codes do not necessarily perform as well.
One of the benefits of LDPC codes is their inherent parallel structure. This structure can be taken advantage of in a hardware implementation to greatly enhance the throughput of the system. It would not be difficult to introduce parallel update function blocks to speed up the decoder, and although doubling the number of function blocks wouldn't necessarily double the throughput, it would increase it dramatically as most of the computations required take place in these blocks.

Chapter 5 Conclusions and Future Work Low Density Parity check codes were shown to be the most powerful codes found to date, approaching to within a fracaion of a decibel of the Shannon limit. We investigated the properties of LDPC codes and examined the notion of density evolution. This powerful analytical technique allows the performance of LDPC codes to be predicted and optimized under certain assumptions, and empirical evidence has supported these claims.
Reduced complexity LDPC decoders were presented. Three decoding algorithms, the sum-product, min-sum, and modified min-sum algorithms are described, and a technique for optimizing their performance was developed. Density evolution was presented as a tool to maximize the performance of a given decoding scheme and was applied successfully to the three algorithms discussed. This approach may be used for any iterative decoding scheme that may be represented on a graph. We explored some of the issues affecting the performance of these codes including maximal degree, number of maximum iterations allowed, and block length of the codes. The sum-product algorithm was found to be the most powerful decoder, but also the most computationally intensive.
Performance-complexity issues were examined, and the trade-offs clearly presented. For all the decoders, there was a limit, at least in practical decoding scenarios, where increasing the computational effort no longer increased the performance of the code. This important result illustrates that an optimum operating point can be found for these decoders. While the sum-product algorithm provided the best performance, the modified min-sum algorithm outperforms it in certain situations, and with m<~ny fewer computations required per bit.
The min-sum algorithm, the least complex of the decoders presented performed nearly as well as the modified min-sum algorithm and may be an advantageous scheme for applications requiring low complexity.
The sum-transform-sum algorithm was described as an attempt to build a powerful yet simple LDPC decoder by exploiting the efficiencies found in the implementation platform, in this case, hardware. The fixed-point representations, and access to memory provided the means for the success of this approach. The performance of the SXS algorithm was good, but very dependent on the fixed-point precision and transformation look-up tables. This algorithm is a very viable solution for the implementation of an LDPC decoder in hardware, and a description of an actual implementation was detailed.
5.1 Suggestions for Future Research LDPC research is at an exciting and vibrant stage; there are a great number of advancements being made, and thoughts arc' turning more towards the application of these powerful codes.
The use of sparse matrix techniques has improved the encoding problem for these codes, but it would be nice to find a truly linear time encoding procedure. This problem may be solved by exploiting irregular codes and their tendency to have many low weight variables. This would allow the parity check matrix to be constructed in a manner to be encoded in linear time.
We have seen the effective use of practical decoding systems and have presented an al-gorithm capable of being easily implementable in hardware. Many applications could take advantage of this system, including the use of LDPC codes for wireless data transmission, satellite communications and storage media such as disk-drives. Third generation (3G) cellular systems have discovered the power of iterative decoding schemes and are writing turbo-codes into their specifications. LDPC codes could also be used with the same perfor-mance, and have the added advantage of being patent free, thus no royalties are required for their use.
As an extension to the work presented in this paper, it would be useful to examine a few areas in more detail. There may be greater gains in performance made by tweaking the modified min-sum algorithm. If a simple yet accurate representation can be found for the check node update rule, the algorithm may be made more powerful. Perhaps it is possible approach the problem from a stochastic viewpoint, and find a probabilistic approximation to the transfer function. It is also useful to investigate the error floor that results from optimizing the codes and finding a balance between performance and the severity of the error floor.
The effect of cycles in LDPC codes and efficient LDPC code construction techniques are also a useful area for further work. There is still no analysis available to determine the effect that cycles have on codes, though we know that it is detrimental to performance. There are likely certain construction techniques that may be used to mitigate the existence of cycles without affecting the probabilistic balance required for optimal decoding power.
Finally, further investigation into some of the negative aspects of the SXS
algorithm would be instructive. The performance of these codes could possibly be much better if there were a practical solution to the mapping problem caused by the domain transformation that the algorithm requires. The use of cornpanding to extend the dynamic range of the fixed point numbers may provide answers. It may also be possible to adjust the computations at each of the nodes in such a way that their output is already in the transformed state, and this non-trivial advancement could further simplify the decoder.
Ultimately, the next major step with LDPC codes will be their adoption and use in widespread applications. The power of these codes, and the advances in decoding and encoding performance have put. them into the 'practical and useful' realm. The potential for exploiting the parallel structure of these codes should provide a throughput boost that will also enable their use for high bit rate applications in the near future.

Appendix A
The Forurard-BackvVard Algorithm The Forward-Backward algorithm (FBA, also known as BCJR, after its authors), was de-veloped by Bahl, Cocke, Jelineck and Raviv in 1974, and represented a major step forward in computational efficiency for decoding on trellises. The algorithm is used throughout the course of this work and will be presented here.
Consider a parity check equation fl (.rl, .r,2, x3), a function of three input variables. Fig-ure A.1 shows a trellis diagram associated with ,fl(). Having a degree of three, we want to determine the three messages that are to output along each edge, based on the three input messages. A simple, but computationally intensive method is to calculate each output one by one. This requires a number of repetitive calculations, and the FBA removes the need to repeat these, resulting in a more efficient procedure.
Each stage in this trellis represents one of the three variables that participate with the particular parity check, and each node represents points where the parity function is sat-isfied. Edges represent the possible values that each variable an take. Each edge also has a corresponding input message, a f~~i (z), represented by P~~ (z) in the figure. Any possi-ble path between the nodes represents a possible combination of (xl, ~2, xs) which satisfies parity.
The FBA is aptly narne<l, as it first computes a set of forward values, known as the cx values; and then a set of backward values, known as the /3 values. There are two forward values, ao~n~ and al~n~ for each stage of the trellis where n is stage in the trellis. Moving P~2(0) ' 0 ' Pa,~~) i ' Px3~~) i i i i 0 ~ 1 P~z(1)~ 0 P~~~1) ~ ~ P:~3~1) P~Z(0) Figure A.l: Trellis diagram of check with sites, xl, .r,2 and .r,3 forward through the trellis we calculate the values of a, and ao(1~ and al(1~
are initialized to 1 and 0 respectively. Figure A.2 shows how a is computed at each stage, and are computed using the equations ao(n~ = c~o(n - l~u~"-,-~yl(0) + al (n - 1~~~~-,--,f, ~l)~
and al(n~ = ao(n - l~u~~-,-~f, ~1) + al (n - l~u~,~-,--~f, ~0)~
Once all of the ~ values are determined, there is a backward step, which sweep through the trellis in the opposite direction. Two values, ~3o and ,Ql are computed for each stage of the trellis in the backward step. The first values, ,Qo (3~ and X31 (3~ in our example, are set to l and 0 respectively. The remaining values are computed using the equations ,~o(n~ _ ~3o(7z + l~u~,~+,-->f, ~0) + ,~i(n + l~u~~+,->f, ~1)~
and ,~i(n~ _ ,~o(n + 1~~~,~+,->ful) + ~yn + l~u~~+,->fn0)~

After all of the a and ~3 values have been determined, the output messages from each of the check node edges may easily be determined. This is calculated using the equations u,i~-»~(~~ = ao~n~~o~n~ -~ ai~n~~yn~~
and 'uf~-»,a ~ 1 ~ = ao (n~,~i (n~ --I- al ~n~,~o ~n~ .
At this point, all of the output messages from the node have been determined.
This process is repeated at all nodes to update all of the messages in one side of the graph, and this algorithm can be applied for both the check and variable nodes. The algorithm works well regardless of the actual computation required to determine an a or f~, whether it is as simple as an addition or a complex product, the overall computational power needed can be greatly reduced, especially a,s the trellis grows in length.

ao [n]
forward a 1 ~ry Qo[~.] pxa(0)pas(0)+ ~ P 0 ~ 1 pxa (1)I'~3 (1) ward , , /~i[n] P~a(0)p~3(1)+
pxa(1)pTa(0) p~ 7 0 , , , , 1 ~ 2 ; 3 Figure A.2: FBA a, ~3 computation Bibliography (1~ S. Crozier A. Hunt and D. Falconer. Hyper-codes: High-performance low-complexity error-correcting codes. In Proceedings of the 19th Biennial Symposium on Commrcnica-dons, C~~ieen',s Urziversit;y, Kingston,, Ontario, pages 263-270, 1998.
(2~ C. Berrou, A. Glavieux, and P. Thitirnajshima. Near Shannon limit error-correcting cod-ing and decoding: Turbo-codes. In Proceedings of 199 IEEE International Conference ore Communications, pages 1064-1070, Geneva, Switzerland, 1993.
(3~ T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons, NY, NY, 1991.
(4~ M.C. Davey and D. MacKay. Low density parity check codes over GF(q). IEEE
Com-mv,niratinr~,.s Letters, Jun 1998.
(5~ M. Frigo and S. Johnson. FFTW: Fastest fourier transform in the west v.2.1.3.
http:~~www.fftw.org.
(6~ R..G. Gallager. Low Density Parity Check Codes. Number 21 in research monograph series. NIIT Press, Cambridge, Mass., 1963.
(7~ C. Hartmann and L. R.udolph. An optimum symbol-by-symbol decoding rule for linear codes. IEEE Trarasactiaras o~rc Ireforrraatio~rc Theory, IT-22(5), Sept. 1976.
(8~ R,. Johannesson and K. Zigangirov. F~cnda,mertta,ls of Convoh~tional Coding. IEEE Press, New York, New York, 1999.

(9~ G. David Forney Jr. Codes on graphs: Generalized state realizations.
Submitted to IEEE Transactions on Information Theory, 1999.
(10~ F. Kschischang, B. Fret', and H. Loeliger. Factor graphs and the sum-product algorithm.
Submitted to IEEE Transactions on Information Theory, Jul 1998.
(11~ F. Jelinelc L. Bahl, J. Cocke and J. Raviv. Optimal decoding of linear codes for mini-mizing symbol error rate. IEEE Transactions on Information Theory, Mar 1974.
(12~ M. Luby, M. Mitzenrnacher, M. Shokrollahi, and D. Spielrnan. Analysis of low density codes and improved designs using irregular graphs. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing, pages 249-258, 1998.
(13~ D. MacKay. Decoding times of irregular Gallager codes. Irz Progress, Oct 1998.
(14~ D. MacKay. Good error-correcting codes based on very sparse matricies.
IEEE Trans-actions ova Ir~forrr~ation Theory, 45(2, Mar 1999.
(15~ D. MacKay and Nl. Davey. Evahration of Gallager codes for short block length and high rate applications. In Progress, Jan 1999.
(16~ D. MacKay, S. Wilson, and M. Davey. Comparison of constructions of irregular Gallager codes. In 1998 Allerton Conference on Communication, Control, and Computing, 1998.
(17~ P. Hoeher P. Robertson and E. Villebrun. Optimal and sub-optimal maximum a poste-riori algorithms suitable for turbo decoding. IEEE Communications Theory, 8(2~:119-120, March 1997.
(18~ T. Richardson, A. Shokrollahi, and R. Urbanke. Design of provably good low-density parity check codes. Sub7nitted to IEEE Transactions on Information Theory, Mar 1999.
(19~ T. Richardson and R. Urbanke. The capacity of low-density parity check codes under message-passing decoding. Submitted to IEEE Transactions on Information Theory, Nov 1998.

(20~ K. Gracie S. Crozier, A. Hunt and J. Lodge. Performance and complexity comparison of block-like turbo codes, hyper-codes, and tail biting convolutional codes. In Proceedings of the 19th Biennial Symposium on Communications, Queen's University, Kingston, Ontario, pages 84-88, 1998.
(21~ P. Sauve S. Crozier, J. Lodge and A. Hunt. SHAR,C DSP in flushed and tail-biting Viterbi decoders. The DSP Applications Newsletter, 41, September 1999.
(22~ C.E. Shannon. A Mathematical Theory of Communications. Bell Systems Technical Journal, 1948.
(23~ Vladislav Sorokine. Gallager Codes ,for CDMA Applications. PhD thesis, University of Toronto, Canada, 1998.
(24~ D. Spielman. Linear-time encodable and decodable error-correcting codes.
IEEE Trans-actions on Information Theory, 42, Nov 1996.
(25~ D. Spielman. The complexity of error correcting codes. Lecture Notes in Compv,ter Science, 1279:67-84, 1998.
(26~ D. Spielman. Finding good LDPC codes. 36th Annual Allertora Conference on Com-munication, Control, and Computing, 1998.
(27~ Michael Wall. GAlib: A C++ library of genetic algorithm components.
http:~~lancet.mit.edu~ga~.
(28~ S. Wicker. Error Control Systems for Digital Corremrcnicatiora and Storage. Prentice Hall, Upper Saddle River, New Jersey, 1995.

We present the Sum-X-Sum algorithm, a representation and method for computation of error control decoding metrics, and provide an example for a hardware implementable low density parity check decoder which performs well with low complexity. Performance and complexity are compared with the sum-product algorithm and it is found that while the Sum-X-Sum algorithm performs nearly as well, it is much less co7nputationally complex.
1 Introduction One of the major drawbacks in the implementation of iterative decoding schemes has been the relative complexity in the decoding procedure. Foe example, low density parity check (LDPC) codes that perform well generally require too many iterations or require block lengths that are too long for practical applications.
Conversely, those codes that may be practical don't stand up to comparisons with other decoding methods such as Turbo-codes ~1~.
It has been a challenge to find a method to reduce the complexity of the iterative decoding process without significantly reducing the performance ~2~. We make use of the duality found in code graphs to reduce the complexity without an effect on code performance. The use of a dual code for the purpose of decoding is not new (3, 4~, but the efficient transformation technique from one state to another has not been previously used.
Iterative decoding operation on graphs generally has two steps; updating the messages sent to checks nodes, and updating messages sent to variable nodes. With the Sum-Product algorithm, a complicated yet powerful decoding technique, it is simple to compute the 'Sum' messages, but quite complicated to compute the 'Product' messages ~5~.
We solve this difficulty by transforming the messages needed for the 'Product' computation and trans-forming them into their dual representation so that the 'Product' computation actually becomes a 'Sum' computation. The added complexity of the transformation is generally much less than the savings from the dual message calculation. The algorithm may be implemented in conventional manners, both in hardware and software.
2 Algorithm Description Iterative decoding schemes such as the sum-product algorithm are generally described in a graphical form.
Figure 1 gives an example of a graphical description of a parity check code.
The graph is bipartite, with variable nodes being represented by the circles at the top, check nodes represented by squares at the bottom, and edges showing the interconnection between the two. Iterative decoders operate by passing messages back and forth along the edges, computing new messages at both the variable and check nodes.

Figure 1: Example of a graphical representation of an LDPC code. The circles represent variables, and the squares are parity check nodes Figure 2: Message passing and updating at variable and check nodes showing which message representations are used for each update equation.
The Sum-Transform-Surn Algorithm (Sum-X-Sum Algorithrrr), is au instantiation of a soft-input, soft-output iterative decoding algorithm, and is similar to the standard sum-product or min-sum algorithms. The inputs to the algorithm are soft representations of the probabilities for the possible transmitted symbols.
For example, over the binary input, channel, these possible states are 0 and 1. The probabilities may be represented by a floating point value, a fixed point, value or some other proportional means, depending on the actual implementation.
Unlike other iterative decoding schemes, the Sum-X-Sum algorithm utilizes two separate probability representations for messages and requires a transformation algorithm to convert between the two. The method with which the probabilities arc transformed may be implemented using logic, functions, memory, or any combination of these.
The two representations for messages used are the log-ratio and the log-magnitude-difference, both of which have been used in other decoding algorithms (2~ (6~. The log-ratio representation can be written, ratio = log P~i~ (1) The log-magnitude-difference representation is a pair of values (s, r), the first defined over the set {-1, 1}
representing the sign bit, and the second over the set of positive real numbers such that ~.~~fr --- (Sgn (Pfol - P(11) , Ilog IP(o~ - p(1111) (2) With two separate probability representations, the Sum-X-Sum algorithm requires a function r(x) (s, r) to transform messages between them. Let ~e,.atio be a message in the log-likelihood ratio domain, and ~.difp be a message in the log-likelihood difference pair domain. The function T(~) (s, r) takes as its argument a message of the form pratzo and returns a message of the form pd2p f. The returned message has the equivalent probability as the input, message, only expressed in a different manner. Similarly, the inverse transform function h-t ( s, r) (~) takes as its input. a message of the form ~,di p f and returns the equivalent probability in a message of the form ~,,.o~.to. The function r(x)(s,r) may be written as r(:r)(s,r) _ Csgrr I2 ~ PT ~ - 1J , log 2 ~ Pa ~ - I ~ (3) L 1+e~' eT+1 where x is a message of the form ~.,.~t~o. Likewise, the inverse transform function r-t (s, r)(~) may be written as ),-r (s, r) (x) = log ~ 1 + SeT ~ (4) where (s, r) is the sign and magnitude of a message of the form ~,~.~ f f.
The function in (3) and its inverse, (4), may be implemented very simply using a pair of look-up tables in memory. If, for example, messages are stored as 8-bit (N-bit in general) values, then two look-up tables each of size 2s = 256 bytes (2'~' N-bit values in general) are required to completely describe the forward and inverse functions. One need only to calculate the value of the transform for each possible input once, and store the results in memory for future use. Alternatively, the function and its inverse may be used themselves to calculate the necessary transforms on an as-needed basis.
It may also be possible to implement the transformation through the use of a combination of logic and memory. For example, it rnay be a simple matter to use Borne logic to determine the most significant bits of the transform output, and determine the least significant, bits in a reduced look-up table. This flexibility allows the algorithm to be used in an optimal manner depending on the platform that is it is implemented on. If processing speed is a constraint, it would make sense to take advantage of the look-up table as much as possible as opposed to calculating multiple transforms.
The algorithm passes and updates two sets of messages back and forth iteratively, one set such that the parity constraints are met using ~rdi f f representation as inputs, and the other such that certain variable constraints are met using the ~er~t2o representation. An examination of Figure 2 shows how messages are input into a node and an output is determined. For each message input into a node, a corresponding message which has been transformed is used to update that message going in the other direction. This will be used as an input for the other side of the graph. For example, the output of a check node will be transformed from a ~C~iff to a /~rotio message.
Two arrays may be used to store the set of messages being passed in both directions on the graph, one for check node input (and hence variable node output), and the other for variable node input. The message updating rule for a variable will take as its inputs a number of elements from the variable input array, and output in a transformed representation, the same number' of elements to possibly different elements of the check node input array. This is done for all variable nodes in the graph so that all elements in the variable input array are used possible a multiple of times.
Likewise, the message updating rule for check nodes takes as inputs a number of elements from the check input array, and outputs in the transformed representation the same number of elements to possibly different elements of the variable input array.
2.1 Algorithm Initialization Information is sampled from the <;hannel and converted into probability messages of the form ~rotio and are stored in an array as the messages y,i~~t. This set of probabilities are used to determine whether a valid codeword has been received. If a valid codeword is detected, no decoding needs to be done, and the codeword is output from the algorithm. If a valid codeword is not detected, the received messages are transformed into ~,d2 p p messages and are used to seed the check node input array.

The algorithm requires a useful description of the code construction, which rnay be a table listing the connectivity between the check nodes and variable nodes. The actual table may be randomly determined using permutations or some other technique, or the code may be constructed manually. This table is used to determine which messages participate in a given check node or variable update equation, which are the sets f (i) and f (j) in Figure 2.
2.2 Decoding Procedure Once information has been sampled off the channel and a valid codeword has not been detected, the decoding algorithm commences by updating the variable input, array messages. This is accomplished using the following equation:
l~rnti.a - r ~ ~ ''~'w;~; f~' ~ rW;~=!I (5) ~Ef(2) .xEf(~) where i is the ith element, of the variable input array and f (i) is the set of messages connecting to the node associated with i, excluding i itself. The variables s and r represent the sign and magnitude for the pair (s, r) w hick is how ~,~i f f messages are represented. This equation is used a number of times based on the structure of the LDPC code to update all of the elements of the variable input array. In order to increase the efficiency of this equation, the fnrxun,rrl-6n,ck;wn,rd algorithm may be used ~7~, a one-pass algorithm may also be used (8~, or some other efficient, technique may be utilized. These techniques also apply to equation 6 when updating the check input, array.
Once completed, the check node input message array is then updated. The rule for updating these messages is governed by the equation ~~tlif f - r ~~'inil + ~ ~lratio (6) aEf(J) where j is the jth element, of the check node input array and f ( j ) is the set of messages connecting to the node associated with j, excluding j itself. The order in which the variable and check node input arrays are updated may vary depending on the actual code used. The updating may be performed in a serial manner or parallelism may be exploited to increase the overall throughput of the decoder. The actual update schedule may vary, it is importarxt only that all messages are updated in a timely manner.
After all of the messages have been updated once, one iteration of the decoding algorithm has been completed. The decoding algorithm may check to determine if the current probabilities in both the variable input array and check node input, array correspond to a valid codeword. If a valid codeword is detected, the algorithm stops and the current probabilities are output. The algorithnx may also continue the decoding procedure, by repeating each of the two update steps until some criteria is reached such as a certain fixed number of iterations, or until a valid codeword is detected.
If the decoding procedure fails to produce a valid codeword, a number of actions may be taken. The errors may simply be ignored and the current estimate of the codeword be output. A second outer code may be used to attempt to correctly fix the remaining errors. Some type of retransmission request may be made, or the output may be flagged as to be in error.
3 Performance and Complexity Figure 3 shows the bit-error performance of the Sum-X-Sum and sum-product algorithms for a regular (3,6) LDPC code. The Sum-X-Sum algorithm performance approaches that of the sum-product algorithm, and 0.01 ,.
' ' 0.001 ' ' ' '' v ' W '' Pa 0.0001 '' , SXS
SP '' SXS SP,~~'~.
1 e-05 v le-06 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 SNR, (Eb/No) Figure 3: Comparison between Sum-X-Sum and sum-product performance for regular (3, 6) codes. The two left curves are block lengths of 10,000, the two right curves are block lengths of 1000. In both cases the sum-product decoder is more powerful.
for large enough fixed-point precision, the performance actually equals that of the sum-product algorithm.
The performance of both these algorithms may be increased significantly by using optimized irregular LDPC
codes, but. for comparative purposes we shall focus on regular LDPC codes.
Having been designed with reduced complexity in mind, and having seen that the performance of these codes can be quite good, it is important to understand the overall complexity that is involved in the practical implementation of the Sum-X-Surn algorithm. To this en<I, we calculate the computational power required per bit decoded.
At each computational node, we assume that the Forward-Backward algorithm ~7~
is used. It is also assumed that the hardware that the algorithm is implemented on is able t,o handle the bit-depths required with one operation, e.g., 12-bit memory look-ups require one operations, as do 10-bit additions. Taking these assumptions into account, the number of computations required per bit is CSum-X-Sum = I $ r 5 - r 6 (7) fo ~(~)d~ fu P(~)rdx where I is the number of iterations. It is difficult to do a straight complexity comparison with the sum-product algorithm due to the fixed vs. floating point computation differences.
The computations required per bit for the Sum-X-Sum and sum-product algorithms are presented in Figure 4, assuming a one-to-one complexity between fixed and floating point computations. The Sum-X-Sum algorithm requires a significantly lower number of computations than the sum-product, so clearly there is an advantage in its use. Given the pessimistic comparison between fixed-point and floating-point complexity, it is encouraging to see that as a worst case, the Sunr-X-Sum algorithm performs quite well for its complexity. A
more realistic comparison will only show a.n overall better outcome in favor of the Sum-X-Sum decoder.

Figure 4: Complexity and target SNR for Sum-X-Sum Algorithm codes of length 10,000 vs. similar codes using the sum-product algorithm. Complexity is presented as computations per bit per iteration.
4 Implementation Architecture The Sum-X-Sum algorithm has been designed with the intention of being easily implemented in a fixed-point hardware environment, and we have seen that with this goal in mind good performance is still possible. The use of a non-optimized, regular code simplifies the structure of the hardware required to implement the algorithm and will be used to describe the hardware. In general however, any iterative code may use this algorithm Let the code be a regular {1000, 3,6} code, therefore the number of edges on the graph is 3,000. Figure 5 shows a block diagram for a possible implementation, and is the used for this example. Assume that there is a method to convert channel samples into fixed-point probability messages (~cratio messages), and store the received word in an array called Input. There will be two sets of messages that will be needed, and the memory requirement for these are each of size 3,000 and are the check and variable message memory blocks.
There is also the need for two look-up tables to perform the necessary transformations from one message domain to the other, called the check and variable node transform blocks. The memory requirements for these arrays is entirely dependent on the fixed-point precision used, and if we assume 8-bit precision, we require a memory of size 2 x 28 = 512 bytes.
The decoder must know which edges participate at each of the nodes, and this is achieved through the use of the check and variable node descriptors, both of length 3,000. Each of these arrays contains a list of how each edge is connected to its neighbour, and provides the information required to perform the update rules. Note that there is only a need for one of these arrays, as the information from one can be determined from the other, but the format in which the information is stored in both allows for a simple approach to updating both check and variable nodes.
Once the input array is initialized, the variable message memory block is set to be erasures, and the decoding process may commence. The variable node descriptor is stepped through and in groups of three, the FBA is used to produce three output messages which are transformed and written to the proper locations in the check message memory block. A similar process is used to then write new variable messages. The process repeats until Borne termination criterion is satisfied, which is not shown in Figure 5. The criterion may simply be some counter tracking the iterations, or may be a test to determine if a valid codeword has been reached. After the decoder has terminated, the decoded word is output.
This may be a soft decision or a hard decision, but some decision is made and is output. The word may be used from that point in future processing.
This relatively straightforward implementation requires an overall memory of size 13, 512, or about 13k.
In general, the amount of memory 111 needed for this implementation for a code of length n and a bit precision of b is M = 4n ~ .~(~)dx + 2b+r (8) It is also useful to look at the throughput capabilities for this architecture. The number of computations per bit per iterations is given in equation 7. Given a rate R = 2 code, and limiting the number of iterations to 25, the number of computations required per information bit is CI = 25 x 5.33 = 2 = 266.5 (9) Check Node Descriptor Check Node Check Node Update Transform Function Block Block Check Msg. Variable Msg.
Memory Memory Decoded Word Block ~ ~ 1 Block Variable Node Variable Node Transform Update Block ~ Function Block Variable Node Input ~ From Channel Descriptor 1' I~llre J: BIOCIC dlr'l~f8IT1 of Sllnl-X-Sllm al~Orltrlnl a,rC~llteCtllre.
S(~llare blOfICS repreSPIlt memory IOCat10I1S~
rounded boxes are computation nodes.

In a completely serial implementation such as the one presented, if the processor was capable of 40MIPS, then the throughput would be approximately 150kbps. As a very rough comparison, convolut,ional codes can achieve about the same throughput on a similar system, though with fewer memory requirements (9J.
However, these codes do not necessarily perform as well.
One of the benefits of LDPC codes is their inherent parallel structure. This structure can be taken advantage of in a hardware implementation to greatly enhance the throughput of the system. It would not be difficult to introduce parallel update function blocks to speed up the decoder, and although doubling the number of function blocks wouldn't necessarily double the throughput, it would increase it dramatically as most of the computations required take place in these blocks.
Conclusions We presented a novel iterative decoding representation and method which is well suited to practical imple-mentations. The use of the dual domain can greatly simplify the decoding procedure. It is a viable solution for a wide variety of codes, and can perform asymptotically as well as the sum-product algorithm. The overall complexity can be made very low through the use of look-up tables or other simple transformation techniques to perform some of the required calculations.
References (1J C. Berrou, A. Glavieux, and P. Thitima,jshima, "Near shannon limit error-correcting coding and decoding:
'Ilrrbo-codes," in Proceedin,,gs of 199, IEEE In,terreational Conference on Comrnv,~aication,s, (Geneva, Switzerland), pp. 1064-1070, 1993.
(2J D. MacKay, "Good error-correcting codes based on very sparse matricies,"
IEEE Transactio~as on Infor-~nation Theory, vol. 45, pp. 399-431, Mar 1999.
(3~ C. Hartmann and L. Rudolph, "An optirnurn symbol-by-symbol decoding rule for linear codes," IEEE
Tra~,sactioras o~c hnformatiorc Tlaeory, vol. IT-22, pp. 514 517, Sept. 1976.
(4J G. D. Forney, "Codes on graphs: Generalized state realizations,"
Sr~b~mitted to IEEE Tran.sactiorcs on Information, Theory, 1999.
(5J F. Kschischang, B. Frey, and H. Loeliger, "Factor graphs and the sum-product algorithm," Submitted to IEEE Transactions on Information Theory, Jul 1998.
(6J T. Richardson, A. Shokrollahi, and R. Urbanke, "Design of provably good low-density parity check codes," Srcbmitted to IEEE Trarc.sactiores on Inforrrtatio~ Theory, Mar 1999.
(7~ L. Bahl, J. Cocks, F. Jelinek, and J. Raviv, "Optimal decoding of linear codes for minimizing symbol error rate," IEEE Transactions on Information Theory, pp. 284-287, Mar 1974.
(8~ R. Johannesson and K. Zigangirov, Fundarn,eutads of Convolv,tional Coding.
New York, New York: IEEE
Press, 1999.
~9~ S. Crozier, J. Lodge, P. Sauve, and A. Hunt, "SHARC DSP in flushed and tail-biting Viterbi decoders,"
The DSP Applications Newsletter, vol. 41, September 1999.

Claims

1. A method for decoding comprising the steps of:
i) receiving a set of probability messages in a first format;
ii) updating said set of probability messages based on an operation that compares said set of probability messages with a first-set of code constraints;
iii) transforming said set updated at step ii) into a second format;
iv) updating said set transformed at step iii) based on an operation that compares said set of probability messages with a second-set of code constraints complementary to said first-set;
v) transforming said set updated at step iv) back into said first format;
vi) repeating steps ii)-v) on said set of probability messages transformed at v) until a desired level of decoding is reached; and, vii) outputting said set of probability messages determined at step vi).

2. The method according to claim 1 wherein said first-set is a set of variable node descriptors and said second-set is a set of check-node descriptors.

3. The method according to claim 1 wherein said desired level of decoding is reached when step vi) is performed a predetermined number of times.

4. The method according to claim 1 wherein said desired level of coding is reached when step vi) is performed until said set of probability messages matches a known type of valid code.