CN110971244A

CN110971244A - Forward error correction decoding decoder based on burst error detection

Info

Publication number: CN110971244A
Application number: CN201910994927.4A
Authority: CN
Inventors: 张为; 王佳琪; 陆薇
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-10-18
Filing date: 2019-10-18
Publication date: 2020-04-07

Abstract

The invention belongs to the field of error control coding in channel coding, and aims to shorten the delay of a key path of a decoder on the premise of ensuring the decoding performance and improve the decoding throughput rate. Therefore, the technical scheme adopted by the invention is that the forward error correction decoding decoder based on burst error detection comprises the following steps: the system comprises a syndrome calculation SC module, a key equation solving KES module and a chien search and error estimation CSEE module, wherein a syndrome calculated by the SC module is output to the KES module, and an error position polynomial Lambda (X) and an error estimation polynomial omega (X) calculated by the KES module are output to the chien search and error estimation CSEE module. The invention is mainly applied to the design and manufacture occasions of the decoder.

Description

Forward error correction decoding decoder based on burst error detection

Technical Field

The invention belongs to the field of error control coding in channel coding, and relates to Reed-Solomon (RS) code coding and decoding related technology, pipelining technology and retiming technology, in particular to a forward error correction decoder framework of a Reed-Solomon (RS) code RS (255,239) suitable for a 100 Gb/rate optical communication system or above.

Background

In recent years, the transmission of digital information is becoming more frequent due to the development of communication technology, digital signals are transmitted in time and space through a medium or a storage device such as a wired medium or a wireless medium, however, errors occur due to the fact that the digital signals are interfered by noise in different degrees during transmission due to the non-ideal transmission channel. Error control coding is a technology for correcting errors generated in a transmission process of digital information by using coding and decoding technology. As an important error control coding scheme, the RS code has been widely used in various fields such as wireless communication, data storage, deep space exploration, and digital video broadcasting due to its characteristics of strong error correction capability, simple structure, and efficient decoding algorithm and large-scale integrated circuit development.

In 2009, the RS (255,239) code was defined as the transmission standard of submarine optical fiber systems, high-speed optical fiber systems, gigabit passive optical communication networks, and the like, by ITU-T (international Telecommunication union's telecommunications selector), because of the high practical value of RS codes. With the rapid development of optical communication systems, due to the over-high transmission rate and the over-long transmission distance, the error of a large amount of data during transmission becomes more serious, and even further development of optical communication systems is limited.

The decoding method of the RS code mainly comprises two categories of soft decision decoding algorithm and hard decision decoding algorithm. The soft-decision decoding algorithm can fully utilize the channel soft information in the received signal, so that the soft-decision decoding algorithm has higher coding gain and error correction capability than the hard-decision decoding algorithm, but needs to consume more hardware resources and is not beneficial to the application of the algorithm. The hard decision algorithm and the hardware architecture are simple, and the method has absolute advantages in the practical application of the current decoder. The hard-decision RS decoder mainly includes three modules: syndrome Computation (SC), Key Equation Solving (KES), Chien Search and Error Evaluation (CSEE). As the most classical RS decoding algorithm, a hardware architecture of the RiBM (reconstructed updated Berlekamp-Massey) algorithm includes 3t +1 homogeneous Processing units (PE), and 2t clock cycles are required to complete the calculation of the error location polynomial and the error estimation polynomial, where in this specification, t represents the error correction capability of the RS code. An mCS-RiBM (modified Compensated Simplified-reconstructed Berlekamp-Massey) algorithm derived on the basis of the RiBM algorithm removes a plurality of redundant processing units, and the hardware resource consumption is obviously reduced.

Note that in the existing hard decision decoder architecture, the delay of the syndrome computation module and the chien search and error estimation module is n (n is the RS code packet length) clock cycles, and the minimum delay of the KES module is 2t-1 clock cycles, so that the KES module has a large amount of idle time, which causes waste of hardware resources, and the directly implemented decoder has large occupied area, low throughput rate and low decoding efficiency. Therefore, the design method of the RS code hard decision decoder architecture capable of effectively improving the decoding speed needs to be further researched.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a parallel decoder framework based on the current most advanced hard decision RS decoding algorithm-mCS-RiBM algorithm, and on the premise of ensuring the decoding performance, the delay of the key path of the decoder is shortened, so that the decoding throughput rate is improved. Therefore, the technical scheme adopted by the invention is that the forward error correction decoding decoder based on burst error detection comprises the following steps: the system comprises a syndrome calculation SC module, a key equation solving KES module and a chien search and error estimation CSEE module, wherein a syndrome calculated by the SC module is output to the KES module, an error position polynomial Λ (X) and an error estimation polynomial Ω (X) calculated by the KES module are output to the chien search and error estimation CSEE module, and all dislocation positions and error values are calculated by the CSEE module; the SC module adopts the syndrome calculation formula as follows to calculate the syndrome:

s_i＝(…(r_n-1α^i(q-1)+r_n-2α^i(q-2)+…+r_n-q+1αⁱ+r_n-q)α^iq+…+r_q)α^iq+r_q-1α^i(q-1)+r_q-2α^i(q-2)+…+r₁αⁱ+r₀， (2)

wherein, αⁱA root representing a symbol generator polynomial, i ═ 1,2,3 … 2 t; t is a syndrome S_iNumber, i is more than or equal to 0 and less than or equal to 2t-1, n is the length of the processed code element packet, r₀、r₁…r_n-1Representing received codeword polynomial coefficients; q is a parallelism factor.

In the syndrome calculation SC module, the total of all 2t syndromes is calculated2t (q +1) constant multipliers, 2t adders and 2t registers are required, the initial value of the register is set to 0, and 0 and α are in the middle of the first clock^iqAfter multiplication is still 0 and after addition to the left term the value r is obtained_q-1α^i(q-1)+…+r₁αⁱ+r₀Storing the value in a register; in the second clock cycle, the product of each item is added to obtain the value (r)_q-1α^i(q-1)+…+r₁αⁱ+r₀)α^iq+r_2q-1α^i(q-1)+…+r_q+1αⁱ+r_q) Storing the new sum of products into the register again; and by analogy, sequentially inputting each path of code elements from a high bit to a low bit, wherein each iteration occupies one clock cycle, processing q code elements in parallel in one clock cycle, after n/q cycles, obtaining the final syndrome value as the result in the register, wherein the key path of the SC module is 3Txor, the Txor is the delay of an exclusive-OR gate, and the calculated syndrome is sent to the KES module.

The KES architecture comprises a controller, t +1 PE1 processing units (numbered from 0 to t) and t PE4 processing units (numbered from t +1 to 2 t), a compensation unit cs (compensation stack), and t +1 processing circuits PE1 respectively being PE1₀～PE1_tArranged in the order of the numbers from small to large, PE1_iOutput iteration signal delta_i(r) feeding to PE1_i-1Wherein i is 1,2, …, t + 1; t processing elements PE4_t+1～PE4_2tArranged in the order of the numbers from small to large, PE4_iOutputting the iterative data signal delta_i(r) feeding to PE4_i-1，PE4_i-1Output intermediate variable signal theta_i-1(r) feeding into PE4_iWherein i is t +1, t +2, … 2 t; PE1₀Outputting the iterative data signal delta_i(r) to the controller, PE1t receives PE4_t+1Output iteration data delta_i+1(r) as input, PE4_2tThe iteration data input value of (1) is 0;

each PE contains 2 pipelined galois field multipliers, 1 galois field adder, 2-1 selectors, 15 latches,PE4 has 3 more selectors than PE1, as shown in fig. 2. Upper and lower rows of latches in PE

And

for storing coefficients of polynomials △ (x) and Θ (x), respectively, subscript k being the number of the PE, superscript denoting the channel number, latch

Receiving the operation result of the adder, buffering the operation result, and sending the buffered operation result to the outside as the iteration data signal delta output by the processing unit PE_i(r) simultaneous iteration of the data signal δ_i(r) is also returned to the data selection signal MC2_s,i(r) a controlled data selector; latch device

For receiving a routing signal MC1_s,i(r) the output signal of the controlled data selector is buffered and sent to the outside as the intermediate variable signal theta output by the processing circuit_i-1(r) simultaneously, the intermediate variable signal is also returned to the data selector;

the controller is used for controlling the selector in PE4 and storing data selection signal MC2 for processing unit PE with coefficient of Λ (r, z)_s,i(r) is marked and the signal MC2_s,i(r) are stored in an internal register block, and the controller receives the iteration output signal δ provided by the processing element PE1_i(r) while the data signal δ is to be iterated_i(r) is sent as an output signal; data selection signal MC1 output by controller_s,i(r) iteration data signal delta_i(r) and an intermediate variable signal θ_i-1(r) simultaneously as processing element PE1₀PE1t input signal; data selection signal MC2 output by controller_s,i(r) and MC3_s,i(r) as processing element PE4_t+1～PE4_2tThe input signal of (1); the iteration number r starts from 1, 1 is accumulated in each iteration until 2t iterations are finished, and when r is 2t, 2t iteration operations are carried outThereafter, the processing element PE1₀PE1t outputs error value polynomial coefficients; processing element PE4_t+1～PE4_2tOutputting a position polynomial coefficient;

mCS-the overflow factor in the RiBM algorithm needs to be stored in dependence on a CS unit which operates as follows:

Λ(r+1,z)＝γ(r).....γ(r′+1)·δ₀(r) & B (r', z) to calculate the compensation coefficient delta_cAnd passed to the PE unit at the appropriate time, where γ (r) represents the error probability, r' represents the higher order portion of r, and δ₀(r) represents the 0 th iteration data signal, B (r', z) represents the intermediate polynomial;

the CS unit consists of 5 two-way selectors, 1 multiplier and 7 registers, wherein the three selectors M1, M2 and M3 are in cascade connection, and the output of the register is buffered by three stages of registers and then is sent to the register D4 or is used as one-way input of the selector M4 after passing through the multiplier. In each clock cycle, data in registers D3, D4 or DM can be selected to be sent to the leftmost register through a selector, namely three feedback loop named L3, L4 and Lm exist in the CS architecture, three updating modes are adopted for circularly generating and transmitting related polynomial coefficients, the mode 1 is applied to the condition that k (r) is not less than 0 and flag (r) is not less than 0, k (r) and flag (r) respectively represent the initial value in the r iteration and the position of the first coefficient of Lambda (r, z), and the aim is to move the current coefficient to the left by one bit and receive a new overflow coefficient after the current iteration is completed; mode 2 applies to the k (r) <0 condition in order to multiply all current coefficients by the error probability γ (r) and move one bit to the right. Mode 3 is applied to the condition that k (r) ≧ 0& flag (r) <0, and the effect is that all coefficients remain the original positions after one iteration is completed. I.e. the current coefficient is subjected to the L4 feedback loop twice in this iteration, while keeping the value of D5 unchanged.

The CSEE module consists of 8 units, 1 register, 1 alternative data selector and 32 adders, wherein 8 units are used for calculating error positions and error values, and each unit comprises 1 selector, 1 register, 1 multiplier and lambda_j，j∈[1,8]The coefficients of the j-th order of the polynomial representing the error location are determined at the first clockPeriod, the selection signals of the selectors in the 8 cells are all set to 1, i.e. λ j, j ∈ [1,8 ] is selected]As output, sending to constant multiplier to complete corresponding multiplication operation; on one hand, the products with the highest frequency in each unit are stored in a register and are sent to the output end of the selector when waiting for the next clock period; in the second clock cycle, all selectors in 8 units are set to be 0, and the product with the highest frequency in the previous clock cycle is accumulated in a register; by analogy, the judgment of all positions can be realized by continuously raising the power of the result in the register in each iteration; the Foney algorithm module removes the C8 unit on the original basis, and outputs the error value of the error position to be added to the corresponding transmission code word after the same operation is executed.

The invention has the characteristics and beneficial effects that:

aiming at the problems of complex structure and long decoding time of an RS code decoder, the invention combines a retiming technology and a pipeline parallel architecture with an mCS-RiBM algorithm, and provides a 16-channel four-degree parallel Forward error correction (RS-FEC) decoder architecture which comprises 4-channel sub-decoders, wherein the parallelism of an SC module and a CSEE module in each sub-decoder is 4. And the delay of the key path is shortened, so that the decoding time of the RS code is shortened, and the throughput rate of the decoder is greatly improved.

Description of the drawings:

FIG. 1A modified four-degree parallel syndrome computation module (SC) module for a conventional SC cell

FIG. 24 channel Key equation solving Module (KES) architecture

FIG. 3 four-degree-parallel PE1 and PE4 processing units (1) PE 1; (2) PE4

FIG. 4 is a four-degree parallel CS circuit diagram

FIG. 5 architecture of a four-degree parallel pipeline chien search module

FIG. 6 is a 16-channel four-degree parallel FEC architecture based on RS codes

Detailed Description

An architecture for a four-degree parallel forward error correction decoder (RS-FEC) for the RS decoding algorithm-mCS-RiBM algorithm, the improvement comprising the following aspects:

(1) and the parallel structure is adopted to realize the function of the SC module, from the perspective of reducing the critical path, the syndromes are divided into an odd number part and an even number part for calculation respectively, and finally, the odd number part and the even number part are added to calculate 2t syndromes. The parallelism factor of the syndrome computing circuit is 4, each path of code elements are sequentially input according to the sequence from high order to low order, the syndrome computing module simultaneously processes 4 code elements in one clock period, and after n/4 clock periods, the computed syndrome is sent to the KES module;

(2) since the number of channels of the sub-decoder is 4, 1 pair of registers of each processing unit in the original KES module needs to be increased to 4 pairs, the number of the register stages is increased compared with the original number, and a common multiplier is replaced into a pipeline multiplier by using a retiming technology;

(3) after the function of the KES module is implemented, the calculated error location polynomial Λ (X) and error estimation polynomial Ω (X) are fed to a Chien Search and Error Estimation (CSEE) module. The CSEE module is also designed into a four-degree parallel architecture, namely, in one clock cycle, the circuit can simultaneously process 4 code elements, and the CSEE module needs n/4 clock cycles to calculate all dislocation positions and error values.

The invention mainly aims at the most advanced hard decision RS decoding algorithm-mCS-RiBM algorithm at present, designs a parallel decoder architecture, improves the utilization rate of key modules by utilizing a module multiplexing technology and a pipeline parallel design, and realizes the hard decision decoder architecture with low hardware cost and high throughput rate. The present invention will be described in detail with reference to the drawings and examples. For convenience, RS (255,239) code (n is 255, t is 8) which is the most widely used is described as an example.

(1) The first step of the decoding process is to calculate 2t syndromes S_iI is more than or equal to 0 and less than or equal to 2t-1, if all the 2t syndromes are 0, no error occurs, otherwise, the error occurs in the transmission. The most basic formula for syndrome calculation is:

in order to improve the speed and throughput rate of the decoder, the invention adopts a parallel structure to realize the function of the SC module. Changing equation (1) to the following form:

the parallelism factor of the circuit is q (q is 4 in the invention), but the traditional multi-degree parallel structure will certainly increase the critical path of the module, and in order to solve the problem, the syndrome is divided into odd part R_odd(αⁱ) And an even part R_even(αⁱ) Respectively calculating, and finally adding. Namely:

s_i＝R(αⁱ)＝R_odd(αⁱ)+R_even(αⁱ)， (3)

FIG. 1 shows a four-degree parallel module with modifications to a conventional parallel SC cell, a single SC circuit architecture requires 1 adder, q +1 constant multipliers and one register, and in a four-degree parallel SC architecture, a total of 2t (q +1) constant multipliers, 2t adders and 2t registers are required to calculate all 2t syndromes, the initial value of the register is set to 0, and 0 and α are in the middle of the first clock^iqAfter multiplication is still 0 and after addition to the left term the value r is obtained_q-1α^i(q-1)+…+r₁αⁱ+r₀Storing the value in a register; in the second clock cycle, the product of each item is added to obtain the value (r)_q-1α^i(q-1)+…+r₁αⁱ+r₀)α^iq+r_2q-1α^i(q-1)+…+r_q+1αⁱ+r_q) The new sum of products is again stored in the register. By analogy, according to the heightAnd sequentially inputting each path of code elements from the bit to the lower bit, wherein each iteration occupies one clock cycle, the SC circuit module can process q code elements in parallel in one clock cycle, and after n/q cycles, the result in the register is the required final syndrome value. The critical path of the SC module at this time is 3 Txor. In a four-degree parallel structure, only 64 clock cycles are required to complete the syndrome computation. The calculated syndrome is sent to the KES module.

(2) 4-channel KES architecture. Since the number of channels of the sub-decoder is 4, only 1 pair of registers of each processing unit in the original KES module needs to be increased to 4 pairs, and the overall architecture of the 4-channel KES module is as shown in fig. 2. The complete KES architecture includes a controller, t +1 PE1 units and t PE4 units, a Compensation Stack (CS).

the t +1 processing circuits PE1 are respectively PE1₀～PE1_tArranged in the order of the numbers from small to large, PE1_iOutput iteration signal delta_i(r) feeding to PE1_i-1Wherein i is 1,2, …, t + 1; t processing elements PE4_t+1～PE4_2tArranged in the order of the numbers from small to large, PE4_iOutputting the iterative data signal delta_i(r) feeding to PE4_i-1,PE4_i-1Output intermediate variable signal theta_i-1(r) feeding into PE4_iWherein i is t +1, t +2, … 2 t; PE10 outputs iteration data signal delta_i(r) to the controller, PE1t receives PE4_t+1Output iteration data delta_i+1(r) as input, PE4_2tThe iteration data input value of (1) is 0. Due to the increased number of register stages, a pipeline multiplier can be replaced with a normal multiplier using retiming.

The four-degree parallel PE processing unit is shown in fig. 3. Each PE contains 2 pipelined galois field multipliers, 1 galois field adder, 2-1 selectors, and 15 latches. PE4 differs slightly from PE1 by 3 more selectors. Upper and lower rows of latches in PE

And

for storing the coefficients of polynomials △ (x) and Θ (x), respectively, with subscript k being the number of the PE and superscript representing the channel number

Receiving the operation result of the adder, buffering the operation result, and sending the buffered operation result to the outside as the iteration data signal delta output by the processing unit PE_i(r) simultaneous iteration of the data signal δ_i(r) is also returned to MC2_s,i(r) a controlled data selector; latch device

For receiving the data from the MC1_s,i(r) the output signal of the controlled data selector is buffered and sent to the outside as the intermediate variable signal theta output by the processing circuit_i-1(r) simultaneously, the intermediate variable signal is also returned to the data selector;

the controller is used for controlling the selector in PE4, and MC2 is used for processing unit PE storing Λ (r, z) coefficient_s,i(r) is labeled, and MC2_s,i(r) is stored in an internal register block. The controller receives the iterative output signal delta provided by the processing unit PE1_i(r) while the data signal δ is to be iterated_i(r) is sent as an output signal; data selection signal MC1 output by controller_s,i(r) iteration data signal delta_i(r) and an intermediate variable signal θ_i-1(r) simultaneously as processing units PE 10-PE 1t input signals; data selection signal MC2 output by controller_s,i(r) and MC3_s,i(r) as processing element PE4_t+1～PE4_2tThe input signal of (1); the iteration number r starts from 1, and 1 is accumulated in each iteration until 2t iterations are finished. When r is 2t, namely after 2t times of iterative operation, the processing units PE 10-PE 1t output error value polynomial coefficients; processing element PE4_t+1～PE4_2tAnd outputting the position polynomial coefficient.

Λ(r+1,z)＝γ(r)··γ(r′+1)·δ₀(r) & B (r', z) to calculate the compensation coefficient delta_cAnd passed to the PE unit at the appropriate time. Fig. 4 shows a four-degree parallel CS circuit diagram, which is composed of 5 two-way selectors, 1 multiplier and 7 registers, where three selectors M1, M2, and M3 are cascaded, and the register output is buffered by three-level registers and then sent to D4 or used as one-way input of the selector M4 after passing through the multiplier. In each clock cycle, data in D3, D4 or DM can be selected to be sent to the leftmost register through the selector, namely three feedback loops of L3, L4 and Lm exist in the CS architecture. Since the channel parallelism in this architecture is 4, each iteration is completed with 8 clock cycles. In the proposed CS cell, there are three update modes for cyclically generating, transferring the polynomial coefficients involved. Mode 1 applies to k (r) ≧ 0&flag (r) ≧ 0, in order to shift the current coefficient one bit to the left and receive a new overflow coefficient after the iteration is completed. The specific implementation steps are that the current coefficient passes through a 4-stage L4 feedback loop for one time, so that the coefficient returns to the original position after 4 cycles, then the 3-stage L3 feedback loop is completed from the 5 th clock cycle, the values of D5 and Dm are kept unchanged, and when the 8 th clock cycle comes, a new overflow coefficient theta is added_fInto the rightmost register. Mode 2 applies to k (r)<The 0 condition, the objective is to multiply the current overall coefficient by γ (r) and move one bit to the right. The specific operation steps are that the current coefficient passes through an Lm feedback loop of 5-stage in the first 4 clock cycles, and all data starts to complete an L4 feedback loop in the 5 th cycle. Mode 3 is applied to k (r) ≧ 0&flag(r)<Under the condition of 0, the effect is that all the coefficients keep the original positions after one iteration is finished. I.e. the current coefficient is subjected to the L4 feedback loop twice in this iteration, while keeping the value of D5 unchanged.

In the four-degree parallel architecture, since each iteration is performed in 4-channel serial mode, the product coefficient theta (x) of the intermediate polynomial and the syndrome of the first channel is output at the 60 th clock cycle after the start of each iteration, and the output of all 4 channels is completed at the 63 th clock cycle.

(3) In the key part of completionAfter the process is solved, the obtained error position polynomial and error value polynomial are sent to the CSEE module, wherein the Qian search module calculates the root of the error polynomial, and the Forney algorithm module calculates each error value, in order to be suitable for the high-speed RS decoder, the CSEE module also needs to adopt the pipelining technology, because of the existence of a feedback loop, the coefficient of α needs to be adjusted in the pipelining CSEE architecture to adjust the time sequence, FIG. 5 shows the architecture of the four-degree parallel pipelining Qian search module, the circuit realizes the Qian search function through the following function of lambda in the figure_j，j∈[1,8]The j-degree term coefficients representing the error location polynomial. In the first clock cycle, the selection signals of the selectors in the 8 cells are all set to 1, i.e. λ is selected_j，j∈[1,8]As output, sending to constant multiplier to complete corresponding multiplication operation; on one hand, the products with the highest frequency in each unit are stored in a register and are sent to the output end of the selector when the next clock cycle is waited. In the second clock cycle, all selectors in 8 cells are set to 0, and the product with the highest number of times in the previous clock cycle is accumulated in the register. By analogy, the determination of all positions can be realized by continuously performing the raising power operation on the result in the register in each iteration. The Foney algorithm module only removes the C8 unit on the original basis, and the rest parts are basically similar. The critical path of the CSEE module is 3Txor + Tmux, which represents the delay time of one xor gate and one multiplier, respectively. There is a delay of 3 cycles before the first accepted codeword is output and it takes 64 clock cycles to complete the calculation of all the error values.

Fig. 6 is a 16-channel four-degree-parallel RS-FEC architecture proposed herein, including four 4-channel multi-degree-parallel RS decoders. The parallelism of the SC block and the CSEE block in each sub-decoder is 4. The syndrome calculation needs 64 clock cycles, the KES framework starts to output the coefficients of the error position polynomial and the error estimation polynomial of the first channel in the 124 th clock cycle, and sends the output result to the CSEE module, and the output of all the error position polynomial and the error estimation polynomial coefficients is completed after 127 clock cycles.

Claims

1. A forward error correction decoding decoder based on burst error detection, comprising: the system comprises a syndrome calculation SC module, a key equation solving KES module and a chien search and error estimation CSEE module, wherein a syndrome calculated by the SC module is output to the KES module, an error position polynomial Λ (X) and an error estimation polynomial Ω (X) calculated by the KES module are output to the chien search and error estimation CSEE module, and all dislocation positions and error values are calculated by the CSEE module; the SC module adopts the syndrome calculation formula as follows to calculate the syndrome:

2. The forward error correction decoding decoder based on burst error detection as claimed in claim 1, wherein the syndrome calculation SC module requires 2t (q +1) constant multipliers, 2t adders and 2t registers in total for calculating all 2t syndromes, the initial value of the register is set to 0, and 0 and α are respectively set in the middle of the first clock^iqAfter multiplication is still 0 and after addition to the left term the value r is obtained_q-1α^i(q-1)+…+r₁αⁱ+r₀Storing the value in a register; in the second clock cycle, the product of each item is added to obtain the value (r)_q-1α^i(q-1)+…+r₁αⁱ+r₀)α^iq+r_2q-1α^i(q-1)+…+r_q+1αⁱ+r_q) Storing the new sum of products into the register again; and by analogy, sequentially inputting each path of code elements from a high bit to a low bit, wherein each iteration occupies one clock cycle, processing q code elements in parallel in one clock cycle, after n/q cycles, obtaining the final syndrome value as the result in the register, wherein the key path of the SC module is 3Txor, the Txor is the delay of an exclusive-OR gate, and the calculated syndrome is sent to the KES module.

3. The forward error correction decoder of claim 1, wherein the KES structure comprises a controller, t +1 PE1 processing units (numbered from 0 to t) and t PE4 processing units (numbered from t +1 to 2 t), a compensation unit cs (compensation stage), and the t +1 processing circuit PE1 is PE1 respectively₀～PE1_tArranged in the order of the numbers from small to large, PE1_iOutput iteration signal delta_i(r) feeding to PE1_i-1Wherein i is 1,2, …, t + 1; t processing elements PE4_t+1～PE4_2tArranged in the order of the numbers from small to large, PE4_iOutputting the iterative data signal delta_i(r) feeding to PE4_i-1，PE4_i-1Output intermediate variable signal theta_i-1(r) feeding into PE4_iWherein i is t +1, t +2, … 2 t; PE1₀Outputting the iterative data signal delta_i(r) to the controller, PE1t receives PE4_t+1Output iteration data delta_i+1(r) as input, PE4_2tThe iteration data input value of (1) is 0;

each PE contains 2 pipelined galois field multipliers, 1 galois field adder, 2-1 selectors, 15 latches, 3 more selectors for PE4 than PE1, as shown in fig. 2. Upper and lower rows of latches in PE

And

for storing polynomials Δ (x) and Θ (x), respectively) Subscript k is the serial number of PE, superscript denotes the channel serial number, latch

the controller is used for controlling the selector in PE4 and storing data selection signal MC2 for processing unit PE with coefficient of Λ (r, z)_s,i(r) is marked and the signal MC2_s,i(r) are stored in an internal register block, and the controller receives the iteration output signal δ provided by the processing element PE1_i(r) while the data signal δ is to be iterated_i(r) is sent as an output signal; data selection signal MC1 output by controller_s,i(r) iteration data signal delta_i(r) and an intermediate variable signal θ_i-1(r) simultaneously as processing element PE1₀PE1t input signal; data selection signal MC2 output by controller_s,i(r) and MC3_s,i(r) as processing element PE4_t+1～PE4_2tThe input signal of (1); the iteration number r starts from 1, 1 is accumulated in each iteration until 2t iterations end, and when r is 2t, that is, after 2t iterations, the processing unit PE1₀PE1t outputs error value polynomial coefficients; processing element PE4_t+1～PE4_2tOutputting a position polynomial coefficient;

Λ(r+1,z)＝γ(r)·····γ(r′+1)·δ₀(r) & B (r', z) to calculate the compensation coefficient delta_cAnd passed to the PE unit at the appropriate time, where γ (r) represents the error probability, r' represents the higher order portion of r, and δ₀(r) represents the 0 th iteration data signal, B (r', z) represents the intermediate polynomial;

the CS unit consists of 5 two-way selectors, 1 multiplier and 7 registers, wherein the three selectors M1, M2 and M3 are in cascade connection, and the output of the register is buffered by three stages of registers and then is sent to the register D4 or is used as one-way input of the selector M4 after passing through the multiplier. In each clock cycle, data in registers D3, D4 or DM are sent to the leftmost register through a selector, namely three feedback loop named L3, L4 and Lm exist in a CS framework, three updating modes are adopted for circularly generating and transmitting related polynomial coefficients, the mode 1 is applied to the condition that k (r) ≧ 0& flag (r) ≧ 0, k (r) and flag (r) respectively represent the initial value in the r iteration and the position of the first coefficient of Λ (r, z), and the aim is to move the current coefficient to the left by one bit and receive a new overflow coefficient after the current iteration is completed; mode 2 applies to the k (r) <0 condition in order to multiply all current coefficients by the error probability γ (r) and move one bit to the right. Mode 3 is applied to the condition that k (r) is not less than 0& flag (r) and less than 0, and the effect is that all coefficients keep the original positions after one iteration is completed. I.e. the current coefficient is subjected to the L4 feedback loop twice in this iteration, while keeping the value of D5 unchanged.

4. The forward error correction decoding decoder based on burst error detection as claimed in claim 1, wherein the CSEE module is composed of 8 units, 1 register, 1 alternative data selector and 32 adders, each of the 8 units is used to calculate the error position and error value, each unit includes 1 selector, 1 register and 1 multiplier, and λ_j，j∈[1,8]The coefficients of the j-th order of the polynomial representing the error position, the selection signals of the selectors in 8 cells are all set to 1 in the first clock cycle, i.e. λ j, j ∈ [1,8 ] is selected]As output, sending to constant multiplier to complete corresponding multiplication operation; then, on the one hand, the products obtained in each cell are divided into several stepsAdding to judge whether 4 positions have error positions, on one hand, storing the product with the highest frequency in each unit in a register to be sent to the output end of the selector in the next clock cycle; in the second clock cycle, all selectors in 8 units are set to be 0, and the product with the highest frequency in the previous clock cycle is accumulated in a register; by analogy, the judgment of all positions can be realized by continuously raising the power of the result in the register in each iteration; the Foney algorithm module removes the C8 unit on the original basis, and outputs the error value of the error position to be added to the corresponding transmission code word after the same operation is executed.