CN104617959A - Universal processor-based LDPC (Low Density Parity Check) encoding and decoding method - Google Patents

Universal processor-based LDPC (Low Density Parity Check) encoding and decoding method Download PDF

Info

Publication number
CN104617959A
CN104617959A CN201510026526.1A CN201510026526A CN104617959A CN 104617959 A CN104617959 A CN 104617959A CN 201510026526 A CN201510026526 A CN 201510026526A CN 104617959 A CN104617959 A CN 104617959A
Authority
CN
China
Prior art keywords
vector
matrix
subvector
row
check
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510026526.1A
Other languages
Chinese (zh)
Other versions
CN104617959B (en
Inventor
牛凯
贺志强
张竟意
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201510026526.1A priority Critical patent/CN104617959B/en
Publication of CN104617959A publication Critical patent/CN104617959A/en
Application granted granted Critical
Publication of CN104617959B publication Critical patent/CN104617959B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses an LDPC (Low Density Parity Check) encoding method. The method comprises the following steps: determining vectors p1 and p2 and obtaining an encoding result vector, wherein multiplication processing of any matrix and any vector during determination of the vectors p1 and p2 comprises the steps of taking each row of any matrix as a thread, multiplying the corresponding row of the matrix by any vector and constituting the multiplication results of all rows into a result vector; the multiplication operation of any row of any matrix by any vector comprises the steps of determining a vector starting position corresponding to each element j of the ith row of the matrix, performing left shift on data of length Z-Ai, j from a starting position in any vector through a single-instruction multiple-data stream mode, shifting the data of length Ai, j in front of the starting position to the space after the data subjected to left shift to obtain a vector shift result corresponding to the element j and adding the vector shift result of each element. Through the method, the encoding speed can be improved in a universal processor by using multi-thread and SIMD (Single Instruction Multiple Data) processing.

Description

A kind of LDPC coding and decoding method based on general processor
Technical field
The application relates to LDPC coding and decoding technology, particularly a kind of LDPC coding and decoding method based on general processor.
Background technology
LDPC code is the linear block codes that a kind of code length is larger.Its check matrix is also comparatively large, and nonzero element in check matrix is little, and namely the number of " 1 " is little, therefore claims low-density.
In the process realizing IEEE 802.11n WLAN (wireless local area network) host-host protocol, LDPC coding and decoding technology need be used, according to protocol requirement, wherein LDPC PPDU (Presentation Protocol Data Unit, presentation protocol data unit) generative process as follows, see Fig. 1:
(1) shortening bit is calculated
(1a) available bit number N is calculated avbits, formula is:
N pld=length×8+16,
Wherein, if having STBC (Space-time block code) precoding, then flag bit m sTBCbe 2, otherwise be 1; N cBPSrepresent the number of coded bits of each symbol; Length represents the byte number of PSDU (presentation Service DataUnit), is the byte number of information bit position; N pldrepresent total bit number of PSDU and SERVICE FIELD; R presentation code code check.
(1b) LDPC code word number N is calculated cWwith code length L lDPC
Work as N avbitswhen≤648, code word number N cWbe 1, and if N avbits>=N pldduring+912 × (1-R), code length L lDPCbe 1296, otherwise code length L lDPCbe 648; Work as 648<N avbitswhen≤1296, code word number N cWbe 1, and if N avbits>=N pldduring+1464 × (1-R), code length L lDPCbe 1944, otherwise code length L lDPCbe 1296; Work as 1296<N avbitswhen≤1944, code word number N cWbe 1, now code length L lDPCbe 1944; Work as 1944<N avbitswhen≤2592, code word number N cWbe 2, and if N avbits>=N pldduring+2916 × (1-R), code length L lDPCbe 1944, otherwise code length L lDPCbe 1296; Work as N avbitsduring >2592, code word number N cWfor now code length L lDPCbe 1944;
(1c) shortening bit number N is calculated shrt, after shortening bit is filled into information bit position before LDPC coding:
N shrt=max(0,(N CW×L LDPC×R)-N pld)
Work as N shrtwhen=0, do not carry out benefit 0 and operate.Work as N shrtduring >0, shorten bit at all N cWindividual code word is evenly distributed, namely each code assignment to shortening bit number be if N shrtmodN cW≠ 0, wherein mod is remainder, i.e. N shrtto N cWremainder, then first more than other code words one of code word shortens bit.
(2) carry out LDPC coding, obtain check bit position.
(3) shortening bit is abandoned
(4) calculate punching bit position number and abandon punching bit position, calculating the rear punching bit position number N of LDPC coding according to following formula punc:
N punc=max(0,(N CW×L LDPC)-N avbits-N shrt)
If or (N punc>0.3 × N cW× L lDPC× (1-R)), increase N avbitsthen N is recalculated according to following formula punc:
N' avbits=N avbits+N CBPS×m STBC,N punc=max(0,(N CW×L LDPC)-N' avbits-N shrt)
Punching bit position is at all N cWindividual code word is evenly distributed, namely each code assignment to punching bit position number be if N puncmodN cW≠ 0, wherein mod is remainder, i.e. N puncto N cWremainder, then first code word punching bit position more than other code words.
(5) calculate repetition bits position, calculate repetition bits position number N according to following formula rep:
N rep=max(0,N' avbits-N CW×L LDPC×(1-R)-N pld)
Repetition bits position is at all N cWindividual code word is evenly distributed, namely each code assignment to repetition bits position number be if N repmodN cW≠ 0, wherein mod is remainder, i.e. N repto N cWremainder, then first code word repetition bits position more than other code words.Repetition bits position order from first bit of information bit position is chosen, until meet length requirement, repetition bits position is copied from the code word after the shortening bit removed.The repetition bits position selected is linked in sequence after check bit position.When needs punch, check bit position does not need repetition, and vice versa.
In LDPC PPDU generative process, LDPC coding method is the most important, and the codeword vector exported after LDPC coding is designated as c=(S, p 1, p 2), wherein S is information vector, p 1and p 2for code word verification vector, but because of the check matrix H of LDPC code comparatively large, the computing in cataloged procedure will be very loaded down with trivial details.Observe the check matrix provided in agreement can find out, its row weight average of the matrix under different code check R is 24, and its column weight is 24 × (1-R), according to the characteristic of check matrix H, is carried out following piecemeal H A B T F D E , Be divided into matrix A, matrix B, matrix D, matrix E, matrix T and matrix F six submatrixs, wherein the structure of matrix B, matrix D, matrix E and matrix T is comparatively special, B=(1--... 0-...) t, D=(1), E=(-...-0), the structure irregularities of matrix A and matrix F, see agreement 802.11n.In addition, because check matrix H is larger, therefore, when representing check matrix, by a submatrix in an actual check matrix of element representation, particularly, in the method for expressing of check matrix H and matrix in block form A, B, D, E, T, F, "-" represents that this submatrix is null matrix, and " 0 " represents that this submatrix is unit matrix, and " constant C " represents that this submatrix is the matrix of consequence behind unit Matrix C time ring shift right position.Wherein the dimension of submatrix is that Z*Z, Z can determine according to code length in advance.By the way, the expression size of check matrix and each matrix in block form can be greatly reduced.
Under the prerequisite of known check matrix H and information vector S, determine that the concrete mode of codeword vector c is: according to check equations Hc t=0 tcan score solve an equation AS T + Bp 1 T + Tp 2 T = 0 ( - ET - 1 A + F ) S T + ( - ET - 1 B + D ) p 1 T = 0 , Can obtain after optimizing p 1 T = ( ET - 1 A + F ) S T p 2 T = T - 1 ( AS T + Bp 1 T ) , Obtain p 1and p 2after, codeword vector c=(S, p can be obtained 1, p 2).
With reference to the composition of the encoder of above-mentioned LDPC cataloged procedure see Fig. 2, wherein containing 4 kinds of functional modules: pre-encoding matrix generator, matrix multiplier, matrix adder and LDPC code word synthesizer.
6 pre-encoding matrix generator are had in this encoder, its input is a matrix, matrix A, matrix B, matrix D, matrix E, matrix T and matrix F six submatrixs respectively, its output is a matrix, be through the matrix after pre-encoding matrix generator process, its function is the mode stored by compression by the matrix of input, namely only deposits matrix non-zero element, input matrix is converted, obtains output matrix.
6 multipliers are had in this encoder, it has two inputs output, two inputs be respectively information vector S, matrix A, by the matrix after pre-encoding matrix generator process, by the matrix of consequence of other multipliers or by two in the matrix of consequence after adder, its output is a matrix, be the result vector after two inputs carry out multiplying, its function is that two inputs are carried out matrix multiplication operation and Output rusults matrix.
This encoder has 2 matrix adders, its input is two matrixes, and be matrix multiplier in encoder and export rear matrix, its output is a matrix, be the result after two input matrixes carry out addition of matrices, its function is that two input matrixes are carried out addition of matrices and Output rusults matrix.
Have 1 LDPC code word synthesizer in this encoder, its input is three vectors, is information vector S, code word verifies vectorial p 1vectorial p is verified with code word 2, its output is a vector, is codeword vector c, and its function is that information vector S, code word are verified vectorial p 1vectorial p is verified with code word 2three vector synthesis codeword vector c=(S, p 1, p 2) and codeword vector c.
Above-mentioned existing LDPC coding method and the corresponding encoder of being is formed.At receiving terminal, also needing the LDPC code word to receiving to carry out decoding, obtaining the information vector of rebuilding.Existing LDPC decoding technique, its key step is as follows:
(1) M checkpoint is divided into M blayer, every layer comprises T check-node.Next, one deck connects the order execution decode procedure of one deck.In ground floor processing procedure, calculate the information of check-node and variable node, after ground floor decode procedure terminates, the second layer uses the information of the variable node obtained from ground floor to carry out initialization, and by that analogy;
(2) initialization: use LLRs (log-likelihood ratios, namely information to variable node value carry out initialization, and by all check-node information be set to 0, the iterations of decoding algorithm is I, and iterative process is carried out by row, the n ∈ N wherein in minimum-sum algorithm mrepresent check matrix prototype H bin [H b] m,nthe row of ≠ '-';
(3) minimize: variable node vector q nring shift right position (shift count S (m, n)=[H b] m,n) deduct check-node information be there is vectorial t in result nin, according to OMS (offset min-sum, namely value reuse characteristic, only need minimum value and the sub-minimum of element in compute vector;
(4) minimum value is selected: to n ∈ N m, calculate and upgrade q nwith value.
For realizing above-mentioned interpretation method, existing decoder, see Fig. 3, is made up of 4 parts, is respectively initialization translator unit, minimum value and sub-minimum selected cell, data brachymemma unit and cycle shift unit.
Initialization translator unit in this decoder, its input is a LDPC test matrix, its output is a test matrix after the process of initialization decoding unit, and its function to be carried out by test matrix storing conversion according to decoder input requirements and test matrix after output processing.
Minimum value in this decoder and sub-minimum selected cell, its input is two matrixes, one of them is through the test matrix after the process of initialization translator unit, another is LDPC code word matrix c, namely the LDPC code word matrix c after wireless channel transmission, its output is a matrix after minimum value and the process of sub-minimum selected cell, and its function is the minimum value of the difference calculated between variable node ring shift right position and check-node and sub-minimum and Output rusults matrix is supplied to data brachymemma unit and cycle shift unit.
The data brachymemma unit of this decoder, its input be one by the matrix exported after minimum value and the process of sub-minimum selected cell, its output is a matrix after data brachymemma cell processing, its function is the spilling for preventing check-node information, data brachymemma process is carried out to it, and Output rusults matrix is supplied to cycle shift unit.
The cycle shift unit of this decoder, its input is two matrixes, one of them is by the matrix exported after data brachymemma cell processing, another is by the matrix exported after minimum value and the process of sub-minimum selected cell, its output is a matrix after cycle shift unit process, its function adds calculate variable node matrix by minimum value matrix and check-node matrix being carried out step-by-step mould two, and output variable node matrix equation.
As mentioned above, the coding and decoding theory of current LDPC code is comparatively ripe, but because LDPC code is the linear block codes that a kind of code length is larger, check matrix is also larger, algorithm complex is very high, traditional LDPC coding and decoding mode is not well positioned to meet the throughput requirement of IEEE 802.11n system, has largely had influence on the performance of system.In existing high speed wireless access system, the realization of LDPC code is mostly based on FPGA (Field-Programmable GateArray, field programmable gate array) chip and DSP (Digital Signal Processor, Digital Signal Processing) chip.Although can be met the requirement of process and time delay in Modern High-Speed protocol of wireless local area network by previous methods, FPGA programming and professional DSP all more complicated, lack abundant programmed environment and debugging acid, applicability is general.
Summary of the invention
The application provides a kind of LDPC coding and decoding method based on general processor, can realize LDPC coding and decoding efficiently on aageneral-purposeaprocessor.
For achieving the above object, the application adopts following technical scheme:
Based on a LDPC coding method for general processor, comprising: obtain signal vector S to be encoded by signals collecting or reception, determine check matrix H and matrix in block form A, B, D, E, F and T, and preserve; According to p 1 T = ( ET - 1 A + F ) S T p 2 T = T - 1 ( AS T + Bp 1 T ) Determine vectorial p 1and p 2, and obtain coding result vector c=(S, the p of LDPC 1, p 2); Wherein, describedly vectorial p is determined 1and p 2arbitrary matrix of Shi Jinhang comprises with the process that is multiplied of arbitrary vector:
Using every a line of described arbitrary matrix as a thread, carry out the corresponding line of this matrix and the multiplication operations of described arbitrary vector, and the multiplied result of all row is combined formation result vector;
Wherein, every a line of described arbitrary matrix comprises with the multiplication operations of described arbitrary vector: the original position+A determining vectorial original position=described arbitrary vector that each element j of current i-th row of matrix is corresponding i,j+ (j-1) * Z, by described arbitrary vector from described original position Z-A i,jthe data of length are shifted left by the mode of single-instruction multiple-data stream (SIMD) SIMD, and A before described original position is started i,jafter data after the data of length move to and shift left, obtain the vector shift result that described element j is corresponding; Again by vector shift results added corresponding for each element, as the multiplied result of described every a line and described arbitrary vector;
In the mode of described SIMD, will from described original position Z-A i,jthe data of length are divided in units of length W section is right segment data is parallel carries out operation of shifting left, then by remaining (Z-A i,j) data of modW length carry out operation of shifting left;
Z is the submatrix size of an element representative in described check matrix.
Preferably, when described arbitrary matrix is T -1time, described T -1every a line and the multiplication operations of corresponding vector time, only carry out T -1value is being multiplied of element and the corresponding vector of 0, and obtaining this value is the vector shift result that 0 element is corresponding, and vector shift result corresponding for all the other elements is set to null vector; Again by vector shift results added corresponding for each element, as the multiplied result of described every a line and described arbitrary vector.
Preferably, W segment data is shifted left after operation simultaneously to get a front Z data be valid data.
Preferably, described vector shift results added corresponding for each element to be comprised: vector shift result corresponding for each element is divided in units of length W section, by SIMD couple segment data is parallel carries out phase add operation, then by remaining (Z-A i,j) data of modW length carry out phase add operation.
Preferably, described matrix A, B, D, E, F and T -1preserved by linear search table.
Based on a LDPC interpretation method for general processor, comprising: receive encoded LDPC code word signal c, determine check matrix H; Calculate variable node vector q as decode results by successive ignition, during each iteration, calculating temporary variable vector according to current variable node vector q and check-node vector r is and upgrade check-node vector r according to the vectorial t of described temporary variable, then according to check-node vector r and temporary variable vector t renewal variable node vector q be during first iteration, using character signal c as variable node vector q, verification knot vector r is set to 0; Wherein,
When each iterative computation temporary variable vector t, check-node vector r and variable node vector q, carry out computing and renewal using every a line of check matrix as a thread, obtain with often to go in corresponding vectorial t, q and r call number from arrive subvector; Wherein, i is the line index of check matrix, when the i-th row of corresponding described check matrix calculates temporary variable vector t, check-node vector r and variable node vector subvector corresponding to q, according to each non-"-" element H of this row of check matrix i,jwith element H in corresponding compute vector t, q and r i,jcorresponding call number from arrive subvector, then carry out successively connecting and obtain and often go corresponding subvector, during i=1, order
Calculate and H i,jthe mode of corresponding temporary variable vector t subvector is: determine H i,jcorresponding vectorial original position Z* (n-1)+H i,n, original position described in vectorial q subvector corresponding for the i-th row is played length is or the data of 6 are copied to and H by the mode of SIMD i,jthe beginning of corresponding temporary variable vector t subvector; At H i,n≠ 0, H i,n≠ '-' and (Z-H i,n) modW ≠ 0 time, determine matrix M ldpcAssemble1in with check matrix element H i,jthe value of each element in corresponding row and will with element H i,jin the subvector of corresponding current vectorial q, call number is each element copy to successively and H i,jin the current location of corresponding temporary variable vector t subvector; Determine each element H again i,jcorresponding secondary vector original position M ldpcOffset2, described secondary vector original position is played length is data copied to and H by the mode of SIMD i,jin the current location of corresponding temporary variable vector t subvector; Get and H i,jfront Z position in corresponding temporary variable vector t subvector and take absolute value as with H i,jeffective subvector of corresponding temporary variable vector t;
Work as H i,n≠ 0, H i,n≠ '-' and (Z-H i,n) modW ≠ 0 time, work as H i,n=0 or H i,n='-' or (Z-H i,n) modW=0 time, (M ldpcOffset2) i,j=Z* (n-1); k is that general processor once can deal with data amount size, and k is the fundamental unit size of SIMD process; Code length L lDPCwhen=648, LdpcRemain=11; As code length L lDPCwhen=1296, LdpcRemain=6; As code length L lDPCwhen=1944, LdpcRemain=1; J is the index of each non-"-" element in this row all non-"-" element in the i-th row, and n is the i-th row jth column index of non-"-" element in check matrix.
Preferably, calculating with the mode of the check-node vector r subvector that often row is corresponding of check matrix is:
Write as V by with the check matrix temporary variable vector t subvector that often row is corresponding ldpcRowLength(v) row and the matrix T of row v, wherein, described matrix T veach behavior described in temporary variable vector t subvector with element H i,jcorresponding subvector, carries out cover when columns is inadequate;
To described matrix T vbe worth most distribution, be worth variable vector m subvector matrix M most v;
According to described matrix T vcalculate intermediate variable vector s subvector matrix S v;
According to described matrix M vwith described matrix S vthe element that middle index value is identical, determines an intermediary matrix R v' in the element value of respective index value; Wherein, if matrix S vin arbitrary element be less than 0, then get the complement of this arbitrary element and be added with this arbitrary element, using addition result as matrix R v' in be worth the value of identical element with described arbitrary element index; If matrix S vin arbitrary element equal 0, then this arbitrary element is added with 0, using addition result as matrix R vin be worth the value of identical element with described arbitrary element index; If matrix S vin arbitrary element >0, then in matrix M vin get and be worth identical element with described arbitrary element index and be added with described arbitrary element, using addition result as matrix R vin be worth the value of identical element with described arbitrary element index; Described operation of comparing and be added is undertaken by the mode of SIMD;
By SIMD mode by described matrix R v' and matrix T vthe element that middle index value is identical subtracts each other, using result as check-node vector r subvector matrix R vthe element value of middle same index value; By described matrix R vin front Z element of every row read the vectorial r subvector of composition check-node successively according to the mode of row major.
Preferably, be worth distribution described in most to comprise:
Described matrix T is determined by the mode of SIMD vin minimum value of each row and sub-minimum and line index corresponding to minimum value; The minimum value obtained and sub-minimum are revised, all deducts default correction value β, when revised minimum value and sub-minimum are less than 0, are set to 0, otherwise remain unchanged;
According to described matrix T vin the current minimum value of each row, sub-minimum and line index corresponding to minimum value, structure value variable vector m subvector matrix M vthe row of middle same index, wherein, at M varbitrary row in, be set to the element of the corresponding identical line index of current minimum value the minimum value determined, all the other elements be set to sub-minimum.
Preferably, the described mode by SIMD determines that the mode of each minimum value arranged and sub-minimum and corresponding line index comprises:
By described matrix T veach row element be divided into individual sub-block, each sub-block comprises W base unit; In more described matrix T vin the element of any two row time, compare W base unit by the mode of SIMD is disposable.
Preferably, described calculating intermediate variable vector s subvector matrix S vcomprise:
For matrix T vin each row, this row all elements is carried out xor operation, then by result and i-th ' row element XOR after to carry out with 0x7f or operate, general or operating result are as intermediate vector matrix S vmiddle same index row i-th ' row element; Wherein, by described matrix T veach row element be divided into individual sub-block, each sub-block comprises W base unit, when carrying out XOR/or operation, by XOR/or the operation of the disposable execution W base unit of the mode of SIMD.
Preferably, calculating and H i,jcorresponding variable node vector q subvector comprises:
Determine H i,jcorresponding vectorial original position Z* (n-1)+H i,n, by SIMD mode by H i,jcorresponding temporary variable vector t subvector and H i,jcorresponding check-node vector r subvector is added, and original position described in result vector is played length is or the data of 5 are copied to and H by the mode of SIMD i,jthe beginning of corresponding variable node vector q subvector; At H i,n≠ 0, H i,n≠ '-' and (Z-H i,n) modW ≠ 0 time, determine matrix M ldpcAssemble1in with check matrix element H i,jthe value of each element in corresponding row and will with element H i,jin the subvector of corresponding current vectorial q, call number is each element copy to successively and H i,jin the current location of corresponding variable node vector q subvector;
Determine each element H i,jcorresponding secondary vector original position M ldpcOffset2, described secondary vector original position is risen length be 0 or data copied to and H by the mode of SIMD i,jin the current location of corresponding variable node vector q subvector;
According to the cover number of LdpcRemain instruction, according to M ldpcAssemble1in with check matrix element H i,jin corresponding row, the value of element carries out cover.
Preferably, precalculate and preserve each element H i,jcorresponding vectorial original position Z* (n-1)+H i,nwith secondary vector original position M ldpcOffset2, matrix M ldpcAssemble1, the vectorial V that forms of the number of often going non-"-" element in check matrix ldpcRowLength, M ldpcAssemble1, LdpcRemain.
As seen from the above technical solution, the LDPC coding and decoding method in the application, can improve coding and decoding speed by SIMD instruction, multithreading and the mode such as to prestore.
Accompanying drawing explanation
Fig. 1 is the generative process schematic diagram of LDPC PPDU;
Fig. 2 is the encoder composition schematic diagram of LDPC cataloged procedure;
Fig. 3 is existing ldpc decoder schematic diagram;
Fig. 4 is the overview flow chart of coding method in the application;
Fig. 5 is that in the application LDPC coded treatment, compute codeword verifies vectorial p 1computing schematic diagram;
Fig. 6 is that in the application LDPC coded treatment, compute codeword verifies vectorial p 2computing schematic diagram;
Fig. 7 is the structural representation optimizing multiplier 1;
Fig. 8 is the structural representation optimizing multiplier 2;
Fig. 9 is the structural representation optimizing adder;
Figure 10 is that an element in matrix A is multiplied with the transposition of vectorial S the schematic diagram processed;
Figure 11 is the process schematic diagram of step 5 in the application LDPC coding method;
Figure 12 is the process schematic diagram of step 5 in the application LDPC interpretation method;
Figure 13 is the idiographic flow use figure of the application LDPC interpretation method;
Figure 14 is for the schematic flow sheet being worth allocation process most that a sub-block is carried out in the application LDPC interpretation method;
Figure 15 is the structural representation being once worth most distributive operation in LDPC interpretation method;
Figure 16 is the schematic flow sheet calculating intermediate variable vector in LDPC interpretation method for a sub-block;
Figure 17 is the structural representation of an intermediate vector calculating in LDPC interpretation method.
Embodiment
In order to make the object of the application, technological means and advantage clearly understand, below in conjunction with accompanying drawing, the application is described in further details.
The application provides the LDPC coding method and interpretation method that are applicable to realize in general processor.Coding method in the application and interpretation method are described below in detail.
According to the LDPC PPDU generation method in IEEE 802.11n agreement, codeword vector after coding is designated as c=(S, p 1, p 2), wherein S is information vector, p 1and p 2for code word verification vector, check matrix H is simplified six parts H A B T F D E , Be divided into matrix A, matrix B, matrix D, matrix E, matrix F and matrix T six submatrixs.Before carrying out decoding, need extraneous input code length L lDPC, encoder bit rate R and information vector S.The application optimizes LDPC coding method as follows according to the characteristic of general processor (GPP) chip architecture:
1, SIMD (Single Instruction Multiple Data is adopted, single-instruction multiple-data stream (SIMD)) operation method coding method is optimized, its essential concept is the effect processing to obtain parallel processing within a clock cycle of CPU to multiple data, and similarly is not common occupation mode---each clock cycle only carries out a data processing operation.Wherein will relate to used general processor once can deal with data amount size, and suppose that this size is K bit, the fundamental unit size of SIMD process is k, then once-through operation can deal with data amount
2, the information adopting the method for linear search table to optimize check matrix stores, and check matrix H is split into six parts H A B T F D E , Be divided into matrix A, matrix B, matrix D, matrix E, matrix F and matrix T six submatrixs, and generate six linear search tables with these six submatrixs, reduce computation complexity.
3, adopt the method for multithreading, with the line number of check matrix H for Thread Count, be namely a thread with data line process in check matrix H, the process performing multiple thread at one time operates, and then the disposed of in its entirety performance of elevator system.
Fig. 4 is the general flow chart of coding method in the application, wherein, this coding method based on algorithm principle identical with current LDPC coding method, the specific implementation that difference is for coding method in general processor.Idiographic flow is as follows:
1, according to 802.11n agreement, different code length L lDPCand different coding code check R correspond to different check matrix H.First, according to code length L lDPCand encoder bit rate R, extract corresponding check matrix H, and initialization carried out to following parameters:
1.1 generator matrix A, matrix B, matrix D, matrix E, matrix F and matrix T -1six submatrixs, wherein () -1for inverse of a matrix, and it is stored in successively in linear search table, with the particular location of the method mark desired data of side-play amount, exchanges computation complexity for internal memory, improve the data processing speed of LDPC code coding method.
1.2 sizes generating submatrix representated by each element in selected check matrix H are Z.As code length L lDPCwhen=648, Z=27; As code length L lDPCwhen=1296, Z=54; As code length L lDPCwhen=1944, Z=81.
2, according to GPP chip characteristic, the method of multithreading is adopted to be optimized coding method, with the line number of check matrix H for Thread Count, data line wherein in check matrix H is that a thread process step 3 is to step 4, perform the process operation of multiple thread at one time, namely the same time carries out the process of multiple step 3 to step 4, and then the disposed of in its entirety performance of elevator system.Following step 3 and step 4, be the idiographic flow of single thread process.
3, compute codeword verifies vectorial p 1, multiplying wherein and add operation all adopt the operation method of SIMD to be optimized, and its computing schematic diagram is see Fig. 5, and idiographic flow is as follows:
3.1 matrix A are multiplied with the transposition of vectorial S, and result is vector, and vector length is equal with the line number of matrix A.
3.2 matrix T -1be multiplied with the transposition of step 3.1 acquired results vector, result is vector, vector length and matrix T -1line number equal.
3.3 matrix E are multiplied with the transposition of step 3.2 acquired results vector, and result is vector, and vector length is equal with the line number of matrix E.
3.4 matrix F are multiplied with the transposition of vectorial S, and result is vector, and vector length is equal with the line number of matrix F.
3.5 step 3.3 acquired results vectors and step 3.4 acquired results addition of vectors, result is code word and verifies vectorial p 1.
4, compute codeword verifies vectorial p 2, multiplying wherein and add operation all adopt the operation method of SIMD to be optimized, and its computing schematic diagram is see Fig. 6, and idiographic flow is as follows:
4.1 matrix B and vectorial p 1transposition be multiplied, result be vector, vector length is equal with the line number of matrix B.
4.2 step 3.1 acquired results vectors and step 4.1 acquired results addition of vectors, result is vector.
4.3 matrix T -1be multiplied with the transposition of step 4.2 acquired results vector, result is code word and verifies vectorial p 2.
5, LDPC code word vector c is assembled:
By gained vector according to S, p 1, p 2sequential storage, obtain LDPC code word vector c=(S, p 1, p 2).
In the coding method of above-mentioned the application, relate to two kinds of optimization multipliers and a kind of optimization adder, become respectively and optimize multiplier 1, optimize multiplier 2 and optimize adder.According to GPP chip architected features, optimize the operation method relating to the available SIMD of part of parallel work-flow in multiplier 1, optimization multiplier 2 and optimization adder and be optimized.Be introduced one by one below.
Optimize multiplier 1 and have two inputs output, optimize multiplier 1 schematic diagram see Fig. 7, in the step 3.1 of Optimized Coding, step 3.3, step 3.4 and step 4.1, involved matrix and multiplication of vectors computing all use and optimize multiplier 1.For step 3.1, the input optimizing multiplier 1 is the transposition of matrix A and vectorial S, and its specific implementation flow process is as follows:
1, judge whether the maximum number of lines reaching matrix A, if reach, then complete this operation; If do not reach, then carry out step 2.
2, the submatrix of the Z*Z in matrix A representated by first element is multiplied with the transposition of vectorial S.Because this submatrix is that the unit matrix of a Z*Z is through A 1,1(A 1,1the element of representing matrix A the first row first row) result behind secondary ring shift left position, be equivalent to carry out A to vectorial S so this submatrix is multiplied with the transposition of vectorial S 1,1secondary circulative shift operation.This operation can carry out SIMD optimization, and see Figure 10, concrete operations flow process is as follows:
2.1 calculate part 2 data length=(Z-A 1,1) (it is Z-A to modW 1,1to W delivery), and desired data initial value position=information vector s original position+A 1,1.
Desired data initial value position is risen by 2.2 the data copy of length is as the start position data of intermediate data;
2.3 couples of remaining (Z-A 1,1) a modW data carry out displacement copy, and " remainder " and " cover " in Figure 10 to be copied in output.In order to adapt to SIMD computing, input vector length is Z, and the large young pathbreaker of output vector is but in output vector, only have front Z element to be wherein valid data, and result data is stored in result vector register.
3, judge whether the maximum number of column reaching matrix A, if reach, then return step 1; If do not reach, then carry out step 4.
4, carry out the submatrix of the Z*Z in matrix A representated by next element to be multiplied with the transposition of vectorial S, concrete steps are with step 2.1,2.2,2.3, but transposition multiplied result does not need stored in result vector register, and perform step 5.
5, step 4 result of calculation be added with element in result vector register, two binary numbers are equivalent to two numbers and carry out xor operation, now can carry out SIMD optimization, and namely once-through operation can obtain W result, see Figure 11, and wherein (a 1, a 2..., a w) represent step 4 result of calculation, (b 1, b 2..., b w) representing element in result vector register, rectangle frame represents exclusive-OR operator, (y 1, y 2..., y w) represent the result after computing, namely ( y 1 , y 2 , . . . , y W ) = ( a 1 &CirclePlus; b 1 , a 2 &CirclePlus; b 2 , . . . , a W &CirclePlus; b W ) , And be stored in result vector register, return step 3.
Optimize multiplier 2 and have two inputs output, optimize multiplier 2 schematic diagram see Fig. 8, matrix involved in the step 3.2 and step 4.3 of Optimized Coding and multiplication of vectors computing all use optimizes multiplier 2.Optimizing multiplier 2 is the special circumstances optimizing multiplier 1, and optimizing one of them input of multiplier 2 is matrix T -1, under different check matrix H, matrix matrix T -1the element in lower triangle is only had to be effective value.For step 3.2, the input optimizing multiplier 2 is matrix T - 1with the transposition of step 3.1 acquired results vector, its specific implementation flow process is as follows:
1, judge whether the maximum number of lines reaching matrix A, if reach, then complete this operation; If do not reach, then carry out step 2.
2, have can find out, matrix T -1upper triangle element is 0,
After the submatrix of the Z*Z representated by element " 0 " is multiplied with the transposition of step 3.1 acquired results vector in Fig. 4, result is still the latter, so to matrix T -1element " 0 " often in row is multiplied with the transposition of step 3.1 acquired results vector in Fig. 4, and according to the step 5 optimized in multiplier 1, is added by acquired results, returns step 1.
Optimize adder and have two inputs output, two inputs are vector, optimize adder schematic diagram see Figure 12, vector involved in the step 3.5 and step 4.2 of Optimized Coding and addition of vectors computing all use optimization adder.For step 3.5, the input optimizing adder is step 3.3 acquired results vector and step 3.4 acquired results vector, because input is binary number, two binary number additions equal two binary numbers and do xor operation, now SIMD optimization can be carried out, namely once-through operation can obtain W result, see Figure 11, and wherein (a 1, a 2..., a w) represent step 3.3 acquired results vector, (b 1, b 2..., b w) representing step 3.4 acquired results vector, rectangle frame represents exclusive-OR operator, (y 1, y 2..., y w) represent the result after computing, namely ( y 1 , y 2 , . . . , y W ) = ( a 1 &CirclePlus; b 1 , a 2 &CirclePlus; b 2 , . . . , a W &CirclePlus; b W ) .
The above-mentioned idiographic flow being LDPC coding method in the application.The interpretation method of the application to existing decoder is optimized, and optimizes interpretation method particular flow sheet see Figure 13.Before carrying out decoding, the external world need input code length L lDPC, encoder bit rate R, coding after codeword vector c, maximum iteration time I and revise side-play amount β.Characteristic according to GPP chip framework is optimized as follows to LDPC interpretation method:
1, the operation method of SIMD is adopted to be optimized coding method, its essential concept is the effect processing to obtain parallel processing within a clock cycle of CPU to multiple data, and similarly is not common occupation mode---each clock cycle only carries out a data processing operation.Wherein will relate to used general processor once can deal with data amount size, and suppose that this size is K bit, the fundamental unit size of SIMD process is k, then once-through operation can deal with data amount
2, the part run optimized in interpretation method adopts the method for multithreading, with the line number of check matrix H for Thread Count, namely be a thread with the data line processed in check matrix H, perform the process operation of multiple thread at one time, and then the disposed of in its entirety performance of elevator system.
Below introduce the idiographic flow of interpretation method in the application, wherein, the general framework of interpretation method is identical with current interpretation method, specifically comprises: receive encoded LDPC code word signal c, determine check matrix H; Calculate variable node vector q as decode results by successive ignition, during each iteration, calculating temporary variable vector according to current variable node vector q and check-node vector r is and upgrade check-node vector r according to the vectorial t of temporary variable, then according to check-node vector r and temporary variable vector t renewal variable node vector q be the interpretation method difference with the prior art that the application provides is, the specific implementation in general processor is different.Concrete operation step is as follows:
1, according to 802.11n agreement, different code length L lDPCand different coding code check R correspond to different check matrix H.First, according to code length L lDPCand encoder bit rate R, extract corresponding check matrix H, parameters is carried out initialization, and it is stored in successively in linear search table, with the particular location of the method mark desired data of side-play amount, exchange computation complexity for internal memory, improve the data processing speed of LDPC code interpretation method:
1.1Z, in check matrix H namely, the size of submatrix representated by each element, is variable.As code length L lDPCwhen=648, Z=27; As code length L lDPCwhen=1296, Z=54; As code length L lDPCwhen=1944, Z=81.
1.2LdpcRowNum, namely the line number of selected check matrix H, is variable.Work as code check time, LdpcRowNum=12; Work as code check time, LdpcRowNum=8; Work as code check time, LdpcRowNum=6; Work as code check time, LdpcRowNum=4.
1.3V ldpcRowLength, namely often go the number of non-"-" element in selected check matrix H, the vector of to be length be 1*LdpcRowNum.
1.4LdpcBufferNum, namely storing register number needed for the selected each submatrix data of check matrix H, is variable.Its operational formula is
1.5LdpcRemain, i.e. one of variable needed for step 9 are variable.As code length L lDPCwhen=648, LdpcRemain=11; As code length L lDPCwhen=1296, LdpcRemain=6; As code length L lDPCwhen=1944, LdpcRemain=1.
1.6LdpcRoundNum, i.e. one of variable needed for step 9 are variable.As code length L lDPCwhen=648, LdpcRoundNum=27; As code length L lDPCwhen=1296, LdpcRoundNum=22; As code length L lDPCwhen=1944, LdpcRoundNum=17.
1.7V ldpcRowBuffer, namely store selected check matrix H and often go register number needed for non-"-" data, the vector of to be length be 1*LdpcRowNum.Its operational formula is V ldpcRowBuffer(v)=V ldpcRowLength(v) * LdpcBufferNum (wherein V ldpcRowBufferv () represents v the element of vectorial LdpcRowBuffer, v correspond to the line number selecting check matrix H, as follows in like manner).
1.8M ldpcOffset1, one of cycle offset namely calculated according to selected check matrix H, for step 4 and step 9, is LdpcRowNum*max (V ldpcRowLength(v)) matrix (wherein max (V ldpcRowLength(v)) represent amount of orientation V ldpcRowLengththe maximum of middle element, as follows in like manner).Its operational formula is (M ldpcOffset1) i,j=Z* (n-1)+H i,n, wherein n represents the columns in selected check matrix H, and the position that the corresponding relation of j and n is selected check matrix H i-th row jth this check matrix H of non-"-" element place is that the i-th row n-th arranges, as follows in like manner.
1.9M ldpcRound1, one of cycle-index namely calculated according to selected check matrix H, for step 4, is LdpcRowNum*max (V ldpcRowLength(v)) matrix.Its operational formula is for work as H i,n≠ 0 and H i,n≠ '-' time, work as H i,n=0 and H i,n≠ '-' time, (M ldpcRound1) i,j=6.
1.10M ldpcAssemble1, what namely calculate according to selected check matrix H supplies one of offset flag position, for step 4, is LdpcRowNum*max (V ldpcRowLength(v)) matrix.Its operational formula is for work as H i,n=0 and H i,n≠ '-' time, (M ldpcAssemble1) i,j=0; Work as H i,n≠ 0, H i,n≠ '-' and (Z-H i,n) modW=0 time, (M ldpcAssemble1) i,j=0; Work as H i,n≠ 0, H i,n≠ '-' and (Z-H i,n) modW ≠ 0 time, (M ldpcAssemble1) i,j=1.
1.11M ldpcAssembleTable1, namely calculate circulation according to selected check matrix H and supply side-play amount, for step 4 and step 9, for ( matrix (wherein for all elements in compute vector LdpcRowLength and),
1.12M ldpcOffset2, one of cycle offset namely calculated according to selected check matrix H, for step 4 and step 9, is LdpcRowNum*max (V ldpcRowLength(v)) matrix.Its operational formula is for working as (M ldpcAssemble1) i,jwhen=0, (M ldpcOffset2) i,j=Z* (n-1)+[W-Z+H i,n+ (M ldpcRound1) i,j* W ]; As (M ldpcAssemble1) i,jwhen=1, (M ldpcOffset2) i,j=Z* (n-1).
1.13M ldpcRound2, one of cycle-index namely calculated according to selected check matrix H, for step 4, is LdpcRowNum*max (V ldpcRowLength(v)) matrix.Its operational formula is
1.14M ldpcRound3, one of cycle-index namely calculated according to selected check matrix H, for step 9, is LdpcRowNum*max (V ldpcRowLength(v)) matrix.Its operational formula is for work as H i,n≠ 0 and H i,n≠ '-' time, work as H i,n=0 and H i,n≠ '-' time, (M ldpcRound3) i,j=5.
1.15M ldpcAssemble2, what namely calculate according to selected check matrix H supplies one of offset flag position, for step 9, and same M ldpcAssemble1.
1.16M ldpcRound4, one of cycle-index namely calculated according to selected check matrix H, for step 9, is LdpcRowNum*max (V ldpcRowLength(v)) matrix.Its operational formula is as i=0, (M ldpcRound4) i,j=0; When i ≠ 0,
2, judge whether to reach maximum iteration time I.If do not reach maximum iteration time I, then carry out step 3; If reach maximum iteration time I, then decoding terminates.
3, according to GPP chip characteristic, the method of multithreading is adopted to be optimized interpretation method, with the line number of check matrix H for Thread Count, data line wherein in check matrix H is that a thread process step 4 is to step 9, perform the process operation of multiple thread at one time, namely the same time carries out the process of multiple step 4 to step 9, and then the disposed of in its entirety performance of elevator system.Following step 4, to step 9, is the idiographic flow of single thread process.
4, calculate temporary variable vector t, to be length be for it vector.Wherein, this temporary variable vector comprises subvector corresponding to a line every with check matrix, and its call number is arrive this subvector comprises again and each non-"-" element H i,jcorresponding subvector.(, only has non-"-" the element H in check matrix here i,jhave corresponding subvector, "-" element in check matrix does not all have corresponding subvector in temporary variable vector, the vectorial r of check-node and variable node vector q.) particularly, with H i,jthe computing formula of corresponding subvector t' is namely the value of temporary variable vector t subvector t' be in variable node vector q with element H i,jcorresponding subvector q' according to after check matrix H i-th row jth column element value cyclic shift with check-node vector r in element H i,jthe difference of corresponding subvector r'.If be now first time interative computation, variable node vector q is the codeword vector c after LDPC coding, and check-node vector r is initial condition, and now the computing formula of temporary variable vector t subvector t' is namely temporary variable vector t subvector t' is that variable node vector q subvector q' is according to result after check matrix H i-th row jth column element value cyclic shift.In order to adapt to SIMD computing, input variable knot vector q subvector q' and check-node vector r subvector r' length are Z, and exporting the large young pathbreaker of temporary variable vector t subvector t' is but in output subvector, only have front Z element to be wherein valid data.The computing of temporary variable vector t is often gone non-"-" element number with selected check matrix H and is circulated, and such as selected check matrix H i-th row jth non-"-" element will calculate the of temporary variable vector t arrive the element of position.Its concrete calculation procedure is as follows:
4.1 according to M ldpcOffset1the cycle offset that selected check matrix H i-th row jth non-"-" element is corresponding is found out in matrix, the initial value position of desired data is found out, the cycle offset of the initial value position+correspondence of the initial value position=variable node vector q subvector q' of desired data in variable node vector q.
4.2 according to M ldpcRound1matrix finds out cycle-index corresponding to selected check matrix H i-th row jth non-"-" element, by (M behind the initial value position of desired data ldpcRound1) i,j* W data copy in the current location of temporary variable vector t subvector t', and current location here refers to the original position of not yet copies data in subvector.
4.3 according to M ldpcAssemble1matrix, judges whether to need to carry out padding operation.If (M ldpcAssemble1) i,j=1, then according to M ldpcAssembleTable1side-play amount indicated in matrix carries out padding operation; If (M ldpcAssemble1) i,j=0, then do not need padding operation.Concrete padding operation comprises: determine matrix M ldpcAssemble1in with check matrix element H i,jthe value of each element in corresponding row and will with element H i,jin the subvector of corresponding current vectorial q, call number is each element copy to successively and H i,jin the current location of corresponding temporary variable vector t subvector; Wherein,
4.4 according to M ldpcOffset2the cycle offset that selected check matrix H i-th row jth non-"-" element is corresponding is found out in matrix, the initial value position of desired data is found out, the cycle offset of the initial value position+correspondence of the initial value position=variable node vector q subvector q' of desired data in variable node vector q.
4.5 according to M ldpcRound2matrix finds out cycle-index corresponding to selected check matrix H i-th row jth non-"-" element, by (M behind the initial value position of desired data ldpcRound2) i,j* W data copy in the current location of temporary variable vector t subvector t'.
If 4.6 now non-first time interative computations, then need to carry out temporary variable vector t subvector t'=temporary variable vector t subvector t'-check-node vector r subvector r' computing, being divided into W element by temporary variable vector t subvector t' and check-node vector r subvector r' all elements is one group, now SIMD optimization can be carried out, namely once-through operation can obtain the element in W temporary variable vector t, see Figure 11, wherein (a 1, a 2..., a w) represent element in one group of temporary variable vector t, (b 1, b 2..., b w) representing element in one group of check-node vector r, rectangle frame represents subtraction operator, (y 1, y 2..., y w) represent the result after computing, i.e. (y 1, y 2..., y w)=(a 1-b 1, a 2-b 2..., a w-b w); If be now first time interative computation, then carry out step 5.
5, the absolute value of all elements in temporary variable vector t subvector t' is calculated, and temporary variable vector after being designated as delivery | t|.Be divided into W element by temporary variable vector t subvector t' to be one group, now can to carry out SIMD optimization, namely once-through operation can obtain the element in W temporary variable vector t subvector t', see Figure 12, and wherein (c 1, c 2..., c w) representing element in one group of temporary variable vector t subvector t', rectangle frame represents modulo operation device, (y 1, y 2..., y w) represent temporary variable vector after delivery | t|, i.e. (y 1, y 2..., y w)=(| c 1|, | c 2| ..., | c w|).It can be used as the vector of the temporary variable after renewal t subvector t'.
6, be worth distributive operation most, and result be stored in value variable vector m matrix M.This process circulates with selected check matrix H line number, namely each to the V in temporary variable vector t ldpcRowLengthv () * W*LdpcBufferNum element operates, obtaining length is V ldpcRowLengthv the result of () * W*LdpcBufferNum is stored in value variable vector m.Once be worth most distributive operation schematic diagram see Figure 15, write as V by with the check matrix temporary variable vector t subvector t' that often row is corresponding ldpcRowLength(v) row and the matrix T of row v, wherein, matrix T veach behavior temporary variable vector t subvector t' in element H i,jcorresponding subvector, carries out cover when columns is inadequate; Most be worth distributive operation and compute matrix T vin the minimum value of every column element and sub-minimum, distribute, and result be stored in value variable vector m subvector matrix M v.Matrix T voften row has LdpcBufferNum sub-block, has W base unit in each sub-block; Contrast the size of base unit in each row, draw minimum value wherein and sub-minimum, and record the line number of the place line number of this minimum value, i.e. index value; Variable vector m will be worth most according to index value to fill, if index value is different from the line number being worth most variable vector m, then insert the minimum value found out being worth most variable vector m, if index value is identical with the line number being worth most variable vector m, then insert the sub-minimum found out being worth most variable vector m.To find out the minimum value sub-minimum of a sub-block, its flow chart is see Figure 14, and concrete steps are as follows:
6.1 comparator matrix T vthe size of the first row and corresponding base unit in first sub-block of the second row, by the line number of smaller value stored in index value, and smaller value is recorded as minimum value, higher value is recorded as sub-minimum, now SIMD optimization can be carried out, carry out twice computing, once get maximum, draw higher value between the two, once go minimum value, draw smaller value between the two, see Figure 11, wherein (a 1, a 2..., a w) represent the element of the first row first sub-block, (b 1, b 2..., b w) representing the element of the second row first sub-block, rectangle frame represents to be got maximum operation device or gets minimum operation device, (y 1, y 2..., y w) represent the result after computing, i.e. (y 1, y 2..., y w)=(max (a 1, b 1), max (a 2, b 2) ..., max (a w, b w)) or (y 1, y 2..., y w)=(min (a 1, b 1), min (a 2, b 2) ..., min (a w, b w)).
6.2 judge whether to reach maximum cycle V ldpcRowLengthv (), if do not reach, then carry out step 6.3; If reach, then carry out step 6.6.
6.3 by matrix T vnext line first sub-block and the minimum value of precedence record carry out getting maxima operation, this operation can carry out SIMD optimization, with step 6.1.
The sub-minimum of 6.4 results step 6.3 obtained and current record carries out getting minimum value and operates, and this operation can carry out SIMD optimization, with step 6.1, and result is designated as sub-minimum.
The minimum value of 6.5 results step 6.3 obtained and current record is carried out getting minimum value and is operated, and this operation can carry out SIMD optimization, with step 6.1, and result is designated as minimum value, the line number of this minimum value is recorded as index value simultaneously, returns step 6.2.
The minimum value of current record and sub-minimum are all deducted correction value β by 6.6, and this operation can carry out SIMD optimization, see Figure 11, and wherein (a 1, a 2..., a w) represent the minimum value of precedence record or sub-minimum, (b 1, b 2..., b w) represent correction value β also can be expressed as (β, β ..., β), rectangle frame represents subtraction operator, (y 1, y 2..., y w) represent the result after computing, i.e. (y 1, y 2..., y w)=(a 1-β, a 2-β ..., a w-β), and be minimum value or sub-minimum by outcome record.
Minimum value and the sub-minimum of 6.7 pairs of current records are revised, and when the minimum value of current record or sub-minimum are less than zero, this value are set to zero, otherwise do not operate, and this operation can carry out SIMD optimization, see Figure 11, and wherein (a 1, a 2..., a w) represent the minimum value of current record or sub-minimum, (b 1, b 2..., b w) represent null value also can be expressed as (0,0 ..., 0), rectangle frame represents correction arithmetic unit, (y 1, y 2..., y w) represent the result after computing, namely y i = if a i < b i , a i = 0 else a i = a i , i = 1,2 , . . . , W , And be minimum value or sub-minimum by outcome record.
6.8 will be worth variable vector m most according to index value fills, if index value be worth most variable vector m subvector matrix M vline number different, then in matrix M vsame position insert the minimum value of current record, if index value be worth most variable vector m subvector matrix M vline number identical, be then worth most variable vector m subvector matrix M vsame position insert the sub-minimum of current record.By matrix M vin element be worth variable vector m subvector most according to the formation that sequentially reads of row major.
7, intermediate variable vector s is calculated.This process circulates with selected check matrix H line number, namely each to the V in temporary variable vector t ldpcRowLengthv () * W*LdpcBufferNum element operates, obtaining length is V ldpcRowLengthv the result of () * W*LdpcBufferNum is stored in intermediate variable vector s.Temporary variable vector t, see Figure 17, is divided into V by an intermediate variable vector s computing schematic diagram ldpcRowLengthv () row, often row has LdpcBufferNum sub-block, has W base unit in each sub-block.To calculate the intermediate variable vector s of a sub-block, its flow chart is see Figure 16, and concrete operations flow process is as follows:
The first row first sub-block of temporary variable vector t and the second row first sub-block are carried out xor operation by 7.1, and this operation can carry out SIMD optimization, see Figure 11, and wherein (a 1, a 2..., a w) represent the first row first sub-block, (b 1, b 2..., b w) representing the second row first sub-block, rectangle frame represents exclusive-OR operator, (y 1, y 2..., y w) represent the result after computing, namely ( y 1 , y 2 , . . . , y W ) = ( a 1 &CirclePlus; b 1 , a 2 &CirclePlus; b 2 , . . . , a W &CirclePlus; b W ) .
7.2 judge whether to reach maximum cycle V ldpcRowLengthv (), if do not reach, then carry out step 7.3; If reach, then carry out step 7.4, and perform from the first row.
Temporary variable vector next line first sub-block of t and the result of step 7.1 are carried out xor operation by 7.3, and this operation can carry out SIMD optimization, see Figure 11, and wherein (a 1, a 2..., a w) represent next line first sub-block, (b 1, b 2..., b w) representing the result of step 7.1, rectangle frame represents exclusive-OR operator, (y 1, y 2..., y w) represent the result after computing, namely ( y 1 , y 2 , . . . , y W ) = ( a 1 &CirclePlus; b 1 , a 2 &CirclePlus; b 2 , . . . , a W &CirclePlus; b W ) , Return step 7.2.
7.4 judge whether to reach maximum cycle V ldpcRowLengthv (), if do not reach, then carry out step 7.5; If reach, then carry out step 8.
Temporary variable vector current line first sub-block of t and the result of step 7.3 are carried out xor operation by 7.5, and this operation can carry out SIMD optimization.
7.6 by the result of step 7.5 with carry out or operate, this operation can carry out SIMD optimization, see Figure 11, and wherein (a 1, a 2..., a w) represent the result of step 7.5, (b 1, b 2..., b w) represent rectangle frame represents or arithmetic unit, (y 1, y 2..., y w) represent the result after computing, i.e. (y 1, y 2..., y w)=(a 1| b 1, a 2| b 2..., a w| b w), and by result stored in intermediate variable vector s the first sub-block in, return step 7.4.
8, calculation check knot vector r, to be length be for it vector.Wherein, this check-node vector comprises subvector corresponding to a line every with check matrix, and its call number is arrive this subvector comprises again and each non-"-" element H i,jcorresponding subvector.The process of calculation check knot vector r circulates with selected check matrix H line number, namely each to the V in temporary variable vector t ldpcRowLengthv () * W*LdpcBufferNum element operates, obtaining length is V ldpcRowLengthv the result of () * W*LdpcBufferNum is stored in check-node vector r.When the calculation check matrix subvector that often row is corresponding calculates, with each non-"-" element H i,jcorresponding subvector is that unit carries out, and calculating index is arrive vector in element.Below to calculate one and non-"-" element H i,jcorresponding check-node vector r subvector r' is example, and its concrete operations flow process is as follows:
8.1 judge whether to reach maximum cycle V ldpcRowLengthv (), if do not reach, then carry out step 8.2; If reach, then carry out step 9.
8.2 will be worth variable vector m and H most i,jcorresponding subvector and intermediate variable vector s and H i,jcorresponding subvector carries out contrast operation, if intermediate variable vector s is less than zero, then result is the complement asked the value being worth most variable vector m, if intermediate variable vector s equals zero, then result is zero, if intermediate variable vector s is greater than zero, then result is to the value being worth most variable vector m.This operation can carry out SIMD optimization, see Figure 12, and wherein (a 1, a 2..., a w) represent value variable vector m, (b 1, b 2..., b w) representing intermediate variable vector s, rectangle frame represents contrast arithmetic unit, (y 1, y 2..., y w) represent the result after computing, namely y i = if b i < 0 , a i = NEG ( a i ) f b i = 0 , a i = 0 else a i = a i , i = 1,2 , . . . , W .
The result of step 8.2 is added with intermediate variable vector s by 8.3.This operation can carry out SIMD optimization, see Figure 11, and wherein (a 1, a 2..., a w) represent the result of step 8.2, (b 1, b 2..., b w) representing intermediate variable vector s, rectangle frame represents adder calculator, (y 1, y 2..., y w) represent the result after computing, i.e. (y 1, y 2..., y w)=(a 1+ b 1, a 2+ b 2..., a w+ b w).
The result of step 8.3 and temporary variable vector t subvector t' are subtracted each other by 8.4.This operation can carry out SIMD optimization, see Figure 11, and wherein (a 1, a 2..., a w) represent the result of step 8.3, (b 1, b 2..., b w) representing temporary variable vector t, rectangle frame represents adder calculator, (y 1, y 2..., y w) represent the result after computing, i.e. (y 1, y 2..., y w)=(a 1-b 1, a 2-b 2..., a w-b w), and result is stored in check-node vector r, return step 8.1.
9, calculate variable node vector q, to be length be for it vector.Wherein, this variable node vector comprises subvector corresponding to a line every with check matrix, and its call number is arrive this subvector comprises again and each non-"-" element H i,jcorresponding subvector q'.The computing formula of variable node vector q subvector q' is namely the value of variable node vector q subvector q' be temporary variable vector t subvector t' value with check-node vectorial r subvector r' value with, and according to the result after check matrix H i-th row jth column element value cyclic shift.In order to adapt to SIMD computing, input temporary variable vector t subvector t' and check-node vector r subvector r' length are Z, and the large young pathbreaker of output variable knot vector q subvector q' is but in output subvector, only have front Z element to be wherein valid data.The computing of variable node vector q is often gone non-"-" element number with selected check matrix H and is circulated, and such as selected check matrix H i-th row jth non-"-" element will calculate the of variable node vector q arrive the element of position.Its concrete calculation procedure is as follows:
9.1 according to M ldpcOffset1the cycle offset that selected check matrix H i-th row jth non-"-" element is corresponding is found out in matrix, required original position is found out, the cycle offset of the initial value position+correspondence of required original position=variable node vector q subvector q' in variable node vector q.
Temporary variable vector t subvector t' is added with check-node vector r subvector r' by 9.2.This operation can carry out SIMD optimization, see Figure 11, and wherein (a 1, a 2..., a w) represent temporary variable vector t, (b 1, b 2..., b w) representing check-node vector r, rectangle frame represents adder calculator, (y 1, y 2..., y w) represent the result after computing, i.e. (y 1, y 2..., y w)=(a 1+ b 1, a 2+ b 2..., a w+ b w).
9.3 according to M ldpcRound3matrix finds out cycle-index corresponding to selected check matrix H i-th row jth non-"-" element, by (M behind the initial value position of step 9.2 result data ldpcRound1) i,j* W data copy the required original position found out in step 9.1 to.
9.4 according to M ldpcAssemble2matrix, judges whether to need to carry out padding operation.If (M ldpcAssemble2) i,j=1, then according to M ldpcAssembleTable1side-play amount indicated in matrix carries out padding operation; If (M ldpcAssemble2) i,j=0, then do not need padding operation.
9.5 according to M ldpcOffset2find out the cycle offset that selected check matrix H i-th row jth non-"-" element is corresponding in matrix, in variable node vector q, find out required original position, the cycle offset of the initial value position+correspondence of required original position=variable node vector q.
9.6 according to M ldpcRound4matrix finds out cycle-index corresponding to selected check matrix H i-th row jth non-"-" element, by (M behind the initial value position of step 9.2 result data ldpcRound4) i,j* W data copy the required original position found out in step 9.5 to.
9.7 cover number needed for indicated by LdpcRemain, then according to M ldpcAssembleTable1side-play amount indicated in matrix carries out complement operation to variable node vector q surplus element, returns step 2.
The above-mentioned LDPC coding&decoding method be in the application.
The coding and decoding theory of LDPC code is comparatively ripe, but because LDPC code is the linear block codes that a kind of code length n is larger, check matrix H is also larger, algorithm complex is very high, traditional LDPC coding and decoding mode is not well positioned to meet the throughput requirement of IEEE 802.11n system, has largely had influence on the performance of system.In existing high speed wireless access system, the realization of LDPC code is mostly based on fpga chip and dsp chip.Although can be met the requirement of process and time delay in Modern High-Speed protocol of wireless local area network by previous methods, FPGA programming and professional DSP all more complicated, lack abundant programmed environment and debugging acid, applicability is general.And based on GPP chip, developer can use common computer to use abundant instrument to develop, as C/C++ environment under the structure be familiar with and environment.The innovative point of this patent be exactly in GPP chip to the LDPC code in high speed wireless access system when using original coder, the characteristic according to GPP chip is optimized coding and decoding method.LDPC code due to IEEE802.11n is irregular LDPC codes, and the nonnegative value number that its check matrix prototype is often gone is not necessarily identical, so phase LDPC code Encoding Realization method than ever, the flexibility of GPP chip can have great advantage.In addition, consider the high speed development of CPU (Central Processing Unit, central processing unit), the data-handling capacity of GPP chip also can constantly promote.
First, SIMD instruction is adopted to realize the parallel processing of data.SIMD instruction set, Intel CPU adopted in this patent also can be called SSE (Streaming SIMD Extensions, instruction set) instruction set, its essential concept is the effect processing to obtain parallel processing within a clock cycle of CPU to multiple data, and similarly is not common occupation mode---each clock cycle only carries out a data processing operation.For the CPU of Nehalem framework, its process bit wide is 128 bits, for the CPU of Sandy Bridge framework, its process bit wide is 256 bits, namely for 8 bit fixed point numbers, the former can process 16 data within an instruction cycle, and the latter can process 32 data within an instruction cycle, theoretically, degree of parallelism be respectively 16 times parallel and 32 times walk abreast.But can be known by actual procedure simulation result, desirable parallel multiple often cannot be reached in the system operation of reality, one side is because of program and non-fully is made up of data manipulation flow process, further comprises a large amount of judgement statements, and these judge that statement cannot carry out parallel work-flow simultaneously.On the other hand, if adopt the SIMD instruction of 128 bit bit wides, for the check matrix of IEEE 802.11n standard, each submatrix size is not the multiple of 16, and therefore during last group data processing of each submatrix size, degree of parallelism is less than 16.
Secondly, by adopting the method for look-up table, namely initialization is carried out to multiple known parameter, and mark the particular location of desired data by the method for side-play amount, exchange computation complexity for internal memory, improve the data processing speed of LDPC code coding and decoding method.In LDPC code encoder, the block matrix needed for coding under different code check and code length condition can be calculated in advance, and be stored in LUTs (Look-Up-Table, look-up table), as long as table is read in when program brings into operation, without the need to double counting.
Finally, have employed the method for multithreading, perform more than one thread at one time, and then the disposed of in its entirety performance of elevator system.In LDPC co mpiler optimization code method, to the operation wherein in units of check matrix data line, be optimized by the method for multithreading, Thread Count is the line number of check matrix.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (12)

1. based on a LDPC coding method for general processor, comprising: obtain signal vector S to be encoded by signals collecting or reception, determine check matrix H and matrix in block form A, B, D, E, F and T, and preserve; According to p 1 T = ( ET - 1 A + F ) S T p 2 T = T - 1 ( AS T + Bp 1 T ) Determine vectorial p 1and p 2, and obtain coding result vector c=(S, the p of LDPC 1, p 2); It is characterized in that, describedly determine vectorial p 1and p 2arbitrary matrix of Shi Jinhang comprises with the process that is multiplied of arbitrary vector:
Using every a line of described arbitrary matrix as a thread, carry out the corresponding line of this matrix and the multiplication operations of described arbitrary vector, and the multiplied result of all row is combined formation result vector;
Wherein, every a line of described arbitrary matrix comprises with the multiplication operations of described arbitrary vector: the original position+A determining vectorial original position=described arbitrary vector that each element j of current i-th row of matrix is corresponding i,j+ (j-1) * Z, by described arbitrary vector from described original position Z-A i,jthe data of length are shifted left by the mode of single-instruction multiple-data stream (SIMD) SIMD, and A before described original position is started i,jafter data after the data of length move to and shift left, obtain the vector shift result that described element j is corresponding; Again by vector shift results added corresponding for each element, as the multiplied result of described every a line and described arbitrary vector;
In the mode of described SIMD, will from described original position Z-A i,jthe data of length are divided in units of length W section is right segment data is parallel carries out operation of shifting left, then by remaining (Z-A i,j) data of modW length carry out operation of shifting left;
Z is the submatrix size of an element representative in described check matrix.
2. method according to claim 1, is characterized in that, when described arbitrary matrix is T -1time, described T -1every a line and the multiplication operations of corresponding vector time, only carry out T -1value is being multiplied of element and the corresponding vector of 0, and obtaining this value is the vector shift result that 0 element is corresponding, and vector shift result corresponding for all the other elements is set to null vector; Again by vector shift results added corresponding for each element, as the multiplied result of described every a line and described arbitrary vector.
3. method according to claim 1 and 2, is characterized in that, shift left after operation to W segment data to get a front Z data be valid data simultaneously.
4. method according to claim 1 and 2, is characterized in that, describedly vector shift results added corresponding for each element is comprised: vector shift result corresponding for each element be divided in units of length W section, by SIMD couple segment data is parallel carries out phase add operation, then by remaining (Z-A i,j) data of modW length carry out phase add operation.
5. method according to claim 1 and 2, is characterized in that, described matrix A, B, D, E, F and T -1preserved by linear search table.
6. based on a LDPC interpretation method for general processor, comprising: receive encoded LDPC code word signal c, determine check matrix H; Calculate variable node vector q as decode results by successive ignition, during each iteration, calculating temporary variable vector according to current variable node vector q and check-node vector r is and upgrade check-node vector r according to the vectorial t of described temporary variable, then according to check-node vector r and temporary variable vector t renewal variable node vector q be during first iteration, using character signal c as variable node vector q, verification knot vector r is set to 0; It is characterized in that,
When each iterative computation temporary variable vector t, check-node vector r and variable node vector q, carry out computing and renewal using every a line of check matrix as a thread, obtain with often to go in corresponding vectorial t, q and r call number from [ &Sigma; v = 1 i - 1 V LdpcRowLength ( v ) ] &times; Z Arrive [ &Sigma; v = 1 i - 1 V LdpcRowLength ( v ) ] &times; Z - 1 Subvector; Wherein, i is the line index of check matrix, when the i-th row of corresponding described check matrix calculates temporary variable vector t, check-node vector r and variable node vector subvector corresponding to q, according to each non-"-" element H of this row of check matrix i,jwith element H in corresponding compute vector t, q and r i,jcorresponding call number from [ j - 1 + &Sigma; v = 1 i - 1 V LdpcRowLength ( v ) ] &times; Z Arrive [ j + &Sigma; v = 1 i - 1 V LdpcRowLength ( v ) ] &times; Z - 1 Subvector, then carry out successively connecting and obtain and often go corresponding subvector, during i=1, order
Calculate and H i,jthe mode of corresponding temporary variable vector t subvector is: determine H i,jcorresponding vectorial original position Z* (n-1)+H i,n, original position described in vectorial q subvector corresponding for the i-th row is played length is or the data of 6 are copied to and H by the mode of SIMD i,jthe beginning of corresponding temporary variable vector t subvector; At H i,n≠ 0, H i,n≠ '-' and (Z-H i,n) modW ≠ 0 time, determine matrix M ldpcAssemble1in with check matrix element H i,jthe value of each element in corresponding row and will with element H i,jin the subvector of corresponding current vectorial q, call number is each element copy to successively and H i,jin the current location of corresponding temporary variable vector t subvector; Determine each element H again i,jcorresponding secondary vector original position M ldpcOffset2, described secondary vector original position is played length is data copied to and H by the mode of SIMD i,jin the current location of corresponding temporary variable vector t subvector; Get and H i,jfront Z position in corresponding temporary variable vector t subvector and take absolute value as with H i,jeffective subvector of corresponding temporary variable vector t;
Work as H i,n≠ 0, H i,n≠ '-' and (Z-H i,n) modW ≠ 0 time, work as H i,n=0 or H i,n='-' or (Z-H i,n) modW=0 time, (M ldpcOffset2) i,j=Z* (n-1); k is that general processor once can deal with data amount size, and k is the fundamental unit size of SIMD process; Code length L lDPCwhen=648, LdpcRemain=11; As code length L lDPCwhen=1296, LdpcRemain=6; As code length L lDPCwhen=1944, LdpcRemain=1; J is the index of each non-"-" element in this row all non-"-" element in the i-th row, and n is the i-th row jth column index of non-"-" element in check matrix.
7. method according to claim 6, is characterized in that, calculates with the mode of the check-node vector r subvector that often row is corresponding of check matrix to be:
Write as V by with the check matrix temporary variable vector t subvector that often row is corresponding ldpcRowLength(v) row and W* the matrix T of row v, wherein, described matrix T veach behavior described in temporary variable vector t subvector with element H i,jcorresponding subvector, carries out cover when columns is inadequate;
To described matrix T vbe worth most distribution, be worth variable vector m subvector matrix M most v;
According to described matrix T vcalculate intermediate variable vector s subvector matrix S v;
According to described matrix M vwith described matrix S vthe element that middle index value is identical, determines an intermediary matrix R v' in the element value of respective index value; Wherein, if matrix S vin arbitrary element be less than 0, then get the complement of this arbitrary element and be added with this arbitrary element, using addition result as matrix R v' in be worth the value of identical element with described arbitrary element index; If matrix S vin arbitrary element equal 0, then this arbitrary element is added with 0, using addition result as matrix R vin be worth the value of identical element with described arbitrary element index; If matrix S vin arbitrary element >0, then in matrix M vin get and be worth identical element with described arbitrary element index and be added with described arbitrary element, using addition result as matrix R vin be worth the value of identical element with described arbitrary element index; Described operation of comparing and be added is undertaken by the mode of SIMD;
By SIMD mode by described matrix R v' and matrix T vthe element that middle index value is identical subtracts each other, using result as check-node vector r subvector matrix R vthe element value of middle same index value; By described matrix R vin front Z element of every row read the vectorial r subvector of composition check-node successively according to the mode of row major.
8. method according to claim 7, is characterized in that, described in be worth distribution most and comprise:
Described matrix T is determined by the mode of SIMD vin minimum value of each row and sub-minimum and line index corresponding to minimum value; The minimum value obtained and sub-minimum are revised, all deducts default correction value β, when revised minimum value and sub-minimum are less than 0, are set to 0, otherwise remain unchanged;
According to described matrix T vin the current minimum value of each row, sub-minimum and line index corresponding to minimum value, structure value variable vector m subvector matrix M vthe row of middle same index, wherein, at M varbitrary row in, be set to the element of the corresponding identical line index of current minimum value the minimum value determined, all the other elements be set to sub-minimum.
9. method according to claim 8, is characterized in that, the described mode by SIMD determines that the mode of each minimum value arranged and sub-minimum and corresponding line index comprises:
By described matrix T veach row element be divided into individual sub-block, each sub-block comprises W base unit; In more described matrix T vin the element of any two row time, compare W base unit by the mode of SIMD is disposable.
10. method according to claim 7, is characterized in that, described calculating intermediate variable vector s subvector matrix S vcomprise:
For matrix T vin each row, this row all elements is carried out xor operation, then by result and i-th ' row element XOR after to carry out with 0x7f or operate, general or operating result are as intermediate vector matrix S vmiddle same index row i-th ' row element; Wherein, by described matrix T veach row element be divided into individual sub-block, each sub-block comprises W base unit, when carrying out XOR/or operation, by XOR/or the operation of the disposable execution W base unit of the mode of SIMD.
11. methods according to claim 6, is characterized in that, calculate and H i,jcorresponding variable node vector q subvector comprises:
Determine H i,jcorresponding vectorial original position Z* (n-1)+H i,n, by SIMD mode by H i,jcorresponding temporary variable vector t subvector and H i,jcorresponding check-node vector r subvector is added, and original position described in result vector is played length is or the data of 5 are copied to and H by the mode of SIMD i,jthe beginning of corresponding variable node vector q subvector; At H i,n≠ 0, H i,n≠ '-' and (Z-H i,n) modW ≠ 0 time, determine matrix M ldpcAssemble1in with check matrix element H i,jthe value of each element in corresponding row and will with element H i,jin the subvector of corresponding current vectorial q, call number is each element copy to successively and H i,jin the current location of corresponding variable node vector q subvector;
Determine each element H i,jcorresponding secondary vector original position M ldpcOffset2, described secondary vector original position is risen length be 0 or data copied to and H by the mode of SIMD i,jin the current location of corresponding variable node vector q subvector;
According to the cover number of LdpcRemain instruction, according to M ldpcAssemble1in with check matrix element H i,jin corresponding row, the value of element carries out cover.
12., according to described method arbitrary in claim 6 to 11, is characterized in that, precalculate and preserve each element H i,jcorresponding vectorial original position Z* (n-1)+H i,nwith secondary vector original position M ldpcOffset2, matrix M ldpcAssemble1, the vectorial V that forms of the number of often going non-"-" element in check matrix ldpcRowLength, M ldpcAssemble1, LdpcRemain.
CN201510026526.1A 2015-01-20 2015-01-20 A kind of LDPC coding and decoding methods based on general processor Active CN104617959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510026526.1A CN104617959B (en) 2015-01-20 2015-01-20 A kind of LDPC coding and decoding methods based on general processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510026526.1A CN104617959B (en) 2015-01-20 2015-01-20 A kind of LDPC coding and decoding methods based on general processor

Publications (2)

Publication Number Publication Date
CN104617959A true CN104617959A (en) 2015-05-13
CN104617959B CN104617959B (en) 2017-09-05

Family

ID=53152273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510026526.1A Active CN104617959B (en) 2015-01-20 2015-01-20 A kind of LDPC coding and decoding methods based on general processor

Country Status (1)

Country Link
CN (1) CN104617959B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104967455A (en) * 2015-07-09 2015-10-07 北京邮电大学 Recursive encoding method of spatially-coupled low-density parity check codes
CN105897278A (en) * 2016-03-30 2016-08-24 联想(北京)有限公司 Information processing method and storage device
CN106921395A (en) * 2015-12-28 2017-07-04 北京忆芯科技有限公司 LDPC coding methods and its device
CN108365849A (en) * 2018-01-10 2018-08-03 东南大学 The long LDPC code coding/decoding method of multi code Rate of Chinese character multi-code based on SIMD instruction collection
CN108874744A (en) * 2017-05-08 2018-11-23 辉达公司 The broad sense of matrix product accumulating operation accelerates
CN114667698A (en) * 2019-12-25 2022-06-24 华为技术有限公司 Check sum calculation method and circuit
WO2022268064A1 (en) * 2021-06-25 2022-12-29 华为技术有限公司 Data transmission method and related apparatus
US11816482B2 (en) 2017-05-08 2023-11-14 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7480848B2 (en) * 2006-02-10 2009-01-20 The Directv Group, Inc. Methods and apparatus to select tornado error correction parameters
CN102932003A (en) * 2012-09-07 2013-02-13 上海交通大学 Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework
US20130173956A1 (en) * 2011-12-30 2013-07-04 Streamscale, Inc. Using parity data for concurrent data authentication, correction, compression, and encryption

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7480848B2 (en) * 2006-02-10 2009-01-20 The Directv Group, Inc. Methods and apparatus to select tornado error correction parameters
US20130173956A1 (en) * 2011-12-30 2013-07-04 Streamscale, Inc. Using parity data for concurrent data authentication, correction, compression, and encryption
CN102932003A (en) * 2012-09-07 2013-02-13 上海交通大学 Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DEBAPRIYA CHATTERJEE AND VALERIA BERTACCO: "EQUIPE:Parallel Equivalence Checking with GP-GPUs", 《COMPUTER DESIGN(ICCD),2010 IEEE INTERNATIONAL CONFERENCE ON》 *
MARCO GOMES ET AL.: "SERIAL LDPC DECODING ON A SIMD DSP USING HORIZONTAL SCHEDULING", 《14TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2006),FLORENCE,ITALY》 *
黄双渠 等: "基于SIMD结构的多标准LDPC译码器的VLSI实现", 《计算机研究与发展》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104967455B (en) * 2015-07-09 2018-02-23 北京邮电大学 The recursive encoding method of Space Coupling low density parity check code
CN104967455A (en) * 2015-07-09 2015-10-07 北京邮电大学 Recursive encoding method of spatially-coupled low-density parity check codes
CN106921395A (en) * 2015-12-28 2017-07-04 北京忆芯科技有限公司 LDPC coding methods and its device
CN105897278A (en) * 2016-03-30 2016-08-24 联想(北京)有限公司 Information processing method and storage device
US11797301B2 (en) 2017-05-08 2023-10-24 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
CN108874744A (en) * 2017-05-08 2018-11-23 辉达公司 The broad sense of matrix product accumulating operation accelerates
US11816481B2 (en) 2017-05-08 2023-11-14 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
CN108874744B (en) * 2017-05-08 2022-06-10 辉达公司 Processor, method and storage medium for performing matrix multiply-and-accumulate operations
US11816482B2 (en) 2017-05-08 2023-11-14 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
US11797302B2 (en) 2017-05-08 2023-10-24 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
US11797303B2 (en) 2017-05-08 2023-10-24 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
CN108365849A (en) * 2018-01-10 2018-08-03 东南大学 The long LDPC code coding/decoding method of multi code Rate of Chinese character multi-code based on SIMD instruction collection
CN108365849B (en) * 2018-01-10 2021-03-09 东南大学 Multi-code-rate multi-code-length LDPC code decoding method based on SIMD instruction set
CN114667698A (en) * 2019-12-25 2022-06-24 华为技术有限公司 Check sum calculation method and circuit
CN114667698B (en) * 2019-12-25 2024-04-12 华为技术有限公司 Checksum calculation method and circuit
WO2022268064A1 (en) * 2021-06-25 2022-12-29 华为技术有限公司 Data transmission method and related apparatus

Also Published As

Publication number Publication date
CN104617959B (en) 2017-09-05

Similar Documents

Publication Publication Date Title
CN104617959A (en) Universal processor-based LDPC (Low Density Parity Check) encoding and decoding method
CN107145939B (en) Computer vision processing method and device of low-computing-capacity processing equipment
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN101192833B (en) A device and method for low-density checksum LDPC parallel coding
CN111162797B (en) Encoding device and encoding method of rate compatible 5G LDPC code
CN112106078A (en) Neural network processing element
CN109379086A (en) The 5G LDPC coding method of the code-rate-compatible of low complex degree and encoder
CN107704916A (en) A kind of hardware accelerator and method that RNN neutral nets are realized based on FPGA
CN107229967A (en) A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107786211B (en) Algebraic structure obtaining method, encoding method and encoder of IRA-QC-LDPC code
JP4534128B2 (en) Encoding method and apparatus
CN111831254A (en) Image processing acceleration method, image processing model storage method and corresponding device
CN112114776A (en) Quantum multiplication method and device, electronic device and storage medium
CN110741557B (en) Low delay polarization encoding and decoding by combining stages of polarization code patterns
CN101273532A (en) Decoding device, and receiving device
US9928037B2 (en) Modulo calculation using polynomials
US20120317466A1 (en) Method and apparatus for data check processing
CN114063973B (en) Galois field multiplier and erasure coding and decoding system
CN112039535A (en) Code rate compatible LDPC encoder based on quasi-cyclic generator matrix
CN105099467B (en) The coding method of QC-LDPC code and code device
CN101777922B (en) High-speed and low-delay Berlekamp-Massey iteration decoding circuit for broadcast channel (BCH) decoder
CN113472358B (en) High-speed parallel encoder based on quasi-cyclic generation matrix
CN100586029C (en) A kind of coding method of structured odd-even check code and encoder thereof
CN110990776B (en) Coding distributed computing method, device, computer equipment and storage medium
Xu et al. An efficient CNN training accelerator leveraging transposable block sparsity

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant