CN104617959A - Universal processor-based LDPC (Low Density Parity Check) encoding and decoding method - Google Patents
Universal processor-based LDPC (Low Density Parity Check) encoding and decoding method Download PDFInfo
- Publication number
- CN104617959A CN104617959A CN201510026526.1A CN201510026526A CN104617959A CN 104617959 A CN104617959 A CN 104617959A CN 201510026526 A CN201510026526 A CN 201510026526A CN 104617959 A CN104617959 A CN 104617959A
- Authority
- CN
- China
- Prior art keywords
- vector
- matrix
- subvector
- row
- check
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses an LDPC (Low Density Parity Check) encoding method. The method comprises the following steps: determining vectors p1 and p2 and obtaining an encoding result vector, wherein multiplication processing of any matrix and any vector during determination of the vectors p1 and p2 comprises the steps of taking each row of any matrix as a thread, multiplying the corresponding row of the matrix by any vector and constituting the multiplication results of all rows into a result vector; the multiplication operation of any row of any matrix by any vector comprises the steps of determining a vector starting position corresponding to each element j of the ith row of the matrix, performing left shift on data of length Z-Ai, j from a starting position in any vector through a single-instruction multiple-data stream mode, shifting the data of length Ai, j in front of the starting position to the space after the data subjected to left shift to obtain a vector shift result corresponding to the element j and adding the vector shift result of each element. Through the method, the encoding speed can be improved in a universal processor by using multi-thread and SIMD (Single Instruction Multiple Data) processing.
Description
Technical field
The application relates to LDPC coding and decoding technology, particularly a kind of LDPC coding and decoding method based on general processor.
Background technology
LDPC code is the linear block codes that a kind of code length is larger.Its check matrix is also comparatively large, and nonzero element in check matrix is little, and namely the number of " 1 " is little, therefore claims low-density.
In the process realizing IEEE 802.11n WLAN (wireless local area network) host-host protocol, LDPC coding and decoding technology need be used, according to protocol requirement, wherein LDPC PPDU (Presentation Protocol Data Unit, presentation protocol data unit) generative process as follows, see Fig. 1:
(1) shortening bit is calculated
(1a) available bit number N is calculated
avbits, formula is:
N
pld=length×8+16,
Wherein, if having STBC (Space-time block code) precoding, then flag bit m
sTBCbe 2, otherwise be 1; N
cBPSrepresent the number of coded bits of each symbol; Length represents the byte number of PSDU (presentation Service DataUnit), is the byte number of information bit position; N
pldrepresent total bit number of PSDU and SERVICE FIELD; R presentation code code check.
(1b) LDPC code word number N is calculated
cWwith code length L
lDPC
Work as N
avbitswhen≤648, code word number N
cWbe 1, and if N
avbits>=N
pldduring+912 × (1-R), code length L
lDPCbe 1296, otherwise code length L
lDPCbe 648; Work as 648<N
avbitswhen≤1296, code word number N
cWbe 1, and if N
avbits>=N
pldduring+1464 × (1-R), code length L
lDPCbe 1944, otherwise code length L
lDPCbe 1296; Work as 1296<N
avbitswhen≤1944, code word number N
cWbe 1, now code length L
lDPCbe 1944; Work as 1944<N
avbitswhen≤2592, code word number N
cWbe 2, and if N
avbits>=N
pldduring+2916 × (1-R), code length L
lDPCbe 1944, otherwise code length L
lDPCbe 1296; Work as N
avbitsduring >2592, code word number N
cWfor
now code length L
lDPCbe 1944;
(1c) shortening bit number N is calculated
shrt, after shortening bit is filled into information bit position before LDPC coding:
N
shrt=max(0,(N
CW×L
LDPC×R)-N
pld)
Work as N
shrtwhen=0, do not carry out benefit 0 and operate.Work as N
shrtduring >0, shorten bit at all N
cWindividual code word is evenly distributed, namely each code assignment to shortening bit number be
if N
shrtmodN
cW≠ 0, wherein mod is remainder, i.e. N
shrtto N
cWremainder, then first more than other code words one of code word shortens bit.
(2) carry out LDPC coding, obtain check bit position.
(3) shortening bit is abandoned
(4) calculate punching bit position number and abandon punching bit position, calculating the rear punching bit position number N of LDPC coding according to following formula
punc:
N
punc=max(0,(N
CW×L
LDPC)-N
avbits-N
shrt)
If
or (N
punc>0.3 × N
cW× L
lDPC× (1-R)), increase N
avbitsthen N is recalculated according to following formula
punc:
N'
avbits=N
avbits+N
CBPS×m
STBC,N
punc=max(0,(N
CW×L
LDPC)-N'
avbits-N
shrt)
Punching bit position is at all N
cWindividual code word is evenly distributed, namely each code assignment to punching bit position number be
if N
puncmodN
cW≠ 0, wherein mod is remainder, i.e. N
puncto N
cWremainder, then first code word punching bit position more than other code words.
(5) calculate repetition bits position, calculate repetition bits position number N according to following formula
rep:
N
rep=max(0,N'
avbits-N
CW×L
LDPC×(1-R)-N
pld)
Repetition bits position is at all N
cWindividual code word is evenly distributed, namely each code assignment to repetition bits position number be
if N
repmodN
cW≠ 0, wherein mod is remainder, i.e. N
repto N
cWremainder, then first code word repetition bits position more than other code words.Repetition bits position order from first bit of information bit position is chosen, until meet length requirement, repetition bits position is copied from the code word after the shortening bit removed.The repetition bits position selected is linked in sequence after check bit position.When needs punch, check bit position does not need repetition, and vice versa.
In LDPC PPDU generative process, LDPC coding method is the most important, and the codeword vector exported after LDPC coding is designated as c=(S, p
1, p
2), wherein S is information vector, p
1and p
2for code word verification vector, but because of the check matrix H of LDPC code comparatively large, the computing in cataloged procedure will be very loaded down with trivial details.Observe the check matrix provided in agreement can find out, its row weight average of the matrix under different code check R is 24, and its column weight is 24 × (1-R), according to the characteristic of check matrix H, is carried out following piecemeal
Be divided into matrix A, matrix B, matrix D, matrix E, matrix T and matrix F six submatrixs, wherein the structure of matrix B, matrix D, matrix E and matrix T is comparatively special, B=(1--... 0-...)
t, D=(1), E=(-...-0),
the structure irregularities of matrix A and matrix F, see agreement 802.11n.In addition, because check matrix H is larger, therefore, when representing check matrix, by a submatrix in an actual check matrix of element representation, particularly, in the method for expressing of check matrix H and matrix in block form A, B, D, E, T, F, "-" represents that this submatrix is null matrix, and " 0 " represents that this submatrix is unit matrix, and " constant C " represents that this submatrix is the matrix of consequence behind unit Matrix C time ring shift right position.Wherein the dimension of submatrix is that Z*Z, Z can determine according to code length in advance.By the way, the expression size of check matrix and each matrix in block form can be greatly reduced.
Under the prerequisite of known check matrix H and information vector S, determine that the concrete mode of codeword vector c is: according to check equations Hc
t=0
tcan score solve an equation
Can obtain after optimizing
Obtain p
1and p
2after, codeword vector c=(S, p can be obtained
1, p
2).
With reference to the composition of the encoder of above-mentioned LDPC cataloged procedure see Fig. 2, wherein containing 4 kinds of functional modules: pre-encoding matrix generator, matrix multiplier, matrix adder and LDPC code word synthesizer.
6 pre-encoding matrix generator are had in this encoder, its input is a matrix, matrix A, matrix B, matrix D, matrix E, matrix T and matrix F six submatrixs respectively, its output is a matrix, be through the matrix after pre-encoding matrix generator process, its function is the mode stored by compression by the matrix of input, namely only deposits matrix non-zero element, input matrix is converted, obtains output matrix.
6 multipliers are had in this encoder, it has two inputs output, two inputs be respectively information vector S, matrix A, by the matrix after pre-encoding matrix generator process, by the matrix of consequence of other multipliers or by two in the matrix of consequence after adder, its output is a matrix, be the result vector after two inputs carry out multiplying, its function is that two inputs are carried out matrix multiplication operation and Output rusults matrix.
This encoder has 2 matrix adders, its input is two matrixes, and be matrix multiplier in encoder and export rear matrix, its output is a matrix, be the result after two input matrixes carry out addition of matrices, its function is that two input matrixes are carried out addition of matrices and Output rusults matrix.
Have 1 LDPC code word synthesizer in this encoder, its input is three vectors, is information vector S, code word verifies vectorial p
1vectorial p is verified with code word
2, its output is a vector, is codeword vector c, and its function is that information vector S, code word are verified vectorial p
1vectorial p is verified with code word
2three vector synthesis codeword vector c=(S, p
1, p
2) and codeword vector c.
Above-mentioned existing LDPC coding method and the corresponding encoder of being is formed.At receiving terminal, also needing the LDPC code word to receiving to carry out decoding, obtaining the information vector of rebuilding.Existing LDPC decoding technique, its key step is as follows:
(1) M checkpoint is divided into M
blayer, every layer comprises T check-node.Next, one deck connects the order execution decode procedure of one deck.In ground floor processing procedure, calculate the information of check-node and variable node, after ground floor decode procedure terminates, the second layer uses the information of the variable node obtained from ground floor to carry out initialization, and by that analogy;
(2) initialization: use LLRs (log-likelihood ratios, namely
information to variable node
value carry out initialization, and by all check-node information
be set to 0, the iterations of decoding algorithm is I, and iterative process is carried out by row, the n ∈ N wherein in minimum-sum algorithm
mrepresent check matrix prototype H
bin [H
b]
m,nthe row of ≠ '-';
(3) minimize: variable node vector q
nring shift right position (shift count S (m, n)=[H
b]
m,n) deduct check-node information
be there is vectorial t in result
nin, according to OMS (offset min-sum, namely
value reuse characteristic, only need minimum value and the sub-minimum of element in compute vector;
(4) minimum value is selected: to n ∈ N
m, calculate and upgrade q
nwith
value.
For realizing above-mentioned interpretation method, existing decoder, see Fig. 3, is made up of 4 parts, is respectively initialization translator unit, minimum value and sub-minimum selected cell, data brachymemma unit and cycle shift unit.
Initialization translator unit in this decoder, its input is a LDPC test matrix, its output is a test matrix after the process of initialization decoding unit, and its function to be carried out by test matrix storing conversion according to decoder input requirements and test matrix after output processing.
Minimum value in this decoder and sub-minimum selected cell, its input is two matrixes, one of them is through the test matrix after the process of initialization translator unit, another is LDPC code word matrix c, namely the LDPC code word matrix c after wireless channel transmission, its output is a matrix after minimum value and the process of sub-minimum selected cell, and its function is the minimum value of the difference calculated between variable node ring shift right position and check-node and sub-minimum and Output rusults matrix is supplied to data brachymemma unit and cycle shift unit.
The data brachymemma unit of this decoder, its input be one by the matrix exported after minimum value and the process of sub-minimum selected cell, its output is a matrix after data brachymemma cell processing, its function is the spilling for preventing check-node information, data brachymemma process is carried out to it, and Output rusults matrix is supplied to cycle shift unit.
The cycle shift unit of this decoder, its input is two matrixes, one of them is by the matrix exported after data brachymemma cell processing, another is by the matrix exported after minimum value and the process of sub-minimum selected cell, its output is a matrix after cycle shift unit process, its function adds calculate variable node matrix by minimum value matrix and check-node matrix being carried out step-by-step mould two, and output variable node matrix equation.
As mentioned above, the coding and decoding theory of current LDPC code is comparatively ripe, but because LDPC code is the linear block codes that a kind of code length is larger, check matrix is also larger, algorithm complex is very high, traditional LDPC coding and decoding mode is not well positioned to meet the throughput requirement of IEEE 802.11n system, has largely had influence on the performance of system.In existing high speed wireless access system, the realization of LDPC code is mostly based on FPGA (Field-Programmable GateArray, field programmable gate array) chip and DSP (Digital Signal Processor, Digital Signal Processing) chip.Although can be met the requirement of process and time delay in Modern High-Speed protocol of wireless local area network by previous methods, FPGA programming and professional DSP all more complicated, lack abundant programmed environment and debugging acid, applicability is general.
Summary of the invention
The application provides a kind of LDPC coding and decoding method based on general processor, can realize LDPC coding and decoding efficiently on aageneral-purposeaprocessor.
For achieving the above object, the application adopts following technical scheme:
Based on a LDPC coding method for general processor, comprising: obtain signal vector S to be encoded by signals collecting or reception, determine check matrix H and matrix in block form A, B, D, E, F and T, and preserve; According to
Determine vectorial p
1and p
2, and obtain coding result vector c=(S, the p of LDPC
1, p
2); Wherein, describedly vectorial p is determined
1and p
2arbitrary matrix of Shi Jinhang comprises with the process that is multiplied of arbitrary vector:
Using every a line of described arbitrary matrix as a thread, carry out the corresponding line of this matrix and the multiplication operations of described arbitrary vector, and the multiplied result of all row is combined formation result vector;
Wherein, every a line of described arbitrary matrix comprises with the multiplication operations of described arbitrary vector: the original position+A determining vectorial original position=described arbitrary vector that each element j of current i-th row of matrix is corresponding
i,j+ (j-1) * Z, by described arbitrary vector from described original position Z-A
i,jthe data of length are shifted left by the mode of single-instruction multiple-data stream (SIMD) SIMD, and A before described original position is started
i,jafter data after the data of length move to and shift left, obtain the vector shift result that described element j is corresponding; Again by vector shift results added corresponding for each element, as the multiplied result of described every a line and described arbitrary vector;
In the mode of described SIMD, will from described original position Z-A
i,jthe data of length are divided in units of length W
section is right
segment data is parallel carries out operation of shifting left, then by remaining (Z-A
i,j) data of modW length carry out operation of shifting left;
Z is the submatrix size of an element representative in described check matrix.
Preferably, when described arbitrary matrix is T
-1time, described T
-1every a line and the multiplication operations of corresponding vector time, only carry out T
-1value is being multiplied of element and the corresponding vector of 0, and obtaining this value is the vector shift result that 0 element is corresponding, and vector shift result corresponding for all the other elements is set to null vector; Again by vector shift results added corresponding for each element, as the multiplied result of described every a line and described arbitrary vector.
Preferably, W segment data is shifted left after operation simultaneously to get a front Z data be valid data.
Preferably, described vector shift results added corresponding for each element to be comprised: vector shift result corresponding for each element is divided in units of length W
section, by SIMD couple
segment data is parallel carries out phase add operation, then by remaining (Z-A
i,j) data of modW length carry out phase add operation.
Preferably, described matrix A, B, D, E, F and T
-1preserved by linear search table.
Based on a LDPC interpretation method for general processor, comprising: receive encoded LDPC code word signal c, determine check matrix H; Calculate variable node vector q as decode results by successive ignition, during each iteration, calculating temporary variable vector according to current variable node vector q and check-node vector r is
and upgrade check-node vector r according to the vectorial t of described temporary variable, then according to check-node vector r and temporary variable vector t renewal variable node vector q be
during first iteration, using character signal c as variable node vector q, verification knot vector r is set to 0; Wherein,
When each iterative computation temporary variable vector t, check-node vector r and variable node vector q, carry out computing and renewal using every a line of check matrix as a thread, obtain with often to go in corresponding vectorial t, q and r call number from
arrive
subvector; Wherein, i is the line index of check matrix, when the i-th row of corresponding described check matrix calculates temporary variable vector t, check-node vector r and variable node vector subvector corresponding to q, according to each non-"-" element H of this row of check matrix
i,jwith element H in corresponding compute vector t, q and r
i,jcorresponding call number from
arrive
subvector, then carry out successively connecting and obtain and often go corresponding subvector, during i=1, order
Calculate and H
i,jthe mode of corresponding temporary variable vector t subvector is: determine H
i,jcorresponding vectorial original position Z* (n-1)+H
i,n, original position described in vectorial q subvector corresponding for the i-th row is played length is
or the data of 6 are copied to and H by the mode of SIMD
i,jthe beginning of corresponding temporary variable vector t subvector; At H
i,n≠ 0, H
i,n≠ '-' and (Z-H
i,n) modW ≠ 0 time, determine matrix M
ldpcAssemble1in with check matrix element H
i,jthe value of each element in corresponding row
and will with element H
i,jin the subvector of corresponding current vectorial q, call number is
each element copy to successively and H
i,jin the current location of corresponding temporary variable vector t subvector; Determine each element H again
i,jcorresponding secondary vector original position M
ldpcOffset2, described secondary vector original position is played length is
data copied to and H by the mode of SIMD
i,jin the current location of corresponding temporary variable vector t subvector; Get and H
i,jfront Z position in corresponding temporary variable vector t subvector and take absolute value as with H
i,jeffective subvector of corresponding temporary variable vector t;
Work as H
i,n≠ 0, H
i,n≠ '-' and (Z-H
i,n) modW ≠ 0 time,
work as H
i,n=0 or H
i,n='-' or (Z-H
i,n) modW=0 time, (M
ldpcOffset2)
i,j=Z* (n-1);
k is that general processor once can deal with data amount size, and k is the fundamental unit size of SIMD process; Code length L
lDPCwhen=648, LdpcRemain=11; As code length L
lDPCwhen=1296, LdpcRemain=6; As code length L
lDPCwhen=1944, LdpcRemain=1; J is the index of each non-"-" element in this row all non-"-" element in the i-th row, and n is the i-th row jth column index of non-"-" element in check matrix.
Preferably, calculating with the mode of the check-node vector r subvector that often row is corresponding of check matrix is:
Write as V by with the check matrix temporary variable vector t subvector that often row is corresponding
ldpcRowLength(v) row and
the matrix T of row
v, wherein, described matrix T
veach behavior described in temporary variable vector t subvector with element H
i,jcorresponding subvector, carries out cover when columns is inadequate;
To described matrix T
vbe worth most distribution, be worth variable vector m subvector matrix M most
v;
According to described matrix T
vcalculate intermediate variable vector s subvector matrix S
v;
According to described matrix M
vwith described matrix S
vthe element that middle index value is identical, determines an intermediary matrix R
v' in the element value of respective index value; Wherein, if matrix S
vin arbitrary element be less than 0, then get the complement of this arbitrary element and be added with this arbitrary element, using addition result as matrix R
v' in be worth the value of identical element with described arbitrary element index; If matrix S
vin arbitrary element equal 0, then this arbitrary element is added with 0, using addition result as matrix R
vin be worth the value of identical element with described arbitrary element index; If matrix S
vin arbitrary element >0, then in matrix M
vin get and be worth identical element with described arbitrary element index and be added with described arbitrary element, using addition result as matrix R
vin be worth the value of identical element with described arbitrary element index; Described operation of comparing and be added is undertaken by the mode of SIMD;
By SIMD mode by described matrix R
v' and matrix T
vthe element that middle index value is identical subtracts each other, using result as check-node vector r subvector matrix R
vthe element value of middle same index value; By described matrix R
vin front Z element of every row read the vectorial r subvector of composition check-node successively according to the mode of row major.
Preferably, be worth distribution described in most to comprise:
Described matrix T is determined by the mode of SIMD
vin minimum value of each row and sub-minimum and line index corresponding to minimum value; The minimum value obtained and sub-minimum are revised, all deducts default correction value β, when revised minimum value and sub-minimum are less than 0, are set to 0, otherwise remain unchanged;
According to described matrix T
vin the current minimum value of each row, sub-minimum and line index corresponding to minimum value, structure value variable vector m subvector matrix M
vthe row of middle same index, wherein, at M
varbitrary row in, be set to the element of the corresponding identical line index of current minimum value the minimum value determined, all the other elements be set to sub-minimum.
Preferably, the described mode by SIMD determines that the mode of each minimum value arranged and sub-minimum and corresponding line index comprises:
By described matrix T
veach row element be divided into
individual sub-block, each sub-block comprises W base unit; In more described matrix T
vin the element of any two row time, compare W base unit by the mode of SIMD is disposable.
Preferably, described calculating intermediate variable vector s subvector matrix S
vcomprise:
For matrix T
vin each row, this row all elements is carried out xor operation, then by result and i-th ' row element XOR after to carry out with 0x7f or operate, general or operating result are as intermediate vector matrix S
vmiddle same index row i-th ' row element; Wherein, by described matrix T
veach row element be divided into
individual sub-block, each sub-block comprises W base unit, when carrying out XOR/or operation, by XOR/or the operation of the disposable execution W base unit of the mode of SIMD.
Preferably, calculating and H
i,jcorresponding variable node vector q subvector comprises:
Determine H
i,jcorresponding vectorial original position Z* (n-1)+H
i,n, by SIMD mode by H
i,jcorresponding temporary variable vector t subvector and H
i,jcorresponding check-node vector r subvector is added, and original position described in result vector is played length is
or the data of 5 are copied to and H by the mode of SIMD
i,jthe beginning of corresponding variable node vector q subvector; At H
i,n≠ 0, H
i,n≠ '-' and (Z-H
i,n) modW ≠ 0 time, determine matrix M
ldpcAssemble1in with check matrix element H
i,jthe value of each element in corresponding row
and will with element H
i,jin the subvector of corresponding current vectorial q, call number is
each element copy to successively and H
i,jin the current location of corresponding variable node vector q subvector;
Determine each element H
i,jcorresponding secondary vector original position M
ldpcOffset2, described secondary vector original position is risen length be 0 or
data copied to and H by the mode of SIMD
i,jin the current location of corresponding variable node vector q subvector;
According to the cover number of LdpcRemain instruction, according to M
ldpcAssemble1in with check matrix element H
i,jin corresponding row, the value of element carries out cover.
Preferably, precalculate and preserve each element H
i,jcorresponding vectorial original position Z* (n-1)+H
i,nwith secondary vector original position M
ldpcOffset2, matrix M
ldpcAssemble1, the vectorial V that forms of the number of often going non-"-" element in check matrix
ldpcRowLength, M
ldpcAssemble1, LdpcRemain.
As seen from the above technical solution, the LDPC coding and decoding method in the application, can improve coding and decoding speed by SIMD instruction, multithreading and the mode such as to prestore.
Accompanying drawing explanation
Fig. 1 is the generative process schematic diagram of LDPC PPDU;
Fig. 2 is the encoder composition schematic diagram of LDPC cataloged procedure;
Fig. 3 is existing ldpc decoder schematic diagram;
Fig. 4 is the overview flow chart of coding method in the application;
Fig. 5 is that in the application LDPC coded treatment, compute codeword verifies vectorial p
1computing schematic diagram;
Fig. 6 is that in the application LDPC coded treatment, compute codeword verifies vectorial p
2computing schematic diagram;
Fig. 7 is the structural representation optimizing multiplier 1;
Fig. 8 is the structural representation optimizing multiplier 2;
Fig. 9 is the structural representation optimizing adder;
Figure 10 is that an element in matrix A is multiplied with the transposition of vectorial S the schematic diagram processed;
Figure 11 is the process schematic diagram of step 5 in the application LDPC coding method;
Figure 12 is the process schematic diagram of step 5 in the application LDPC interpretation method;
Figure 13 is the idiographic flow use figure of the application LDPC interpretation method;
Figure 14 is for the schematic flow sheet being worth allocation process most that a sub-block is carried out in the application LDPC interpretation method;
Figure 15 is the structural representation being once worth most distributive operation in LDPC interpretation method;
Figure 16 is the schematic flow sheet calculating intermediate variable vector in LDPC interpretation method for a sub-block;
Figure 17 is the structural representation of an intermediate vector calculating in LDPC interpretation method.
Embodiment
In order to make the object of the application, technological means and advantage clearly understand, below in conjunction with accompanying drawing, the application is described in further details.
The application provides the LDPC coding method and interpretation method that are applicable to realize in general processor.Coding method in the application and interpretation method are described below in detail.
According to the LDPC PPDU generation method in IEEE 802.11n agreement, codeword vector after coding is designated as c=(S, p
1, p
2), wherein S is information vector, p
1and p
2for code word verification vector, check matrix H is simplified six parts
Be divided into matrix A, matrix B, matrix D, matrix E, matrix F and matrix T six submatrixs.Before carrying out decoding, need extraneous input code length L
lDPC, encoder bit rate R and information vector S.The application optimizes LDPC coding method as follows according to the characteristic of general processor (GPP) chip architecture:
1, SIMD (Single Instruction Multiple Data is adopted, single-instruction multiple-data stream (SIMD)) operation method coding method is optimized, its essential concept is the effect processing to obtain parallel processing within a clock cycle of CPU to multiple data, and similarly is not common occupation mode---each clock cycle only carries out a data processing operation.Wherein will relate to used general processor once can deal with data amount size, and suppose that this size is K bit, the fundamental unit size of SIMD process is k, then once-through operation can deal with data amount
2, the information adopting the method for linear search table to optimize check matrix stores, and check matrix H is split into six parts
Be divided into matrix A, matrix B, matrix D, matrix E, matrix F and matrix T six submatrixs, and generate six linear search tables with these six submatrixs, reduce computation complexity.
3, adopt the method for multithreading, with the line number of check matrix H for Thread Count, be namely a thread with data line process in check matrix H, the process performing multiple thread at one time operates, and then the disposed of in its entirety performance of elevator system.
Fig. 4 is the general flow chart of coding method in the application, wherein, this coding method based on algorithm principle identical with current LDPC coding method, the specific implementation that difference is for coding method in general processor.Idiographic flow is as follows:
1, according to 802.11n agreement, different code length L
lDPCand different coding code check R correspond to different check matrix H.First, according to code length L
lDPCand encoder bit rate R, extract corresponding check matrix H, and initialization carried out to following parameters:
1.1 generator matrix A, matrix B, matrix D, matrix E, matrix F and matrix T
-1six submatrixs, wherein ()
-1for inverse of a matrix, and it is stored in successively in linear search table, with the particular location of the method mark desired data of side-play amount, exchanges computation complexity for internal memory, improve the data processing speed of LDPC code coding method.
1.2 sizes generating submatrix representated by each element in selected check matrix H are Z.As code length L
lDPCwhen=648, Z=27; As code length L
lDPCwhen=1296, Z=54; As code length L
lDPCwhen=1944, Z=81.
2, according to GPP chip characteristic, the method of multithreading is adopted to be optimized coding method, with the line number of check matrix H for Thread Count, data line wherein in check matrix H is that a thread process step 3 is to step 4, perform the process operation of multiple thread at one time, namely the same time carries out the process of multiple step 3 to step 4, and then the disposed of in its entirety performance of elevator system.Following step 3 and step 4, be the idiographic flow of single thread process.
3, compute codeword verifies vectorial p
1, multiplying wherein and add operation all adopt the operation method of SIMD to be optimized, and its computing schematic diagram is see Fig. 5, and idiographic flow is as follows:
3.1 matrix A are multiplied with the transposition of vectorial S, and result is vector, and vector length is equal with the line number of matrix A.
3.2 matrix T
-1be multiplied with the transposition of step 3.1 acquired results vector, result is vector, vector length and matrix T
-1line number equal.
3.3 matrix E are multiplied with the transposition of step 3.2 acquired results vector, and result is vector, and vector length is equal with the line number of matrix E.
3.4 matrix F are multiplied with the transposition of vectorial S, and result is vector, and vector length is equal with the line number of matrix F.
3.5 step 3.3 acquired results vectors and step 3.4 acquired results addition of vectors, result is code word and verifies vectorial p
1.
4, compute codeword verifies vectorial p
2, multiplying wherein and add operation all adopt the operation method of SIMD to be optimized, and its computing schematic diagram is see Fig. 6, and idiographic flow is as follows:
4.1 matrix B and vectorial p
1transposition be multiplied, result be vector, vector length is equal with the line number of matrix B.
4.2 step 3.1 acquired results vectors and step 4.1 acquired results addition of vectors, result is vector.
4.3 matrix T
-1be multiplied with the transposition of step 4.2 acquired results vector, result is code word and verifies vectorial p
2.
5, LDPC code word vector c is assembled:
By gained vector according to S, p
1, p
2sequential storage, obtain LDPC code word vector c=(S, p
1, p
2).
In the coding method of above-mentioned the application, relate to two kinds of optimization multipliers and a kind of optimization adder, become respectively and optimize multiplier 1, optimize multiplier 2 and optimize adder.According to GPP chip architected features, optimize the operation method relating to the available SIMD of part of parallel work-flow in multiplier 1, optimization multiplier 2 and optimization adder and be optimized.Be introduced one by one below.
Optimize multiplier 1 and have two inputs output, optimize multiplier 1 schematic diagram see Fig. 7, in the step 3.1 of Optimized Coding, step 3.3, step 3.4 and step 4.1, involved matrix and multiplication of vectors computing all use and optimize multiplier 1.For step 3.1, the input optimizing multiplier 1 is the transposition of matrix A and vectorial S, and its specific implementation flow process is as follows:
1, judge whether the maximum number of lines reaching matrix A, if reach, then complete this operation; If do not reach, then carry out step 2.
2, the submatrix of the Z*Z in matrix A representated by first element is multiplied with the transposition of vectorial S.Because this submatrix is that the unit matrix of a Z*Z is through A
1,1(A
1,1the element of representing matrix A the first row first row) result behind secondary ring shift left position, be equivalent to carry out A to vectorial S so this submatrix is multiplied with the transposition of vectorial S
1,1secondary circulative shift operation.This operation can carry out SIMD optimization, and see Figure 10, concrete operations flow process is as follows:
2.1 calculate
part 2 data length=(Z-A
1,1) (it is Z-A to modW
1,1to W delivery), and desired data initial value position=information vector s original position+A
1,1.
Desired data initial value position is risen by 2.2
the data copy of length is as the start position data of intermediate data;
2.3 couples of remaining (Z-A
1,1) a modW data carry out displacement copy, and " remainder " and " cover " in Figure 10 to be copied in output.In order to adapt to SIMD computing, input vector length is Z, and the large young pathbreaker of output vector is
but in output vector, only have front Z element to be wherein valid data, and result data is stored in result vector register.
3, judge whether the maximum number of column reaching matrix A, if reach, then return step 1; If do not reach, then carry out step 4.
4, carry out the submatrix of the Z*Z in matrix A representated by next element to be multiplied with the transposition of vectorial S, concrete steps are with step 2.1,2.2,2.3, but transposition multiplied result does not need stored in result vector register, and perform step 5.
5, step 4 result of calculation be added with element in result vector register, two binary numbers are equivalent to two numbers and carry out xor operation, now can carry out SIMD optimization, and namely once-through operation can obtain W result, see Figure 11, and wherein (a
1, a
2..., a
w) represent step 4 result of calculation, (b
1, b
2..., b
w) representing element in result vector register, rectangle frame represents exclusive-OR operator, (y
1, y
2..., y
w) represent the result after computing, namely
And be stored in result vector register, return step 3.
Optimize multiplier 2 and have two inputs output, optimize multiplier 2 schematic diagram see Fig. 8, matrix involved in the step 3.2 and step 4.3 of Optimized Coding and multiplication of vectors computing all use optimizes multiplier 2.Optimizing multiplier 2 is the special circumstances optimizing multiplier 1, and optimizing one of them input of multiplier 2 is matrix T
-1, under different check matrix H, matrix
matrix T
-1the element in lower triangle is only had to be effective value.For step 3.2, the input optimizing multiplier 2 is matrix T
- 1with the transposition of step 3.1 acquired results vector, its specific implementation flow process is as follows:
1, judge whether the maximum number of lines reaching matrix A, if reach, then complete this operation; If do not reach, then carry out step 2.
2, have
can find out, matrix T
-1upper triangle element is 0,
After the submatrix of the Z*Z representated by element " 0 " is multiplied with the transposition of step 3.1 acquired results vector in Fig. 4, result is still the latter, so to matrix T
-1element " 0 " often in row is multiplied with the transposition of step 3.1 acquired results vector in Fig. 4, and according to the step 5 optimized in multiplier 1, is added by acquired results, returns step 1.
Optimize adder and have two inputs output, two inputs are vector, optimize adder schematic diagram see Figure 12, vector involved in the step 3.5 and step 4.2 of Optimized Coding and addition of vectors computing all use optimization adder.For step 3.5, the input optimizing adder is step 3.3 acquired results vector and step 3.4 acquired results vector, because input is binary number, two binary number additions equal two binary numbers and do xor operation, now SIMD optimization can be carried out, namely once-through operation can obtain W result, see Figure 11, and wherein (a
1, a
2..., a
w) represent step 3.3 acquired results vector, (b
1, b
2..., b
w) representing step 3.4 acquired results vector, rectangle frame represents exclusive-OR operator, (y
1, y
2..., y
w) represent the result after computing, namely
The above-mentioned idiographic flow being LDPC coding method in the application.The interpretation method of the application to existing decoder is optimized, and optimizes interpretation method particular flow sheet see Figure 13.Before carrying out decoding, the external world need input code length L
lDPC, encoder bit rate R, coding after codeword vector c, maximum iteration time I and revise side-play amount β.Characteristic according to GPP chip framework is optimized as follows to LDPC interpretation method:
1, the operation method of SIMD is adopted to be optimized coding method, its essential concept is the effect processing to obtain parallel processing within a clock cycle of CPU to multiple data, and similarly is not common occupation mode---each clock cycle only carries out a data processing operation.Wherein will relate to used general processor once can deal with data amount size, and suppose that this size is K bit, the fundamental unit size of SIMD process is k, then once-through operation can deal with data amount
2, the part run optimized in interpretation method adopts the method for multithreading, with the line number of check matrix H for Thread Count, namely be a thread with the data line processed in check matrix H, perform the process operation of multiple thread at one time, and then the disposed of in its entirety performance of elevator system.
Below introduce the idiographic flow of interpretation method in the application, wherein, the general framework of interpretation method is identical with current interpretation method, specifically comprises: receive encoded LDPC code word signal c, determine check matrix H; Calculate variable node vector q as decode results by successive ignition, during each iteration, calculating temporary variable vector according to current variable node vector q and check-node vector r is
and upgrade check-node vector r according to the vectorial t of temporary variable, then according to check-node vector r and temporary variable vector t renewal variable node vector q be
the interpretation method difference with the prior art that the application provides is, the specific implementation in general processor is different.Concrete operation step is as follows:
1, according to 802.11n agreement, different code length L
lDPCand different coding code check R correspond to different check matrix H.First, according to code length L
lDPCand encoder bit rate R, extract corresponding check matrix H, parameters is carried out initialization, and it is stored in successively in linear search table, with the particular location of the method mark desired data of side-play amount, exchange computation complexity for internal memory, improve the data processing speed of LDPC code interpretation method:
1.1Z, in check matrix H namely, the size of submatrix representated by each element, is variable.As code length L
lDPCwhen=648, Z=27; As code length L
lDPCwhen=1296, Z=54; As code length L
lDPCwhen=1944, Z=81.
1.2LdpcRowNum, namely the line number of selected check matrix H, is variable.Work as code check
time, LdpcRowNum=12; Work as code check
time, LdpcRowNum=8; Work as code check
time, LdpcRowNum=6; Work as code check
time, LdpcRowNum=4.
1.3V
ldpcRowLength, namely often go the number of non-"-" element in selected check matrix H, the vector of to be length be 1*LdpcRowNum.
1.4LdpcBufferNum, namely storing register number needed for the selected each submatrix data of check matrix H, is variable.Its operational formula is
1.5LdpcRemain, i.e. one of variable needed for step 9 are variable.As code length L
lDPCwhen=648, LdpcRemain=11; As code length L
lDPCwhen=1296, LdpcRemain=6; As code length L
lDPCwhen=1944, LdpcRemain=1.
1.6LdpcRoundNum, i.e. one of variable needed for step 9 are variable.As code length L
lDPCwhen=648, LdpcRoundNum=27; As code length L
lDPCwhen=1296, LdpcRoundNum=22; As code length L
lDPCwhen=1944, LdpcRoundNum=17.
1.7V
ldpcRowBuffer, namely store selected check matrix H and often go register number needed for non-"-" data, the vector of to be length be 1*LdpcRowNum.Its operational formula is V
ldpcRowBuffer(v)=V
ldpcRowLength(v) * LdpcBufferNum (wherein V
ldpcRowBufferv () represents v the element of vectorial LdpcRowBuffer, v correspond to the line number selecting check matrix H, as follows in like manner).
1.8M
ldpcOffset1, one of cycle offset namely calculated according to selected check matrix H, for step 4 and step 9, is LdpcRowNum*max (V
ldpcRowLength(v)) matrix (wherein max (V
ldpcRowLength(v)) represent amount of orientation V
ldpcRowLengththe maximum of middle element, as follows in like manner).Its operational formula is (M
ldpcOffset1)
i,j=Z* (n-1)+H
i,n, wherein n represents the columns in selected check matrix H, and the position that the corresponding relation of j and n is selected check matrix H i-th row jth this check matrix H of non-"-" element place is that the i-th row n-th arranges, as follows in like manner.
1.9M
ldpcRound1, one of cycle-index namely calculated according to selected check matrix H, for step 4, is LdpcRowNum*max (V
ldpcRowLength(v)) matrix.Its operational formula is for work as H
i,n≠ 0 and H
i,n≠ '-' time,
work as H
i,n=0 and H
i,n≠ '-' time, (M
ldpcRound1)
i,j=6.
1.10M
ldpcAssemble1, what namely calculate according to selected check matrix H supplies one of offset flag position, for step 4, is LdpcRowNum*max (V
ldpcRowLength(v)) matrix.Its operational formula is for work as H
i,n=0 and H
i,n≠ '-' time, (M
ldpcAssemble1)
i,j=0; Work as H
i,n≠ 0, H
i,n≠ '-' and (Z-H
i,n) modW=0 time, (M
ldpcAssemble1)
i,j=0; Work as H
i,n≠ 0, H
i,n≠ '-' and (Z-H
i,n) modW ≠ 0 time, (M
ldpcAssemble1)
i,j=1.
1.11M
ldpcAssembleTable1, namely calculate circulation according to selected check matrix H and supply side-play amount, for step 4 and step 9, for (
matrix (wherein
for all elements in compute vector LdpcRowLength and),
1.12M
ldpcOffset2, one of cycle offset namely calculated according to selected check matrix H, for step 4 and step 9, is LdpcRowNum*max (V
ldpcRowLength(v)) matrix.Its operational formula is for working as (M
ldpcAssemble1)
i,jwhen=0, (M
ldpcOffset2)
i,j=Z* (n-1)+[W-Z+H
i,n+ (M
ldpcRound1)
i,j* W
]; As (M
ldpcAssemble1)
i,jwhen=1, (M
ldpcOffset2)
i,j=Z* (n-1).
1.13M
ldpcRound2, one of cycle-index namely calculated according to selected check matrix H, for step 4, is LdpcRowNum*max (V
ldpcRowLength(v)) matrix.Its operational formula is
1.14M
ldpcRound3, one of cycle-index namely calculated according to selected check matrix H, for step 9, is LdpcRowNum*max (V
ldpcRowLength(v)) matrix.Its operational formula is for work as H
i,n≠ 0 and H
i,n≠ '-' time,
work as H
i,n=0 and H
i,n≠ '-' time, (M
ldpcRound3)
i,j=5.
1.15M
ldpcAssemble2, what namely calculate according to selected check matrix H supplies one of offset flag position, for step 9, and same M
ldpcAssemble1.
1.16M
ldpcRound4, one of cycle-index namely calculated according to selected check matrix H, for step 9, is LdpcRowNum*max (V
ldpcRowLength(v)) matrix.Its operational formula is as i=0, (M
ldpcRound4)
i,j=0; When i ≠ 0,
2, judge whether to reach maximum iteration time I.If do not reach maximum iteration time I, then carry out step 3; If reach maximum iteration time I, then decoding terminates.
3, according to GPP chip characteristic, the method of multithreading is adopted to be optimized interpretation method, with the line number of check matrix H for Thread Count, data line wherein in check matrix H is that a thread process step 4 is to step 9, perform the process operation of multiple thread at one time, namely the same time carries out the process of multiple step 4 to step 9, and then the disposed of in its entirety performance of elevator system.Following step 4, to step 9, is the idiographic flow of single thread process.
4, calculate temporary variable vector t, to be length be for it
vector.Wherein, this temporary variable vector comprises subvector corresponding to a line every with check matrix, and its call number is
arrive
this subvector comprises again and each non-"-" element H
i,jcorresponding subvector.(, only has non-"-" the element H in check matrix here
i,jhave corresponding subvector, "-" element in check matrix does not all have corresponding subvector in temporary variable vector, the vectorial r of check-node and variable node vector q.) particularly, with H
i,jthe computing formula of corresponding subvector t' is
namely the value of temporary variable vector t subvector t' be in variable node vector q with element H
i,jcorresponding subvector q' according to after check matrix H i-th row jth column element value cyclic shift with check-node vector r in element H
i,jthe difference of corresponding subvector r'.If be now first time interative computation, variable node vector q is the codeword vector c after LDPC coding, and check-node vector r is initial condition, and now the computing formula of temporary variable vector t subvector t' is
namely temporary variable vector t subvector t' is that variable node vector q subvector q' is according to result after check matrix H i-th row jth column element value cyclic shift.In order to adapt to SIMD computing, input variable knot vector q subvector q' and check-node vector r subvector r' length are Z, and exporting the large young pathbreaker of temporary variable vector t subvector t' is
but in output subvector, only have front Z element to be wherein valid data.The computing of temporary variable vector t is often gone non-"-" element number with selected check matrix H and is circulated, and such as selected check matrix H i-th row jth non-"-" element will calculate the of temporary variable vector t
arrive
the element of position.Its concrete calculation procedure is as follows:
4.1 according to M
ldpcOffset1the cycle offset that selected check matrix H i-th row jth non-"-" element is corresponding is found out in matrix, the initial value position of desired data is found out, the cycle offset of the initial value position+correspondence of the initial value position=variable node vector q subvector q' of desired data in variable node vector q.
4.2 according to M
ldpcRound1matrix finds out cycle-index corresponding to selected check matrix H i-th row jth non-"-" element, by (M behind the initial value position of desired data
ldpcRound1)
i,j* W data copy in the current location of temporary variable vector t subvector t', and current location here refers to the original position of not yet copies data in subvector.
4.3 according to M
ldpcAssemble1matrix, judges whether to need to carry out padding operation.If (M
ldpcAssemble1)
i,j=1, then according to M
ldpcAssembleTable1side-play amount indicated in matrix carries out padding operation; If (M
ldpcAssemble1)
i,j=0, then do not need padding operation.Concrete padding operation comprises: determine matrix M
ldpcAssemble1in with check matrix element H
i,jthe value of each element in corresponding row
and will with element H
i,jin the subvector of corresponding current vectorial q, call number is
each element copy to successively and H
i,jin the current location of corresponding temporary variable vector t subvector; Wherein,
4.4 according to M
ldpcOffset2the cycle offset that selected check matrix H i-th row jth non-"-" element is corresponding is found out in matrix, the initial value position of desired data is found out, the cycle offset of the initial value position+correspondence of the initial value position=variable node vector q subvector q' of desired data in variable node vector q.
4.5 according to M
ldpcRound2matrix finds out cycle-index corresponding to selected check matrix H i-th row jth non-"-" element, by (M behind the initial value position of desired data
ldpcRound2)
i,j* W data copy in the current location of temporary variable vector t subvector t'.
If 4.6 now non-first time interative computations, then need to carry out temporary variable vector t subvector t'=temporary variable vector t subvector t'-check-node vector r subvector r' computing, being divided into W element by temporary variable vector t subvector t' and check-node vector r subvector r' all elements is one group, now SIMD optimization can be carried out, namely once-through operation can obtain the element in W temporary variable vector t, see Figure 11, wherein (a
1, a
2..., a
w) represent element in one group of temporary variable vector t, (b
1, b
2..., b
w) representing element in one group of check-node vector r, rectangle frame represents subtraction operator, (y
1, y
2..., y
w) represent the result after computing, i.e. (y
1, y
2..., y
w)=(a
1-b
1, a
2-b
2..., a
w-b
w); If be now first time interative computation, then carry out step 5.
5, the absolute value of all elements in temporary variable vector t subvector t' is calculated, and temporary variable vector after being designated as delivery | t|.Be divided into W element by temporary variable vector t subvector t' to be one group, now can to carry out SIMD optimization, namely once-through operation can obtain the element in W temporary variable vector t subvector t', see Figure 12, and wherein (c
1, c
2..., c
w) representing element in one group of temporary variable vector t subvector t', rectangle frame represents modulo operation device, (y
1, y
2..., y
w) represent temporary variable vector after delivery | t|, i.e. (y
1, y
2..., y
w)=(| c
1|, | c
2| ..., | c
w|).It can be used as the vector of the temporary variable after renewal t subvector t'.
6, be worth distributive operation most, and result be stored in value variable vector m matrix M.This process circulates with selected check matrix H line number, namely each to the V in temporary variable vector t
ldpcRowLengthv () * W*LdpcBufferNum element operates, obtaining length is V
ldpcRowLengthv the result of () * W*LdpcBufferNum is stored in value variable vector m.Once be worth most distributive operation schematic diagram see Figure 15, write as V by with the check matrix temporary variable vector t subvector t' that often row is corresponding
ldpcRowLength(v) row and
the matrix T of row
v, wherein, matrix T
veach behavior temporary variable vector t subvector t' in element H
i,jcorresponding subvector, carries out cover when columns is inadequate; Most be worth distributive operation and compute matrix T
vin the minimum value of every column element and sub-minimum, distribute, and result be stored in value variable vector m subvector matrix M
v.Matrix T
voften row has LdpcBufferNum sub-block, has W base unit in each sub-block; Contrast the size of base unit in each row, draw minimum value wherein and sub-minimum, and record the line number of the place line number of this minimum value, i.e. index value; Variable vector m will be worth most according to index value to fill, if index value is different from the line number being worth most variable vector m, then insert the minimum value found out being worth most variable vector m, if index value is identical with the line number being worth most variable vector m, then insert the sub-minimum found out being worth most variable vector m.To find out the minimum value sub-minimum of a sub-block, its flow chart is see Figure 14, and concrete steps are as follows:
6.1 comparator matrix T
vthe size of the first row and corresponding base unit in first sub-block of the second row, by the line number of smaller value stored in index value, and smaller value is recorded as minimum value, higher value is recorded as sub-minimum, now SIMD optimization can be carried out, carry out twice computing, once get maximum, draw higher value between the two, once go minimum value, draw smaller value between the two, see Figure 11, wherein (a
1, a
2..., a
w) represent the element of the first row first sub-block, (b
1, b
2..., b
w) representing the element of the second row first sub-block, rectangle frame represents to be got maximum operation device or gets minimum operation device, (y
1, y
2..., y
w) represent the result after computing, i.e. (y
1, y
2..., y
w)=(max (a
1, b
1), max (a
2, b
2) ..., max (a
w, b
w)) or (y
1, y
2..., y
w)=(min (a
1, b
1), min (a
2, b
2) ..., min (a
w, b
w)).
6.2 judge whether to reach maximum cycle V
ldpcRowLengthv (), if do not reach, then carry out step 6.3; If reach, then carry out step 6.6.
6.3 by matrix T
vnext line first sub-block and the minimum value of precedence record carry out getting maxima operation, this operation can carry out SIMD optimization, with step 6.1.
The sub-minimum of 6.4 results step 6.3 obtained and current record carries out getting minimum value and operates, and this operation can carry out SIMD optimization, with step 6.1, and result is designated as sub-minimum.
The minimum value of 6.5 results step 6.3 obtained and current record is carried out getting minimum value and is operated, and this operation can carry out SIMD optimization, with step 6.1, and result is designated as minimum value, the line number of this minimum value is recorded as index value simultaneously, returns step 6.2.
The minimum value of current record and sub-minimum are all deducted correction value β by 6.6, and this operation can carry out SIMD optimization, see Figure 11, and wherein (a
1, a
2..., a
w) represent the minimum value of precedence record or sub-minimum, (b
1, b
2..., b
w) represent correction value β also can be expressed as (β, β ..., β), rectangle frame represents subtraction operator, (y
1, y
2..., y
w) represent the result after computing, i.e. (y
1, y
2..., y
w)=(a
1-β, a
2-β ..., a
w-β), and be minimum value or sub-minimum by outcome record.
Minimum value and the sub-minimum of 6.7 pairs of current records are revised, and when the minimum value of current record or sub-minimum are less than zero, this value are set to zero, otherwise do not operate, and this operation can carry out SIMD optimization, see Figure 11, and wherein (a
1, a
2..., a
w) represent the minimum value of current record or sub-minimum, (b
1, b
2..., b
w) represent null value also can be expressed as (0,0 ..., 0), rectangle frame represents correction arithmetic unit, (y
1, y
2..., y
w) represent the result after computing, namely
And be minimum value or sub-minimum by outcome record.
6.8 will be worth variable vector m most according to index value fills, if index value be worth most variable vector m subvector matrix M
vline number different, then in matrix M
vsame position insert the minimum value of current record, if index value be worth most variable vector m subvector matrix M
vline number identical, be then worth most variable vector m subvector matrix M
vsame position insert the sub-minimum of current record.By matrix M
vin element be worth variable vector m subvector most according to the formation that sequentially reads of row major.
7, intermediate variable vector s is calculated.This process circulates with selected check matrix H line number, namely each to the V in temporary variable vector t
ldpcRowLengthv () * W*LdpcBufferNum element operates, obtaining length is V
ldpcRowLengthv the result of () * W*LdpcBufferNum is stored in intermediate variable vector s.Temporary variable vector t, see Figure 17, is divided into V by an intermediate variable vector s computing schematic diagram
ldpcRowLengthv () row, often row has LdpcBufferNum sub-block, has W base unit in each sub-block.To calculate the intermediate variable vector s of a sub-block, its flow chart is see Figure 16, and concrete operations flow process is as follows:
The first row first sub-block of temporary variable vector t and the second row first sub-block are carried out xor operation by 7.1, and this operation can carry out SIMD optimization, see Figure 11, and wherein (a
1, a
2..., a
w) represent the first row first sub-block, (b
1, b
2..., b
w) representing the second row first sub-block, rectangle frame represents exclusive-OR operator, (y
1, y
2..., y
w) represent the result after computing, namely
7.2 judge whether to reach maximum cycle V
ldpcRowLengthv (), if do not reach, then carry out step 7.3; If reach, then carry out step 7.4, and perform from the first row.
Temporary variable vector next line first sub-block of t and the result of step 7.1 are carried out xor operation by 7.3, and this operation can carry out SIMD optimization, see Figure 11, and wherein (a
1, a
2..., a
w) represent next line first sub-block, (b
1, b
2..., b
w) representing the result of step 7.1, rectangle frame represents exclusive-OR operator, (y
1, y
2..., y
w) represent the result after computing, namely
Return step 7.2.
7.4 judge whether to reach maximum cycle V
ldpcRowLengthv (), if do not reach, then carry out step 7.5; If reach, then carry out step 8.
Temporary variable vector current line first sub-block of t and the result of step 7.3 are carried out xor operation by 7.5, and this operation can carry out SIMD optimization.
7.6 by the result of step 7.5 with
carry out or operate, this operation can carry out SIMD optimization, see Figure 11, and wherein (a
1, a
2..., a
w) represent the result of step 7.5, (b
1, b
2..., b
w) represent
rectangle frame represents or arithmetic unit, (y
1, y
2..., y
w) represent the result after computing, i.e. (y
1, y
2..., y
w)=(a
1| b
1, a
2| b
2..., a
w| b
w), and by result stored in intermediate variable vector s the first sub-block in, return step 7.4.
8, calculation check knot vector r, to be length be for it
vector.Wherein, this check-node vector comprises subvector corresponding to a line every with check matrix, and its call number is
arrive
this subvector comprises again and each non-"-" element H
i,jcorresponding subvector.The process of calculation check knot vector r circulates with selected check matrix H line number, namely each to the V in temporary variable vector t
ldpcRowLengthv () * W*LdpcBufferNum element operates, obtaining length is V
ldpcRowLengthv the result of () * W*LdpcBufferNum is stored in check-node vector r.When the calculation check matrix subvector that often row is corresponding calculates, with each non-"-" element H
i,jcorresponding subvector is that unit carries out, and calculating index is
arrive
vector in element.Below to calculate one and non-"-" element H
i,jcorresponding check-node vector r subvector r' is example, and its concrete operations flow process is as follows:
8.1 judge whether to reach maximum cycle V
ldpcRowLengthv (), if do not reach, then carry out step 8.2; If reach, then carry out step 9.
8.2 will be worth variable vector m and H most
i,jcorresponding subvector and intermediate variable vector s and H
i,jcorresponding subvector carries out contrast operation, if intermediate variable vector s is less than zero, then result is the complement asked the value being worth most variable vector m, if intermediate variable vector s equals zero, then result is zero, if intermediate variable vector s is greater than zero, then result is to the value being worth most variable vector m.This operation can carry out SIMD optimization, see Figure 12, and wherein (a
1, a
2..., a
w) represent value variable vector m, (b
1, b
2..., b
w) representing intermediate variable vector s, rectangle frame represents contrast arithmetic unit, (y
1, y
2..., y
w) represent the result after computing, namely
The result of step 8.2 is added with intermediate variable vector s by 8.3.This operation can carry out SIMD optimization, see Figure 11, and wherein (a
1, a
2..., a
w) represent the result of step 8.2, (b
1, b
2..., b
w) representing intermediate variable vector s, rectangle frame represents adder calculator, (y
1, y
2..., y
w) represent the result after computing, i.e. (y
1, y
2..., y
w)=(a
1+ b
1, a
2+ b
2..., a
w+ b
w).
The result of step 8.3 and temporary variable vector t subvector t' are subtracted each other by 8.4.This operation can carry out SIMD optimization, see Figure 11, and wherein (a
1, a
2..., a
w) represent the result of step 8.3, (b
1, b
2..., b
w) representing temporary variable vector t, rectangle frame represents adder calculator, (y
1, y
2..., y
w) represent the result after computing, i.e. (y
1, y
2..., y
w)=(a
1-b
1, a
2-b
2..., a
w-b
w), and result is stored in check-node vector r, return step 8.1.
9, calculate variable node vector q, to be length be for it
vector.Wherein, this variable node vector comprises subvector corresponding to a line every with check matrix, and its call number is
arrive
this subvector comprises again and each non-"-" element H
i,jcorresponding subvector q'.The computing formula of variable node vector q subvector q' is
namely the value of variable node vector q subvector q' be temporary variable vector t subvector t' value with check-node vectorial r subvector r' value with, and according to the result after check matrix H i-th row jth column element value cyclic shift.In order to adapt to SIMD computing, input temporary variable vector t subvector t' and check-node vector r subvector r' length are Z, and the large young pathbreaker of output variable knot vector q subvector q' is
but in output subvector, only have front Z element to be wherein valid data.The computing of variable node vector q is often gone non-"-" element number with selected check matrix H and is circulated, and such as selected check matrix H i-th row jth non-"-" element will calculate the of variable node vector q
arrive
the element of position.Its concrete calculation procedure is as follows:
9.1 according to M
ldpcOffset1the cycle offset that selected check matrix H i-th row jth non-"-" element is corresponding is found out in matrix, required original position is found out, the cycle offset of the initial value position+correspondence of required original position=variable node vector q subvector q' in variable node vector q.
Temporary variable vector t subvector t' is added with check-node vector r subvector r' by 9.2.This operation can carry out SIMD optimization, see Figure 11, and wherein (a
1, a
2..., a
w) represent temporary variable vector t, (b
1, b
2..., b
w) representing check-node vector r, rectangle frame represents adder calculator, (y
1, y
2..., y
w) represent the result after computing, i.e. (y
1, y
2..., y
w)=(a
1+ b
1, a
2+ b
2..., a
w+ b
w).
9.3 according to M
ldpcRound3matrix finds out cycle-index corresponding to selected check matrix H i-th row jth non-"-" element, by (M behind the initial value position of step 9.2 result data
ldpcRound1)
i,j* W data copy the required original position found out in step 9.1 to.
9.4 according to M
ldpcAssemble2matrix, judges whether to need to carry out padding operation.If (M
ldpcAssemble2)
i,j=1, then according to M
ldpcAssembleTable1side-play amount indicated in matrix carries out padding operation; If (M
ldpcAssemble2)
i,j=0, then do not need padding operation.
9.5 according to M
ldpcOffset2find out the cycle offset that selected check matrix H i-th row jth non-"-" element is corresponding in matrix, in variable node vector q, find out required original position, the cycle offset of the initial value position+correspondence of required original position=variable node vector q.
9.6 according to M
ldpcRound4matrix finds out cycle-index corresponding to selected check matrix H i-th row jth non-"-" element, by (M behind the initial value position of step 9.2 result data
ldpcRound4)
i,j* W data copy the required original position found out in step 9.5 to.
9.7 cover number needed for indicated by LdpcRemain, then according to M
ldpcAssembleTable1side-play amount indicated in matrix carries out complement operation to variable node vector q surplus element, returns step 2.
The above-mentioned LDPC coding&decoding method be in the application.
The coding and decoding theory of LDPC code is comparatively ripe, but because LDPC code is the linear block codes that a kind of code length n is larger, check matrix H is also larger, algorithm complex is very high, traditional LDPC coding and decoding mode is not well positioned to meet the throughput requirement of IEEE 802.11n system, has largely had influence on the performance of system.In existing high speed wireless access system, the realization of LDPC code is mostly based on fpga chip and dsp chip.Although can be met the requirement of process and time delay in Modern High-Speed protocol of wireless local area network by previous methods, FPGA programming and professional DSP all more complicated, lack abundant programmed environment and debugging acid, applicability is general.And based on GPP chip, developer can use common computer to use abundant instrument to develop, as C/C++ environment under the structure be familiar with and environment.The innovative point of this patent be exactly in GPP chip to the LDPC code in high speed wireless access system when using original coder, the characteristic according to GPP chip is optimized coding and decoding method.LDPC code due to IEEE802.11n is irregular LDPC codes, and the nonnegative value number that its check matrix prototype is often gone is not necessarily identical, so phase LDPC code Encoding Realization method than ever, the flexibility of GPP chip can have great advantage.In addition, consider the high speed development of CPU (Central Processing Unit, central processing unit), the data-handling capacity of GPP chip also can constantly promote.
First, SIMD instruction is adopted to realize the parallel processing of data.SIMD instruction set, Intel CPU adopted in this patent also can be called SSE (Streaming SIMD Extensions, instruction set) instruction set, its essential concept is the effect processing to obtain parallel processing within a clock cycle of CPU to multiple data, and similarly is not common occupation mode---each clock cycle only carries out a data processing operation.For the CPU of Nehalem framework, its process bit wide is 128 bits, for the CPU of Sandy Bridge framework, its process bit wide is 256 bits, namely for 8 bit fixed point numbers, the former can process 16 data within an instruction cycle, and the latter can process 32 data within an instruction cycle, theoretically, degree of parallelism be respectively 16 times parallel and 32 times walk abreast.But can be known by actual procedure simulation result, desirable parallel multiple often cannot be reached in the system operation of reality, one side is because of program and non-fully is made up of data manipulation flow process, further comprises a large amount of judgement statements, and these judge that statement cannot carry out parallel work-flow simultaneously.On the other hand, if adopt the SIMD instruction of 128 bit bit wides, for the check matrix of IEEE 802.11n standard, each submatrix size is not the multiple of 16, and therefore during last group data processing of each submatrix size, degree of parallelism is less than 16.
Secondly, by adopting the method for look-up table, namely initialization is carried out to multiple known parameter, and mark the particular location of desired data by the method for side-play amount, exchange computation complexity for internal memory, improve the data processing speed of LDPC code coding and decoding method.In LDPC code encoder, the block matrix needed for coding under different code check and code length condition can be calculated in advance, and be stored in LUTs (Look-Up-Table, look-up table), as long as table is read in when program brings into operation, without the need to double counting.
Finally, have employed the method for multithreading, perform more than one thread at one time, and then the disposed of in its entirety performance of elevator system.In LDPC co mpiler optimization code method, to the operation wherein in units of check matrix data line, be optimized by the method for multithreading, Thread Count is the line number of check matrix.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.
Claims (12)
1. based on a LDPC coding method for general processor, comprising: obtain signal vector S to be encoded by signals collecting or reception, determine check matrix H and matrix in block form A, B, D, E, F and T, and preserve; According to
Determine vectorial p
1and p
2, and obtain coding result vector c=(S, the p of LDPC
1, p
2); It is characterized in that, describedly determine vectorial p
1and p
2arbitrary matrix of Shi Jinhang comprises with the process that is multiplied of arbitrary vector:
Using every a line of described arbitrary matrix as a thread, carry out the corresponding line of this matrix and the multiplication operations of described arbitrary vector, and the multiplied result of all row is combined formation result vector;
Wherein, every a line of described arbitrary matrix comprises with the multiplication operations of described arbitrary vector: the original position+A determining vectorial original position=described arbitrary vector that each element j of current i-th row of matrix is corresponding
i,j+ (j-1) * Z, by described arbitrary vector from described original position Z-A
i,jthe data of length are shifted left by the mode of single-instruction multiple-data stream (SIMD) SIMD, and A before described original position is started
i,jafter data after the data of length move to and shift left, obtain the vector shift result that described element j is corresponding; Again by vector shift results added corresponding for each element, as the multiplied result of described every a line and described arbitrary vector;
In the mode of described SIMD, will from described original position Z-A
i,jthe data of length are divided in units of length W
section is right
segment data is parallel carries out operation of shifting left, then by remaining (Z-A
i,j) data of modW length carry out operation of shifting left;
Z is the submatrix size of an element representative in described check matrix.
2. method according to claim 1, is characterized in that, when described arbitrary matrix is T
-1time, described T
-1every a line and the multiplication operations of corresponding vector time, only carry out T
-1value is being multiplied of element and the corresponding vector of 0, and obtaining this value is the vector shift result that 0 element is corresponding, and vector shift result corresponding for all the other elements is set to null vector; Again by vector shift results added corresponding for each element, as the multiplied result of described every a line and described arbitrary vector.
3. method according to claim 1 and 2, is characterized in that, shift left after operation to W segment data to get a front Z data be valid data simultaneously.
4. method according to claim 1 and 2, is characterized in that, describedly vector shift results added corresponding for each element is comprised: vector shift result corresponding for each element be divided in units of length W
section, by SIMD couple
segment data is parallel carries out phase add operation, then by remaining (Z-A
i,j) data of modW length carry out phase add operation.
5. method according to claim 1 and 2, is characterized in that, described matrix A, B, D, E, F and T
-1preserved by linear search table.
6. based on a LDPC interpretation method for general processor, comprising: receive encoded LDPC code word signal c, determine check matrix H; Calculate variable node vector q as decode results by successive ignition, during each iteration, calculating temporary variable vector according to current variable node vector q and check-node vector r is
and upgrade check-node vector r according to the vectorial t of described temporary variable, then according to check-node vector r and temporary variable vector t renewal variable node vector q be
during first iteration, using character signal c as variable node vector q, verification knot vector r is set to 0; It is characterized in that,
When each iterative computation temporary variable vector t, check-node vector r and variable node vector q, carry out computing and renewal using every a line of check matrix as a thread, obtain with often to go in corresponding vectorial t, q and r call number from
Arrive
Subvector; Wherein, i is the line index of check matrix, when the i-th row of corresponding described check matrix calculates temporary variable vector t, check-node vector r and variable node vector subvector corresponding to q, according to each non-"-" element H of this row of check matrix
i,jwith element H in corresponding compute vector t, q and r
i,jcorresponding call number from
Arrive
Subvector, then carry out successively connecting and obtain and often go corresponding subvector, during i=1, order
Calculate and H
i,jthe mode of corresponding temporary variable vector t subvector is: determine H
i,jcorresponding vectorial original position Z* (n-1)+H
i,n, original position described in vectorial q subvector corresponding for the i-th row is played length is
or the data of 6 are copied to and H by the mode of SIMD
i,jthe beginning of corresponding temporary variable vector t subvector; At H
i,n≠ 0, H
i,n≠ '-' and (Z-H
i,n) modW ≠ 0 time, determine matrix M
ldpcAssemble1in with check matrix element H
i,jthe value of each element in corresponding row
and will with element H
i,jin the subvector of corresponding current vectorial q, call number is
each element copy to successively and H
i,jin the current location of corresponding temporary variable vector t subvector; Determine each element H again
i,jcorresponding secondary vector original position M
ldpcOffset2, described secondary vector original position is played length is
data copied to and H by the mode of SIMD
i,jin the current location of corresponding temporary variable vector t subvector; Get and H
i,jfront Z position in corresponding temporary variable vector t subvector and take absolute value as with H
i,jeffective subvector of corresponding temporary variable vector t;
Work as H
i,n≠ 0, H
i,n≠ '-' and (Z-H
i,n) modW ≠ 0 time,
work as H
i,n=0 or H
i,n='-' or (Z-H
i,n) modW=0 time, (M
ldpcOffset2)
i,j=Z* (n-1);
k is that general processor once can deal with data amount size, and k is the fundamental unit size of SIMD process; Code length L
lDPCwhen=648, LdpcRemain=11; As code length L
lDPCwhen=1296, LdpcRemain=6; As code length L
lDPCwhen=1944, LdpcRemain=1; J is the index of each non-"-" element in this row all non-"-" element in the i-th row, and n is the i-th row jth column index of non-"-" element in check matrix.
7. method according to claim 6, is characterized in that, calculates with the mode of the check-node vector r subvector that often row is corresponding of check matrix to be:
Write as V by with the check matrix temporary variable vector t subvector that often row is corresponding
ldpcRowLength(v) row and W*
the matrix T of row
v, wherein, described matrix T
veach behavior described in temporary variable vector t subvector with element H
i,jcorresponding subvector, carries out cover when columns is inadequate;
To described matrix T
vbe worth most distribution, be worth variable vector m subvector matrix M most
v;
According to described matrix T
vcalculate intermediate variable vector s subvector matrix S
v;
According to described matrix M
vwith described matrix S
vthe element that middle index value is identical, determines an intermediary matrix R
v' in the element value of respective index value; Wherein, if matrix S
vin arbitrary element be less than 0, then get the complement of this arbitrary element and be added with this arbitrary element, using addition result as matrix R
v' in be worth the value of identical element with described arbitrary element index; If matrix S
vin arbitrary element equal 0, then this arbitrary element is added with 0, using addition result as matrix R
vin be worth the value of identical element with described arbitrary element index; If matrix S
vin arbitrary element >0, then in matrix M
vin get and be worth identical element with described arbitrary element index and be added with described arbitrary element, using addition result as matrix R
vin be worth the value of identical element with described arbitrary element index; Described operation of comparing and be added is undertaken by the mode of SIMD;
By SIMD mode by described matrix R
v' and matrix T
vthe element that middle index value is identical subtracts each other, using result as check-node vector r subvector matrix R
vthe element value of middle same index value; By described matrix R
vin front Z element of every row read the vectorial r subvector of composition check-node successively according to the mode of row major.
8. method according to claim 7, is characterized in that, described in be worth distribution most and comprise:
Described matrix T is determined by the mode of SIMD
vin minimum value of each row and sub-minimum and line index corresponding to minimum value; The minimum value obtained and sub-minimum are revised, all deducts default correction value β, when revised minimum value and sub-minimum are less than 0, are set to 0, otherwise remain unchanged;
According to described matrix T
vin the current minimum value of each row, sub-minimum and line index corresponding to minimum value, structure value variable vector m subvector matrix M
vthe row of middle same index, wherein, at M
varbitrary row in, be set to the element of the corresponding identical line index of current minimum value the minimum value determined, all the other elements be set to sub-minimum.
9. method according to claim 8, is characterized in that, the described mode by SIMD determines that the mode of each minimum value arranged and sub-minimum and corresponding line index comprises:
By described matrix T
veach row element be divided into
individual sub-block, each sub-block comprises W base unit; In more described matrix T
vin the element of any two row time, compare W base unit by the mode of SIMD is disposable.
10. method according to claim 7, is characterized in that, described calculating intermediate variable vector s subvector matrix S
vcomprise:
For matrix T
vin each row, this row all elements is carried out xor operation, then by result and i-th ' row element XOR after to carry out with 0x7f or operate, general or operating result are as intermediate vector matrix S
vmiddle same index row i-th ' row element; Wherein, by described matrix T
veach row element be divided into
individual sub-block, each sub-block comprises W base unit, when carrying out XOR/or operation, by XOR/or the operation of the disposable execution W base unit of the mode of SIMD.
11. methods according to claim 6, is characterized in that, calculate and H
i,jcorresponding variable node vector q subvector comprises:
Determine H
i,jcorresponding vectorial original position Z* (n-1)+H
i,n, by SIMD mode by H
i,jcorresponding temporary variable vector t subvector and H
i,jcorresponding check-node vector r subvector is added, and original position described in result vector is played length is
or the data of 5 are copied to and H by the mode of SIMD
i,jthe beginning of corresponding variable node vector q subvector; At H
i,n≠ 0, H
i,n≠ '-' and (Z-H
i,n) modW ≠ 0 time, determine matrix M
ldpcAssemble1in with check matrix element H
i,jthe value of each element in corresponding row
and will with element H
i,jin the subvector of corresponding current vectorial q, call number is
each element copy to successively and H
i,jin the current location of corresponding variable node vector q subvector;
Determine each element H
i,jcorresponding secondary vector original position M
ldpcOffset2, described secondary vector original position is risen length be 0 or
data copied to and H by the mode of SIMD
i,jin the current location of corresponding variable node vector q subvector;
According to the cover number of LdpcRemain instruction, according to M
ldpcAssemble1in with check matrix element H
i,jin corresponding row, the value of element carries out cover.
12., according to described method arbitrary in claim 6 to 11, is characterized in that, precalculate and preserve each element H
i,jcorresponding vectorial original position Z* (n-1)+H
i,nwith secondary vector original position M
ldpcOffset2, matrix M
ldpcAssemble1, the vectorial V that forms of the number of often going non-"-" element in check matrix
ldpcRowLength, M
ldpcAssemble1, LdpcRemain.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510026526.1A CN104617959B (en) | 2015-01-20 | 2015-01-20 | A kind of LDPC coding and decoding methods based on general processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510026526.1A CN104617959B (en) | 2015-01-20 | 2015-01-20 | A kind of LDPC coding and decoding methods based on general processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104617959A true CN104617959A (en) | 2015-05-13 |
CN104617959B CN104617959B (en) | 2017-09-05 |
Family
ID=53152273
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510026526.1A Active CN104617959B (en) | 2015-01-20 | 2015-01-20 | A kind of LDPC coding and decoding methods based on general processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104617959B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104967455A (en) * | 2015-07-09 | 2015-10-07 | 北京邮电大学 | Recursive encoding method of spatially-coupled low-density parity check codes |
CN105897278A (en) * | 2016-03-30 | 2016-08-24 | 联想(北京)有限公司 | Information processing method and storage device |
CN106921395A (en) * | 2015-12-28 | 2017-07-04 | 北京忆芯科技有限公司 | LDPC coding methods and its device |
CN108365849A (en) * | 2018-01-10 | 2018-08-03 | 东南大学 | The long LDPC code coding/decoding method of multi code Rate of Chinese character multi-code based on SIMD instruction collection |
CN108874744A (en) * | 2017-05-08 | 2018-11-23 | 辉达公司 | The broad sense of matrix product accumulating operation accelerates |
CN114667698A (en) * | 2019-12-25 | 2022-06-24 | 华为技术有限公司 | Check sum calculation method and circuit |
WO2022268064A1 (en) * | 2021-06-25 | 2022-12-29 | 华为技术有限公司 | Data transmission method and related apparatus |
US11816482B2 (en) | 2017-05-08 | 2023-11-14 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7480848B2 (en) * | 2006-02-10 | 2009-01-20 | The Directv Group, Inc. | Methods and apparatus to select tornado error correction parameters |
CN102932003A (en) * | 2012-09-07 | 2013-02-13 | 上海交通大学 | Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework |
US20130173956A1 (en) * | 2011-12-30 | 2013-07-04 | Streamscale, Inc. | Using parity data for concurrent data authentication, correction, compression, and encryption |
-
2015
- 2015-01-20 CN CN201510026526.1A patent/CN104617959B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7480848B2 (en) * | 2006-02-10 | 2009-01-20 | The Directv Group, Inc. | Methods and apparatus to select tornado error correction parameters |
US20130173956A1 (en) * | 2011-12-30 | 2013-07-04 | Streamscale, Inc. | Using parity data for concurrent data authentication, correction, compression, and encryption |
CN102932003A (en) * | 2012-09-07 | 2013-02-13 | 上海交通大学 | Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework |
Non-Patent Citations (3)
Title |
---|
DEBAPRIYA CHATTERJEE AND VALERIA BERTACCO: "EQUIPE:Parallel Equivalence Checking with GP-GPUs", 《COMPUTER DESIGN(ICCD),2010 IEEE INTERNATIONAL CONFERENCE ON》 * |
MARCO GOMES ET AL.: "SERIAL LDPC DECODING ON A SIMD DSP USING HORIZONTAL SCHEDULING", 《14TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2006),FLORENCE,ITALY》 * |
黄双渠 等: "基于SIMD结构的多标准LDPC译码器的VLSI实现", 《计算机研究与发展》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104967455B (en) * | 2015-07-09 | 2018-02-23 | 北京邮电大学 | The recursive encoding method of Space Coupling low density parity check code |
CN104967455A (en) * | 2015-07-09 | 2015-10-07 | 北京邮电大学 | Recursive encoding method of spatially-coupled low-density parity check codes |
CN106921395A (en) * | 2015-12-28 | 2017-07-04 | 北京忆芯科技有限公司 | LDPC coding methods and its device |
CN105897278A (en) * | 2016-03-30 | 2016-08-24 | 联想(北京)有限公司 | Information processing method and storage device |
US11797301B2 (en) | 2017-05-08 | 2023-10-24 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
CN108874744A (en) * | 2017-05-08 | 2018-11-23 | 辉达公司 | The broad sense of matrix product accumulating operation accelerates |
US11816481B2 (en) | 2017-05-08 | 2023-11-14 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
CN108874744B (en) * | 2017-05-08 | 2022-06-10 | 辉达公司 | Processor, method and storage medium for performing matrix multiply-and-accumulate operations |
US11816482B2 (en) | 2017-05-08 | 2023-11-14 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
US11797302B2 (en) | 2017-05-08 | 2023-10-24 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
US11797303B2 (en) | 2017-05-08 | 2023-10-24 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
CN108365849A (en) * | 2018-01-10 | 2018-08-03 | 东南大学 | The long LDPC code coding/decoding method of multi code Rate of Chinese character multi-code based on SIMD instruction collection |
CN108365849B (en) * | 2018-01-10 | 2021-03-09 | 东南大学 | Multi-code-rate multi-code-length LDPC code decoding method based on SIMD instruction set |
CN114667698A (en) * | 2019-12-25 | 2022-06-24 | 华为技术有限公司 | Check sum calculation method and circuit |
CN114667698B (en) * | 2019-12-25 | 2024-04-12 | 华为技术有限公司 | Checksum calculation method and circuit |
WO2022268064A1 (en) * | 2021-06-25 | 2022-12-29 | 华为技术有限公司 | Data transmission method and related apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN104617959B (en) | 2017-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104617959A (en) | Universal processor-based LDPC (Low Density Parity Check) encoding and decoding method | |
CN107145939B (en) | Computer vision processing method and device of low-computing-capacity processing equipment | |
CN111062472B (en) | Sparse neural network accelerator based on structured pruning and acceleration method thereof | |
CN101192833B (en) | A device and method for low-density checksum LDPC parallel coding | |
CN111162797B (en) | Encoding device and encoding method of rate compatible 5G LDPC code | |
CN112106078A (en) | Neural network processing element | |
CN109379086A (en) | The 5G LDPC coding method of the code-rate-compatible of low complex degree and encoder | |
CN107704916A (en) | A kind of hardware accelerator and method that RNN neutral nets are realized based on FPGA | |
CN107229967A (en) | A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA | |
CN107786211B (en) | Algebraic structure obtaining method, encoding method and encoder of IRA-QC-LDPC code | |
JP4534128B2 (en) | Encoding method and apparatus | |
CN111831254A (en) | Image processing acceleration method, image processing model storage method and corresponding device | |
CN112114776A (en) | Quantum multiplication method and device, electronic device and storage medium | |
CN110741557B (en) | Low delay polarization encoding and decoding by combining stages of polarization code patterns | |
CN101273532A (en) | Decoding device, and receiving device | |
US9928037B2 (en) | Modulo calculation using polynomials | |
US20120317466A1 (en) | Method and apparatus for data check processing | |
CN114063973B (en) | Galois field multiplier and erasure coding and decoding system | |
CN112039535A (en) | Code rate compatible LDPC encoder based on quasi-cyclic generator matrix | |
CN105099467B (en) | The coding method of QC-LDPC code and code device | |
CN101777922B (en) | High-speed and low-delay Berlekamp-Massey iteration decoding circuit for broadcast channel (BCH) decoder | |
CN113472358B (en) | High-speed parallel encoder based on quasi-cyclic generation matrix | |
CN100586029C (en) | A kind of coding method of structured odd-even check code and encoder thereof | |
CN110990776B (en) | Coding distributed computing method, device, computer equipment and storage medium | |
Xu et al. | An efficient CNN training accelerator leveraging transposable block sparsity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |