CN109446478A

CN109446478A - A kind of complex covariance matrix computing system based on iteration and restructural mode

Info

Publication number: CN109446478A
Application number: CN201811284263.4A
Authority: CN
Inventors: 李丽; 陈辉; 傅玉祥; 陈沁雨; 何国强; 何书专; 李伟
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2019-03-08
Anticipated expiration: 2038-10-30
Also published as: CN109446478B

Abstract

The present invention relates to the complex covariance matrix computing systems based on iteration and restructural mode, including DDR memory, reconfigurable cell, dma controller outside on piece SRAM memory, piece and accelerate core, the acceleration core includes: matrix covariance computing module, each region source data of poll on piece SRAM memory by way of iterative calculation, and calculate lower triangle covariance matrix；It is conjugated symmetrical module, according to the conjugate symmetry matter of covariance matrix, lower triangle covariance matrix is obtained into complete complex covariance matrix by way of address of cache and reconstruct storage, forms final operation result；DMA interface function module will be stored on piece SRAM memory by partitioned mode by the dma mode data that DDR memory is read in outside piece.The utility model has the advantages that the present invention supports the complex matrix of any columns to carry out covariance operation, reduces the source data calculation amount of conventional hardware implementation and result data is repeatedly write back to the time of DDR.

Description

A kind of complex covariance matrix computing system based on iteration and restructural mode

Technical field

The present invention relates to field of computer technology, more particularly to based on the complex covariance matrix of iteration and restructural mode Calculation system.

Background technique

The typical operation that covariance matrix is field of signal processing is calculated, is to realize multi-level assessment device, space The key component of Power estimation, relevant sources number detection and affine invarient pattern-recognition, is widely used in radar, sonar, number The fields such as word image procossing.In addition to this, covariance matrix also has widely in fields such as images match, image steganalysis Using, but the calculating process of covariance matrix is complex, by multiple feature vector structures in each region if you need to calculate image At multiple random variables covariance matrix, take a long time, this also becomes the real-time implementation image on personal versatile PC platform The big obstacle that covariance matrix calculates.

With the rapid development of IC industry, high-performance and the unremitting pursuit for being in real time built-in field.Currently, needle To the hardware realization of complex covariance matrix, largely it is all based on the platforms such as DSP, GPU and FPGA and is designed.And it is of the invention Accelerate core based on a restructural intelligence, proposes a kind of hardware for calculating complex covariance matrix based on iteration and restructural mode Implementation method, compared with conventional hardware implementation, this method resource utilization is high, hardware realization speed is fast.At signal The typical operation in reason field, the hardware implementation method have good reference and broad application prospect.

In statistics and probability theory, covariance matrix is a matrix.Each of which element is between each vector element Covariance.If X=(X₁,X₂,X₃,...,X_N)^TFor n n-dimensional random variable n, claim matrix

For the covariance matrix of n n-dimensional random variable n X, it is denoted as D (X).Wherein, C_ij=Cov (X_i,X_j), i, j=1 ..., n is The component X of X_iAnd X_jCovariance.Because of C_ij=C_ji, therefore covariance matrix is symmetrical matrix.

As shown in Figure 1, traditional hardware implementation mode is as follows: for the complex matrix A=[a of a M × N_ij], its multiple association Variance matrix B can be obtained by following formula: B=AA^H.Transposition first is asked to matrix A, then conjugation is asked to obtain A^H, finally obtain matrix B Each element is as follows:Since which is realized according to matrix multiplication, big points complex matrix association is being calculated The multiple carrying-in/carrying-out of data is involved in when variance, this will lead to memory access overlong time, and the implementation does not press association side The conjugate symmetry matter of poor matrix reduces operand, these are all that covariance matrix calculates time long reason.And many real In the application scenarios of border, inefficient calculate of covariance matrix can become a big obstruction, it is seen that there is the present invention certain reference to anticipate Justice and application prospect.

Summary of the invention

It is an object of the invention to overcome the deficiency of the above prior art, one kind is provided, source data operand is effectively reduced, Storage resource is made full use of, calculating speed is accelerated, and then promoted on the whole using algorithm performance based on iteration and can be weighed The complex covariance matrix computing system of structure mode, is specifically realized by the following technical scheme:

The complex covariance matrix computing system based on iteration and restructural mode, including on piece SRAM memory, piece Outer DDR memory, reconfigurable cell, dma controller and accelerate core, the acceleration core respectively on piece SRAM memory, can Reconfiguration unit communication connection, dma controller and reconfigurable cell communicate to connect, and the outer DDR memory of piece is controlled by bus and DMA Device communication connection, the acceleration core include:

Matrix covariance computing module, each region source data of poll on piece SRAM memory by way of iterative calculation, And calculate lower triangle covariance matrix；

It is conjugated symmetrical module, according to the conjugate symmetry matter of covariance matrix, lower triangle covariance matrix is passed through into address The mode of mapping and reconstruct storage obtains complete complex covariance matrix, forms final operation result；

DMA interface function module will be stored in by the dma mode data that DDR memory is read in outside piece by partitioned mode On piece SRAM memory；And the operation result is write back into DDR memory outside piece by dma mode.

The complex covariance matrix computing system based on iteration and restructural mode it is further design be, described Upper SRAM memory setting storage resource is divided into k bank, and the depth of each bank is d, if under m bank of distribution is for storing Triangle covariance matrix carries out covariance operation for the complex matrix that size is M × N, meets M²≤ md, N are arbitrary value；If meter Calculation degree of parallelism is b, then delimits the condition that complex matrix to be asked is small point are as follows: M²≤ bd, condition is not satisfied determines wait ask multiple Matrix is big points.

The further design of the complex covariance matrix computing system based on iteration and restructural mode is, if wait ask Complex matrix is small point, then uses the one-dimensional data transmission mode of DMA；The two of DMA is used if complex matrix to be asked is big points Dimension data transmission mode.

The complex covariance matrix computing system based on iteration and restructural mode it is further design be, the square Battle array covariance computing module is stored in all bank of single bank, single deposit one and arranges and will work as according to by column of complex matrix to be asked Forefront divides an area into, all bank of single divide the regular of a section into and store former data, and it is total to calculate single subregion Size, segmentation total degree, area's number of final stage and the columns in the last one area.

The further design of the complex covariance matrix computing system based on iteration and restructural mode is, if arbitrarily Area's columns is unfilled, then uses zero padding mechanism；If source data stores over single maximum storage points, using ping-pong operation point Section processing uses batch processing if source data stores over single maximum storage points.

The further design of the complex covariance matrix computing system based on iteration and restructural mode is that matrix is assisted Variance computing module is constructed using restructural mode multiplies accumulating computing unit again, and each segmentation, each subregion, each bank are kept in result Data are iterated calculating, obtain lower triangle covariance matrix.

The further design of the complex covariance matrix computing system based on iteration and restructural mode is, multiplies again tired The bank quantity for being equal to source data storage using number for adding complex multiplier and Complex Summer in computing unit, is set as b, every time The number of computing unit input data are as follows: 2b+1；Institute's active data bank m-th address is read as complex multiplier and inputs A；Point Not Du Qu the 1~M address institute active data bank data conjugation as complex multiplier input B, another input data For the value of corresponding storage address in last storage result bank, each calculated result need to write back again in same bank again Same address.

The further design of the complex covariance matrix computing system based on iteration and restructural mode is, described total The symmetrical module of yoke successively reads data from bank where lower triangle covariance matrix, parses row of the data in lower triangular matrix And column, the rule for being stored in a bank is arranged according still further to matrix one, by the data and its source number of conjugate symmetric data deposit multiplexing According in bank, until obtaining complete complex covariance matrix.

The further design of the complex covariance matrix computing system based on iteration and restructural mode is, described to add When fast core distributes the data read from DDR, the address resolution being passed to according to DMA goes out incoming data wait ask in complex matrix Row and column is stored in corresponding source data bank further according to by column distribution principle.

The complex covariance matrix computing system based on iteration and restructural mode it is further design be, conjugate pair Claim module in the address of cache of use are as follows: the data of lower triangle covariance matrix to be successively read from result bank, according to the number The row and column of former lower triangular matrix where going out according to the address resolution in the bank number and bank at place, then press complete covariance square The distribution principle that one column of battle array are stored in a bank will be in current data and the bank of its conjugate symmetric data deposit reconstruct.

Advantages of the present invention is as follows:

Complex covariance matrix computing system based on iteration and restructural mode of the invention is carried out using parameterized approach Design reconstructs the maximized scheme of resource utilization and carries out complex covariance matrix fortune for different storage and computing resource It calculates.The method reduce source data calculation amounts, improve resource utilization, and accelerate hard-wired arithmetic speed, are signal The design realization that covariance matrix is calculated in process field provides good reference function.

Detailed description of the invention

Fig. 1 is to calculate complex covariance matrix hardware in the present invention to realize architecture diagram.

Fig. 2 is that traditional approach calculates complex covariance matrix hardware realization architecture diagram.

Fig. 3 is that source data arranges schematic diagram in the present invention.

Fig. 4 is that source data transmits Address Mapping schematic diagram in the present invention.

Fig. 5 is small point source data Stored Procedure figure in the present invention.

Fig. 6 is computing unit design diagram in the present invention.

Fig. 7 is that calculating process source data reads schematic diagram in the present invention.

Fig. 8 is that lower triangular matrix is conjugated symmetrical Address Mapping schematic diagram in the present invention.

Fig. 9 is the Performance Evaluation contrast schematic diagram that the present invention and traditional approach calculate complex covariance matrix.

Specific embodiment

The present invention program is described in detail with reference to the accompanying drawing.

Such as Fig. 1, the complex covariance matrix computing system based on iteration and restructural mode of the example mainly includes DMA Interface function, the operation of matrix covariance and the main modulars such as conjugation is symmetrical.DMA interface function module is mainly responsible for: first is that will On piece SRAM memory is stored in by partitioned mode with the dma mode data that DDR memory (hereinafter DDR) is read in outside piece (hereinafter SRAM), second is that last operation result is write back DDR by dma mode.Matrix covariance computing module is main It is responsible for each region source data of poll SRAM by way of iterative calculation and calculates lower triangle covariance matrix.It is conjugated symmetrical module master It is responsible for the conjugate symmetry matter according to covariance matrix, lower triangle covariance matrix is stored by address of cache and reconstruct Mode obtains complete complex covariance matrix.The present invention supports the complex matrix of any columns to carry out covariance operation, reduces biography The source data calculation amount for hardware implementation mode of uniting and the time that result data is repeatedly write back to DDR, tradeoff calculates and storage money Source, which is realized, maximizes multidiameter delay, and computing unit is constructed in the way of restructural, particular address mapping ruler is set up and obtains multiple association Variance matrix.

On piece SRAM memory setting storage resource is divided into k bank, and the depth of each bank is d, if m bank of distribution For storing lower triangle covariance matrix, covariance operation is carried out for the complex matrix that size is M × N, meets M²≤ md, N are Arbitrary value；If calculating degree of parallelism is b, the condition that complex matrix to be asked is small point delimited are as follows: M²≤ bd, condition is not satisfied i.e. Determine that complex matrix to be asked is big points.

It is described in detail, and built a based on SystemC language with an example of the present invention realization below Cycle accurate system integration project model is verified.

The present invention is based on hardware implementing architectures shown in Fig. 2 to calculate complex covariance matrix, assumes that matrix X is M × N in example Rank (M≤256, N≤8K), general covariance result Y are the matrix of M × M: (following " % " indicates modulo operation)

If (being divided into 32 bank, each bank depth is 8k) so that memory size is the SRAM of 2MB as an example, when hardware realization Data processing is carried out using subregion (big points also need to be segmented) mode, repeatedly fills up source data area by column piecemeal.Consider to support big Points ping-pong operation, general covariance result Y are up to 256*256 points, i.e. lower triangular matrix is up to 256* (256+ 1)/2=32896 points, therefore at least need 5 bank storages.Therefore, available 32 bank remove storage calculated result 5 bank, remaining all bank by ping-pong operation store source data, therefore the bank quantity of single stored source data be A =floor [(32-5)/2]=13.(floor function performance: returning to the maximum integer smaller than parameter).

Since each bank depth is 8k, and in each bank by single-row M number it is sequentially stored into data (M is up to 256), therefore a area B=floor [8k/M] delimited, i.e., the A bank in each area can store A column data, and all bank at most may be used Store C=(A*B) column data.(big points definition: the size of data of complex matrix to be asked is more than C × M)

By taking M × N matrix as an example (N≤C), then source data arrangement is as shown in Figure 3.If N > C is recycled according to this, need to pass altogether It send D=ceil (N/C) secondary, ceil function performance: returning to the smallest positive integral for being more than or equal to specified expression formula.The last one area The bank number of occupancy is (N-1) %A+1, and area residue bank is filled up with 0.

For the AXI bus data bit wide used in the realization of this example for 256bit, DMA can be split as the number of 4 64bit According to the DMA interface function for sending matrix covariance operation to, which can be deposited into corresponding according to Address Mapping Bank, as shown in Figure 4.Therefore, when big points operation, every section of partition size is preferably 4 multiple, the column transmitted every time in this way The multiple for counting exactly 4 is not in synchronization toward same bank 2 data of write-in, so that it is whole to avoid delay from depositing number influence Body arithmetic speed；For small point operation, terminate since DMA is once carried, and columns is not necessarily exactly 4 multiple, 2 data are written toward same bank so will appear synchronization, need to be delayed at this time is written data in turn, as shown in Figure 5.It is right In last area's data transmission of the final stage of points operation greatly, because DMA uses 2-D data transmission mode, if remaining columns Be not 4 multiple, then can automatic zero padding gather into 4 multiple and needed just later pair although bank is written with zero padding column more Last area's zero padding filling, thus it is not only unimportant, shorten zero padding columns instead, accelerates source data storage time.

Reconfigurable Computation unit need to be reconstructed into the input of 27 complex datas, 5 grades of full flowing water and answer multiply-accumulate unit, such as Fig. 6 It is shown, 13 complex multipliers, 13 Complex Summers are used altogether.

According to the conjugate symmetry property of general covariance result, in order to shorten operation time, complex covariance matrix is only calculated Lower triangle, upper triangle is symmetrically extended using conjugation.Source data reading form in calculating process is as shown in fig. 7, specific fortune Steps are as follows for calculation:

1) matrix is divided into D sections by column, imports 1 segment data every time, is put into source data bank by Fig. 3 form；

2) carry out following operation to each area occupied when leading portion: (input Pre_Region Data is same area's corresponding positions The data set)

The 1st number is read simultaneously as input I1 from A bank, and the 1st number takes conjugation as input I2；

From A bank while the 2nd number is read as input I1, and successively reading the 1st, the 2nd number take conjugation as input I2；

The 3rd number is read simultaneously from A bank as input I1, successively read the 1st, the 2nd, the 3rd number take conjugation conduct Input I2；

……；

M-th number is read simultaneously as input I1 from A bank, successively reads the 1st, the 2nd ... ..., m-th number takes conjugation As input I2；

3) it repeats step 2) and completes the operation for working as all areas of leading portion；

4) all sections of step 2), step 3) completion calculating are repeated, M (M+1)/2 result is obtained；

5) by M (M+1)/2 result, by conjugation, symmetrically being extended to M*M result is newly stored into new storage array.

It can be re-used due to calculating completion opisthogenesis data storage areas, therefore construct new storage array and store M*M square Battle array covariance calculated result, 256bit is corresponding in order to be transmitted as with each data of DMA, and several in view of taking out 4 every time Convenience, therefore new storage array is planned to bank0-bank15, calculated M*M complex covariance matrix is sequentially stored into newly by column Storage array in, it is as shown in Figure 8 that specific lower triangular matrix is conjugated symmetrical Address Mapping.

This example Performance Evaluation is as follows: 1) number of segment for taking B area is floor (N/C)；2) area's number that final stage occupies For ceil ((N%C)/A)；3) periodicity that input I1 is taken when each area's operation is M, takes input I2 and input Pre_ parallel The periodicity of Region Data is M × (M+1)/2；4) time of lower triangular matrix conjugation symmetric extension is M × (M+1)/2；5) Arithmetic element is multiplied accumulating again and calculates the time as T1, reads from bank and the time of storing data is respectively T2, T3.The then multiple association side The poor total execution cycle number of matrix is as follows: floor (N/C) * [(M* (M+1)/2+M) * B]+ceil [(N%C)/A] * [M* (M+1)/2 +M]+M*(M+1)/2+T1+T2+T3.And the periodicity that traditional approach calculates complex covariance matrix is as follows: M*N* (M+4)/4.

Because T1, T2, T3 are smaller relative to total periodicity, thus it is negligible.Association side is being made to different size complex matrix When difference operation, the performance comparison with above 2 kinds of implementations is as shown in Figure 9.From figure it can clearly be seen that based on iteration and can The periodicity that reconstruct mode calculates complex covariance matrix greatly reduces than traditional approach.It is of the invention based on iteration and restructural side The complex covariance matrix computing system of formula supports the complex matrix of any columns to carry out covariance operation, reduces conventional hardware realization The source data calculation amount of mode and the time that result data is repeatedly write back to DDR, tradeoff calculates and storage resource realizes maximum Change multidiameter delay, computing unit is constructed in the way of restructural, particular address mapping ruler is set up and obtains complex covariance matrix, greatly It improves resource utilization and hard-wired arithmetic speed greatly.As the typical operation of field of signal processing, the hardware realization Method has good reference and broad application prospect.

Claims

1. a kind of complex covariance matrix computing system based on iteration and restructural mode, including outside on piece SRAM memory, piece DDR memory, reconfigurable cell, dma controller and accelerate core, the acceleration core respectively on piece SRAM memory, can weigh The connection of structure unit communication, dma controller and reconfigurable cell communicate to connect, and the outer DDR memory of piece passes through bus and dma controller Communication connection, it is characterised in that the acceleration core includes:

Matrix covariance computing module, each region source data of poll on piece SRAM memory by way of iterative calculation, and count Calculate lower triangle covariance matrix；

It is conjugated symmetrical module, according to the conjugate symmetry matter of covariance matrix, lower triangle covariance matrix is passed through into address of cache Complete complex covariance matrix is obtained with the mode of reconstruct storage, forms final operation result；

DMA interface function module will be stored on piece by partitioned mode by the dma mode data that DDR memory is read in outside piece SRAM memory；And the operation result is write back into DDR memory outside piece by dma mode.

2. the complex covariance matrix computing system according to claim 1 based on iteration and restructural mode, feature exist It is divided into k bank on piece SRAM memory setting storage resource, the depth of each bank is d, if m bank of distribution is used In storing lower triangle covariance matrix, covariance operation is carried out for the complex matrix that size is M × N, meets M²≤ md, N are to appoint Meaning value；If calculating degree of parallelism is b, the condition that complex matrix to be asked is small point delimited are as follows: M²≤ bd, condition is not satisfied sentences Fixed complex matrix to be asked is big points.

3. the complex covariance matrix computing system according to claim 2 based on iteration and restructural mode, feature exist If being small point in complex matrix to be asked, the one-dimensional data transmission mode of DMA is used；It is used if complex matrix to be asked is big points The 2-D data transmission mode of DMA.

4. the complex covariance matrix computing system according to claim 1 based on iteration and restructural mode, feature exist All bank of single bank, single deposit one is stored according to by column of complex matrix to be asked in the matrix covariance computing module It arranges and list will be calculated when forefront divide an area into, all bank of single divide the rule an of section into and store former data Subzone total size, segmentation total degree, area's number of final stage and the columns in the last one area.

5. the complex covariance matrix computing system according to claim 4 based on iteration and restructural mode, feature exist If columns is unfilled in any area, zero padding mechanism is used；If source data stores over single maximum storage points, using table tennis Pang operation segment processing uses batch processing if source data stores over single maximum storage points.

6. the complex covariance matrix computing system according to claim 1 based on iteration and restructural mode, feature exist It is constructed in matrix covariance computing module using restructural mode and multiplies accumulating computing unit again, by each segmentation, each subregion, each bank Temporary result data is iterated calculating, obtains lower triangle covariance matrix.

7. the complex covariance matrix computing system according to claim 6 based on iteration and restructural mode, feature exist In the bank quantity for being equal to source data storage using number for multiplying accumulating complex multiplier and Complex Summer in computing unit again, setting For b, the number of each computing unit input data are as follows: 2b+1；Institute's active data bank m-th address is read as complex multiplier Input A；The conjugation for reading the data of the 1~M address institute active data bank respectively inputs B as complex multiplier, another Input data is the value of corresponding storage address in last storage result bank, and each calculated result need to write back same again again Same address in bank.

8. the complex covariance matrix computing system according to claim 1 based on iteration and restructural mode, feature exist Data successively are read from bank where lower triangle covariance matrix in the symmetrical module of conjugation, parse data in lower three angular moment Row and column in battle array arranges the rule for being stored in a bank according still further to matrix one, the data and its conjugate symmetric data is stored in multiple In source data bank, until obtaining complete complex covariance matrix.

9. the complex covariance matrix computing system according to claim 1 based on iteration and restructural mode, feature exist When the acceleration core distributes the data read from DDR, the address resolution being passed to according to DMA goes out incoming data wait ask multiple Row and column in matrix is stored in corresponding source data bank further according to by column distribution principle.

10. the complex covariance matrix computing system according to claim 4 based on iteration and restructural mode, feature exist In being conjugated symmetrical module in the address of cache of use are as follows: the data of lower triangle covariance matrix are successively read from result bank, The row and column of bank number where the data and the address resolution in bank former lower triangular matrix where going out, then by complete What the distribution principle that one column of covariance matrix are stored in a bank reconstructed current data and the deposit of its conjugate symmetric data In bank.