CN109446478A - A kind of complex covariance matrix computing system based on iteration and restructural mode - Google Patents
A kind of complex covariance matrix computing system based on iteration and restructural mode Download PDFInfo
- Publication number
- CN109446478A CN109446478A CN201811284263.4A CN201811284263A CN109446478A CN 109446478 A CN109446478 A CN 109446478A CN 201811284263 A CN201811284263 A CN 201811284263A CN 109446478 A CN109446478 A CN 109446478A
- Authority
- CN
- China
- Prior art keywords
- complex
- bank
- data
- covariance matrix
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 155
- 238000003860 storage Methods 0.000 claims abstract description 28
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims abstract description 9
- 230000001133 acceleration Effects 0.000 claims abstract description 6
- 230000021615 conjugation Effects 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000013500 data storage Methods 0.000 claims description 5
- 238000004891 communication Methods 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000013461 design Methods 0.000 description 20
- 238000000034 method Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 238000013507 mapping Methods 0.000 description 7
- 238000013459 approach Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The present invention relates to the complex covariance matrix computing systems based on iteration and restructural mode, including DDR memory, reconfigurable cell, dma controller outside on piece SRAM memory, piece and accelerate core, the acceleration core includes: matrix covariance computing module, each region source data of poll on piece SRAM memory by way of iterative calculation, and calculate lower triangle covariance matrix;It is conjugated symmetrical module, according to the conjugate symmetry matter of covariance matrix, lower triangle covariance matrix is obtained into complete complex covariance matrix by way of address of cache and reconstruct storage, forms final operation result;DMA interface function module will be stored on piece SRAM memory by partitioned mode by the dma mode data that DDR memory is read in outside piece.The utility model has the advantages that the present invention supports the complex matrix of any columns to carry out covariance operation, reduces the source data calculation amount of conventional hardware implementation and result data is repeatedly write back to the time of DDR.
Description
Technical field
The present invention relates to field of computer technology, more particularly to based on the complex covariance matrix of iteration and restructural mode
Calculation system.
Background technique
The typical operation that covariance matrix is field of signal processing is calculated, is to realize multi-level assessment device, space
The key component of Power estimation, relevant sources number detection and affine invarient pattern-recognition, is widely used in radar, sonar, number
The fields such as word image procossing.In addition to this, covariance matrix also has widely in fields such as images match, image steganalysis
Using, but the calculating process of covariance matrix is complex, by multiple feature vector structures in each region if you need to calculate image
At multiple random variables covariance matrix, take a long time, this also becomes the real-time implementation image on personal versatile PC platform
The big obstacle that covariance matrix calculates.
With the rapid development of IC industry, high-performance and the unremitting pursuit for being in real time built-in field.Currently, needle
To the hardware realization of complex covariance matrix, largely it is all based on the platforms such as DSP, GPU and FPGA and is designed.And it is of the invention
Accelerate core based on a restructural intelligence, proposes a kind of hardware for calculating complex covariance matrix based on iteration and restructural mode
Implementation method, compared with conventional hardware implementation, this method resource utilization is high, hardware realization speed is fast.At signal
The typical operation in reason field, the hardware implementation method have good reference and broad application prospect.
In statistics and probability theory, covariance matrix is a matrix.Each of which element is between each vector element
Covariance.If X=(X1,X2,X3,...,XN)TFor n n-dimensional random variable n, claim matrix
For the covariance matrix of n n-dimensional random variable n X, it is denoted as D (X).Wherein, Cij=Cov (Xi,Xj), i, j=1 ..., n is
The component X of XiAnd XjCovariance.Because of Cij=Cji, therefore covariance matrix is symmetrical matrix.
As shown in Figure 1, traditional hardware implementation mode is as follows: for the complex matrix A=[a of a M × Nij], its multiple association
Variance matrix B can be obtained by following formula: B=AAH.Transposition first is asked to matrix A, then conjugation is asked to obtain AH, finally obtain matrix B
Each element is as follows:Since which is realized according to matrix multiplication, big points complex matrix association is being calculated
The multiple carrying-in/carrying-out of data is involved in when variance, this will lead to memory access overlong time, and the implementation does not press association side
The conjugate symmetry matter of poor matrix reduces operand, these are all that covariance matrix calculates time long reason.And many real
In the application scenarios of border, inefficient calculate of covariance matrix can become a big obstruction, it is seen that there is the present invention certain reference to anticipate
Justice and application prospect.
Summary of the invention
It is an object of the invention to overcome the deficiency of the above prior art, one kind is provided, source data operand is effectively reduced,
Storage resource is made full use of, calculating speed is accelerated, and then promoted on the whole using algorithm performance based on iteration and can be weighed
The complex covariance matrix computing system of structure mode, is specifically realized by the following technical scheme:
The complex covariance matrix computing system based on iteration and restructural mode, including on piece SRAM memory, piece
Outer DDR memory, reconfigurable cell, dma controller and accelerate core, the acceleration core respectively on piece SRAM memory, can
Reconfiguration unit communication connection, dma controller and reconfigurable cell communicate to connect, and the outer DDR memory of piece is controlled by bus and DMA
Device communication connection, the acceleration core include:
Matrix covariance computing module, each region source data of poll on piece SRAM memory by way of iterative calculation,
And calculate lower triangle covariance matrix;
It is conjugated symmetrical module, according to the conjugate symmetry matter of covariance matrix, lower triangle covariance matrix is passed through into address
The mode of mapping and reconstruct storage obtains complete complex covariance matrix, forms final operation result;
DMA interface function module will be stored in by the dma mode data that DDR memory is read in outside piece by partitioned mode
On piece SRAM memory;And the operation result is write back into DDR memory outside piece by dma mode.
The complex covariance matrix computing system based on iteration and restructural mode it is further design be, described
Upper SRAM memory setting storage resource is divided into k bank, and the depth of each bank is d, if under m bank of distribution is for storing
Triangle covariance matrix carries out covariance operation for the complex matrix that size is M × N, meets M2≤ md, N are arbitrary value;If meter
Calculation degree of parallelism is b, then delimits the condition that complex matrix to be asked is small point are as follows: M2≤ bd, condition is not satisfied determines wait ask multiple
Matrix is big points.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is, if wait ask
Complex matrix is small point, then uses the one-dimensional data transmission mode of DMA;The two of DMA is used if complex matrix to be asked is big points
Dimension data transmission mode.
The complex covariance matrix computing system based on iteration and restructural mode it is further design be, the square
Battle array covariance computing module is stored in all bank of single bank, single deposit one and arranges and will work as according to by column of complex matrix to be asked
Forefront divides an area into, all bank of single divide the regular of a section into and store former data, and it is total to calculate single subregion
Size, segmentation total degree, area's number of final stage and the columns in the last one area.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is, if arbitrarily
Area's columns is unfilled, then uses zero padding mechanism;If source data stores over single maximum storage points, using ping-pong operation point
Section processing uses batch processing if source data stores over single maximum storage points.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is that matrix is assisted
Variance computing module is constructed using restructural mode multiplies accumulating computing unit again, and each segmentation, each subregion, each bank are kept in result
Data are iterated calculating, obtain lower triangle covariance matrix.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is, multiplies again tired
The bank quantity for being equal to source data storage using number for adding complex multiplier and Complex Summer in computing unit, is set as b, every time
The number of computing unit input data are as follows: 2b+1;Institute's active data bank m-th address is read as complex multiplier and inputs A;Point
Not Du Qu the 1~M address institute active data bank data conjugation as complex multiplier input B, another input data
For the value of corresponding storage address in last storage result bank, each calculated result need to write back again in same bank again
Same address.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is, described total
The symmetrical module of yoke successively reads data from bank where lower triangle covariance matrix, parses row of the data in lower triangular matrix
And column, the rule for being stored in a bank is arranged according still further to matrix one, by the data and its source number of conjugate symmetric data deposit multiplexing
According in bank, until obtaining complete complex covariance matrix.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is, described to add
When fast core distributes the data read from DDR, the address resolution being passed to according to DMA goes out incoming data wait ask in complex matrix
Row and column is stored in corresponding source data bank further according to by column distribution principle.
The complex covariance matrix computing system based on iteration and restructural mode it is further design be, conjugate pair
Claim module in the address of cache of use are as follows: the data of lower triangle covariance matrix to be successively read from result bank, according to the number
The row and column of former lower triangular matrix where going out according to the address resolution in the bank number and bank at place, then press complete covariance square
The distribution principle that one column of battle array are stored in a bank will be in current data and the bank of its conjugate symmetric data deposit reconstruct.
Advantages of the present invention is as follows:
Complex covariance matrix computing system based on iteration and restructural mode of the invention is carried out using parameterized approach
Design reconstructs the maximized scheme of resource utilization and carries out complex covariance matrix fortune for different storage and computing resource
It calculates.The method reduce source data calculation amounts, improve resource utilization, and accelerate hard-wired arithmetic speed, are signal
The design realization that covariance matrix is calculated in process field provides good reference function.
Detailed description of the invention
Fig. 1 is to calculate complex covariance matrix hardware in the present invention to realize architecture diagram.
Fig. 2 is that traditional approach calculates complex covariance matrix hardware realization architecture diagram.
Fig. 3 is that source data arranges schematic diagram in the present invention.
Fig. 4 is that source data transmits Address Mapping schematic diagram in the present invention.
Fig. 5 is small point source data Stored Procedure figure in the present invention.
Fig. 6 is computing unit design diagram in the present invention.
Fig. 7 is that calculating process source data reads schematic diagram in the present invention.
Fig. 8 is that lower triangular matrix is conjugated symmetrical Address Mapping schematic diagram in the present invention.
Fig. 9 is the Performance Evaluation contrast schematic diagram that the present invention and traditional approach calculate complex covariance matrix.
Specific embodiment
The present invention program is described in detail with reference to the accompanying drawing.
Such as Fig. 1, the complex covariance matrix computing system based on iteration and restructural mode of the example mainly includes DMA
Interface function, the operation of matrix covariance and the main modulars such as conjugation is symmetrical.DMA interface function module is mainly responsible for: first is that will
On piece SRAM memory is stored in by partitioned mode with the dma mode data that DDR memory (hereinafter DDR) is read in outside piece
(hereinafter SRAM), second is that last operation result is write back DDR by dma mode.Matrix covariance computing module is main
It is responsible for each region source data of poll SRAM by way of iterative calculation and calculates lower triangle covariance matrix.It is conjugated symmetrical module master
It is responsible for the conjugate symmetry matter according to covariance matrix, lower triangle covariance matrix is stored by address of cache and reconstruct
Mode obtains complete complex covariance matrix.The present invention supports the complex matrix of any columns to carry out covariance operation, reduces biography
The source data calculation amount for hardware implementation mode of uniting and the time that result data is repeatedly write back to DDR, tradeoff calculates and storage money
Source, which is realized, maximizes multidiameter delay, and computing unit is constructed in the way of restructural, particular address mapping ruler is set up and obtains multiple association
Variance matrix.
On piece SRAM memory setting storage resource is divided into k bank, and the depth of each bank is d, if m bank of distribution
For storing lower triangle covariance matrix, covariance operation is carried out for the complex matrix that size is M × N, meets M2≤ md, N are
Arbitrary value;If calculating degree of parallelism is b, the condition that complex matrix to be asked is small point delimited are as follows: M2≤ bd, condition is not satisfied i.e.
Determine that complex matrix to be asked is big points.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is, if wait ask
Complex matrix is small point, then uses the one-dimensional data transmission mode of DMA;The two of DMA is used if complex matrix to be asked is big points
Dimension data transmission mode.
The complex covariance matrix computing system based on iteration and restructural mode it is further design be, the square
Battle array covariance computing module is stored in all bank of single bank, single deposit one and arranges and will work as according to by column of complex matrix to be asked
Forefront divides an area into, all bank of single divide the regular of a section into and store former data, and it is total to calculate single subregion
Size, segmentation total degree, area's number of final stage and the columns in the last one area.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is, if arbitrarily
Area's columns is unfilled, then uses zero padding mechanism;If source data stores over single maximum storage points, using ping-pong operation point
Section processing uses batch processing if source data stores over single maximum storage points.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is that matrix is assisted
Variance computing module is constructed using restructural mode multiplies accumulating computing unit again, and each segmentation, each subregion, each bank are kept in result
Data are iterated calculating, obtain lower triangle covariance matrix.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is, multiplies again tired
The bank quantity for being equal to source data storage using number for adding complex multiplier and Complex Summer in computing unit, is set as b, every time
The number of computing unit input data are as follows: 2b+1;Institute's active data bank m-th address is read as complex multiplier and inputs A;Point
Not Du Qu the 1~M address institute active data bank data conjugation as complex multiplier input B, another input data
For the value of corresponding storage address in last storage result bank, each calculated result need to write back again in same bank again
Same address.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is, described total
The symmetrical module of yoke successively reads data from bank where lower triangle covariance matrix, parses row of the data in lower triangular matrix
And column, the rule for being stored in a bank is arranged according still further to matrix one, by the data and its source number of conjugate symmetric data deposit multiplexing
According in bank, until obtaining complete complex covariance matrix.
The further design of the complex covariance matrix computing system based on iteration and restructural mode is, described to add
When fast core distributes the data read from DDR, the address resolution being passed to according to DMA goes out incoming data wait ask in complex matrix
Row and column is stored in corresponding source data bank further according to by column distribution principle.
The complex covariance matrix computing system based on iteration and restructural mode it is further design be, conjugate pair
Claim module in the address of cache of use are as follows: the data of lower triangle covariance matrix to be successively read from result bank, according to the number
The row and column of former lower triangular matrix where going out according to the address resolution in the bank number and bank at place, then press complete covariance square
The distribution principle that one column of battle array are stored in a bank will be in current data and the bank of its conjugate symmetric data deposit reconstruct.
It is described in detail, and built a based on SystemC language with an example of the present invention realization below
Cycle accurate system integration project model is verified.
The present invention is based on hardware implementing architectures shown in Fig. 2 to calculate complex covariance matrix, assumes that matrix X is M × N in example
Rank (M≤256, N≤8K), general covariance result Y are the matrix of M × M: (following " % " indicates modulo operation)
If (being divided into 32 bank, each bank depth is 8k) so that memory size is the SRAM of 2MB as an example, when hardware realization
Data processing is carried out using subregion (big points also need to be segmented) mode, repeatedly fills up source data area by column piecemeal.Consider to support big
Points ping-pong operation, general covariance result Y are up to 256*256 points, i.e. lower triangular matrix is up to 256* (256+
1)/2=32896 points, therefore at least need 5 bank storages.Therefore, available 32 bank remove storage calculated result
5 bank, remaining all bank by ping-pong operation store source data, therefore the bank quantity of single stored source data be A
=floor [(32-5)/2]=13.(floor function performance: returning to the maximum integer smaller than parameter).
Since each bank depth is 8k, and in each bank by single-row M number it is sequentially stored into data (M is up to
256), therefore a area B=floor [8k/M] delimited, i.e., the A bank in each area can store A column data, and all bank at most may be used
Store C=(A*B) column data.(big points definition: the size of data of complex matrix to be asked is more than C × M)
By taking M × N matrix as an example (N≤C), then source data arrangement is as shown in Figure 3.If N > C is recycled according to this, need to pass altogether
It send D=ceil (N/C) secondary, ceil function performance: returning to the smallest positive integral for being more than or equal to specified expression formula.The last one area
The bank number of occupancy is (N-1) %A+1, and area residue bank is filled up with 0.
For the AXI bus data bit wide used in the realization of this example for 256bit, DMA can be split as the number of 4 64bit
According to the DMA interface function for sending matrix covariance operation to, which can be deposited into corresponding according to Address Mapping
Bank, as shown in Figure 4.Therefore, when big points operation, every section of partition size is preferably 4 multiple, the column transmitted every time in this way
The multiple for counting exactly 4 is not in synchronization toward same bank 2 data of write-in, so that it is whole to avoid delay from depositing number influence
Body arithmetic speed;For small point operation, terminate since DMA is once carried, and columns is not necessarily exactly 4 multiple,
2 data are written toward same bank so will appear synchronization, need to be delayed at this time is written data in turn, as shown in Figure 5.It is right
In last area's data transmission of the final stage of points operation greatly, because DMA uses 2-D data transmission mode, if remaining columns
Be not 4 multiple, then can automatic zero padding gather into 4 multiple and needed just later pair although bank is written with zero padding column more
Last area's zero padding filling, thus it is not only unimportant, shorten zero padding columns instead, accelerates source data storage time.
Reconfigurable Computation unit need to be reconstructed into the input of 27 complex datas, 5 grades of full flowing water and answer multiply-accumulate unit, such as Fig. 6
It is shown, 13 complex multipliers, 13 Complex Summers are used altogether.
According to the conjugate symmetry property of general covariance result, in order to shorten operation time, complex covariance matrix is only calculated
Lower triangle, upper triangle is symmetrically extended using conjugation.Source data reading form in calculating process is as shown in fig. 7, specific fortune
Steps are as follows for calculation:
1) matrix is divided into D sections by column, imports 1 segment data every time, is put into source data bank by Fig. 3 form;
2) carry out following operation to each area occupied when leading portion: (input Pre_Region Data is same area's corresponding positions
The data set)
The 1st number is read simultaneously as input I1 from A bank, and the 1st number takes conjugation as input I2;
From A bank while the 2nd number is read as input I1, and successively reading the 1st, the 2nd number take conjugation as input I2;
The 3rd number is read simultaneously from A bank as input I1, successively read the 1st, the 2nd, the 3rd number take conjugation conduct
Input I2;
……;
M-th number is read simultaneously as input I1 from A bank, successively reads the 1st, the 2nd ... ..., m-th number takes conjugation
As input I2;
3) it repeats step 2) and completes the operation for working as all areas of leading portion;
4) all sections of step 2), step 3) completion calculating are repeated, M (M+1)/2 result is obtained;
5) by M (M+1)/2 result, by conjugation, symmetrically being extended to M*M result is newly stored into new storage array.
It can be re-used due to calculating completion opisthogenesis data storage areas, therefore construct new storage array and store M*M square
Battle array covariance calculated result, 256bit is corresponding in order to be transmitted as with each data of DMA, and several in view of taking out 4 every time
Convenience, therefore new storage array is planned to bank0-bank15, calculated M*M complex covariance matrix is sequentially stored into newly by column
Storage array in, it is as shown in Figure 8 that specific lower triangular matrix is conjugated symmetrical Address Mapping.
This example Performance Evaluation is as follows: 1) number of segment for taking B area is floor (N/C);2) area's number that final stage occupies
For ceil ((N%C)/A);3) periodicity that input I1 is taken when each area's operation is M, takes input I2 and input Pre_ parallel
The periodicity of Region Data is M × (M+1)/2;4) time of lower triangular matrix conjugation symmetric extension is M × (M+1)/2;5)
Arithmetic element is multiplied accumulating again and calculates the time as T1, reads from bank and the time of storing data is respectively T2, T3.The then multiple association side
The poor total execution cycle number of matrix is as follows: floor (N/C) * [(M* (M+1)/2+M) * B]+ceil [(N%C)/A] * [M* (M+1)/2
+M]+M*(M+1)/2+T1+T2+T3.And the periodicity that traditional approach calculates complex covariance matrix is as follows: M*N* (M+4)/4.
Because T1, T2, T3 are smaller relative to total periodicity, thus it is negligible.Association side is being made to different size complex matrix
When difference operation, the performance comparison with above 2 kinds of implementations is as shown in Figure 9.From figure it can clearly be seen that based on iteration and can
The periodicity that reconstruct mode calculates complex covariance matrix greatly reduces than traditional approach.It is of the invention based on iteration and restructural side
The complex covariance matrix computing system of formula supports the complex matrix of any columns to carry out covariance operation, reduces conventional hardware realization
The source data calculation amount of mode and the time that result data is repeatedly write back to DDR, tradeoff calculates and storage resource realizes maximum
Change multidiameter delay, computing unit is constructed in the way of restructural, particular address mapping ruler is set up and obtains complex covariance matrix, greatly
It improves resource utilization and hard-wired arithmetic speed greatly.As the typical operation of field of signal processing, the hardware realization
Method has good reference and broad application prospect.
Claims (10)
1. a kind of complex covariance matrix computing system based on iteration and restructural mode, including outside on piece SRAM memory, piece
DDR memory, reconfigurable cell, dma controller and accelerate core, the acceleration core respectively on piece SRAM memory, can weigh
The connection of structure unit communication, dma controller and reconfigurable cell communicate to connect, and the outer DDR memory of piece passes through bus and dma controller
Communication connection, it is characterised in that the acceleration core includes:
Matrix covariance computing module, each region source data of poll on piece SRAM memory by way of iterative calculation, and count
Calculate lower triangle covariance matrix;
It is conjugated symmetrical module, according to the conjugate symmetry matter of covariance matrix, lower triangle covariance matrix is passed through into address of cache
Complete complex covariance matrix is obtained with the mode of reconstruct storage, forms final operation result;
DMA interface function module will be stored on piece by partitioned mode by the dma mode data that DDR memory is read in outside piece
SRAM memory;And the operation result is write back into DDR memory outside piece by dma mode.
2. the complex covariance matrix computing system according to claim 1 based on iteration and restructural mode, feature exist
It is divided into k bank on piece SRAM memory setting storage resource, the depth of each bank is d, if m bank of distribution is used
In storing lower triangle covariance matrix, covariance operation is carried out for the complex matrix that size is M × N, meets M2≤ md, N are to appoint
Meaning value;If calculating degree of parallelism is b, the condition that complex matrix to be asked is small point delimited are as follows: M2≤ bd, condition is not satisfied sentences
Fixed complex matrix to be asked is big points.
3. the complex covariance matrix computing system according to claim 2 based on iteration and restructural mode, feature exist
If being small point in complex matrix to be asked, the one-dimensional data transmission mode of DMA is used;It is used if complex matrix to be asked is big points
The 2-D data transmission mode of DMA.
4. the complex covariance matrix computing system according to claim 1 based on iteration and restructural mode, feature exist
All bank of single bank, single deposit one is stored according to by column of complex matrix to be asked in the matrix covariance computing module
It arranges and list will be calculated when forefront divide an area into, all bank of single divide the rule an of section into and store former data
Subzone total size, segmentation total degree, area's number of final stage and the columns in the last one area.
5. the complex covariance matrix computing system according to claim 4 based on iteration and restructural mode, feature exist
If columns is unfilled in any area, zero padding mechanism is used;If source data stores over single maximum storage points, using table tennis
Pang operation segment processing uses batch processing if source data stores over single maximum storage points.
6. the complex covariance matrix computing system according to claim 1 based on iteration and restructural mode, feature exist
It is constructed in matrix covariance computing module using restructural mode and multiplies accumulating computing unit again, by each segmentation, each subregion, each bank
Temporary result data is iterated calculating, obtains lower triangle covariance matrix.
7. the complex covariance matrix computing system according to claim 6 based on iteration and restructural mode, feature exist
In the bank quantity for being equal to source data storage using number for multiplying accumulating complex multiplier and Complex Summer in computing unit again, setting
For b, the number of each computing unit input data are as follows: 2b+1;Institute's active data bank m-th address is read as complex multiplier
Input A;The conjugation for reading the data of the 1~M address institute active data bank respectively inputs B as complex multiplier, another
Input data is the value of corresponding storage address in last storage result bank, and each calculated result need to write back same again again
Same address in bank.
8. the complex covariance matrix computing system according to claim 1 based on iteration and restructural mode, feature exist
Data successively are read from bank where lower triangle covariance matrix in the symmetrical module of conjugation, parse data in lower three angular moment
Row and column in battle array arranges the rule for being stored in a bank according still further to matrix one, the data and its conjugate symmetric data is stored in multiple
In source data bank, until obtaining complete complex covariance matrix.
9. the complex covariance matrix computing system according to claim 1 based on iteration and restructural mode, feature exist
When the acceleration core distributes the data read from DDR, the address resolution being passed to according to DMA goes out incoming data wait ask multiple
Row and column in matrix is stored in corresponding source data bank further according to by column distribution principle.
10. the complex covariance matrix computing system according to claim 4 based on iteration and restructural mode, feature exist
In being conjugated symmetrical module in the address of cache of use are as follows: the data of lower triangle covariance matrix are successively read from result bank,
The row and column of bank number where the data and the address resolution in bank former lower triangular matrix where going out, then by complete
What the distribution principle that one column of covariance matrix are stored in a bank reconstructed current data and the deposit of its conjugate symmetric data
In bank.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811284263.4A CN109446478B (en) | 2018-10-30 | 2018-10-30 | Complex covariance matrix calculation system based on iteration and reconfigurable mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811284263.4A CN109446478B (en) | 2018-10-30 | 2018-10-30 | Complex covariance matrix calculation system based on iteration and reconfigurable mode |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109446478A true CN109446478A (en) | 2019-03-08 |
CN109446478B CN109446478B (en) | 2021-09-28 |
Family
ID=65550425
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811284263.4A Active CN109446478B (en) | 2018-10-30 | 2018-10-30 | Complex covariance matrix calculation system based on iteration and reconfigurable mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109446478B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109614582A (en) * | 2018-11-06 | 2019-04-12 | 海南大学 | The lower triangular portions storage device of self adjoint matrix and parallel read method |
CN111045965A (en) * | 2019-10-25 | 2020-04-21 | 南京大学 | Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method |
CN111723336A (en) * | 2020-06-01 | 2020-09-29 | 南京大学 | Cholesky decomposition-based arbitrary-order matrix inversion hardware acceleration system adopting loop iteration mode |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002037259A1 (en) * | 2000-11-01 | 2002-05-10 | Bops, Inc. | Methods and apparatus for efficient complex long multiplication and covariance matrix implementation |
EP1215507A2 (en) * | 2000-12-12 | 2002-06-19 | Matsushita Electric Industrial Co., Ltd. | Radio-wave arrival-direction estimating apparatus and directional variable transceiver |
CN101211333A (en) * | 2006-12-30 | 2008-07-02 | 北京邮电大学 | Signal processing method, device and system |
CN103685110A (en) * | 2013-12-17 | 2014-03-26 | 京信通信***(中国)有限公司 | Predistortion processing method and system and predistortion factor arithmetic unit |
CN105426345A (en) * | 2015-12-25 | 2016-03-23 | 南京大学 | Matrix inverse operation method |
CN105630735A (en) * | 2015-12-25 | 2016-06-01 | 南京大学 | Coprocessor based on reconfigurable computational array |
CN105893333A (en) * | 2016-03-25 | 2016-08-24 | 合肥工业大学 | Hardware circuit for calculating covariance matrix in MUSIC algorithm |
US9686069B2 (en) * | 2015-05-22 | 2017-06-20 | ZTE Canada Inc. | Adaptive MIMO signal demodulation using determinant of covariance matrix |
-
2018
- 2018-10-30 CN CN201811284263.4A patent/CN109446478B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002037259A1 (en) * | 2000-11-01 | 2002-05-10 | Bops, Inc. | Methods and apparatus for efficient complex long multiplication and covariance matrix implementation |
EP1215507A2 (en) * | 2000-12-12 | 2002-06-19 | Matsushita Electric Industrial Co., Ltd. | Radio-wave arrival-direction estimating apparatus and directional variable transceiver |
CN101211333A (en) * | 2006-12-30 | 2008-07-02 | 北京邮电大学 | Signal processing method, device and system |
CN103685110A (en) * | 2013-12-17 | 2014-03-26 | 京信通信***(中国)有限公司 | Predistortion processing method and system and predistortion factor arithmetic unit |
US9686069B2 (en) * | 2015-05-22 | 2017-06-20 | ZTE Canada Inc. | Adaptive MIMO signal demodulation using determinant of covariance matrix |
CN105426345A (en) * | 2015-12-25 | 2016-03-23 | 南京大学 | Matrix inverse operation method |
CN105630735A (en) * | 2015-12-25 | 2016-06-01 | 南京大学 | Coprocessor based on reconfigurable computational array |
CN105893333A (en) * | 2016-03-25 | 2016-08-24 | 合肥工业大学 | Hardware circuit for calculating covariance matrix in MUSIC algorithm |
Non-Patent Citations (2)
Title |
---|
于东等: "一种高精度的大点数二维FFT处理器设计", 《现代雷达》 * |
何子述等: "基于数据阵共辘重构的MUSIC角估计算法", 《电子科技大学学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109614582A (en) * | 2018-11-06 | 2019-04-12 | 海南大学 | The lower triangular portions storage device of self adjoint matrix and parallel read method |
CN111045965A (en) * | 2019-10-25 | 2020-04-21 | 南京大学 | Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method |
CN111723336A (en) * | 2020-06-01 | 2020-09-29 | 南京大学 | Cholesky decomposition-based arbitrary-order matrix inversion hardware acceleration system adopting loop iteration mode |
CN111723336B (en) * | 2020-06-01 | 2023-01-24 | 南京大学 | Cholesky decomposition-based arbitrary-order matrix inversion hardware acceleration system adopting loop iteration mode |
Also Published As
Publication number | Publication date |
---|---|
CN109446478B (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111178519B (en) | Convolutional neural network acceleration engine, convolutional neural network acceleration system and method | |
CN108241890B (en) | Reconfigurable neural network acceleration method and architecture | |
CN104915322B (en) | A kind of hardware-accelerated method of convolutional neural networks | |
Pınar et al. | Fast optimal load balancing algorithms for 1D partitioning | |
CN107341544A (en) | A kind of reconfigurable accelerator and its implementation based on divisible array | |
CN108805266A (en) | A kind of restructural CNN high concurrents convolution accelerator | |
TW201913460A (en) | Chip device and related products | |
CN111667051A (en) | Neural network accelerator suitable for edge equipment and neural network acceleration calculation method | |
CN107392309A (en) | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA | |
CN111242289A (en) | Convolutional neural network acceleration system and method with expandable scale | |
CN110110844B (en) | Convolutional neural network parallel processing method based on OpenCL | |
CN103970720B (en) | Based on extensive coarseness imbedded reconfigurable system and its processing method | |
CN109446478A (en) | A kind of complex covariance matrix computing system based on iteration and restructural mode | |
CN102135951B (en) | FPGA (Field Programmable Gate Array) implementation method based on LS-SVM (Least Squares-Support Vector Machine) algorithm restructured at runtime | |
CN110647719B (en) | Three-dimensional FFT (fast Fourier transform) calculation device based on FPGA (field programmable Gate array) | |
CN106156851A (en) | The accelerator pursued one's vocational study towards the degree of depth and method | |
CN110222818A (en) | A kind of more bank ranks intertexture reading/writing methods for the storage of convolutional neural networks data | |
CN109993293A (en) | A kind of deep learning accelerator suitable for stack hourglass network | |
CN110069444A (en) | A kind of computing unit, array, module, hardware system and implementation method | |
CN209708122U (en) | A kind of computing unit, array, module, hardware system | |
CN105955896B (en) | A kind of restructural DBF hardware algorithm accelerator and control method | |
CN112732630A (en) | Floating-point matrix multiplier many-core parallel optimization method for deep learning | |
CN106156142A (en) | The processing method of a kind of text cluster, server and system | |
CN114492753A (en) | Sparse accelerator applied to on-chip training | |
Wang et al. | A scalable FPGA engine for parallel acceleration of singular value decomposition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |