CN104794002B - A kind of multidiameter delay division methods and system - Google Patents
A kind of multidiameter delay division methods and system Download PDFInfo
- Publication number
- CN104794002B CN104794002B CN201410836965.4A CN201410836965A CN104794002B CN 104794002 B CN104794002 B CN 104794002B CN 201410836965 A CN201410836965 A CN 201410836965A CN 104794002 B CN104794002 B CN 104794002B
- Authority
- CN
- China
- Prior art keywords
- result
- multidiameter delay
- way
- division methods
- operand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Complex Calculations (AREA)
Abstract
The multidiameter delay division methods based on specific resources that the present invention relates to a kind of, the method is by calculating total operand, it obtains per the operand and fruiting quantities that should actually distribute all the way, so that task load is consistent between the operation IP of parallelization, per the fruiting quantities obtained all the way by top layer configuration parameter in this method, result sum is obtained by shifter-adder, the result sum determines that operation terminates when operation result number reaches result sum per the generation of address sequence all the way.Have the beneficial effect that the monitoring realized with the simple processing circuit of structure by light-operated at night to windowsill upper area, play the role of reminding resident and warns thief, the function that the acquisition of local data can be completed by easy device, handle, be converted to acoustic intelligence for the indoor alarm equipment, and reliability is higher, and concealment is not easy by force person thefted's discovery so that entering the room around the alarm.
Description
Technical field
The present invention relates to drawing parallel for auto-correlation algorithm is realized in calculation resources, the limited specific hardware system of storage resource
Divide technology more particularly to a kind of multidiameter delay division methods and its framework based on specific resources.
Background technique
Digital Signal Processing is not only widely used in multimedia, data communication, thunder as important technological means
Reach the field of engineering technology such as picture, geology detecting, aerospace, becomes artificial intelligence, pattern-recognition, neural network again in recent years
Etc. one of the theoretical basis of new branch of science, coverage is very extensive.And with the rapid development of semiconductor process technique, chip
The continuous promotion of performance provides possibility for the real-time processing of high-volume data.It is main used in Digital Signal Processing
Algorithm be mostly data are filtered, convolution, correlation and spectrum analysis operation etc..
Auto-correlation is the degree of correlation for describing random signal x (t) between any two different moments t1, t2, continuous signal
Auto-correlation function is defined as:
Mathematically it can easily be seen that RXX(τ) is even function, and the auto-correlation function of cosine and sine signal is equally a sine and cosine letter
Number.The auto-correlation function of periodic signal is still same frequency periodic signal.As τ=0, RXX(τ) has maximum value, i.e., signal is equal
Side's value.Illustrate that degree of correlation is maximum at this time.
For the Digital Signal Processing of auto-correlation algorithm, because signal be it is discrete, function may be defined as following form.That is allusion quotation
Type multiplies accumulating structure.
Summary of the invention
Present invention aims to overcome that the deficiency of the above prior art, provides a kind of multidiameter delay based on specific resources stroke
Divide method and its framework, specifically there is following technical scheme realization:
The multidiameter delay division methods based on specific resources, the method are obtained every by calculating total operand
The operand and fruiting quantities that should actually distribute all the way, so that task load is consistent between the operation IP of parallelization,
It is characterized in that in this method that it is total to obtain result by shifter-adder by top layer configuration parameter per the fruiting quantities obtained all the way
Number, the result sum determine the generation per address sequence all the way, the operation knot when operation result number reaches result sum
Beam.
The further design of the multidiameter delay division methods based on specific resources is that the method is based on from phase
Algorithm is closed, sets the points of input phasor as number, by first points and the input vector conjugate vector of input vector
Last number is multiplied to obtain first as a result, again by conjugate vector one lattice of sliding window to the right, the respective items of described two vectors into
Row multiplies accumulating, and obtains second and completes auto-correlation algorithm as a result, repeating the above steps, and the points of treated final result are
number*2-1。
The multidiameter delay division methods based on specific resources it is further design be, the sequence of the final result
Column are using mediant as symmetrical centre, and the number conjugation of the right and left is symmetrical.
The further design of the multidiameter delay division methods based on specific resources is that the operation result is only counted
It calculates the first half and is carried as a result, starting DMA after the first half result operation, after sequentially having carried the first half result, along inverse
The first half result is taken conjugation successively to move out by sequence direction.
The further design of the multidiameter delay division methods based on specific resources is that the input vector multiplies
Cumulative order is incremented to number from 1, and the fruiting quantities that the division mode based on operand sets the distribution of four tunnels are successively 0.5*
number、0.2071*number、0.1589*number、0.1334*number。
Using the multidiameter delay division methods based on specific resources, a kind of multichannel based on specific resources is provided simultaneously
The hardware structure that row divides, including that can exist in calculation resources, the storage resource that autocorrelative algorithm top-level module calls, feature
In further include:
Nucleus module multiply-accumulator for receiving the source data of address generator unit transmission, and is responsible for the place of data flow
Reason;
Address generator unit, the address sequence for being responsible for needed for generating, successively imports calculation resources for source data
Multiply-accumulator;
Memory bank Switching Module, addressing when for different memory blocks to be considered as same memory block to each memory block.
Further design for the hardware structure that the multidiameter delay based on specific resources divides be, the multiply-accumulator
It supports four road transports to calculate, has the input of two-way source data, need eight data input pins altogether.
Further design for the hardware structure that the multidiameter delay based on specific resources divides be, it is described each multiply it is tired
Adding includes a complex multiplier and three complex adders for realizing stream treatment inside device, the complex multiplier and
Three complex adders are sequentially connected in series, and two data input pins of the adder are selected by corresponding Combinational Logic Control respectively
It is logical.
Further design for the hardware structure that the multidiameter delay based on specific resources divides be, memory bank interchange mode
Block is made of the connection of at least two memory blocks.
Advantages of the present invention is as follows:
Multidiameter delay division methods provided by the invention based on specific resources are obtained each by calculating total operand
The operand and fruiting quantities that road should actually be distributed, so that task load is consistent between the operation IP of parallelization,
Obtain maximum performance boost.Four Lu Binghang are used in design, (i.e. by top layer configuration parameter per the fruiting quantities obtained all the way
Vector points) it is obtained by shifter-adder.The result determines the address sequence generated per AGU all the way, while as operation knot
The important Rule of judgment of beam.Invention combines the calculation resources and storage resource of hardware system, realizes 16~128k vector length model
Interior auto-correlation algorithm hardware design is enclosed, the service condition of memory has been discussed in detail, provides the hardware of each points interval range
Degree of parallelism, basic module design, the performance indicator of the system integration and quantization.
Detailed description of the invention
Fig. 1 is auto-correlation algorithm sliding window diagram.
Fig. 2 is multiply-accumulator schematic diagram.
Fig. 3 is auto-correlation algorithm hardware design module interconnection figure.
Specific embodiment
The present invention program is described in detail with reference to the accompanying drawing.
Multidiameter delay division methods provided in this embodiment based on specific resources are obtained by calculating total operand
Per the operand and fruiting quantities that should actually distribute all the way, so that task load is consistent between the operation IP of parallelization,
It is characterized in that it is total to obtain result by shifter-adder by top layer configuration parameter per the fruiting quantities obtained all the way in this method
Number, as a result sum determines that operation terminates when operation result number reaches result sum per the generation of address sequence all the way.
This method is based on auto-correlation algorithm, sets the points of input phasor as number, by first point of input vector
Number is multiplied to obtain with last number of input vector conjugate vector first as a result, again by conjugate vector one lattice of sliding window to the right,
The respective items of two vectors are multiplied accumulating, and are obtained second and are completed auto-correlation algorithm as a result, repeating the above steps, treated
The points of final result are number*2-1.
The main of auto-correlation algorithm realizes that for 1024 points of A vector, algorithm sliding window process is as shown in Figure 1 by sliding window.
It is multiplied first by first number of A vector with last number of A conjugate vector, obtains first as a result, i.e. A (0) A
(1023)*;Then one lattice of sliding window, the respective items of two vectors are multiplied accumulating conjugate vector to the right, obtain second as a result, i.e.
A (0) A (1022) *+A (1) A (1023) *, it is A (0) A (1021) *+A (1) A (1022) *+A that third result, which equally can be obtained,
(2)A(1023)*.And so on, until sliding window to last time operation, the last one is obtained as a result, i.e. A (1023) A (0) *.
Assuming that vector points are number, then auto-correlation algorithm treated result points are number*2-1, from number
It is not difficult to find out that, final result sequence is using mediant as symmetrical centre, and the number conjugation of the right and left is symmetrical on.
It can be seen that algorithm itself from the expression formula and sliding window figure of auto-correlation algorithm and there was only basic multiply-add operation.Tool
Body design method is as follows:
It (1), theoretically can be to auto-correlation since calculation resources can provide 4 groups of complex multipliers and 16 groups of complex adders
Algorithm makees four road Parallel Designs.
(2) operation result is that conjugation is symmetrical, then only need to calculate the first half as a result, starting after operation
DMA is carried.After sequentially having carried the first half result, conjugation is taken successively to move out result along backward direction.Such integral operation
Time diminution half, one times of performance boost.
It (3) is number based on the points that under the premise of (2), auto-correlation algorithm actual operation is obtained, source data multiplies accumulating
Order is incremented to number from 1.If only averagely being divided according to the quantity that every road obtains result, then per fortune actual all the way
Calculation amount will be it is unbalance, operation time can also mismatch.Therefore it needs to use the division methods based on operand, mathematically set
Dispensing score per number of results specific all the way, so that its operand is roughly equal, to guarantee the balance of task load, so that property
It can maximize.
Total operand is 1+2+ ...+number, about number2/2.Operand needed for obtaining the first half result is 1
+ 2+ ...+number/2, about number2/ 8, exactly the 1/4 of total operand, so before the first via can be assigned to just
The result of half.
Operand needed for number*x result is 1+2+ ...+number*x, about number before obtaining2/2* x2, point
X is not enabled2Equal to 1/4,2/4,3/4, what is obtained is the first via, preceding two-way, the result quantities that first three road needs to obtain altogether.X is successively
The fruiting quantities for being the distribution of 0.5,0.7071,0.8660. tetra- tunnel are successively 0.5*number, 0.2071*number, 0.1589*
Number, 0.1334*number.
There was only low and high level, corresponding 0,1 two values in digital circuit.It, always can be by moving to left phase for the multiplication of integer
The method added is realized.But for the multiplication of decimal, only specific decimal can just be realized by moving to right to be added.Specifically design
Verilog code is as follows:
Assign constant1=(number > >);
Assign constant2=(number > > 1)+(number > > 3)+(number > > 4)+(number > > 5);
Assign constant3=(number > > 1)+(number > > 2)+(number > > 3);
Assign result_num1=(number > > 1);
Assign result_num2=(number > > 3)+(number > > 4)+(number > > 5);
Assign result_num3=(number > > 2)-(number > > 4)-(number > > 5);
Assign result_num4=number- (number > > 1)-(number > > 2)-(number > > 3);
1 auto-correlation algorithm parallel task of table divides the comparison with theoretical value
The multidiameter delay division methods based on specific resources utilized provide a kind of multidiameter delay based on specific resources stroke
Point hardware structure, including can autocorrelative algorithm top-level module call calculation resources, storage resource, it is characterised in that also
Include:
Nucleus module multiply-accumulator for receiving the source data of address generator unit transmission, and is responsible for the place of data flow
Reason;
Address generator unit, the address sequence for being responsible for needed for generating, successively imports calculation resources for source data
Multiply-accumulator;
Memory bank Switching Module, addressing when for different memory blocks to be considered as same memory block to each memory block.
For auto-correlation algorithm, from the point of view of the process of sliding window, the correlation between data, which determines, to make simultaneously source vector
Row segmentation.This means that even if four Lu Binghang, every that complete vector is still needed to store all the way.Multiply-accumulator has two-way source number
According to input, 8 tunnels is needed to input altogether.Fig. 2 show multiply-accumulator schematic diagram, includes 1 complex multiplication inside each multiply-accumulator
Device, three complex adders, because each operation IP has three-level flowing water, therefore the purpose for using three adders is to realize stream
Water process.The MUX of two data input pins of each adder is gated respectively by the corresponding combination logic control.
When (assuming that less than 8k) counting smaller, then only needing 8 bank storage source datas.When points are not higher than
When 16k, every two bank is needed to be a memory_switch, 16 bank is needed to provide eight circuit-switched data streams, while number of results altogether
According in 16 bank after being stored in.Four tunnel multiply-accumulator of intrinsic call, the address of each inputoutput data is by corresponding
Operend_AGU and result_AGU control.Fig. 3 show auto-correlation algorithm hardware design module interconnection figure.
When vector points, which are higher than 16k, is not higher than 64k, by taking the maximum value 64k vector of section as an example, the vector is completely stored,
Lucky 8 bank are needed, two paths of data stream occupies 16 bank, it is contemplated that the storage of result, algorithm at this time can not be supported
Parallel processing, thus for the vector in the section, it multiply-accumulator can only calculate all the way, it can not be parallel.
When points are greater than 64k, are not more than 128k, equally by taking the maximum value 128k vector of section as an example, two numbers at this time
Need to take all bank according to stream, such result can only be just stored in the lesser constant memory of scale.The number interval
It does not support equally parallel, to only rely on one group of multiply-accumulator operation.Every circuit-switched data storage needs 16 Bank, needs whole 32 altogether
Bank stores source operand.So result can only be stored in the lesser constant memory of scale.Constant Memory total amount
8K, and result quantities are up to 256k-1, and whole results can not be supported disposably to be stored in.So in design, when the ground of AGU module
When location counter reaches 8k, it is meant that address sequence needed for obtaining last number has generated, and represents the clock at tens
After period, constant Memory will be filled with.AGU can generate an interrupt signal and be sent to module top layer at this time, be determined by top layer 14
It (is determined through experiment simulation) after a clk, the enable signal for inputing to multiply-accumulator and result_AGU module is dragged down,
Mean to fall all modules interrupts, then issues finish signal and inform master controller, by data from Constant
Memory is carried away, and master controller sends start signal to auto-correlation algorithm top layer after the completion, and internal logic can open simultaneously
All module recoveries are moved to execute.After whole operations is finished, finish_all is drawn high.
Table 2 is the running time of three feature vector points, meets the requirement of project performance indicator.(system operation
In 1GHz dominant frequency)
2 auto-correlation algorithm performance indicator of table
Claims (8)
1. a kind of multidiameter delay division methods, the method is obtained per should actually distribute all the way by calculating total operand
Operand and fruiting quantities, so that task load is consistent between the operation IP of parallelization, it is characterised in that in this method
Per the fruiting quantities obtained all the way by top layer configuration parameter, result sum is obtained by shifter-adder, the result sum is determined
The generation per address sequence all the way is determined, operation terminates when operation result number reaches result sum;The method is based on certainly
Related algorithm sets the points of input vector as number, by first points and input vector conjugate vector of input vector
Last number be multiplied to obtain first as a result, again by conjugate vector one lattice of sliding window to the right, the respective items of described two vectors
Multiplied accumulating, obtain second and complete auto-correlation algorithm with the step of second result as a result, repetition obtains first result,
The points of treated final result are number*2-1.
2. multidiameter delay division methods according to claim 1, it is characterised in that the sequence of the final result is in
Between number be symmetrical centre, the number conjugation of the right and left is symmetrical.
3. multidiameter delay division methods according to claim 2, it is characterised in that the operation result only calculates previous
Half carries as a result, starting DMA after the first half result operation, will along backward direction after sequentially having carried the first half result
The first half result takes conjugation successively to move out.
4. multidiameter delay division methods according to claim 3, it is characterised in that the input vector multiplies accumulating order
Be incremented to number from 1, division mode based on operand set the distribution of four tunnels fruiting quantities be successively 0.5*number,
0.2071*number、0.1589*number、0.1334*number。
5. using the multidiameter delay dividing system of multidiameter delay division methods according to any one of claims 1-4, feature
Be include can autocorrelative algorithm top-level module call calculation resources, storage resource, it is characterised in that further include:
Several multiply-accumulators of nucleus module for receiving the source data of address generator unit transmission, and are responsible for the place of data flow
Reason;
Source data, it is tired successively to be imported multiplying for calculation resources by address generator unit, the address sequence for being responsible for needed for generating
Add device;
Memory bank Switching Module, addressing when for different memory blocks to be considered as same memory block to each memory block.
6. multidiameter delay dividing system according to claim 5, it is characterised in that the multiply-accumulator supports four road transports to calculate,
There is the input of two-way source data, needs eight data input pins altogether.
7. multidiameter delay dividing system according to claim 6, it is characterised in that include inside each multiply-accumulator
One complex multiplier and three complex adders for realizing stream treatment, the complex multiplier and three complex additions
Device is sequentially connected in series, and two data input pins of the adder are gated by corresponding Combinational Logic Control respectively.
8. multidiameter delay dividing system according to claim 5, it is characterised in that memory bank Switching Module is by least two
Memory block connection composition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410836965.4A CN104794002B (en) | 2014-12-29 | 2014-12-29 | A kind of multidiameter delay division methods and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410836965.4A CN104794002B (en) | 2014-12-29 | 2014-12-29 | A kind of multidiameter delay division methods and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104794002A CN104794002A (en) | 2015-07-22 |
CN104794002B true CN104794002B (en) | 2019-03-22 |
Family
ID=53558813
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410836965.4A Active CN104794002B (en) | 2014-12-29 | 2014-12-29 | A kind of multidiameter delay division methods and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104794002B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108089839A (en) * | 2017-10-11 | 2018-05-29 | 南开大学 | A kind of method that computing cross-correlation is realized based on FPGA |
CN108762719B (en) * | 2018-05-21 | 2023-06-06 | 南京大学 | Parallel generalized inner product reconstruction controller |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101931449A (en) * | 2010-08-27 | 2010-12-29 | 中国科学院上海微***与信息技术研究所 | Distributed type digital beam formation network and digital beam formation processing method |
CN102214086A (en) * | 2011-06-20 | 2011-10-12 | 复旦大学 | General-purpose parallel acceleration algorithm based on multi-core processor |
CN102521047A (en) * | 2011-11-15 | 2012-06-27 | 重庆邮电大学 | Method for realizing interrupted load balance among multi-core processors |
CN103927290A (en) * | 2014-04-18 | 2014-07-16 | 南京大学 | Inverse operation method for lower triangle complex matrix with any order |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130159680A1 (en) * | 2011-12-19 | 2013-06-20 | Wei-Yu Chen | Systems, methods, and computer program products for parallelizing large number arithmetic |
-
2014
- 2014-12-29 CN CN201410836965.4A patent/CN104794002B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101931449A (en) * | 2010-08-27 | 2010-12-29 | 中国科学院上海微***与信息技术研究所 | Distributed type digital beam formation network and digital beam formation processing method |
CN102214086A (en) * | 2011-06-20 | 2011-10-12 | 复旦大学 | General-purpose parallel acceleration algorithm based on multi-core processor |
CN102521047A (en) * | 2011-11-15 | 2012-06-27 | 重庆邮电大学 | Method for realizing interrupted load balance among multi-core processors |
CN103927290A (en) * | 2014-04-18 | 2014-07-16 | 南京大学 | Inverse operation method for lower triangle complex matrix with any order |
Also Published As
Publication number | Publication date |
---|---|
CN104794002A (en) | 2015-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062472B (en) | Sparse neural network accelerator based on structured pruning and acceleration method thereof | |
CN105528191B (en) | Data accumulation apparatus and method, and digital signal processing device | |
Le Ly et al. | High-performance reconfigurable hardware architecture for restricted Boltzmann machines | |
CN108564168A (en) | A kind of design method to supporting more precision convolutional neural networks processors | |
CN103970720B (en) | Based on extensive coarseness imbedded reconfigurable system and its processing method | |
CN104899182A (en) | Matrix multiplication acceleration method for supporting variable blocks | |
CN102135951B (en) | FPGA (Field Programmable Gate Array) implementation method based on LS-SVM (Least Squares-Support Vector Machine) algorithm restructured at runtime | |
CN110276447A (en) | A kind of computing device and method | |
CN103226543A (en) | FFT processor with pipeline structure | |
CN110163350A (en) | A kind of computing device and method | |
CN108710943A (en) | A kind of multilayer feedforward neural network Parallel Accelerator | |
Rouholamini et al. | A new design for 7: 2 compressors | |
CN105183425A (en) | Fixed-bit-width multiplier with high accuracy and low complexity properties | |
CN104794002B (en) | A kind of multidiameter delay division methods and system | |
CN106951394A (en) | A kind of general fft processor of restructural fixed and floating | |
CN105955896B (en) | A kind of restructural DBF hardware algorithm accelerator and control method | |
CN102567282B (en) | In general dsp processor, FFT calculates implement device and method | |
CN104777456B (en) | Configurable radar digital signal processing device and its processing method | |
Sattianadan et al. | Optimal placement of capacitor in radial distribution system using PSO | |
CN102364456A (en) | 64-point fast Fourier transform (FFT) calculator | |
Yang et al. | An adaptive batch-orchestration algorithm for the heterogeneous GPU cluster environment in distributed deep learning system | |
Li et al. | Numerical solution of nonlinear Klein–Gordon equation using lattice Boltzmann method | |
Antony et al. | Design of high speed Vedic multiplier using multiplexer based adder | |
CN108108812A (en) | For the efficiently configurable convolutional calculation accelerator of convolutional neural networks | |
CN101694648B (en) | Fourier transform processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |