CN104794002B - A kind of multidiameter delay division methods and system - Google Patents

A kind of multidiameter delay division methods and system Download PDF

Info

Publication number
CN104794002B
CN104794002B CN201410836965.4A CN201410836965A CN104794002B CN 104794002 B CN104794002 B CN 104794002B CN 201410836965 A CN201410836965 A CN 201410836965A CN 104794002 B CN104794002 B CN 104794002B
Authority
CN
China
Prior art keywords
result
multidiameter delay
way
division methods
operand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410836965.4A
Other languages
Chinese (zh)
Other versions
CN104794002A (en
Inventor
潘红兵
李丽
黄炎
何书专
李伟
沙金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201410836965.4A priority Critical patent/CN104794002B/en
Publication of CN104794002A publication Critical patent/CN104794002A/en
Application granted granted Critical
Publication of CN104794002B publication Critical patent/CN104794002B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The multidiameter delay division methods based on specific resources that the present invention relates to a kind of, the method is by calculating total operand, it obtains per the operand and fruiting quantities that should actually distribute all the way, so that task load is consistent between the operation IP of parallelization, per the fruiting quantities obtained all the way by top layer configuration parameter in this method, result sum is obtained by shifter-adder, the result sum determines that operation terminates when operation result number reaches result sum per the generation of address sequence all the way.Have the beneficial effect that the monitoring realized with the simple processing circuit of structure by light-operated at night to windowsill upper area, play the role of reminding resident and warns thief, the function that the acquisition of local data can be completed by easy device, handle, be converted to acoustic intelligence for the indoor alarm equipment, and reliability is higher, and concealment is not easy by force person thefted's discovery so that entering the room around the alarm.

Description

A kind of multidiameter delay division methods and system
Technical field
The present invention relates to drawing parallel for auto-correlation algorithm is realized in calculation resources, the limited specific hardware system of storage resource Divide technology more particularly to a kind of multidiameter delay division methods and its framework based on specific resources.
Background technique
Digital Signal Processing is not only widely used in multimedia, data communication, thunder as important technological means Reach the field of engineering technology such as picture, geology detecting, aerospace, becomes artificial intelligence, pattern-recognition, neural network again in recent years Etc. one of the theoretical basis of new branch of science, coverage is very extensive.And with the rapid development of semiconductor process technique, chip The continuous promotion of performance provides possibility for the real-time processing of high-volume data.It is main used in Digital Signal Processing Algorithm be mostly data are filtered, convolution, correlation and spectrum analysis operation etc..
Auto-correlation is the degree of correlation for describing random signal x (t) between any two different moments t1, t2, continuous signal Auto-correlation function is defined as:
Mathematically it can easily be seen that RXX(τ) is even function, and the auto-correlation function of cosine and sine signal is equally a sine and cosine letter Number.The auto-correlation function of periodic signal is still same frequency periodic signal.As τ=0, RXX(τ) has maximum value, i.e., signal is equal Side's value.Illustrate that degree of correlation is maximum at this time.
For the Digital Signal Processing of auto-correlation algorithm, because signal be it is discrete, function may be defined as following form.That is allusion quotation Type multiplies accumulating structure.
Summary of the invention
Present invention aims to overcome that the deficiency of the above prior art, provides a kind of multidiameter delay based on specific resources stroke Divide method and its framework, specifically there is following technical scheme realization:
The multidiameter delay division methods based on specific resources, the method are obtained every by calculating total operand The operand and fruiting quantities that should actually distribute all the way, so that task load is consistent between the operation IP of parallelization, It is characterized in that in this method that it is total to obtain result by shifter-adder by top layer configuration parameter per the fruiting quantities obtained all the way Number, the result sum determine the generation per address sequence all the way, the operation knot when operation result number reaches result sum Beam.
The further design of the multidiameter delay division methods based on specific resources is that the method is based on from phase Algorithm is closed, sets the points of input phasor as number, by first points and the input vector conjugate vector of input vector Last number is multiplied to obtain first as a result, again by conjugate vector one lattice of sliding window to the right, the respective items of described two vectors into Row multiplies accumulating, and obtains second and completes auto-correlation algorithm as a result, repeating the above steps, and the points of treated final result are number*2-1。
The multidiameter delay division methods based on specific resources it is further design be, the sequence of the final result Column are using mediant as symmetrical centre, and the number conjugation of the right and left is symmetrical.
The further design of the multidiameter delay division methods based on specific resources is that the operation result is only counted It calculates the first half and is carried as a result, starting DMA after the first half result operation, after sequentially having carried the first half result, along inverse The first half result is taken conjugation successively to move out by sequence direction.
The further design of the multidiameter delay division methods based on specific resources is that the input vector multiplies Cumulative order is incremented to number from 1, and the fruiting quantities that the division mode based on operand sets the distribution of four tunnels are successively 0.5* number、0.2071*number、0.1589*number、0.1334*number。
Using the multidiameter delay division methods based on specific resources, a kind of multichannel based on specific resources is provided simultaneously The hardware structure that row divides, including that can exist in calculation resources, the storage resource that autocorrelative algorithm top-level module calls, feature In further include:
Nucleus module multiply-accumulator for receiving the source data of address generator unit transmission, and is responsible for the place of data flow Reason;
Address generator unit, the address sequence for being responsible for needed for generating, successively imports calculation resources for source data Multiply-accumulator;
Memory bank Switching Module, addressing when for different memory blocks to be considered as same memory block to each memory block.
Further design for the hardware structure that the multidiameter delay based on specific resources divides be, the multiply-accumulator It supports four road transports to calculate, has the input of two-way source data, need eight data input pins altogether.
Further design for the hardware structure that the multidiameter delay based on specific resources divides be, it is described each multiply it is tired Adding includes a complex multiplier and three complex adders for realizing stream treatment inside device, the complex multiplier and Three complex adders are sequentially connected in series, and two data input pins of the adder are selected by corresponding Combinational Logic Control respectively It is logical.
Further design for the hardware structure that the multidiameter delay based on specific resources divides be, memory bank interchange mode Block is made of the connection of at least two memory blocks.
Advantages of the present invention is as follows:
Multidiameter delay division methods provided by the invention based on specific resources are obtained each by calculating total operand The operand and fruiting quantities that road should actually be distributed, so that task load is consistent between the operation IP of parallelization, Obtain maximum performance boost.Four Lu Binghang are used in design, (i.e. by top layer configuration parameter per the fruiting quantities obtained all the way Vector points) it is obtained by shifter-adder.The result determines the address sequence generated per AGU all the way, while as operation knot The important Rule of judgment of beam.Invention combines the calculation resources and storage resource of hardware system, realizes 16~128k vector length model Interior auto-correlation algorithm hardware design is enclosed, the service condition of memory has been discussed in detail, provides the hardware of each points interval range Degree of parallelism, basic module design, the performance indicator of the system integration and quantization.
Detailed description of the invention
Fig. 1 is auto-correlation algorithm sliding window diagram.
Fig. 2 is multiply-accumulator schematic diagram.
Fig. 3 is auto-correlation algorithm hardware design module interconnection figure.
Specific embodiment
The present invention program is described in detail with reference to the accompanying drawing.
Multidiameter delay division methods provided in this embodiment based on specific resources are obtained by calculating total operand Per the operand and fruiting quantities that should actually distribute all the way, so that task load is consistent between the operation IP of parallelization, It is characterized in that it is total to obtain result by shifter-adder by top layer configuration parameter per the fruiting quantities obtained all the way in this method Number, as a result sum determines that operation terminates when operation result number reaches result sum per the generation of address sequence all the way.
This method is based on auto-correlation algorithm, sets the points of input phasor as number, by first point of input vector Number is multiplied to obtain with last number of input vector conjugate vector first as a result, again by conjugate vector one lattice of sliding window to the right, The respective items of two vectors are multiplied accumulating, and are obtained second and are completed auto-correlation algorithm as a result, repeating the above steps, treated The points of final result are number*2-1.
The main of auto-correlation algorithm realizes that for 1024 points of A vector, algorithm sliding window process is as shown in Figure 1 by sliding window. It is multiplied first by first number of A vector with last number of A conjugate vector, obtains first as a result, i.e. A (0) A (1023)*;Then one lattice of sliding window, the respective items of two vectors are multiplied accumulating conjugate vector to the right, obtain second as a result, i.e. A (0) A (1022) *+A (1) A (1023) *, it is A (0) A (1021) *+A (1) A (1022) *+A that third result, which equally can be obtained, (2)A(1023)*.And so on, until sliding window to last time operation, the last one is obtained as a result, i.e. A (1023) A (0) *.
Assuming that vector points are number, then auto-correlation algorithm treated result points are number*2-1, from number It is not difficult to find out that, final result sequence is using mediant as symmetrical centre, and the number conjugation of the right and left is symmetrical on.
It can be seen that algorithm itself from the expression formula and sliding window figure of auto-correlation algorithm and there was only basic multiply-add operation.Tool Body design method is as follows:
It (1), theoretically can be to auto-correlation since calculation resources can provide 4 groups of complex multipliers and 16 groups of complex adders Algorithm makees four road Parallel Designs.
(2) operation result is that conjugation is symmetrical, then only need to calculate the first half as a result, starting after operation DMA is carried.After sequentially having carried the first half result, conjugation is taken successively to move out result along backward direction.Such integral operation Time diminution half, one times of performance boost.
It (3) is number based on the points that under the premise of (2), auto-correlation algorithm actual operation is obtained, source data multiplies accumulating Order is incremented to number from 1.If only averagely being divided according to the quantity that every road obtains result, then per fortune actual all the way Calculation amount will be it is unbalance, operation time can also mismatch.Therefore it needs to use the division methods based on operand, mathematically set Dispensing score per number of results specific all the way, so that its operand is roughly equal, to guarantee the balance of task load, so that property It can maximize.
Total operand is 1+2+ ...+number, about number2/2.Operand needed for obtaining the first half result is 1 + 2+ ...+number/2, about number2/ 8, exactly the 1/4 of total operand, so before the first via can be assigned to just The result of half.
Operand needed for number*x result is 1+2+ ...+number*x, about number before obtaining2/2* x2, point X is not enabled2Equal to 1/4,2/4,3/4, what is obtained is the first via, preceding two-way, the result quantities that first three road needs to obtain altogether.X is successively The fruiting quantities for being the distribution of 0.5,0.7071,0.8660. tetra- tunnel are successively 0.5*number, 0.2071*number, 0.1589* Number, 0.1334*number.
There was only low and high level, corresponding 0,1 two values in digital circuit.It, always can be by moving to left phase for the multiplication of integer The method added is realized.But for the multiplication of decimal, only specific decimal can just be realized by moving to right to be added.Specifically design Verilog code is as follows:
Assign constant1=(number > >);
Assign constant2=(number > > 1)+(number > > 3)+(number > > 4)+(number > > 5);
Assign constant3=(number > > 1)+(number > > 2)+(number > > 3);
Assign result_num1=(number > > 1);
Assign result_num2=(number > > 3)+(number > > 4)+(number > > 5);
Assign result_num3=(number > > 2)-(number > > 4)-(number > > 5);
Assign result_num4=number- (number > > 1)-(number > > 2)-(number > > 3);
1 auto-correlation algorithm parallel task of table divides the comparison with theoretical value
The multidiameter delay division methods based on specific resources utilized provide a kind of multidiameter delay based on specific resources stroke Point hardware structure, including can autocorrelative algorithm top-level module call calculation resources, storage resource, it is characterised in that also Include:
Nucleus module multiply-accumulator for receiving the source data of address generator unit transmission, and is responsible for the place of data flow Reason;
Address generator unit, the address sequence for being responsible for needed for generating, successively imports calculation resources for source data Multiply-accumulator;
Memory bank Switching Module, addressing when for different memory blocks to be considered as same memory block to each memory block.
For auto-correlation algorithm, from the point of view of the process of sliding window, the correlation between data, which determines, to make simultaneously source vector Row segmentation.This means that even if four Lu Binghang, every that complete vector is still needed to store all the way.Multiply-accumulator has two-way source number According to input, 8 tunnels is needed to input altogether.Fig. 2 show multiply-accumulator schematic diagram, includes 1 complex multiplication inside each multiply-accumulator Device, three complex adders, because each operation IP has three-level flowing water, therefore the purpose for using three adders is to realize stream Water process.The MUX of two data input pins of each adder is gated respectively by the corresponding combination logic control.
When (assuming that less than 8k) counting smaller, then only needing 8 bank storage source datas.When points are not higher than When 16k, every two bank is needed to be a memory_switch, 16 bank is needed to provide eight circuit-switched data streams, while number of results altogether According in 16 bank after being stored in.Four tunnel multiply-accumulator of intrinsic call, the address of each inputoutput data is by corresponding Operend_AGU and result_AGU control.Fig. 3 show auto-correlation algorithm hardware design module interconnection figure.
When vector points, which are higher than 16k, is not higher than 64k, by taking the maximum value 64k vector of section as an example, the vector is completely stored, Lucky 8 bank are needed, two paths of data stream occupies 16 bank, it is contemplated that the storage of result, algorithm at this time can not be supported Parallel processing, thus for the vector in the section, it multiply-accumulator can only calculate all the way, it can not be parallel.
When points are greater than 64k, are not more than 128k, equally by taking the maximum value 128k vector of section as an example, two numbers at this time Need to take all bank according to stream, such result can only be just stored in the lesser constant memory of scale.The number interval It does not support equally parallel, to only rely on one group of multiply-accumulator operation.Every circuit-switched data storage needs 16 Bank, needs whole 32 altogether Bank stores source operand.So result can only be stored in the lesser constant memory of scale.Constant Memory total amount 8K, and result quantities are up to 256k-1, and whole results can not be supported disposably to be stored in.So in design, when the ground of AGU module When location counter reaches 8k, it is meant that address sequence needed for obtaining last number has generated, and represents the clock at tens After period, constant Memory will be filled with.AGU can generate an interrupt signal and be sent to module top layer at this time, be determined by top layer 14 It (is determined through experiment simulation) after a clk, the enable signal for inputing to multiply-accumulator and result_AGU module is dragged down, Mean to fall all modules interrupts, then issues finish signal and inform master controller, by data from Constant Memory is carried away, and master controller sends start signal to auto-correlation algorithm top layer after the completion, and internal logic can open simultaneously All module recoveries are moved to execute.After whole operations is finished, finish_all is drawn high.
Table 2 is the running time of three feature vector points, meets the requirement of project performance indicator.(system operation In 1GHz dominant frequency)
2 auto-correlation algorithm performance indicator of table

Claims (8)

1. a kind of multidiameter delay division methods, the method is obtained per should actually distribute all the way by calculating total operand Operand and fruiting quantities, so that task load is consistent between the operation IP of parallelization, it is characterised in that in this method Per the fruiting quantities obtained all the way by top layer configuration parameter, result sum is obtained by shifter-adder, the result sum is determined The generation per address sequence all the way is determined, operation terminates when operation result number reaches result sum;The method is based on certainly Related algorithm sets the points of input vector as number, by first points and input vector conjugate vector of input vector Last number be multiplied to obtain first as a result, again by conjugate vector one lattice of sliding window to the right, the respective items of described two vectors Multiplied accumulating, obtain second and complete auto-correlation algorithm with the step of second result as a result, repetition obtains first result, The points of treated final result are number*2-1.
2. multidiameter delay division methods according to claim 1, it is characterised in that the sequence of the final result is in Between number be symmetrical centre, the number conjugation of the right and left is symmetrical.
3. multidiameter delay division methods according to claim 2, it is characterised in that the operation result only calculates previous Half carries as a result, starting DMA after the first half result operation, will along backward direction after sequentially having carried the first half result The first half result takes conjugation successively to move out.
4. multidiameter delay division methods according to claim 3, it is characterised in that the input vector multiplies accumulating order Be incremented to number from 1, division mode based on operand set the distribution of four tunnels fruiting quantities be successively 0.5*number, 0.2071*number、0.1589*number、0.1334*number。
5. using the multidiameter delay dividing system of multidiameter delay division methods according to any one of claims 1-4, feature Be include can autocorrelative algorithm top-level module call calculation resources, storage resource, it is characterised in that further include:
Several multiply-accumulators of nucleus module for receiving the source data of address generator unit transmission, and are responsible for the place of data flow Reason;
Source data, it is tired successively to be imported multiplying for calculation resources by address generator unit, the address sequence for being responsible for needed for generating Add device;
Memory bank Switching Module, addressing when for different memory blocks to be considered as same memory block to each memory block.
6. multidiameter delay dividing system according to claim 5, it is characterised in that the multiply-accumulator supports four road transports to calculate, There is the input of two-way source data, needs eight data input pins altogether.
7. multidiameter delay dividing system according to claim 6, it is characterised in that include inside each multiply-accumulator One complex multiplier and three complex adders for realizing stream treatment, the complex multiplier and three complex additions Device is sequentially connected in series, and two data input pins of the adder are gated by corresponding Combinational Logic Control respectively.
8. multidiameter delay dividing system according to claim 5, it is characterised in that memory bank Switching Module is by least two Memory block connection composition.
CN201410836965.4A 2014-12-29 2014-12-29 A kind of multidiameter delay division methods and system Active CN104794002B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410836965.4A CN104794002B (en) 2014-12-29 2014-12-29 A kind of multidiameter delay division methods and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410836965.4A CN104794002B (en) 2014-12-29 2014-12-29 A kind of multidiameter delay division methods and system

Publications (2)

Publication Number Publication Date
CN104794002A CN104794002A (en) 2015-07-22
CN104794002B true CN104794002B (en) 2019-03-22

Family

ID=53558813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410836965.4A Active CN104794002B (en) 2014-12-29 2014-12-29 A kind of multidiameter delay division methods and system

Country Status (1)

Country Link
CN (1) CN104794002B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108089839A (en) * 2017-10-11 2018-05-29 南开大学 A kind of method that computing cross-correlation is realized based on FPGA
CN108762719B (en) * 2018-05-21 2023-06-06 南京大学 Parallel generalized inner product reconstruction controller

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101931449A (en) * 2010-08-27 2010-12-29 中国科学院上海微***与信息技术研究所 Distributed type digital beam formation network and digital beam formation processing method
CN102214086A (en) * 2011-06-20 2011-10-12 复旦大学 General-purpose parallel acceleration algorithm based on multi-core processor
CN102521047A (en) * 2011-11-15 2012-06-27 重庆邮电大学 Method for realizing interrupted load balance among multi-core processors
CN103927290A (en) * 2014-04-18 2014-07-16 南京大学 Inverse operation method for lower triangle complex matrix with any order

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130159680A1 (en) * 2011-12-19 2013-06-20 Wei-Yu Chen Systems, methods, and computer program products for parallelizing large number arithmetic

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101931449A (en) * 2010-08-27 2010-12-29 中国科学院上海微***与信息技术研究所 Distributed type digital beam formation network and digital beam formation processing method
CN102214086A (en) * 2011-06-20 2011-10-12 复旦大学 General-purpose parallel acceleration algorithm based on multi-core processor
CN102521047A (en) * 2011-11-15 2012-06-27 重庆邮电大学 Method for realizing interrupted load balance among multi-core processors
CN103927290A (en) * 2014-04-18 2014-07-16 南京大学 Inverse operation method for lower triangle complex matrix with any order

Also Published As

Publication number Publication date
CN104794002A (en) 2015-07-22

Similar Documents

Publication Publication Date Title
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN105528191B (en) Data accumulation apparatus and method, and digital signal processing device
Le Ly et al. High-performance reconfigurable hardware architecture for restricted Boltzmann machines
CN108564168A (en) A kind of design method to supporting more precision convolutional neural networks processors
CN103970720B (en) Based on extensive coarseness imbedded reconfigurable system and its processing method
CN104899182A (en) Matrix multiplication acceleration method for supporting variable blocks
CN102135951B (en) FPGA (Field Programmable Gate Array) implementation method based on LS-SVM (Least Squares-Support Vector Machine) algorithm restructured at runtime
CN110276447A (en) A kind of computing device and method
CN103226543A (en) FFT processor with pipeline structure
CN110163350A (en) A kind of computing device and method
CN108710943A (en) A kind of multilayer feedforward neural network Parallel Accelerator
Rouholamini et al. A new design for 7: 2 compressors
CN105183425A (en) Fixed-bit-width multiplier with high accuracy and low complexity properties
CN104794002B (en) A kind of multidiameter delay division methods and system
CN106951394A (en) A kind of general fft processor of restructural fixed and floating
CN105955896B (en) A kind of restructural DBF hardware algorithm accelerator and control method
CN102567282B (en) In general dsp processor, FFT calculates implement device and method
CN104777456B (en) Configurable radar digital signal processing device and its processing method
Sattianadan et al. Optimal placement of capacitor in radial distribution system using PSO
CN102364456A (en) 64-point fast Fourier transform (FFT) calculator
Yang et al. An adaptive batch-orchestration algorithm for the heterogeneous GPU cluster environment in distributed deep learning system
Li et al. Numerical solution of nonlinear Klein–Gordon equation using lattice Boltzmann method
Antony et al. Design of high speed Vedic multiplier using multiplexer based adder
CN108108812A (en) For the efficiently configurable convolutional calculation accelerator of convolutional neural networks
CN101694648B (en) Fourier transform processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant