CN104794002B

CN104794002B - A kind of multidiameter delay division methods and system

Info

Publication number: CN104794002B
Application number: CN201410836965.4A
Authority: CN
Inventors: 潘红兵; 李丽; 黄炎; 何书专; 李伟; 沙金
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2014-12-29
Filing date: 2014-12-29
Publication date: 2019-03-22
Anticipated expiration: 2034-12-29
Also published as: CN104794002A

Abstract

The multidiameter delay division methods based on specific resources that the present invention relates to a kind of, the method is by calculating total operand, it obtains per the operand and fruiting quantities that should actually distribute all the way, so that task load is consistent between the operation IP of parallelization, per the fruiting quantities obtained all the way by top layer configuration parameter in this method, result sum is obtained by shifter-adder, the result sum determines that operation terminates when operation result number reaches result sum per the generation of address sequence all the way.Have the beneficial effect that the monitoring realized with the simple processing circuit of structure by light-operated at night to windowsill upper area, play the role of reminding resident and warns thief, the function that the acquisition of local data can be completed by easy device, handle, be converted to acoustic intelligence for the indoor alarm equipment, and reliability is higher, and concealment is not easy by force person thefted's discovery so that entering the room around the alarm.

Description

A kind of multidiameter delay division methods and system

Technical field

The present invention relates to drawing parallel for auto-correlation algorithm is realized in calculation resources, the limited specific hardware system of storage resource Divide technology more particularly to a kind of multidiameter delay division methods and its framework based on specific resources.

Background technique

Digital Signal Processing is not only widely used in multimedia, data communication, thunder as important technological means Reach the field of engineering technology such as picture, geology detecting, aerospace, becomes artificial intelligence, pattern-recognition, neural network again in recent years Etc. one of the theoretical basis of new branch of science, coverage is very extensive.And with the rapid development of semiconductor process technique, chip The continuous promotion of performance provides possibility for the real-time processing of high-volume data.It is main used in Digital Signal Processing Algorithm be mostly data are filtered, convolution, correlation and spectrum analysis operation etc..

Auto-correlation is the degree of correlation for describing random signal x (t) between any two different moments t1, t2, continuous signal Auto-correlation function is defined as:

Mathematically it can easily be seen that R_XX(τ) is even function, and the auto-correlation function of cosine and sine signal is equally a sine and cosine letter Number.The auto-correlation function of periodic signal is still same frequency periodic signal.As τ=0, R_XX(τ) has maximum value, i.e., signal is equal Side's value.Illustrate that degree of correlation is maximum at this time.

For the Digital Signal Processing of auto-correlation algorithm, because signal be it is discrete, function may be defined as following form.That is allusion quotation Type multiplies accumulating structure.

Summary of the invention

Present invention aims to overcome that the deficiency of the above prior art, provides a kind of multidiameter delay based on specific resources stroke Divide method and its framework, specifically there is following technical scheme realization:

The multidiameter delay division methods based on specific resources, the method are obtained every by calculating total operand The operand and fruiting quantities that should actually distribute all the way, so that task load is consistent between the operation IP of parallelization, It is characterized in that in this method that it is total to obtain result by shifter-adder by top layer configuration parameter per the fruiting quantities obtained all the way Number, the result sum determine the generation per address sequence all the way, the operation knot when operation result number reaches result sum Beam.

The further design of the multidiameter delay division methods based on specific resources is that the method is based on from phase Algorithm is closed, sets the points of input phasor as number, by first points and the input vector conjugate vector of input vector Last number is multiplied to obtain first as a result, again by conjugate vector one lattice of sliding window to the right, the respective items of described two vectors into Row multiplies accumulating, and obtains second and completes auto-correlation algorithm as a result, repeating the above steps, and the points of treated final result are number*2-1。

The multidiameter delay division methods based on specific resources it is further design be, the sequence of the final result Column are using mediant as symmetrical centre, and the number conjugation of the right and left is symmetrical.

The further design of the multidiameter delay division methods based on specific resources is that the operation result is only counted It calculates the first half and is carried as a result, starting DMA after the first half result operation, after sequentially having carried the first half result, along inverse The first half result is taken conjugation successively to move out by sequence direction.

The further design of the multidiameter delay division methods based on specific resources is that the input vector multiplies Cumulative order is incremented to number from 1, and the fruiting quantities that the division mode based on operand sets the distribution of four tunnels are successively 0.5* number、0.2071*number、0.1589*number、0.1334*number。

Using the multidiameter delay division methods based on specific resources, a kind of multichannel based on specific resources is provided simultaneously The hardware structure that row divides, including that can exist in calculation resources, the storage resource that autocorrelative algorithm top-level module calls, feature In further include:

Nucleus module multiply-accumulator for receiving the source data of address generator unit transmission, and is responsible for the place of data flow Reason；

Address generator unit, the address sequence for being responsible for needed for generating, successively imports calculation resources for source data Multiply-accumulator；

Memory bank Switching Module, addressing when for different memory blocks to be considered as same memory block to each memory block.

Further design for the hardware structure that the multidiameter delay based on specific resources divides be, the multiply-accumulator It supports four road transports to calculate, has the input of two-way source data, need eight data input pins altogether.

Further design for the hardware structure that the multidiameter delay based on specific resources divides be, it is described each multiply it is tired Adding includes a complex multiplier and three complex adders for realizing stream treatment inside device, the complex multiplier and Three complex adders are sequentially connected in series, and two data input pins of the adder are selected by corresponding Combinational Logic Control respectively It is logical.

Further design for the hardware structure that the multidiameter delay based on specific resources divides be, memory bank interchange mode Block is made of the connection of at least two memory blocks.

Advantages of the present invention is as follows:

Multidiameter delay division methods provided by the invention based on specific resources are obtained each by calculating total operand The operand and fruiting quantities that road should actually be distributed, so that task load is consistent between the operation IP of parallelization, Obtain maximum performance boost.Four Lu Binghang are used in design, (i.e. by top layer configuration parameter per the fruiting quantities obtained all the way Vector points) it is obtained by shifter-adder.The result determines the address sequence generated per AGU all the way, while as operation knot The important Rule of judgment of beam.Invention combines the calculation resources and storage resource of hardware system, realizes 16~128k vector length model Interior auto-correlation algorithm hardware design is enclosed, the service condition of memory has been discussed in detail, provides the hardware of each points interval range Degree of parallelism, basic module design, the performance indicator of the system integration and quantization.

Detailed description of the invention

Fig. 1 is auto-correlation algorithm sliding window diagram.

Fig. 2 is multiply-accumulator schematic diagram.

Fig. 3 is auto-correlation algorithm hardware design module interconnection figure.

Specific embodiment

The present invention program is described in detail with reference to the accompanying drawing.

Multidiameter delay division methods provided in this embodiment based on specific resources are obtained by calculating total operand Per the operand and fruiting quantities that should actually distribute all the way, so that task load is consistent between the operation IP of parallelization, It is characterized in that it is total to obtain result by shifter-adder by top layer configuration parameter per the fruiting quantities obtained all the way in this method Number, as a result sum determines that operation terminates when operation result number reaches result sum per the generation of address sequence all the way.

This method is based on auto-correlation algorithm, sets the points of input phasor as number, by first point of input vector Number is multiplied to obtain with last number of input vector conjugate vector first as a result, again by conjugate vector one lattice of sliding window to the right, The respective items of two vectors are multiplied accumulating, and are obtained second and are completed auto-correlation algorithm as a result, repeating the above steps, treated The points of final result are number*2-1.

The main of auto-correlation algorithm realizes that for 1024 points of A vector, algorithm sliding window process is as shown in Figure 1 by sliding window. It is multiplied first by first number of A vector with last number of A conjugate vector, obtains first as a result, i.e. A (0) A (1023)*；Then one lattice of sliding window, the respective items of two vectors are multiplied accumulating conjugate vector to the right, obtain second as a result, i.e. A (0) A (1022) *+A (1) A (1023) *, it is A (0) A (1021) *+A (1) A (1022) *+A that third result, which equally can be obtained, (2)A(1023)*.And so on, until sliding window to last time operation, the last one is obtained as a result, i.e. A (1023) A (0) *.

Assuming that vector points are number, then auto-correlation algorithm treated result points are number*2-1, from number It is not difficult to find out that, final result sequence is using mediant as symmetrical centre, and the number conjugation of the right and left is symmetrical on.

It can be seen that algorithm itself from the expression formula and sliding window figure of auto-correlation algorithm and there was only basic multiply-add operation.Tool Body design method is as follows:

It (1), theoretically can be to auto-correlation since calculation resources can provide 4 groups of complex multipliers and 16 groups of complex adders Algorithm makees four road Parallel Designs.

(2) operation result is that conjugation is symmetrical, then only need to calculate the first half as a result, starting after operation DMA is carried.After sequentially having carried the first half result, conjugation is taken successively to move out result along backward direction.Such integral operation Time diminution half, one times of performance boost.

It (3) is number based on the points that under the premise of (2), auto-correlation algorithm actual operation is obtained, source data multiplies accumulating Order is incremented to number from 1.If only averagely being divided according to the quantity that every road obtains result, then per fortune actual all the way Calculation amount will be it is unbalance, operation time can also mismatch.Therefore it needs to use the division methods based on operand, mathematically set Dispensing score per number of results specific all the way, so that its operand is roughly equal, to guarantee the balance of task load, so that property It can maximize.

Total operand is 1+2+ ...+number, about number²/2.Operand needed for obtaining the first half result is 1 + 2+ ...+number/2, about number²/ 8, exactly the 1/4 of total operand, so before the first via can be assigned to just The result of half.

Operand needed for number*x result is 1+2+ ...+number*x, about number before obtaining²/2* x², point X is not enabled²Equal to 1/4,2/4,3/4, what is obtained is the first via, preceding two-way, the result quantities that first three road needs to obtain altogether.X is successively The fruiting quantities for being the distribution of 0.5,0.7071,0.8660. tetra- tunnel are successively 0.5*number, 0.2071*number, 0.1589* Number, 0.1334*number.

There was only low and high level, corresponding 0,1 two values in digital circuit.It, always can be by moving to left phase for the multiplication of integer The method added is realized.But for the multiplication of decimal, only specific decimal can just be realized by moving to right to be added.Specifically design Verilog code is as follows:

Assign constant1=(number > >)；

Assign constant2=(number > > 1)+(number > > 3)+(number > > 4)+(number > > 5)；

Assign constant3=(number > > 1)+(number > > 2)+(number > > 3)；

Assign result_num1=(number > > 1)；

Assign result_num2=(number > > 3)+(number > > 4)+(number > > 5)；

Assign result_num3=(number > > 2)-(number > > 4)-(number > > 5)；

Assign result_num4=number- (number > > 1)-(number > > 2)-(number > > 3)；

1 auto-correlation algorithm parallel task of table divides the comparison with theoretical value

The multidiameter delay division methods based on specific resources utilized provide a kind of multidiameter delay based on specific resources stroke Point hardware structure, including can autocorrelative algorithm top-level module call calculation resources, storage resource, it is characterised in that also Include:

For auto-correlation algorithm, from the point of view of the process of sliding window, the correlation between data, which determines, to make simultaneously source vector Row segmentation.This means that even if four Lu Binghang, every that complete vector is still needed to store all the way.Multiply-accumulator has two-way source number According to input, 8 tunnels is needed to input altogether.Fig. 2 show multiply-accumulator schematic diagram, includes 1 complex multiplication inside each multiply-accumulator Device, three complex adders, because each operation IP has three-level flowing water, therefore the purpose for using three adders is to realize stream Water process.The MUX of two data input pins of each adder is gated respectively by the corresponding combination logic control.

When (assuming that less than 8k) counting smaller, then only needing 8 bank storage source datas.When points are not higher than When 16k, every two bank is needed to be a memory_switch, 16 bank is needed to provide eight circuit-switched data streams, while number of results altogether According in 16 bank after being stored in.Four tunnel multiply-accumulator of intrinsic call, the address of each inputoutput data is by corresponding Operend_AGU and result_AGU control.Fig. 3 show auto-correlation algorithm hardware design module interconnection figure.

When vector points, which are higher than 16k, is not higher than 64k, by taking the maximum value 64k vector of section as an example, the vector is completely stored, Lucky 8 bank are needed, two paths of data stream occupies 16 bank, it is contemplated that the storage of result, algorithm at this time can not be supported Parallel processing, thus for the vector in the section, it multiply-accumulator can only calculate all the way, it can not be parallel.

When points are greater than 64k, are not more than 128k, equally by taking the maximum value 128k vector of section as an example, two numbers at this time Need to take all bank according to stream, such result can only be just stored in the lesser constant memory of scale.The number interval It does not support equally parallel, to only rely on one group of multiply-accumulator operation.Every circuit-switched data storage needs 16 Bank, needs whole 32 altogether Bank stores source operand.So result can only be stored in the lesser constant memory of scale.Constant Memory total amount 8K, and result quantities are up to 256k-1, and whole results can not be supported disposably to be stored in.So in design, when the ground of AGU module When location counter reaches 8k, it is meant that address sequence needed for obtaining last number has generated, and represents the clock at tens After period, constant Memory will be filled with.AGU can generate an interrupt signal and be sent to module top layer at this time, be determined by top layer 14 It (is determined through experiment simulation) after a clk, the enable signal for inputing to multiply-accumulator and result_AGU module is dragged down, Mean to fall all modules interrupts, then issues finish signal and inform master controller, by data from Constant Memory is carried away, and master controller sends start signal to auto-correlation algorithm top layer after the completion, and internal logic can open simultaneously All module recoveries are moved to execute.After whole operations is finished, finish_all is drawn high.

Table 2 is the running time of three feature vector points, meets the requirement of project performance indicator.(system operation In 1GHz dominant frequency)

2 auto-correlation algorithm performance indicator of table

Claims

1. a kind of multidiameter delay division methods, the method is obtained per should actually distribute all the way by calculating total operand Operand and fruiting quantities, so that task load is consistent between the operation IP of parallelization, it is characterised in that in this method Per the fruiting quantities obtained all the way by top layer configuration parameter, result sum is obtained by shifter-adder, the result sum is determined The generation per address sequence all the way is determined, operation terminates when operation result number reaches result sum；The method is based on certainly Related algorithm sets the points of input vector as number, by first points and input vector conjugate vector of input vector Last number be multiplied to obtain first as a result, again by conjugate vector one lattice of sliding window to the right, the respective items of described two vectors Multiplied accumulating, obtain second and complete auto-correlation algorithm with the step of second result as a result, repetition obtains first result, The points of treated final result are number*2-1.

2. multidiameter delay division methods according to claim 1, it is characterised in that the sequence of the final result is in Between number be symmetrical centre, the number conjugation of the right and left is symmetrical.

3. multidiameter delay division methods according to claim 2, it is characterised in that the operation result only calculates previous Half carries as a result, starting DMA after the first half result operation, will along backward direction after sequentially having carried the first half result The first half result takes conjugation successively to move out.

4. multidiameter delay division methods according to claim 3, it is characterised in that the input vector multiplies accumulating order Be incremented to number from 1, division mode based on operand set the distribution of four tunnels fruiting quantities be successively 0.5*number, 0.2071*number、0.1589*number、0.1334*number。

5. using the multidiameter delay dividing system of multidiameter delay division methods according to any one of claims 1-4, feature Be include can autocorrelative algorithm top-level module call calculation resources, storage resource, it is characterised in that further include:

Several multiply-accumulators of nucleus module for receiving the source data of address generator unit transmission, and are responsible for the place of data flow Reason；

Source data, it is tired successively to be imported multiplying for calculation resources by address generator unit, the address sequence for being responsible for needed for generating Add device；

6. multidiameter delay dividing system according to claim 5, it is characterised in that the multiply-accumulator supports four road transports to calculate, There is the input of two-way source data, needs eight data input pins altogether.

7. multidiameter delay dividing system according to claim 6, it is characterised in that include inside each multiply-accumulator One complex multiplier and three complex adders for realizing stream treatment, the complex multiplier and three complex additions Device is sequentially connected in series, and two data input pins of the adder are gated by corresponding Combinational Logic Control respectively.

8. multidiameter delay dividing system according to claim 5, it is characterised in that memory bank Switching Module is by least two Memory block connection composition.