CN100594491C - Reconstructable digital signal processor - Google Patents

Reconstructable digital signal processor Download PDF

Info

Publication number
CN100594491C
CN100594491C CN200610086398A CN200610086398A CN100594491C CN 100594491 C CN100594491 C CN 100594491C CN 200610086398 A CN200610086398 A CN 200610086398A CN 200610086398 A CN200610086398 A CN 200610086398A CN 100594491 C CN100594491 C CN 100594491C
Authority
CN
China
Prior art keywords
module
data
complex multiplication
output
coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN200610086398A
Other languages
Chinese (zh)
Other versions
CN1900927A (en
Inventor
洪一
郭二辉
赵斌
洪灏
彭勇俊
陈风波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui core Century Technology Co., Ltd.
Original Assignee
CETC 38 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 38 Research Institute filed Critical CETC 38 Research Institute
Priority to CN200610086398A priority Critical patent/CN100594491C/en
Publication of CN1900927A publication Critical patent/CN1900927A/en
Application granted granted Critical
Publication of CN100594491C publication Critical patent/CN100594491C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The present invention discloses a kind of reconfigurable digital signal processor (DSP), which has internal hardware resource capable of being reconfigured according to different application requirements so as to realize filtering operation of different forms. The present invention has the advantages of both application specific integrated circuit (ASIC) and common DSP, possesses arithmetic capacity similar to that of large scale special device, and can suit for fast Fourier transform (FFT), inverse fast Fourier transform (IFFT), FIR pulse group processing and other digital signal real-time processing fields. It has simple usage and low cost.

Description

Reconstructable digital signal processor
Affiliated technical field
The invention discloses a kind of reconstructable digital signal processor (DSP) that digital signals such as fast Fourier transform (FFT), invert fast fourier transformation (IFFT), the processing of FIR arteries and veins group, relevant treatment are handled in real time that is used for.
Background technology
Since the sixties in 20th century, along with developing rapidly of computing technique and infotech, digital signal processing develops rapidly as an independent educational project and is used widely at numerous areas.Along with the fast development of large scale integrated circuit technology and semiconductor technology and improving constantly of various real-time processing requirement, digital signal processing capability also rapidly promotes with exponential speed, and bringing into play more and more important effect in scientific research, military affairs and field such as civilian, digital signal processor spare has become the essential condition that supports these field high speed developments.In digital signal is handled in real time, being most widely used of filtering operations such as fast Fourier transform (FFT), invert fast fourier transformation (IFFT), the processing of FIR arteries and veins group, relevant treatment.The implementation of hardware mainly contains at present based on nextport universal digital signal processor NextPort, based on field programmable gate array (FPGA)/scale programmable logic device (CPLD) with based on three kinds of special ICs (ASIC).On the one hand, three kinds of devices respectively have limitation, and the advantage of nextport universal digital signal processor NextPort is flexibility of programming and universality, but its arithmetic capability is limited.Jumbo FPGA/CPLD internal hardware resources is more, but need develop firmware logic separately at concrete application, the human cost height, and high capacity FPGA/CPLD costs an arm and a leg.Traditional special IC framework and rigid line are connected and fixed, and function is more single, and its range of application receives limitation greatly.On the other hand, the technical requirement of digital signal processing is but improving constantly, along with the operand that the broadband operation occasion constantly enlarges, the number of ARRAY PROCESSING constantly increases, cooperation and non-cooperation number target processing relate to continues to increase, the speed of signal Processing is constantly being raised the price.Require more than 100MHz as 1024 FFT arithmetic speeds of plural number, some occasion need be for more than the 500MHz.Above-mentioned three kinds of devices are the more and more difficult requirement of satisfying the real-time processing of digital signal on function, price, adaptability, ease for use.
Summary of the invention
Technical matters to be solved by this invention provides a kind of reconstructable digital signal processor that digital signal is handled in real time that is applied to, it has the arithmetic capability of specialized large scale integrated circuit, and can adapt to different digital signals such as fast Fourier transform (FFT), invert fast fourier transformation (IFFT), the processing of FIR arteries and veins group, relevant treatment and handle occasion in real time, use simple, cheap simultaneously.
The technical solution used in the present invention is:
The hardware structure of reconstructable digital signal processor inside and hardware line can carry out structural rearrangement by the configuration control word, thereby realize the filtering operation of various ways such as fast Fourier transform (FFT)/invert fast fourier transformation (IFFT), FIR arteries and veins group and relevant treatment.
Main framework comprises input block, output unit, exchanges data unit and 4 elementary cells, comprise 160 real number floating-point multiplication totalizers in its elementary cell, and they is evenly distributed in 4 elementary cells.
The organizational form of hardware can be recombinated by the configuration control word: by the configuration of control word and control signal, can change the organizational form of described 160 real number floating-point multiplication totalizers and exchanges data unit, make it to select different mode of operations, to adapt to three kinds of different processor active tasks: FFT/IFFT, the processing of FIR arteries and veins group, related operation.
The hardware scheduling scheme is taked the centralized and distributed two-level scheduler method that combines: promptly control word is carried out one-level decoding by global module earlier, carries out two-stage decode by each elementary cell again.
General frame adopts the two-stage control architecture, and global control module is used to coordinate 4 elementary cells, and there is the local control logic of himself each elementary cell inside.Exchanges data unit between 4 elementary cells is responsible for the data in each elementary cell are required to send into other 3 elementary cells according to different control.
Input block receives control word, control word is carried out one-level decoding, distributed control word to each unit.Control word and the coefficient 1 multiplexing same port that enters the mouth, the control word receiver module receives control word, send into the decoding of one-level decoding module then, produce on the one hand overall control signal be used to produce sequential in the sheet, for coefficient and data provide synchronously, on the other hand by the control word distribution module respectively in sheet other unit launch.The coefficient synchronization module carries out synchronously coefficient inlet 1 and coefficient inlet 2.Data simultaneous module is carried out synchronously data inlet 1 and data inlet 2.
The exchanges data unit is the switch combination of importing, exporting more a group more, swap data between 4 elementary cells.
Output unit sorts to the operation result of each elementary cell, and according to different form output.Elementary cell output sort result module is arranged the output of elementary cell when FFT/IFFT and the processing of FIR arteries and veins group according to channel order; When working in the relevant treatment operational pattern, will be adjusted into the continuous data stream identical from data each elementary cell, output frame by frame with importing data transfer rate.Ask the mould module to finish of the conversion of the output format of real part/imaginary part to the output format of mould value/phase angle.The module of taking the logarithm is converted to logarithm with the mould value of importing and represents.Floating-point/fixed point modular converter can with real part/imaginary part module or ask the output of mould module to be converted to fixed point format by floating-point format.Index normalization module unifies to be fixed value by mantissa being done corresponding displacement with the index of the operation result of floating-point format.Above-mentioned 4 kinds of format converting module have two covers respectively, and the corresponding cover of each output port guarantees that two output ports can be independently with any one form output.
In the elementary cell, data-carrier store comprises 8 512 * 40 two-port RAM, is used for the temporary and operation result output buffers of input-buffer, intermediate results of operations of operational data.Metadata cache can be added to each complex multiplication totalizer and complex multiplication totalizer submatrix to data simultaneously.Coefficient memory comprises 10 256 * 32 two-port RAM, the weighting coefficient when being used to store relevant treatment and FIR filter coefficient and FFT computing.The coefficient of FFT interative computation is a table fixing, that realize with special logic.Corresponding 4 the real number floating-point multiplication totalizers of each complex multiplication totalizer, each complex multiplication totalizer submatrix comprises 16 real number floating-point multiplication totalizers, is equivalent to 4 complex multiplication totalizers.Two complex multiplication totalizer submatrixs and complex multiplication totalizer are used to finish different computings according to different mode matched combined.The structure of real number floating-point multiplication totalizer is divided into 5 parts: fixed-point multiplication part, cut position part, index adjustment member, fixed point addition section, index judgment part.
When three kinds of filtering operations, take corresponding organizational form to be:
When doing the FFT/IFFT computing, add up submatrix A and complex multiplication of complex multiplication adds up submatrix B respectively as the node of an interative computation of basic 8 algorithms, and 4 complex multiplication totalizers in the submatrix are used to finish one-level base 8 interative computations.That complex multiplication totalizer A of submatrix front and complex multiplication totalizer B carry out windowing process as the windowing arithmetical unit to data.Data after the windowing enter the complex multiplication submatrix that adds up and carry out one-level base 8 interative computations.The output of each grade interative computation at first imports metadata cache, and then delivers to the corresponding another one complex multiplication submatrix that adds up by the exchanges data unit, to carry out the next stage interative computation.
When doing FIR arteries and veins group and handle, the in parallel use of parallel multiplication in 2 complex multiplication accumulator module of elementary cell inside and 2 the complex multiplication totalizer submatrix modules, a channel of the corresponding FIR arteries and veins of each complex multiplication totalizer group.Corresponding 2 channels during conjugate operation.
When carrying out related calculation, complex multiplication totalizer and the series connection of complex multiplication totalizer submatrix are used, 10 the complex multiplication totalizers that are equivalent to connect, and the parallel multiplication of elementary cell inside is configured to the form of cascade at this moment.
The hardware scheduling scheme that is adopted is:
By control word and control signal the two-level scheduler pattern that local control combines in overall situation control and the elementary cell is adopted in the scheduling of hardware resource, control word and control signal at first enter global control module and carry out one-level decoding and control word distribution.In this module, at first receive control word, produce some overall control signals according to control word then.These overall situation controls comprise: coordinate the action of 4 elementary cells, the output format of decision operation result is to the setting of major clock in the sheet.Second effect of global control module is the Local Control Module emission control information to each elementary cell.These information that are launched comprise: subtype, the computing of mode of operation, mode of operation counted, the number of number of channels, treatment channel.
Local Control Module in each elementary cell carries out two-stage decode to these information after the information of receiving the global control module emission, be converted into the details of hardware scheduling control signal.
The present invention has significant technical progress and good effect:
The present invention has had the advantage of conventional dedicated integrated circuit and nextport universal digital signal processor NextPort concurrently.
The high speed processing ability: the present invention adopts restructural dedicated digital signal processor scheme, and sheet contains great amount of hardware resources, nearly 160 in floating-point multiplication totalizer, and comparing with the conventional dedicated integrated circuit on processing power still had it.Because the device inside algorithm all realizes with hardware, thus have the nextport universal digital signal processor NextPort chip incomparable high-speed computation performance.Device inside computing form is a floating-point, and the maximum number of points that monolithic is finished the fast Fourier transform (FFT) computing is 4096 points, and finishing 4096 FFT operation times is 25.6us, finish with the FFT+IFFT time of counting be 51.2us; It is that the following maximum filter length of 128,80 channels is that the above maximum filter length of 256,80 channels is 128 that FIR arteries and veins group is handled the highest channel number; The maximum computing length of relevant treatment computing monolithic single channel is 4000, but maximum 16 passages of parallel processing, and the corresponding relation between each channel data and each the group coefficient is very flexible.
Reconfigurable function: what be different from traditional dedicated IC chip is, the resource of device inside can be carried out structural rearrangement according to different application demands, thereby can realize the filtering operation of various ways such as FFT, IFFT, FIR arteries and veins group and relevant treatment.Strengthen device and used dirigibility, expanded the range of application of device.
Use-pattern easily: the present invention combines the dirigibility of nextport universal digital signal processor NextPort to a certain extent, has adopted exclusive 64bit function control word that device is configured, to satisfy different application demands.Use very easyly, do not have loaded down with trivial details programming and debugging, also do not need to resemble and need carry out logical design and time series analysis the FPGA, only need send 64 control words, operational data gets final product according to the standard time sequence input.As long as change control word, the resource structures of device inside is just recombinated, and device is also just according to another pattern work.
Description of drawings
Fig. 1 hardware structure block diagram one
Fig. 2 basic cell structure block diagram
Fig. 3 real multiplications totalizer structured flowchart
The elementary cell configuration structure block diagram of Fig. 4 FFT computing
The elementary cell configuration structure block diagram of Fig. 5 FIR arteries and veins group computing
The elementary cell configuration structure block diagram of Fig. 6 related operation
Fig. 7 hardware resource two-level scheduler framework
Fig. 8 hardware structure block diagram two
Fig. 9 input block is realized block diagram
Figure 10 output unit is realized block diagram
32 control words of Figure 11 are imported real preface figure
16 control words of Figure 12 are imported real preface figure
Embodiment
The invention will be further described below in conjunction with accompanying drawing.
Main body of the present invention is 160 parallel multiplications, is evenly distributed in 4 elementary cells (basic unit).By the configuration of control word and control signal, can allow hardware resources such as these 160 parallel multiplications and storer, exchanges data unit work in different organizational forms, these organizational forms have just determined the different working modes type of device.In addition, the present invention adopts the two-level scheduler method to the scheduling of hardware resource, adopts centralized control and distributed control way of combining in control.
The specific embodiment of the present invention is divided into 3 levels, and ground floor is the device hardware structure; The second layer is hardware organization's mode; The 3rd layer is to the hardware resource dispatching method.Below these three levels are illustrated respectively.
One, the main framework of hardware
Hardware structure of the present invention such as Fig. 1 and shown in Figure 8.
The present invention adopts full Synchronization Design, and promptly entire chip has only a clock zone.Consider that from the balanced equal angles of function division, logic word resource the top layer of entire device is designed to 7 parts, promptly based on 4 elementary cells, adds input block, output unit, 3 parts of exchanges data.
Device adopts the two-stage control architecture, and global control module is used to coordinate 4 elementary cells, and there is the local control logic of himself each elementary cell inside.These 4 main bodys that elementary cell is a device are finished major functions such as metadata cache, computing, address generation.Exchanges data unit between 4 elementary cells is responsible for the data in each elementary cell are required to send into other 3 elementary cells according to different control.For satisfying different application, device inside is integrated some accommodation function modules comprise: mod circuit, the circuit of taking the logarithm, index normalization circuit, floating-point-fixed point change-over circuit etc.These special-purpose functional modules can be exported the operation result of elementary cell by different way, and two output ports are independent fully, are independent of each other.In addition, device inside is also integrated phaselocked loop that is used for frequency multiplication makes that like this internal arithmetic clock both can be directly from the outside input, also can be earlier by low-speed clock of outside input, again by after the inner frequency multiplication of phase locked loop as the computing clock.
The specific implementation of input block as shown in Figure 9.Except that the test input pin that is used for design for Measurability, the input pin of all functions at first enters input block.Input block is undertaken following task: receive control word, control word is carried out one-level decoding, distributed control word to each unit, input coefficient is carried out synchronously, carries out synchronously, produces overall clock signal required in the sheet etc. to importing data.Below just to the brief description one by one of each module in the input block.Therefore control word and the coefficient 1 multiplexing same port (coeff_in[31:0]) that enters the mouth must at first receive the control word of input, and this is the task of control word receiver module.Control word receiver module inside mainly is one group of register, according to the outer enable signal of importing of sheet 64 control words is divided and squeezes into for 2 times or 4 times in this group register.After receiving control word, the one-level decoding module is decoded as overall control signal with the part control word, and these overall control signals are used to produce sequential in the sheet, provide synchronous etc. for coefficient and data.The task of control word distribution module is, with other unit emissions in sheet respectively after 64 control word decodings.Because coefficient inlet 1 is undertaken the task of Input Control Word simultaneously, and coefficient inlet 2 is not undertaken this task, so coefficient inlet 1 is different with 2 time-delays of coefficient inlet, so must carry out synchronously coefficient inlet 1 and coefficient inlet 2 with a synchronization module.Similarly, data inlet 1 and data inlet 2 are not quite similar on function, so the data of these two data inlets must be synchronous according to mode of operation and data type.
The specific implementation of output unit as shown in figure 10.The main effect of this unit is that the operation result to each elementary cell sorts, and according to different form output.Filtering operation is finished in elementary cell, and 4 elementary cells export its result separately, therefore the output result of 4 elementary cells must be sorted, to guarantee that finally exporting the result exports in order.Acting as of " elementary cell output sort result " module: when device worked in FFT/IFFT and the processing of FIR arteries and veins group, this module was arranged the output of elementary cell according to channel order; When working in the relevant treatment operational pattern, elementary cell is the form output of a frame according to per 10, exist between two Frames at interval, this moment, this module was adjusted the output data form of elementary cell, to be adjusted into the continuous data stream identical from data each elementary cell, that export frame by frame with importing data transfer rate.Ask the mould module mainly to finish of the conversion of the output format of real part/imaginary part to the output format of mould value/phase angle.The module of taking the logarithm can be converted to logarithm with the mould value of input and represent.Floating-point/fixed point modular converter can with real part/imaginary part module or ask the output of mould module to be converted to fixed point format by floating-point format.The effect of index normalization module is similar to floating-point/fixed point modular converter, and it is by doing corresponding displacement to mantissa, thereby is certain fixed value with the index of the operation result of floating-point format is unified.4 kinds of above-mentioned format converting module have two covers respectively, and the corresponding cover of each output port of chip has guaranteed that like this two output ports can be independently with any one form output.
The exchanges data unit is actually the switch combination of importing, exporting more a group more.It receives 8 groups of inputs from 4 elementary cells, by selector switch any one group of input is switched to any one group of output then.Like this, data can freely be transmitted between 4 elementary cells, when realizing special algorithm, provide the guarantee of data routing for the cooperation between 4 elementary cells.
4 elementary cells are main parts of the present invention, finish most functions such as metadata cache, computing, address generation.Its structure as shown in Figure 2.4 elementary cells have unified framework." data-carrier store " comprised 8 512 * 40 two-port RAM, is used for the temporary and FFT operation result output buffers of data input-buffer, FFT intermediate results of operations of operational pattern.As seen from the figure, " metadata cache " can be added to each " complex multiplication totalizer " and " complex multiplication totalizer submatrix " to data simultaneously." coefficient memory " comprised 10 256 * 32 two-port RAM, is used to store the filter coefficient of relevant treatment computing, FIR filtering operation and the weighting coefficient of FFT computing." FFT interative computation coefficient " with one fixing, with the table that special logic is realized, be used to provide the interative computation coefficient of FFT/IFFT computing.The floating-point multiplication totalizer of each " complex multiplication totalizer " corresponding 4 real numbers among the figure, each " complex multiplication totalizer submatrix " then comprises the floating-point multiplication totalizer of 16 real numbers, is equivalent to 4 " complex multiplication totalizers ".Two " complex multiplication totalizer submatrix " and " complex multiplication totalizer " can be finished different computings according to different mode matched combined.
When doing the FFT/IFFT computing, " complex multiplication totalizer " is weighted the input data that need multistage operations as the weighting multiplier, sends into " complex multiplication add up submatrix " then and carries out basic 8 interative computations.That is to say, when doing the FFT/IFFT computing, " complex multiplication totalizer A " in the elementary cell and " complex multiplication totalizer submatrix A " form base 8 operation core, " complex multiplication totalizer B " and " complex multiplication totalizer submatrix B " forms another basic 8 operation core, entire chip just has 8 such operation core like this, and the computing with FFT/IFFT when computing decomposes this 8 operation core parallel processings.
When doing FIR arteries and veins group and handle, the in parallel use of parallel multiplication in 2 " complex multiplication totalizer " modules of elementary cell inside and 2 " complex multiplication totalizer submatrix " modules, a channel of the corresponding FIR arteries and veins of each complex multiplication totalizer group.If the employing conjugate operation, corresponding 2 channels of each complex multiplication totalizer then, if once multiplexing again, could more multichannel computing of correspondence.
When carrying out related calculation, " complex multiplication totalizer " and " complex multiplication totalizer submatrix " series connection is used, and 10 complex multiplication totalizers are equivalent to connect; When doing the FIR filtering operation, " complex multiplication totalizer " and " complex multiplication totalizer submatrix " is in parallel to be used, corresponding 10 channels of filtering operation; When doing the FFT computing, " complex multiplication totalizer " as the weighting multiplier, gives " complex multiplication totalizer submatrix " again after data are weighted and does interative computation.
The present invention adopts floating point data format to carry out complex operation, and complex operation is made up of 4 real arithmetics, so real number floating-point multiplication totalizer is its main operational parts of device.The structured flowchart of real number floating-point multiplication totalizer is divided into 5 parts as shown in Figure 3: fixed-point multiplication part, cut position part, index adjustment member, fixed point addition section, index judgment part.
Device treat operand according to totally 20, form is that 4 no symbol index+16 have symbol mantissa, the dynamic range that can represent is-2 15* 2 15~ 2 15* 2 15-1.Coefficient is 16 signed fixed-point numbers, and the dynamic range that can represent is-2 15~ 2 15-1.For the intermediate data of computing, the compromise precision factor of considering computing and hard-wired area, speed factor adopt 24 floating point data formats, and promptly there is symbol mantissa 4 no symbol index+20.The calculating process of parallel multiplication has following 5 steps.
1., the fixed-point multiplication part, being used for 16 of data has symbol mantissa and 16 to have the coefficient of symbol to fix a point to multiply each other.
2., the cut position part, be used for keeping to greatest extent fixed-point multiplication result's precision.According to the figure place of the redundant symbol position of 32 multiplication results after the fixed-point multiplication, determine 20 mantissa and pairing 4 indexes of this mantissa of 24 intermediate operations data that will keep.If 32 fixed-point multiplication result has k redundant symbol position, and can only keep 20, so obvious, in order to obtain maximal accuracy, preferably cut out the redundant symbol (the k position moves to left) of this k position and on index, subtract k simultaneously.Mantissa has become the form of a bit sign position and 19 bit data positions thereafter like this.This just is equivalent to 32 fixed-point multiplications a as a result 31a 30a 29A 2a 1a 0* 2 eCarry out following operation simultaneously:
Remove redundant symbol position and cut position: a 31a 30a 29A 2a 1a 0a 31-ka 30-ka 29-kA 31 -18-ka 31-19-k
Index subtracts k:e e-k on original basis
For the situation of e>k>12, promptly move to left surpass 12 after, should mend 0 at low level.Should be noted that simultaneously, can not make to have subtracted the later index of k less than 0.Like this, in cut position, will take all factors into consideration the figure place of redundant symbol position and the size of index, if original index less than the figure place of redundant symbol position, i.e. e<k so just can only cut out the redundant symbol position of e position, simultaneously index is kept to 0:
Remove redundant symbol position and cut position: a 31a 30a 29A 2a 1a 0a 31-ea 30-ea 29-eA 31 -18-ea 31-19-e
Index subtracts k:e 0 on original basis
3., for the addition of floating number, must addend identical with the index of summand the two mantissa could addition.The index adjustment member plays a part to adjust the index of addend or summand so that the two is equal just.Suppose A1=a1 * 2 E1, A2=a2 * 2 E2Two number additions, and e1<e2 will be adjusted to the numerical value identical with e2 to the exponent e 1 of A1 so, simultaneously the mantissa of A1 be done the expansion of e2-e1 bit sign position and the e2-e1 position that moves to right.
4., through after the index adjustment, addend and summand index are unified, the addition of just mantissa after adjusting of the two can being fixed a point after sign bit is expanded obtains 21 fixed point addition results.
5., because 3. always less index is being adjusted into bigger index in the index adjustment process, so after the fixed point addition, will judge the value of addition index afterwards simultaneously, the effect of index judgement that Here it is to 21 fixed point and removal redundant symbol position.
Two, the organizational form of hardware resource
The present invention is a restructural dedicated digital signal processor, and so-called " restructural " promptly can organize hardware resource by disposing different control words, and hardware resource is operated under the different patterns.The present invention can be configured to three class mode of operation: FFT/IFFT, the processing of FIR arteries and veins group, relevant treatment computing.Each class of this three classes mode of operation can be regulated its key parameter by control word again, satisfies different processing demands.The organizational form of hardware resource under this three classes mode of operation is described respectively below in conjunction with accompanying drawing.
1、FFT/IFFT
The fundamental formular of discrete Fourier transformation (DFT) is:
Figure C20061008639800181
k=0,1,…,N-1
Wherein w (i) is the DFT weighting factor, and x (i) is input data, e j2<ki/NBe twiddle factor.N is counting of DFT computing.If N is a composite number, then long DFT computing of counting can be converted into two weak points (N that counts 1, N 2) the DFT computing.If N 1, N 2Can continue to decompose, then this decomposition can go on always, and operand can further descend.
According to different decomposition methods, FFT has scheduling algorithms such as base 2, base 4, base 8, mixed base.FFT/IFFT of the present invention adopts basic 8 computings, this mode of operation can be subdivided into three types of FFT, IFFT, FFT+IFFT, and its processing to count can be 256 points, 512 points, 1024 points, 2048 points, 4096 etc., and counting smaller or equal to 2048 o'clock, can also carry out FFT or IFFT processing to two paths of data simultaneously.No matter the processing of FFT/IFFT counts what are, under this type of mode of operation, 4 elementary cells (basic_unit) of device all will be configured to mode as shown in Figure 4.As the node of an interative computation of basic 8 algorithms, 4 complex multiplication totalizers in the submatrix are used to finish one-level base 8 interative computations respectively for this moment shown in Figure 2 " complex multiplication add up submatrix A " and " complex multiplication add up submatrix B ".That of submatrix front " complex multiplication totalizer A " reaches " complex multiplication totalizer B " and as the windowing arithmetical unit, data carried out windowing process.Data after the windowing enter " complex multiplication add up submatrix " and carry out one-level base 8 interative computations.The output of each grade interative computation at first imports metadata cache, and then delivers to corresponding another one " complex multiplication add up submatrix " by the exchanges data unit among Fig. 1, to carry out the next stage interative computation.At single elementary cell hardware organization's mode under the FFT/IFFT pattern is described below.
When the GA3816 device carried out the FFT conversion, if computing is counted greater than 256, device inside was worked in the time-sharing multiplex mode, and processing is counted big more, and multiplexing number is many more.Data-carrier store is configured to the metadata cache of ping-pong structure, and total data storage capacity is 32 * 256 * 40bits in the device, and the metadata cache that is assigned in each elementary cell is 2 * 4 * 256 * 40bits.Each elementary cell data buffer memory marks the address space of half, i.e. 4 * 256 * 40bits is as the data input-buffer.Like this, 4 elementary cells of individual devices can be finished maximum 4096 FFT computings.
When carrying out the FFT computing,, need carry out windowing process to the input data in order to suppress secondary lobe.The dual port RAM that 40 * 256 * 32bit is arranged at the GA3816 device inside is as coefficient memory, and being assigned to each elementary cell has 10 * 256 * 32bits.In each elementary cell, get 8 * 256 * 32bits as the window function buffer memory, and this buffer memory also is divided into two groups, each group addressing degree of depth is 1024, provides window function to each " complex multiplication totalizer submatrix " windowing arithmetical unit before respectively.
Because the twiddle factor of FFT/IFFT computing has regularity, so the present invention is stored in the twiddle factor of basic 8FFT computing in the fixing table.It is 4096 that the achievable maximum FFT/IFFT of monolithic of the present invention counts, and chip internal has this twiddle factor coefficient table of 8 512 * 32bit, and by mean allocation in four elementary cells (basic_unit).If computing is counted less than 4096, then twiddle factor just extracts from this table.We have also designed the coefficient table of 4 * 8 * 32bits in addition, the coefficient when being used to store basic 8 computings.
When FFT counts smaller or equal to 2048 the time, the present invention is directed to FFT/IFFT and designed second kind of hardware resource organizational form: two-way carries out the FFT computing simultaneously.Two-way advances in the data pattern, and two elementary cells of basic_uint0, basic_unit2 are one group, are used to finish first via computing; Two elementary cells of basic_uint1, basic_unit3 are another group, are used to finish other one tunnel computing.Though hardware resource is divided into two groups, the principle of two groups of hardware base 8 computings is constant.Two paths of data is imported the input of 2 ports from data input 1, data simultaneously, and two groups of results of computing gained are simultaneously from output port 1, output port 2 and line output.This pattern can be used for two groups independently data be FFT/IFFT simultaneously or one group of data is Two-dimensional FFT/IFFT.
2, FIR arteries and veins group is handled
The basic function that FIR arteries and veins group is finished dealing with is a matrix operation, and its fundamental formular is: Y=H*X
Wherein: X=(X 0, X 1..., X N 1) T, Y=(Y 0, Y 1..., Y M 1) T(N 〉=M)
Figure C20061008639800201
The matrix operation of following formula can be regarded the multiplication accumulating operation as in realization.If when observing the computing of the delegation of H matrix (below be also referred to as " matrix of coefficients ") and X matrix (below be also referred to as " data ") row and operational data for plural number separately, its fundamental operation form is exactly that multiplication adds up, expression formula is:
Figure C20061008639800202
As seen, FIR arteries and veins group is handled the computing of in fact total i group multiply accumulating, and every group comprises and multiply each other for j time and add up for j-1 time.
Except the citation form that FIR arteries and veins group is handled, the present invention proposes FIR arteries and veins group and handle several extend types: the two channel addition forms that slip FIR arteries and veins group is handled, the FIR arteries and veins group of two groups of data parallels is handled, FIR arteries and veins group is handled etc.No matter citation form or extend type are handled under this class mode of operation in FIR arteries and veins group, the hardware resource in 4 elementary cells (basic_unit) all will be configured to organizational form shown in Figure 5.
In the organizational form shown in Figure 5,4 complex multiplication totalizers are together with " complex multiplication totalizer A " composition " multiplication in parallel add up array 1 " in " complex multiplication add up submatrix A "; 4 complex multiplication totalizers are together with " complex multiplication totalizer B " composition " multiplication in parallel add up array 2 " in " complex multiplication add up submatrix B ".In " multiplication in parallel add up array ", 1 complex multiplication totalizer is used for FIR arteries and veins group and handles 1 channel (2 channels of conjugation situation) computing, such one " multiplication in parallel add up array " can 5 channels of parallel processing (handling 10 channels under the conjugation situation), so 1 elementary cell can 10 channels of parallel processing (handling 20 channels under the conjugation situation).Therefore, when the input data transfer rate equaled chip computing clock, 4 elementary cells can walk abreast at most and finish 80 channel FIR arteries and veins groups processing.If " multiplication in parallel add up array " is multiplexing, then can finishes 128 channel FIR arteries and veins groups at most and handle.
The coefficient storage that FIR arteries and veins group is handled is in coefficient memory shown in Figure 2.Each elementary cell has the dual port RAM storage coefficient of 10 256 * 32bits, each multiplication unit is equipped with a dual port RAM, some complex multiplication totalizers in " multiplication in parallel add up array " provide coefficient regularly, and such parallel multiplication provides the dual port RAM of coefficient to constitute the channel that FIR arteries and veins group is handled together with that to it.
The dual port RAM that 8 512 * 40bits are arranged in each elementary cell (basic_unit) is as data-carrier store.When FIR arteries and veins group was handled, these 8 dual port RAMs all were used for metadata cache.In each elementary cell, metadata cache receives identical data from the data input port.And the coefficient that is added to each parallel multiplication data input pin and different channel carries out the multiplication accumulating operation.
The FIR arteries and veins group that second kind of hardware resource organizational form that FIR arteries and veins group is handled is two groups of parallel datas is handled.Under this organizational form, first elementary cell (basic_uint0) and the 3rd elementary cell (basic_uint2) are divided into one group, and second elementary cell (basic_uint1) and the 4th elementary cell (basic_uint3) are divided into another group." multiplication in parallel add up array " in first group of elementary cell is used for the data of " data input 1 " port input are handled, and second group " multiplication in parallel add up array " then is used for data processing that " data input 2 " port is entered.Data-carrier store and coefficient memory in first group of elementary cell are used to store first group of data and coefficient; Data-carrier store and coefficient memory in second group of elementary cell then are used to store second group of data and coefficient.So, monolithic device just can carry out the processing of FIR arteries and veins group to two groups of different pieces of informations concurrently.
Introduced the pattern of two groups of data parallel computings above, under the sort of pattern, the hardware resource of entire chip has been divided into two groups, one group of data of each group individual processing.Similarly, when channel number smaller or equal to 40 filter length more than or equal to 80 the time, in order to improve arithmetic speed, pending data can be equally divided into two sections according to 1/2 of filter length, enter from data-in port 1 the last period, and back one section enters from data-in port 2, utilize then in the sheet two groups of hardware resources simultaneously with separately multiplication, add up, and then incite somebody to action the results added of multiply accumulating separately, obtain complete result.The third hardware resource organizational form under the FIR arteries and veins group work of treatment pattern that Here it is.
3, relevant treatment computing
The relevant treatment computing refers to that therefrom intercepting N data continuously with sliding type from the data sequence of a continuous sampling carries out filtering operation.Its computing characteristics are to have N-1 data identical in the two adjacent groups filtering operation.Its mathematic(al) representation is:
Figure C20061008639800221
Wherein N is a wave filter computing length.x iBe input signal, h iCoefficient for the wave filter correspondence.
Hardware resource configuration during relevant treatment computing mode of operation in the elementary cell is seen shown in Figure 6.The parallel multiplication of elementary cell inside is configured to the form of cascade at this moment.Each elementary cell comprises the tired device of 10 floating-point Complex multiplication.The output of first complex multiplication totalizer is imported as second complex multiplication totalizer, carry out sum operation with second multiplication accumulation result, addition result is as the output of second complex multiplication totalizer, be input in the 3rd the complex multiplication totalizer, an addend as the 3rd additive operation ... by that analogy, 10 complex multiplication totalizer cascades are used.Like this, each elementary cell (basic_unit) can be organized into the computing structure of the tired device cascade of 10 Complex multiplication, multiplexing by to these " level continued multiplication add up array " different number of times can be finished 10*N point (N=1,2,3 ... 100) related operation of filter length.In like manner, also adopt cascade mode between 4 elementary cells, the part multiplication that previous elementary cell obtains related operation add up and, deliver to next elementary cell, as the addend of first complex multiplication totalizer in the next elementary cell " level continued multiplication add up array ".Such 4 elementary cells cascade successively add up, and final related operation result produces in the 4th elementary cell (basic_unit3).
The data-carrier store that same capability is arranged in each elementary cell of device: the dual port RAM of 8 512 * 40bits.Under the related operation pattern, the data-carrier store in each elementary cell is by unified addressing, as metadata cache.When data are imported, the data that the storage of the metadata cache of each elementary cell is identical.During computing,, provide identical data by 10 cascade complex multiplication totalizers of metadata cache in same elementary cell with for the moment individual clock beat.
The coefficient of related operation is temporary in the dual port RAM of 10 256 * 32bits in each elementary cell.In elementary cell, the dual port RAM correspondence of each 32bit add up the tired device of a complex multiplication in the array of level continued multiplication, provides coefficient to that complex multiplication totalizer regularly.The storage characteristics of coefficient sequence in these 10 dual port RAMs are that coefficient is according to h n, h N+1H N+9Order be stored in successively in the 1st to the 10th dual port RAM.That is to say that for same dual port RAM, the sequence number of the coefficient that its address n and n+1 position, address are stored differs 10.This is in order to cooperate add up 10 complex multiplication totalizers in the array of grade continued multiplication.Because on the level continued multiplication added up the streamline of array, the same clock period need be carried out 10 multiply accumulating computings, this just requires coefficient to be buffered in the same clock period 10 continuous coefficients is provided, so coefficient is stored with this characteristics.
Three, the dispatching method of hardware resource
The present invention adopts the two-level scheduler framework to the scheduling of hardware resource, adopts centralized control to combine with the control that distributes on control method.Fig. 7 is a device two-level scheduler frame diagram.Control word and control signal at first enter " overall situation control " module and carry out one-level decoding and control word distribution.In this module, at first receive control word, produce some overall control signals according to control word then.These overall situation controls comprise: coordinate the action of 4 elementary cells, the output format of decision operation result is to being provided with of major clock in the sheet etc.Second effect of " overall situation control " module is to each elementary cell " local control " module emission control information.Control signal and control word at first enter " overall situation control " module, and be decoded with the control word of the various relating to parameters of s operation control in this module, and to each elementary cell emission.These information that are launched comprise: subtype, the computing of mode of operation, mode of operation counted, the number of number of channels, treatment channel or the like.
Local Control Module in each elementary cell (basic_unit) can carry out two-stage decode to these information after the information of receiving the global control module emission, be converted into the details of hardware scheduling control signal.For example, according to the folding of mode of operation and some selector switch of type decided thereof; Number according to treatment channel decides coefficient how to store, and how the coefficient memory address produces; According to computing the add up multiplexing number of array of decision multiplication of counting, and then determine whether certain register upgrades and update time or the like.The scheduling of hardware resource is divided into two-stage, mainly is for logic on controlling and parsimony.From the angle that realizes, more clear and clear.
In elementary cell inside, adopt the method for arithmetical operation to come key parameter is deciphered.No matter FFT/IFFT, FIR arteries and veins group still are the relevant treatment computing, the add up multiplexing number etc. of array of the storage/access address of its data and coefficient, multiplication all has regular, can be on the basis of these rules, utilize arithmetical operation that control word is decoded, obtain all control informations and clock signal.For example, in the computing of FIR arteries and veins group, the filtering exponent number is 40, and counting also is 40, and then special-purpose decoding multiplier can calculate the coefficient that needs 40 * 40=1600 to order altogether.When producing the coefficient write address, can produce 0 ~ 1599 write address successively so, coefficient is write corresponding cache.Adopt the Another reason of arithmetical operation decoding process, be for trading off between complexity of dispatching and the use resource: at first, 8 * 8 fixed-point multiplication device can't take too many resource; Secondly,, realize decoding, loaded down with trivial details and huge workload is not only arranged, its storage unit and to search the resource that logic takies also be very considerable with lookup table mode if processor all working status Bar is enumerated.
The present invention has 64 to the control word that the hardware resource dispatching office adopts, and under the mode of operation of FFT/IFFT, FIR arteries and veins group, related operation, the explanation of control word is respectively as table 1, table 2, table 3
Control word Title Meaning and value
Chip_id (3:0) Level consecutive numbers Do not use, can perseverance be changed to 0000.
Chip_num (3:0) Level is counted in flakes Do not use, can perseverance be changed to 0000.
Chan_num (7:0) Port number Do not use, can perseverance be changed to 00000000.
Work_model (4:0) Mode of operation FFT: value is 10000; IFFT: value is 10010; FFT+IFFT: value is 10011; Two-way advances data FFT: value is 10100; Two-way advances data I FFT: value is 10110.
Coeff_num (4:0) The coefficient sets number Do not use, can perseverance be changed to 00000.
Coeff_chan (4:0) The coefficient passage Do not use, can perseverance be changed to 00000.
Conj Conjugation is selected Do not use, can be changed to ' 0 '.
Length (7:0) Filter length What low four length (3:0) characterized is the data length (also being counting of one group of weighting coefficient) of FFT/IFFT computing.2 being the end, the corresponding positive integer value of length (3:0) is as index, and the power of being obtained is exactly the length of computing.Treatable length code is at present: " 1000 "-256 point; " 1001 "-512 point; " 1010 "-1024 point; " 1011 "-2048 point; " 1100 "-4096 point.The positive integer value that length (6:4) is corresponding adds 1 value that obtains and represents the FFT/IFFT computing of counting greatly to split into the needed computing progression of point base eight computings, and its value is relevant with computing length.Three grades of 256 point~512 point processings palpuses, corresponding code is: " 010 "; 1024 point~4096 point processings palpus level Four, corresponding code is: " 011 ".Most significant digit length (7) temporarily need not, be changed to ' 0 '.
Sel_coeff_ram (4:0) The coefficient plot During the FFT/IFFT computing, replaceable coefficient sets in calculating process.Corresponding certain group coefficient in the coefficient memory that the round values that sel_coeff_ram (4:0) is corresponding indicates to use.The optional value of coefficient sets number is counted relevant with computing.4096 have 2 groups of coefficients interchangeable, and 2048 have 4 groups, and 1024 have 8 groups, and 512 have 16 groups, and 256 then have 32 groups of coefficients interchangeable.Be that the sel_coeff_ram span arrives between " 11111 " in " 00000 ".
Sel_data Whether weighting Under the FIR arteries and veins group operational pattern, whether selection is to the input data weighting.FFT pattern perseverance is made as ' 0 '.
Sel_coeff_pp Coefficient table tennis Whether coefficient was rattled and is write when two-way advanced data.Do not rattle, be set to ' 0 '; Table tennis is set to ' 1 '.
Result1_mod (1:0) Delivery outlet 1 way of output The way of output as a result of net result output port 1 can have 4 kinds.This control word gets 00: output I/Q data; Get 01: output mould value, phase angle; Get 10: output logarithm, phase angle; Get 11: the relevant cascade data of output.
Sel_AGC1 (1:0) Delivery outlet 1 gain control Net result output port 1 both can be according to the form of input data, and I/Q is respectively 20 floating-point outputs; Also can get a normalized floating point values as standard with the value of control word GAC1 (3:0); I/Q can also be converted into 20 fixed-point number output.Sel_AGC1 (1:0) got 00,11 o'clock, floating-point output; Got index normalization output at 01 o'clock; Got fixed point output at 10 o'clock.
AGC1 (3:0) Delivery outlet 1 normalization level Sel_AGC1=01,10 o'clock, all need to determine the exponential quantity of a benchmark, and AGC1 (3:0) i.e. benchmark index value for this reason, when index normalization output or fixed point output, all as benchmark.
Result2_mod (1:0) Delivery outlet 2 way of outputs Be similar to result1_mod (1:0).Get 00: output I/Q data; Get 01: output mould value, phase angle; Get 10: output logarithm, phase angle; Get 11: the relevant cascade data of output.
Sel_AGC2 (1:0) Delivery outlet 1 gain control Be similar to sel_AGC1 (1:0).Get 00,11: floating-point output; Get 01: index normalization output; Get 10: fixed point output.
AGC2 (3:0) Delivery outlet 1 normalization level Be similar to AGC1 (3:0).Work as sel_AGC2=01,10 o'clock, the floating-point index of output was got the AGC2 value and is benchmark, did index normalization or floating-point and changeed fixed-point processing.
Table 1
Control word Title Meaning and value
Chip_id (3:0) Level consecutive numbers Perseverance is changed to 0000 under the FIR pattern.
Chip_num (3:0) Level is counted in flakes Perseverance is changed to 0000 under the FIR pattern.
Chan_num (7:0) Channel number The number of channels of FIR arteries and veins group computing.Determine according to using needs.As, parallel 50 channel computings, then chan_num+1=50.
Work_model (4:0) Mode of operation Value is 01000 under the FIR arteries and veins group operational pattern; Value is 00010 under the slip FIR arteries and veins group operational pattern; Value is 00100 under the FIR arteries and veins group mode of two groups of data parallel computings;
Value is 00001 under the FIR arteries and veins group mode of two channel additions; FIR arteries and veins group realizes the DFT pattern, and value is 01000 with common FIR arteries and veins group operational pattern, and control word sel_data must be set to ' 1 ' simultaneously.
Coeff_num (4:0) The coefficient sets number Perseverance is changed to 00000 under FIR arteries and veins group mode.
Coeff_chan (4:0) The coefficient passage Perseverance is changed to 00000 under FIR arteries and veins group mode.
Conj Conjugation is selected Coefficient conjugation whether under the FIR arteries and veins group mode.If bank of filters adopts the coefficient of conjugation between channel, then be equivalent to arithmetic speed and double.Be made as ' 0 ', not conjugation; Be made as ' 1 ', conjugation.
Length (7:0) Filter length Filtering operation length under the FIR arteries and veins group mode, filter length N=length+1.
Sel_coeff_ram (4:0) The coefficient plot The calculating process coefficient is changed plot under the FIR arteries and veins group mode.High 2 perseverances are changed to 00, the coefficient in which zone in the middle of the low 3 bit representation coefficient of performance storeies.
Sel_data Whether weighting Under the FIR arteries and veins group operational pattern, whether selection is to the input data weighting.' 0 ', not weighting; ' 1 ' weighting.When and if only if chip operation was realized the DFT computing in FIR arteries and veins group, this control word was ' 1 '.
Sel_coeff_pp Coefficient table tennis Under the FIR arteries and veins group mode, fixedly be made as ' 0 ' herein.
Result1_mod (1:0) Delivery outlet 1 way of output The way of output as a result of net result output port 1 can have 4 kinds.This control word gets 00: output I/Q data; Get 01: output mould value, phase angle; Get 10: output logarithm, phase angle; Get 11: the relevant cascade data of output.
Sel_AGC1 (1:0) Delivery outlet 1 gain control Net result output port 1 both can be according to the form of input data, and I/Q is respectively 20 floating-point outputs; Also can get a normalized floating point values as standard with the value of control word GAC1 (3:0); I/Q can also be converted into 20 fixed-point number output.Sel_AGC1 (1:0) got 00,11 o'clock, floating-point output; Got index normalization output at 01 o'clock; Got fixed point output at 10 o'clock.
AGC1 (3:0) Delivery outlet 1 normalization level Sel_AGC1=01,10 o'clock, all need to determine the exponential quantity of a benchmark, and AGC1 (3:0) i.e. benchmark index value for this reason, when index normalization output or fixed point output, all as benchmark.
Result2_mod (1:0) Delivery outlet 2 way of outputs Be similar to result1_mod (1:0).Get 00: output I/Q data; Get 01: output mould value, phase angle; Get 10: output logarithm, phase angle; Get 11: the relevant cascade data of output.
Sel_AGC2 (1:0) Delivery outlet 1 gain control Be similar to sel_AGC1 (1:0).Get 00,11: floating-point output; Get 01: index normalization output; Get 10: fixed point output.
AGC2 (3:0) Delivery outlet 1 normalization level Be similar to AGC1 (3:0).Work as sel_AGC2=01,10 o'clock, the floating-point index of output was got the AGC2 value and is benchmark, did index normalization or floating-point and changeed fixed-point processing.
Table 2
Control word Title Meaning and value
Chip_id (3:0) Level consecutive numbers When the multi-disc cascade uses, the position of this film on the cascade chain.For example: this film is the 3rd on the cascade chain, then chip_id+1=3.If monolithic uses, then the chip_id perseverance is 0000.
Chip_num (3:0) Level is counted in flakes When the multi-disc cascade uses, show total several devices on the cascade chain.For example: have 3 device cascades to use, then chip_num+1=3.If monolithic uses, then the chip_num perseverance is 0000.
Chan_num (7:0) Port number Value is between 00000000~00001111, according to using needs to determine.For example, need concurrent operation 7 passages, then chan_num+1=7.
Work_model (4:0) Mode of operation Value is 11000 under the related operation pattern
Coeff_num (4:0) The coefficient sets number Need how much organize coefficient altogether when showing the hyperchannel computing.Value must not surpass current port number chan_num (7:0).For example, have 3 groups of coefficients, then coeff_num+1=3.
Coeff_chan (4:0) The coefficient passage Each organizes coefficient and each interchannel corresponding relation.Cooperate coefficient sets to count coeff_num and port number chan_num use.Triangular pass is: coeff_chan+1=[(chan_num+1)/(coeff_num+1)]+1 promptly, (coeff_chan+1) add 1 after rounding divided by (coeff_num+1) for (chan_num+1).For example: chan_num+1=7, coeff_num+1=3, then coeff_num+1=3.The expressed meaning of Coeff_chan+1 is: the pairing port number of each group coefficient.Above example, 7 passages, it is 3 that 3 groups of coefficients, coefficient passage add 1, and the 1st, 2,3 passages and the 1st group of coefficients match are described, the 4th, 5,6 passages and the 2nd group of coefficients match, the 7th passage and the 3rd group of coefficients match.
Conj Conjugation is selected Perseverance is changed to 0 during related operation.
Length (7:0) Filter length What characterize is the filter length (also being counting of one group of coefficient) of related operation.If length of window is X, then, X=40* (length+1).
Sel_coeff_ram (4: 0) Read the coefficient plot During related operation, coefficient memory is divided into two zones, can change coefficient in real time in calculating process.High 4 perseverances of Sel_coeff_ram (4:0) are changed to 0000, and lowest order is represented the coefficient with which zone.
Sel_data Whether weighting Under the related operation pattern, fixedly be made as ' 0 ' herein.
Sel_coeff_pp Coefficient table tennis Under the related operation pattern, fixedly be made as ' 0 ' herein.
Result1_mod (1:0) Export 1 pattern 00:result1 directly exports I/Q; 01:result1 output mould value, phase angle; 10:result1 output logarithm, phase angle; The 11:result1 cascade of being correlated with.
Sel_AGC1 (1:0) Exporting 1 gain control selects 00, the 11:result1 gain does not deal with; 01:result1 result carries out index normalization; 10:result1 result changes fixed point by floating-point.
AGC1 (3:0) Output 1 gain When the sel_AGC1 value is 01 or 10, just the index of computing gained floating point result 1 is carried out normalization according to the value of AGC1.
Result2_mod (1:0) Export 2 patterns 00:result2 directly exports I/Q; 01:result2 output mould value, phase angle; 10:result2 output logarithm, phase angle; The 11:result2 cascade of being correlated with.
Sel_AGC2 (1:0) Exporting 2 gain control selects 00, the 11:result2 gain does not deal with; 01:result2 result carries out index normalization; 10:result2 result changes fixed point by floating-point.
AGC2 (3:0) Output 2 gains When the sel_AGC2 value is 01 or 10, just the index of computing gained floating point result 2 is carried out normalization according to the value of AGC2.
Table 3
64 control words are by coefficient input port coeff_in input, and input mode is divided into two kinds of 32 inputs and 16 inputs,, are respectively done for oneself at control_en1, control_en2 and to send into respectively when low as enabling by pin control_en1, control_en2.During 32 inputs, the function control word is divided two groups, take whole 32 of coeff_in (31:0), the low level of control_en1, control_en2 continues 1 coeff_en cycle respectively, at control_en1 is low coeff_en send the function control word in the cycle preceding 32, is low coeff_en send control word in the cycle back 32 at control_en2.During 16 inputs, the function control word takies the high 16 of coeff_in (31:0), the low level of control_en1, control_en2 continues 2 coeff_en cycles, at control_en1 is that 2 low coeff_en send preceding 32 of function control word in the cycle, is 2 low coeff_en send the function control word in the cycle back 32 at control_en2.Sequential Figure 11 of 32 control word input timings and 16 control word inputs, shown in Figure 12.
On the universal signal disposable plates that with the present invention is primary processor, carry out the related operation of 1024 FFT computings, 10 rank FIR arteries and veins groups processing, 360 linear FM signal respectively, its control word is provided with respectively shown in table 4, table 5, table 6.
Control word Title Meaning and value
Work_model (4:0) Mode of operation FFT: value is 10000;
Length (7:0) Filter length What low four length (3:0) characterized is the data length (also being counting of one group of weighting coefficient) of FFT/IFFT computing.2 being the end, the corresponding positive integer value of length (3:0) is as index, and the power of being obtained is exactly the length of computing.Value " 1010 " herein, corresponding 1024 points.The positive integer value that length (6:4) is corresponding adds 1 value that obtains and represents the FFT/IFFT computing of counting greatly to split into the needed computing progression of point base eight computings, and its value is relevant with computing length.1024 point processings must level Four herein, and corresponding code is: " 011 ".Most significant digit length (7) is changed to ' 0 '.
Result1_mod (1:0) Delivery outlet 1 way of output The way of output as a result of net result output port 1 can have 4 kinds.Control word gets 01 herein, output mould value, phase angle.
Sel_AGC1 (1:0) Delivery outlet 1 gain control Net result output port 1 both can be according to the form of input data, and I/Q is respectively 20 floating-point outputs; Also can get a normalized floating point values as standard with the value of control word GAC1 (3:0); I/Q can also be converted into 20 fixed-point number output.Get 10 herein, fixed point output.
AGC1 (3:0) Delivery outlet 1 normalization level Work as sel_AGC1=01,10 o'clock, all need to determine the exponential quantity of a benchmark, and AGC1 (3:0) i.e. benchmark index value for this reason, when index normalization output or fixed point output, all as benchmark.Getting benchmark herein is 1011.
Table 4
Control word Title Meaning and value
Chan_num (7:0) Channel number The number of channels of FIR arteries and veins group computing.Rule is: number of channels=chan_num+1.Be 10 rank FIR arteries and veins group computings, so chan_num is 00001001 herein
Work_model (4:0) Mode of operation Value is 01000 herein for FIR arteries and veins group operational pattern, so be set to 01000 under the FIR arteries and veins group operational pattern
Conj Conjugation is selected Coefficient conjugation whether under the FIR arteries and veins group mode.If bank of filters adopts the coefficient of conjugation between channel, then be equivalent to arithmetic speed and double.Be made as ' 1 ' herein, select conjugation.
Length (7:0) Filter length Filtering operation length under the FIR arteries and veins group mode, filter length N=length+1.Be 10 rank FIR arteries and veins group computings, so length is set to 00001001 herein
Result1_mod (1:0) Delivery outlet 1 way of output The way of output as a result of net result output port 1 can have 4 kinds.Get 00 herein: output I/Q data.
Sel_AGC1 (1:0) Delivery outlet 1 gain control Net result output port 1 both can be according to the form of input data, and I/Q is respectively 20 floating-point outputs; Also can get a normalized floating point values as standard with the value of control word GAC1 (3:0); I/Q can also be converted into 20 fixed-point number output.Sel_AGC1 (1:0) gets 10 herein, fixed point output.
AGC1 (3:0) Delivery outlet 1 normalization level Sel_AGC1=01,10 o'clock, all need to determine the exponential quantity of a benchmark, and AGC1 (3:0) i.e. benchmark index value for this reason, when index normalization output or fixed point output, all as benchmark.Benchmark GAC1 gets 0111 herein
Table 5
Control word Title Meaning and value
Chip_id (3:0) Level consecutive numbers When the multi-disc cascade uses, the position of this film on the cascade chain.Monolithic uses herein, so chip_id is set to 0000.
Chip_hum (3:0) Level is counted in flakes When the multi-disc cascade uses, show total several devices on the cascade chain.Monolithic uses herein, so chip_num is set to 0000.
Chan_num (7:0) Port number Number of active lanes, rule is: port number=chan_num+1.Passage herein is so chan_num is set to 00000000
Work_model (4:0) Mode of operation Value is 11000 herein for related operation under the related operation pattern, and work_model is set to 11000
Coeff_num (4:0) The coefficient sets number Need how much organize coefficient altogether when showing the hyperchannel computing.Value must not surpass current port number.Has only one group of coefficient herein, so coeff_num is set to 00000
Coeff_chan (4:0) The coefficient passage Each organizes coefficient and each interchannel corresponding relation.Corresponding passage of one group of coefficient herein is so coeff_chan is set to 00000
Length (7:0) Filter length What characterize is counting of related operation.If length of window is X, then, X=40* (length+1).Be 360 spot correlation computings, so length is set to 00001001 herein
Result1_mod (1:0) Export 1 pattern Show that the result1 output port is with pattern output in what.Result1_mod is set to 10 herein, result1 output logarithm, phase angle.
Table 6

Claims (2)

1, reconstructable digital signal processor, it is characterized in that: the hardware structure of processor inside and hardware line can carry out structural rearrangement by the configuration control word, thereby realize the filtering operation of fast fourier transform/invert fast fourier transformation, FIR arteries and veins group and relevant treatment form;
Hardware structure comprises input block, output unit, exchanges data unit and 4 elementary cells, comprises 160 real number floating-point multiplication totalizers in its elementary cell, and they are evenly distributed in 4 elementary cells;
Input block receives control word, control word is carried out one-level decoding, distribute control word to output unit, exchanges data unit and 4 elementary cells: input block comprises the control word receiver module, the one-level decoding module, the control word distribution module, the coefficient synchronization module, data simultaneous module, clock signal generation module in the sheet, control word and the coefficient 1 multiplexing same port that enters the mouth, the control word receiver module receives control word, send into the decoding of one-level decoding module then, produce overall control signal on the one hand and be used to produce sequential in the sheet, for coefficient and data provide synchronous, pass through the control word distribution module on the other hand respectively to output unit, exchanges data unit and 4 elementary cell emissions, the coefficient synchronization module carries out synchronously coefficient inlet 1 and coefficient inlet 2, and data simultaneous module is carried out synchronously data inlet 1 and data inlet 2;
The exchanges data unit is the switch combination of importing, exporting more a group more, swap data between 4 elementary cells;
Output unit sorts to the operation result of each elementary cell, and export according to different forms: output unit comprises elementary cell output sort result module, the module of taking the logarithm, floating-point/fixed point modular converter, ask the mould module, index normalization module, elementary cell output sort result module is arranged the output of elementary cell when working in FFT/IFFT and FIR arteries and veins group tupe according to channel order, will be from each elementary cell when working in the relevant treatment pattern, Shu Chu data are adjusted into the continuous data stream identical with importing data transfer rate frame by frame, ask the mould module to finish of the conversion of the output format of real part/imaginary part to the output format of mould value/phase angle, the module of taking the logarithm is converted to logarithm with the mould value of importing and represents, floating-point/fixed point modular converter is with real part/imaginary part or ask the output of mould module to be converted to fixed point format by floating-point format, index normalization module unifies to be fixed value by mantissa being done corresponding displacement with the index of the operation result of floating-point format, above-mentioned 4 kinds of format converting module have two covers respectively, the corresponding cover of each output port guarantees that two output ports are independently with any one form output;
Comprise data-carrier store in the elementary cell, the complex multiplication totalizer, complex multiplication totalizer submatrix, coefficient memory, sequential control circuit, data-carrier store comprises 8 512 * 40 two-port RAM, the input-buffer that is used for operational data, temporary and the operation result output buffers of intermediate results of operations, metadata cache can be added to each complex multiplication totalizer and complex multiplication totalizer submatrix to data simultaneously, coefficient memory comprises 10 256 * 32 two-port RAM, weighting coefficient when being used to store relevant treatment and FIR filter coefficient and FFT computing, the coefficient of FFT interative computation be one fixing, table with the special logic realization, corresponding 4 the real number floating-point multiplication totalizers of each complex multiplication totalizer, each complex multiplication totalizer submatrix comprises 16 real number floating-point multiplication totalizers, be equivalent to 4 complex multiplication totalizers, two complex multiplication totalizer submatrixs and complex multiplication totalizer are according to different mode matched combined, be used to finish different computings, the structure of real number floating-point multiplication totalizer is divided into 5 parts: the fixed-point multiplication part, the cut position part, the index adjustment member, the fixed point addition section, the index judgment part;
The organizational form of hardware can be recombinated by the configuration control word: by the configuration of control word and control signal, can change the organizational form of described 160 real number floating-point multiplication totalizers and exchanges data unit, make it to select different mode of operations, to adapt to three kinds of different processor active tasks: FFT/IFFT, the processing of FIR arteries and veins group, related operation;
Take corresponding organizational form when doing three kinds of filtering operations, be respectively:
When doing the FFT/IFFT computing, two complex multiplication totalizer submatrixs are respectively as the node of an interative computation of basic 8 algorithms, 4 complex multiplication totalizers in each complex multiplication totalizer submatrix are used to finish one-level base 8 interative computations: that complex multiplication totalizer of each complex multiplication totalizer submatrix front is as the windowing arithmetical unit, data are carried out windowing process, data after the windowing enter this complex multiplication totalizer submatrix and carry out one-level base 8 interative computations, the output of each grade interative computation at first imports metadata cache, and then deliver to complex multiplication totalizer submatrix in the corresponding another one elementary cell by the exchanges data unit, to carry out the next stage interative computation;
When doing the processing of FIR arteries and veins group, in parallel use of parallel multiplication in 2 complex multiplication accumulator module of elementary cell inside and 2 the complex multiplication totalizer submatrix modules, a channel of the corresponding FIR arteries and veins of each complex multiplication totalizer group, corresponding 2 channels during conjugate operation;
When carrying out related calculation, complex multiplication totalizer and the series connection of complex multiplication totalizer submatrix are used, and 10 complex multiplication totalizers are equivalent to connect.
2, reconstructable digital signal processor as claimed in claim 1 is characterized in that: the hardware scheduling scheme that is adopted is,
The hardware scheduling scheme is taked the centralized and distributed two-level scheduler method that combines: control word and control signal at first enter global control module and carry out one-level decoding and control word distribution, in this module, at first receive control word, produce overall control signal according to control word then, comprise: coordinate the action of 4 elementary cells, the output format of decision operation result, setting to major clock in the sheet, second effect of global control module is the Local Control Module emission control information to each elementary cell, comprising: mode of operation, the subtype of mode of operation, computing is counted, number of channels, the number of treatment channel;
Local Control Module in each elementary cell carries out two-stage decode to these information after the information of receiving the global control module emission, be converted into the details of hardware scheduling control signal.
CN200610086398A 2006-07-14 2006-07-14 Reconstructable digital signal processor Active CN100594491C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200610086398A CN100594491C (en) 2006-07-14 2006-07-14 Reconstructable digital signal processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200610086398A CN100594491C (en) 2006-07-14 2006-07-14 Reconstructable digital signal processor

Publications (2)

Publication Number Publication Date
CN1900927A CN1900927A (en) 2007-01-24
CN100594491C true CN100594491C (en) 2010-03-17

Family

ID=37656814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200610086398A Active CN100594491C (en) 2006-07-14 2006-07-14 Reconstructable digital signal processor

Country Status (1)

Country Link
CN (1) CN100594491C (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782893B (en) * 2009-01-21 2014-12-24 上海芯豪微电子有限公司 Reconfigurable data processing platform
CN102087640B (en) * 2009-12-08 2013-06-05 中兴通讯股份有限公司 Method and device for realizing Fourier transform
CN102122275A (en) * 2010-01-08 2011-07-13 上海芯豪微电子有限公司 Configurable processor
CN101833540B (en) * 2010-04-07 2012-06-06 华为技术有限公司 Signal processing method and device
CN102043761B (en) * 2011-01-04 2012-06-13 东南大学 Fourier transform implementation method based on reconfigurable technology
US9372663B2 (en) * 2011-10-27 2016-06-21 Intel Corporation Direct digital synthesis of signals using maximum likelihood bit-stream encoding
CN103390071A (en) * 2012-05-07 2013-11-13 北京大学深圳研究生院 Hierarchical interconnection structure of reconfigurable operator array
CN103543984B (en) 2012-07-11 2016-08-10 世意法(北京)半导体研发有限责任公司 Modified form balance throughput data path architecture for special related application
CN103543983B (en) * 2012-07-11 2016-08-24 世意法(北京)半导体研发有限责任公司 For improving the novel data access method of the FIR operating characteristics in balance throughput data path architecture
CN104932992B (en) * 2015-07-08 2017-10-03 中国电子科技集团公司第五十四研究所 A kind of flexible retransmission method of the variable Digital Microwave of bandwidth granularity
CN105206282B (en) * 2015-09-24 2019-12-13 深圳市冠旭电子股份有限公司 noise collection method and device
CN106959936A (en) * 2016-01-08 2017-07-18 福州瑞芯微电子股份有限公司 A kind of the hardware-accelerated of FFT realizes device and method
US9942074B1 (en) * 2016-11-30 2018-04-10 Micron Technology, Inc. Wireless devices and systems including examples of mixing coefficient data specific to a processing mode selection
US10027523B2 (en) 2016-11-30 2018-07-17 Micron Technology, Inc. Wireless devices and systems including examples of mixing input data with coefficient data
CN106951211B (en) * 2017-03-27 2019-10-18 南京大学 A kind of restructural fixed and floating general purpose multipliers
CN106951394A (en) * 2017-03-27 2017-07-14 南京大学 A kind of general fft processor of restructural fixed and floating
CN109782661B (en) * 2019-01-04 2020-10-16 中国科学院声学研究所东海研究站 System and method for realizing reconfigurable and multi-output real-time processing based on FPGA
US10886998B2 (en) 2019-02-22 2021-01-05 Micron Technology, Inc. Mixing coefficient data specific to a processing mode selection using layers of multiplication/accumulation units for wireless communication
CN110247872B (en) * 2019-03-25 2021-11-23 南京杰思微电子技术有限公司 Synchronous detection method and device for power line carrier communication chip
CN110674456B (en) * 2019-09-26 2022-11-22 电子科技大学 Time-frequency conversion method of signal acquisition system
CN111158636B (en) * 2019-12-03 2022-04-05 中国人民解放军战略支援部队信息工程大学 Reconfigurable computing structure and routing addressing method and device of computing processing array
CN111064912B (en) * 2019-12-20 2022-03-22 江苏芯盛智能科技有限公司 Frame format conversion circuit and method
CN111782581B (en) * 2020-07-30 2024-01-12 中国电子科技集团公司第十四研究所 Reconfigurable signal processing operation unit and recombination unit based on same

Also Published As

Publication number Publication date
CN1900927A (en) 2007-01-24

Similar Documents

Publication Publication Date Title
CN100594491C (en) Reconstructable digital signal processor
CN108805266B (en) Reconfigurable CNN high-concurrency convolution accelerator
US10387122B1 (en) Residue number matrix multiplier
CN106775599A (en) Many computing unit coarseness reconfigurable systems and method of recurrent neural network
CN110276450A (en) Deep neural network structural sparse system and method based on more granularities
CN102043761B (en) Fourier transform implementation method based on reconfigurable technology
US20020002573A1 (en) Processor with reconfigurable arithmetic data path
CN100573440C (en) A kind of parallel-to-serial adder and multiplier
CN103678257A (en) Positive definite matrix floating point inversion device based on FPGA and inversion method thereof
CN101782893A (en) Reconfigurable data processing platform
CN101149730B (en) Optimized discrete Fourier transform method and apparatus using prime factor algorithm
CN110765709A (en) FPGA-based 2-2 fast Fourier transform hardware design method
CN103176767A (en) Implementation method of floating point multiply-accumulate unit low in power consumption and high in huff and puff
CN110163359A (en) A kind of computing device and method
WO2018027706A1 (en) Fft processor and algorithm
CN110851779A (en) Systolic array architecture for sparse matrix operations
CN104090737A (en) Improved partial parallel architecture multiplying unit and processing method thereof
CN105095152A (en) Configurable 128 point fast Fourier transform (FFT) device
CN104268124B (en) A kind of FFT realizes apparatus and method
CN107368459A (en) The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions matrix multiplication
CN100547580C (en) Be used to realize the method and apparatus of the fast orthogonal transforms of variable-size
RU2689819C1 (en) Vector multiformat multiplier
CN106385311A (en) Chaotic signal generator of complex chaotic simplified system based on FPGA
RU185346U1 (en) VECTOR MULTIFORM FORMAT
CN107358292A (en) A kind of convolution accelerator module design method based on chemical reaction network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20070124

Assignee: China Information & Electronice Development Inc., Ltd., Hefei

Assignor: No.38 Inst., China Electronic Sci. & Tech. Group Co.

Contract record no.: 2013340000054

Denomination of invention: Reconstructable digital signal processor

Granted publication date: 20100317

License type: Exclusive License

Record date: 20130422

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
TR01 Transfer of patent right

Effective date of registration: 20191119

Address after: 5 / F, airborne center, 38 new area, No. 199, Xiangzhang Avenue, hi tech Zone, Hefei City, Anhui Province 230000

Patentee after: Anhui core Century Technology Co., Ltd.

Address before: 230031 Hefei thirty-eighth Research Institute, 9023 mailbox, Anhui, China

Patentee before: No.38 Inst., China Electronic Sci. & Tech. Group Co.

TR01 Transfer of patent right