CN103838704A - FFT accelerator with high throughput rate - Google Patents

FFT accelerator with high throughput rate Download PDF

Info

Publication number
CN103838704A
CN103838704A CN201310739716.9A CN201310739716A CN103838704A CN 103838704 A CN103838704 A CN 103838704A CN 201310739716 A CN201310739716 A CN 201310739716A CN 103838704 A CN103838704 A CN 103838704A
Authority
CN
China
Prior art keywords
fft
module
data
accelerator
flowing water
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310739716.9A
Other languages
Chinese (zh)
Inventor
潘红兵
吕飞
李丽
姚馨
田静
徐淼
魏子君
陈辉
李伟
何书专
沙金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201310739716.9A priority Critical patent/CN103838704A/en
Publication of CN103838704A publication Critical patent/CN103838704A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention relates to an FFT accelerator with a high throughput rate. The FFT accelerator with the high throughput rate is characterized by comprising a data storage module, an address generation module and an FFT acceleration module, wherein the data storage module is used for reading, writing and transmission of data, the address generation module provides a target address of data transmission for the data storage module, and the FFT acceleration module carries out FFT on the data output by the data storage module. The FFT accelerator with the high throughput rate has the advantages that a one-way delayed feedback structure is adopted, the throughput rate is high, and the storage resources in a chip are effectively saved. The FFT accelerator supports expansion interface input on one hand, and supports ping-pong output on the other and. In the data input process, cache space is not needed, the data are fed into an FFT calculation component directly, and FFT calculation is carried out. In the data output process, inverted-order output is carried out through the cache.

Description

A kind of FFT accelerator of high-throughput
Technical field
The present invention relates to FFT accelerator, relate in particular to a kind of FFT accelerator of high-throughput.
Background technology
Along with the development of communication, Radar Technology, FFT is used widely in fields such as radio communication, speech recognition, image processing and spectrum analyses.Particularly, after OFDM (Orthogonal Frequency Division Multiplexing, OFDM) occurs, processing ultra long FFT how more fast, more flexibly becomes more and more important problem.FFT hardware structure is mainly divided into: two kinds of the FFT of pipeline organization and the FFT of memory construction.
It is all fewer that the storage unit that the FFT of memory construction needs and arithmetic element consume hardware resource, but due to the FFT structure of memory construction, the input and output of each DBMS all share same block RAM, only have when each frame data is after processing finishes completely, could start the input of next frame data, thereby increase the handling time of data, caused the throughput of data and the reduction of arithmetic element utilization factor.So the topmost shortcoming of the FFT of memory construction is that it can not carry out continuous FFT data processing.Existing FFT hardware configuration is subject to the constraint of storage resources on sheet, and operation efficiency is lower, and throughput is low.
Summary of the invention
The object of the invention is to overcome the deficiency of above prior art, and a kind of FFT accelerator of high-throughput is provided, and specifically has following technical scheme to realize:
The FFT accelerator of described high-throughput, comprises
Data memory module, for read-write and the transmission of data;
Address generation module, for data memory module provides the destination address of data transmission;
FFT accelerating module, carries out FFT to the data of being exported by data memory module.
The further design of the FFT accelerator of described high-throughput is, described address generation module need to be to the processing of overturning of the binary number of destination address, for corresponding with FFT accelerating module Input Address.
The further design of the FFT accelerator of described high-throughput is, described FFT accelerating module builds multistage flowing water computing, comprises
Twiddle factor generation module, for generating twiddle factor output;
Its main operational unit, carries out butterfly computation and carries out complex multiplication with the twiddle factor receiving according to the progression of flowing water computing, and Output rusults;
The further design of the FFT accelerator of described high-throughput is, described its main operational unit comprises
Butterfly operation module, according to b i = a i + a i + N / 2 b i + N / 2 = a i - a i + N / 2 i = 0 , . . . , N / 2 - 1 Carry out butterfly computation, the input data amount check of any one-level that wherein N is described multistage flowing water, a i, a i+N/2for be separated by two elements of N/2 of this one-level sequence number, b i, b i+N/2for the intermediate value of the FFT that exports after dish-shaped computing;
Data cache module, for providing the address data memory corresponding with described multistage flowing water computing;
Complex multiplication module, for completing the intermediate value of described FFT and the complex multiplication operation of twiddle factor, and exports result.
The further design of the FFT accelerator of described high-throughput is, described twiddle factor generation module is corresponding to the progression m of described multistage flowing water computing, and its regularity of distribution is as follows:
M=0 level, W 2 M r , r = 0,1 , . . . , N / 2 - 1
M=1 level, W 2 M - 1 r , r = 0,1 , . . . , N / 4 - 1
......
m=M-1, W 2 r , r = 0
M level, W 2 M - m r , r = 0,1 , . . . , 2 M - m - 1 - 1
Wherein, M is constant,
Figure BDA0000449462480000026
for twiddle factor, r is one and exports twiddle factor at different levels from increasing memory variable for flowing water, makes the intermediate value synchronous transport of twiddle factors at different levels and described FFT to complex multiplication module.
The further design of the FFT accelerator of described high-throughput is, described multistage flowing water computing is 17 grades of flowing water computings.
The further design of the FFT accelerator of described high-throughput is, also comprises two senior extensive interfaces, and described data memory module is connected with address generation module and FFT accelerating module respectively by described two senior extensive interfaces.
Advantage of the present invention is as follows:
The FFT accelerator of high-throughput of the present invention, based on fpga chip, utilizes its abundant logical resource and DSP resource; Adopt single channel Delay Feedback structure, have the advantages that throughput is high, can effectively save again storage resources in sheet.This FFT accelerator is supported senior extensive interface interface flowing water input on the one hand, supports on the other hand table tennis output.In the time that data are inputted, without spatial cache, directly data are sent to FFT arithmetic unit, carry out FFT computing, and in data whens output, is carried out inverted order output by buffer memory.
Brief description of the drawings
Fig. 1 is FFT top-level schematic;
Fig. 2 is Radix-2SDF FFT Organization Chart;
Fig. 3 is dish-shaped cell schematics;
Fig. 4 is FFT accelerator checking flow process;
Embodiment
Below in conjunction with accompanying drawing, the present invention program is elaborated.
Fast Fourier Transform (FFT) (Fast Fourier Transformation, the FFT) processor of counting is greatly realized based on field programmable gate array (Field Programmable Gate Array, FPGA) conventionally.The FFT accelerator of the high-throughput that the present embodiment provides is based on fpga chip, and proportion extracts the fft algorithm of base 2, adopts the hardware structure of single channel Delay Feedback.The system test platform of the present embodiment is the indispensable design tool ISE of the FPGA based on XILINX.
As Fig. 1, the FFT accelerator of the high-throughput that the present embodiment provides, comprises data memory module, address generation module and FFT accelerating module.Data memory module, for read-write and the transmission of data, the present embodiment is QDR storer.Address generation module, for data memory module provides the destination address of data transmission, the present embodiment is addr_gen_unit module.FFT accelerating module, carries out FFT to the data of being exported by data memory module, and the present embodiment is Radix-2SDF FFT module.This FFT accelerator also comprises two senior extensive interfaces, and data memory module is connected with address generation module and FFT accelerating module respectively by two senior extensive interfaces.
QDR: it is 256K that the present embodiment adopts a word length, word is wide is that 64 QDR does ping-pong operation.Due to the restriction of on-chip memory cell, use an outside QDR as input memory.Can pass through configuration signal, by one or more groups 128K point, by AXI interface, flowing water is exported FFT computing module, in the time that data processing is complete, the address first producing according to addr_gen_unit, writes the 128K address before QDR by one group of 128K data, then passes through AXI-4 interface by data reading.In data reading in the front 128K address of QDR, data are write in the rear 128K address of QDR.By that analogy, complete the output of whole data.So just can save the handling time of data, greatly improve the efficiency of FFT module.
Addr_gen_unit: for QDR provides address, for data_out_stage is write to QDR.From DIF base 2FFT algorithm, the data of computing output are arranged in reverse order, need to be by the data of inverted order, be put into position corresponding in QDR.The output sequence list of data is shown as to scale-of-two, supposes N=8, output order is 0-7, and scale-of-two can be expressed as: 000,001,010,011,100,101,110,111, the scale-of-two upset of sequence number can be obtained: 000,100,010,110,001,101,011,111, be: 0,4,2,6,1,5,3,7, corresponding one by one with the OPADD of addr_gen_unit.
FFT accelerating module, builds multistage flowing water computing, comprises twiddle factor generation module and its main operational unit.Twiddle factor generation module, for generating twiddle factor output, the present embodiment is w_gen module.Its main operational unit, carries out butterfly computation and carries out complex multiplication with the twiddle factor receiving according to the progression of flowing water computing, and Output rusults, and the present embodiment is stage unit.By w_gen, two modules of stage separately, on the one hand in order to simplify control signals at different levels, ensure that each DBMS and w_gen can synchronous transmission on the other hand, wait for, thereby save clock without data are stopped.
As Fig. 2, its main operational unit comprises butterfly operation module, data cache module and complex multiplication module.Butterfly operation module, according to b i = a i + a i + N / 2 b i + N / 2 = a i - a i + N / 2 i = 0 , . . . , N / 2 - 1 Carry out butterfly computation, the input data amount check of any one-level that wherein N is multistage flowing water, a i, a i+N/2for be separated by two elements of N/2 of this one-level sequence number, b i, b i+N/2for the intermediate value of the FFT that exports after dish-shaped computing, the present embodiment is BF module, referring to Fig. 3.Data cache module, for the address data memory corresponding with multistage flowing water computing is provided, this example is fifo module.Complex multiplication module, for completing the intermediate value of FFT and the complex multiplication operation of twiddle factor, and by result output, this example is mul module.
W_gen is corresponding to the progression m of multistage flowing water computing, and its regularity of distribution is as follows:
M=0 level, W 2 M r , r = 0,1 , . . . , N / 2 - 1
M=1 level, W 2 M - 1 r , r = 0,1 , . . . , N / 4 - 1
......
m=M-1, W 2 r , r = 0
M level, W 2 M - m r , r = 0,1 , . . . , 2 M - m - 1 - 1
Wherein, M is constant,
Figure BDA0000449462480000046
for twiddle factor, r is one and exports twiddle factor at different levels from increasing memory variable for flowing water, makes the intermediate value synchronous transport of twiddle factors at different levels and FFT to complex multiplication module.As known in above formula, twiddle factor is relevant with r, m, by start(start0, and start1 ... start16) control of signal.
Because multistage flowing water computing is 17 grades of flowing water computings, therefore fifo module needs 17 grades of flowing water, so also need 17 FIFO, the data length of depositing is followed successively by: N/2, N/2 2...., N/2 17, due to the restriction of block memory, front 13 FIFO can directly be generated by IP by block memory.And the minimum supported data length of IP is 16, rear 4 data fifo length are: 8,4,2,1.Can use reg to carry out buffer memory, save clock on the one hand, on the other hand also to have reduced the consumption of storage resources.
Through system testing, testing process is as Fig. 4, and the modules in FFT accelerator can complete corresponding function, obtains predetermined result, can judge that this FFT accelerator bears results reliably.
FFT accelerator can directly generate by ISE the IP carrying and obtain, but the FFT IP kernel maximum being carried by ISE is only supported the FFT computing that 64K is ordered, the FFT IP kernel that can order by two 64K is realized the FFT accelerator of 128K, thereby head it off, but the more important thing is that FFT IP kernel consumes storage resources in more sheet, can carry by ISE the Block memory consumption of 64K point FFT IP kernel, estimate the interior storage resources of sheet that 128K point FFT IP needs, be approximately 484 36Kb Block memory.Expend storage resources in too much sheet.So need the satisfactory FFT accelerator of designed, designed.And in this design, only have the certain Block memory resource of cell fifo consumption.Known after comprehensive, whole FFT accelerator consumes 234 36Kb Block memory.By optimizing, the frequency of operation of FFT accelerator can reach 300MHz, and data pass rate can reach 2.2GB/s.

Claims (7)

1. a FFT accelerator for high-throughput, is characterised in that and comprises
Data memory module, for read-write and the transmission of data;
Address generation module, for data memory module provides the destination address of data transmission;
FFT accelerating module, carries out FFT to the data of being exported by data memory module.
2. the FFT accelerator of high-throughput according to claim 1, is characterized in that described address generation module need to be to the processing of overturning of the binary number of destination address, for corresponding with FFT accelerating module Input Address.
3. the FFT accelerator of high-throughput according to claim 1, is characterized in that described FFT accelerating module, builds multistage flowing water computing, comprises
Twiddle factor generation module, for generating twiddle factor output;
Its main operational unit, carries out butterfly computation and carries out complex multiplication with the twiddle factor receiving according to the progression of flowing water computing, and Output rusults.
4. the FFT accelerator of high-throughput according to claim 3, is characterized in that described its main operational unit comprises
Butterfly operation module, according to
Figure FDA0000449462470000011
carry out butterfly computation, the input data amount check of any one-level that wherein N is described multistage flowing water, a i, a i+N/2for be separated by two elements of N/2 of this one-level sequence number, b i, b i+N/2for the intermediate value of the FFT that exports after dish-shaped computing;
Data cache module, for providing the address data memory corresponding with described multistage flowing water computing;
Complex multiplication module, for completing the intermediate value of described FFT and the complex multiplication operation of twiddle factor, and exports result.
5. the FFT accelerator of high-throughput according to claim 4, is characterized in that the progression m of described twiddle factor generation module corresponding to described multistage flowing water computing, and its regularity of distribution is as follows:
M=0 level,
M=1 level,
Figure FDA0000449462470000013
......
m=M-1,
Figure FDA0000449462470000014
M level,
Wherein, M is constant,
Figure FDA0000449462470000021
for twiddle factor, r is one and exports twiddle factor at different levels from increasing memory variable for flowing water, makes the intermediate value synchronous transport of twiddle factors at different levels and described FFT to complex multiplication module.
6. the FFT accelerator of high-throughput according to claim 5, is characterized in that described multistage flowing water computing is 17 grades of flowing water computings.
7. the FFT accelerator of high-throughput according to claim 5, characterized by further comprising two senior extensive interfaces, and described data memory module is connected with address generation module and FFT accelerating module respectively by described two senior extensive interfaces.
CN201310739716.9A 2014-03-20 2014-03-20 FFT accelerator with high throughput rate Pending CN103838704A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310739716.9A CN103838704A (en) 2014-03-20 2014-03-20 FFT accelerator with high throughput rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310739716.9A CN103838704A (en) 2014-03-20 2014-03-20 FFT accelerator with high throughput rate

Publications (1)

Publication Number Publication Date
CN103838704A true CN103838704A (en) 2014-06-04

Family

ID=50802220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310739716.9A Pending CN103838704A (en) 2014-03-20 2014-03-20 FFT accelerator with high throughput rate

Country Status (1)

Country Link
CN (1) CN103838704A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106936446A (en) * 2017-03-10 2017-07-07 南京大学 A kind of high speed decoder and its interpretation method based on Non-Binary LDPC Coded
CN108182169A (en) * 2017-12-18 2018-06-19 北京时代民芯科技有限公司 Efficient FFT implementation methods in a kind of MTD wave filters
CN109073746A (en) * 2016-06-16 2018-12-21 德州仪器公司 radar hardware accelerator
CN109948113A (en) * 2019-03-04 2019-06-28 东南大学 A kind of Two-dimensional FFT accelerator based on FPGA

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100576520B1 (en) * 2003-12-01 2006-05-03 전자부품연구원 Variable fast fourier transform processor using iteration algorithm
CN101231632A (en) * 2007-11-20 2008-07-30 西安电子科技大学 Method for processing floating-point FFT by FPGA
CN101339546A (en) * 2008-08-07 2009-01-07 那微微电子科技(上海)有限公司 Address mappings method and operand parallel FFT processing system
CN102129419A (en) * 2011-03-04 2011-07-20 中山大学 Fast Fourier transform-based processor
CN102495721A (en) * 2011-12-02 2012-06-13 南京大学 Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100576520B1 (en) * 2003-12-01 2006-05-03 전자부품연구원 Variable fast fourier transform processor using iteration algorithm
CN101231632A (en) * 2007-11-20 2008-07-30 西安电子科技大学 Method for processing floating-point FFT by FPGA
CN101339546A (en) * 2008-08-07 2009-01-07 那微微电子科技(上海)有限公司 Address mappings method and operand parallel FFT processing system
CN102129419A (en) * 2011-03-04 2011-07-20 中山大学 Fast Fourier transform-based processor
CN102495721A (en) * 2011-12-02 2012-06-13 南京大学 Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHUENN-YUH LEE ET AL.: "A Low-Power VLSI Architecture for a Shared-Memory FFT Processor with a Mixed-Radix Algorithm and a Simple Memory Control Scheme", 《PROCEEDINGS. 2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS》 *
丁智泉 等: "高速浮点FFT处理器的FPGA实现", 《四川理工学院学报(自然科学版)》 *
何星 等: "流水线结构FFT/IFFT处理器的设计与实现", 《微电子学与计算机》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109073746A (en) * 2016-06-16 2018-12-21 德州仪器公司 radar hardware accelerator
US11579242B2 (en) 2016-06-16 2023-02-14 Texas Instruments Incorporated Radar hardware accelerator
CN109073746B (en) * 2016-06-16 2023-05-16 德州仪器公司 Radar hardware accelerator
CN106936446A (en) * 2017-03-10 2017-07-07 南京大学 A kind of high speed decoder and its interpretation method based on Non-Binary LDPC Coded
CN108182169A (en) * 2017-12-18 2018-06-19 北京时代民芯科技有限公司 Efficient FFT implementation methods in a kind of MTD wave filters
CN108182169B (en) * 2017-12-18 2021-06-08 北京时代民芯科技有限公司 Method for realizing high-efficiency FFT in MTD filter
CN109948113A (en) * 2019-03-04 2019-06-28 东南大学 A kind of Two-dimensional FFT accelerator based on FPGA

Similar Documents

Publication Publication Date Title
Zhai et al. Energy-efficient subthreshold processor design
CN103412284B (en) Matrix transposition method in SAR imaging system based on DSP chip
CN103984560A (en) Embedded reconfigurable system based on large-scale coarseness and processing method thereof
CN103838704A (en) FFT accelerator with high throughput rate
CN103970718A (en) Quick Fourier transformation implementation device and method
CN201226025Y (en) Processor for pulse Doppler radar signal
CN104268122A (en) Point-changeable floating point FFT (fast Fourier transform) processor
Xie et al. An energy-efficient FPGA-based embedded system for CNN application
Xiao et al. Reduced memory architecture for CORDIC-based FFT
CN102209962A (en) Method and device for computing matrices for discrete fourier transform (dft) coefficients
CN103577161A (en) Big data frequency parallel-processing method
CN103544111B (en) A kind of hybrid base FFT method based on real-time process
CN102129419B (en) Based on the processor of fast fourier transform
CN103197287A (en) High-speed real-time frequency domain pulse compression device and processing method thereof
CN105955705B (en) A kind of restructural multi-channel detection algorithm accelerator
Zong-ling et al. The design of lightweight and multi parallel CNN accelerator based on FPGA
CN113779499A (en) Fast Fourier algorithm optimization method and system based on high-level comprehensive tool
CN104657334A (en) FFT (Fast Fourier Transform) radix-2-4-8 mixed-radix butterfly operator and application thereof
CN106782669A (en) A kind of self calibration scalability SRAM delay test circuits
Hou et al. An FPGA-based multi-core system for synthetic aperture radar data processing
CN102353838A (en) Rapid high precision frequency measuring realization method by applying FPGA chip
CN103761074B (en) A kind of configuration method for pipeline-architecturfixed-point fixed-point FFT word length
Feng et al. Implementation and optimisation of pulse compression algorithm on open CL‐based FPGA
Zhang et al. Small area high speed configurable FFT processor
CN105337759A (en) Internal and external ratio measurement method based on community structure, and community discovery method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140604