CN107229596B - Non-pipeline fast Fourier transform processor and operation control method thereof - Google Patents

Non-pipeline fast Fourier transform processor and operation control method thereof Download PDF

Info

Publication number
CN107229596B
CN107229596B CN201610177927.1A CN201610177927A CN107229596B CN 107229596 B CN107229596 B CN 107229596B CN 201610177927 A CN201610177927 A CN 201610177927A CN 107229596 B CN107229596 B CN 107229596B
Authority
CN
China
Prior art keywords
register
data
result
stores
subtraction result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610177927.1A
Other languages
Chinese (zh)
Other versions
CN107229596A (en
Inventor
董旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ali Corp
Original Assignee
Ali Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ali Corp filed Critical Ali Corp
Priority to CN201610177927.1A priority Critical patent/CN107229596B/en
Publication of CN107229596A publication Critical patent/CN107229596A/en
Application granted granted Critical
Publication of CN107229596B publication Critical patent/CN107229596B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm

Landscapes

  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Discrete Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a non-pipeline fast Fourier transform processor and an operation control method thereof. The conversion processor includes a control logic circuit, a first processing core and a second processing core. The first processing core is coupled to the control logic circuit. The second processing core is coupled to the control logic circuit and the first processing core. The control logic circuit provides a first control instruction and a second control instruction to the first processing core and the second processing core respectively. The first processing core is controlled by a first control instruction to perform fast Fourier transform according to a plurality of in-phase operation data and a plurality of orthogonal intermediate data. The second processing core is controlled by a second control instruction to perform fast Fourier transform according to the plurality of orthogonal operation data and the plurality of in-phase intermediate data.

Description

Non-pipeline fast Fourier transform processor and operation control method thereof
Technical Field
The present invention relates to a fast fourier transform processor, and more particularly, to a non-pipelined fast fourier transform processor and an operation control method thereof.
Background
The terrestrial digital multimedia/television broadcasting system (DTMB) gradually becomes the digital multimedia/television transmission standard in china due to its advantages of high transmission efficiency or spectrum efficiency, strong anti-multipath interference capability, good signal channel estimation performance, suitability for mobile reception, and the like. Moreover, the 3780 point fourier transform (FFT) and inverse fourier transform (IFFT) modules have become one of the important modules of the chinese terrestrial digital multimedia/television broadcasting system. Since the above modules cannot directly use the mature base-2 (base-2) and base-4 (base-4) algorithms for conversion, the 3780 point fourier transform and inverse fourier transform modules need an algorithm and hardware circuit implementation method with good calculation efficiency and reasonable hardware resources.
Disclosure of Invention
The invention provides a non-pipeline fast Fourier transform processor and an operation control method thereof, and the non-pipeline fast Fourier transform processor can reduce hardware cost.
The invention relates to a non-pipeline fast Fourier transform processor, which comprises a control logic circuit, a first processing core and a second processing core. The first processing core is coupled to the control logic circuit. The second processing core is coupled to the control logic circuit and the first processing core. The control logic circuit provides a first control instruction and a second control instruction to the first processing core and the second processing core respectively. The first processing core receives a plurality of in-phase operation data and a plurality of orthogonal intermediate data from the second processing core, and is controlled by a first control instruction to perform 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform in sequence according to the in-phase operation data and the orthogonal intermediate data, and to provide a plurality of in-phase intermediate data and a plurality of in-phase transformation data in sequence. The second processing core receives a plurality of orthogonal operation data and the in-phase intermediate data, is controlled by a second control instruction to perform 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform according to the orthogonal operation data and the in-phase intermediate data, and sequentially provides the orthogonal intermediate data and a plurality of orthogonal transformation data.
The operation control method of the non-pipeline type fast Fourier transform processor comprises the following steps. A first control instruction and a second control command are provided to a first processing core and a second processing core respectively through a control logic circuit. The first processing core is controlled through a first control instruction, and the first processing core sequentially performs 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform according to a plurality of in-phase operation data and a plurality of orthogonal intermediate data from the second processing core to sequentially provide a plurality of in-phase intermediate data and a plurality of in-phase conversion data. The second processing core is controlled by the second control instruction, and performs 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform according to a plurality of orthogonal operation data and the in-phase intermediate data, and sequentially provides the orthogonal intermediate data and a plurality of orthogonal transformation data.
Based on the above, the non-pipelined fft processor and the operation control method thereof of the present invention can reduce the use of a large amount of memory because the non-pipelined fft processor does not need to cache intermediate results. In addition, the first processing core and the second processing core of the present invention are fully reused, i.e. the first processing core and the second processing core can perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms, thereby saving the number of logic gates of the circuit.
In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.
Drawings
FIG. 1 is a system diagram of a non-pipelined fast Fourier transform processor according to one embodiment of the invention.
FIG. 2 is a system diagram illustrating a first processing core and a second processing core according to an embodiment of the invention.
FIG. 3 is a timing diagram illustrating operation control of the first processing core performing 5-point FFT of in-phase operation data according to an embodiment of the invention.
FIG. 4 is a timing diagram illustrating operation control of the second processing core performing a 5-point fast Fourier transform of orthogonal operation data according to an embodiment of the invention.
FIG. 5 is a flowchart illustrating an operation control method of a non-pipelined FFT processor according to an embodiment of the invention.
Description of the reference numerals
100: non-pipelined fast Fourier transform processor
110: state master control circuit
120: input/output control circuit
130: storage unit
140: address mapping circuit
150: first buffer circuit
160: a first processing core
170: a second processing core
180: control logic circuit
190: second buffer circuit
210: first adder array
220: first multiplier array
230: first register group
240: second adder array
250: second multiplier array
260: second register set
A11, A21: first adder
a11, a 21: first addition result
A12, A22: second adder
a12, a 22: second addition result
A13, A23: third adder
a13, a 23: third phase addition
A14, A24: fourth adder
a14, a 24: fourth addition result
a15, a 25: fifth addition result
a16, a 26: sixth addition result
a17, b 27: seventh addition result
a18, a 28: eighth addition result
b11, b 21: first subtraction result
b12, b 22: second subtraction result
b13, b 23: third phase reduction result
b14, b 24: fourth subtraction result
b15, b 25: fifth subtraction result
b16, b 26: sixth subtraction result
b17, b 27: the seventh subtraction result
b18, b 28: eighth subtraction result
b19, b 29: ninth subtraction result
CM 1: a first control instruction
CM 2: a second control instruction
D1~D3780: inputting data
DIB: in-phase intermediate data
DOI: in-phase operation data
DOQ: orthogonal operation data
DQB: orthogonal intermediate data
DTI: in-phase converted data
DTQ: quadrature converting data
I1, Q1: first operation data
I2, Q2: second operation data
I3, Q3: third operation data
I4, Q4: fourth operation data
I5, Q5: fifth operation data
IB 1: first in-phase intermediate data
IB 2: second in-phase intermediate data
IT 1: first phase-inversion data
IT 2: second in-phase converted data
IT 3: third in-phase converted data
IT 4: fourth in-phase converted data
IT 5: fifth in-phase conversion data
m11, m 21: first multiplication result
M11, M21: first multiplier
m12, m 22: second multiplication result
M12, M22: second multiplier
m13, m 23: third phase multiplication result
M13, M23: third multiplier
m14, m 24: the fourth multiplication result
m15, m 25: result of the fifth multiplication
QB 1: first orthogonal intermediate data
QB 2: second orthogonal intermediate data
QT 1: first quadrature-converted data
QT 2: second quadrature-converted data
QT 3: third orthogonally converted data
QT 4: fourth quadrature-converted data
QT 5: fifth quadrature converted data
R11, R21: first register
R12, R22: second register
R13, R23: third register
R14, R24: fourth register
R15, R25: the fifth register
R16, R26: the sixth register
Steps S510, S520, S530:
Detailed Description
FIG. 1 is a system diagram of a non-pipelined fast Fourier transform processor according to one embodiment of the invention. Referring to fig. 1, in the present embodiment, a non-pipelined fft processor 100 includes a state master circuit 110, an input/output control circuit 120, a storage unit 130, an address mapping circuit 140, a first buffer circuit 150, a first processing core 160, a second processing core 170, a control logic circuit 180, and a second buffer circuit 190.
The state control circuit 110 is coupled to the input/output control circuit 120 and the address mapping circuit 140 for controlling data reading and data transmission operations of the input/output control circuit 120 and the address mapping circuit 140. The input/output control circuit 120 is coupled to the storage unit 130 and the state control circuit 110, and receives 3780-point input data D1~D3780For storing or reading 3780-point input data D controlled by the state master circuit 1101~D3780In the storage unit 130. The address mapping circuit 140 is coupled to the storage unit 130 and the state master circuit 110, and is controlled by the state master circuit 110 to read the storage unit 130 to provide the in-phase operation data DOI and the quadrature operation data DOQ, and further store the in-phase conversion data DTI and the quadrature conversion data DTQ in the storage unit 130.
The first cache circuit 150 is coupled to the address mapping circuit 140, the first processing core 160 and the second processing core 170 for caching the in-phase operation data DOI and the quadrature operation data DOQ. The first processing core 160 is coupled to the first cache circuit 150, the second processing core 170 and the control logic circuit 180, for receiving the in-phase operation data DOI from the first cache circuit 150, the plurality of quadrature intermediate data DQB from the second processing core 170 and a first control command CM1 provided by the control logic circuit 180. The second processing core 170 is coupled to the first buffer circuit 150, the first processing core 160 and the control logic circuit 180 for receiving the quadrature data DOQ from the first buffer circuit 150, the plurality of in-phase intermediate data DIB from the first processing core 160 and the control logic circuit 180.
In this embodiment, a Prime Factor Algorithm (PFA) is used to input 3780 points (corresponding to 3780 points) into the data D1~D3780) Decomposing the data into 35 points and 108 points, and then decomposing the 35 points into 5 points and 7 points respectively; 108 points are decomposed into 4 points and 27 points, and finally 27 points are decomposed into 9 points and 3 points. In other words, the first processing core 160 and the second processing core 170 perform 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform to complete the 3780-point input data D by the prime factor algorithm1~D3780Where 3, 4, 5, and 7 are co-prime and therefore do not require a rotation factor, but 3 and 9 are not co-prime and therefore require 27 (i.e., 9 × 3) rotation factors.
Therefore, the first processing core 160 is controlled by the first control command CM1 to perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms in sequence according to the in-phase operation data DOI and the quadrature intermediate data DQB, and to provide a plurality of in-phase intermediate data DIB and a plurality of in-phase transformed data DTI in sequence; similarly, the second processing core 170 is also controlled by the second control command CM2 to perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms according to the quadrature operation data DOQ and the in-phase intermediate data DIB, and to sequentially provide the quadrature intermediate data DQB and the plurality of quadrature transformed data DTQs. The first processing core 160 and the second processing core 170 may perform 3-point, 4-point, 5-point, 7-point, and 9-point fast fourier transforms using Winograd Small-N algorithm.
The second buffer circuit 190 is coupled to the address mapping circuit 140, the first processing core 160 and the second processing core 170 for buffering the in-phase conversion data DTI and the quadrature conversion data DTQ.
In light of the above, a significant amount of memory usage may be reduced since the non-pipelined fft processor 100 does not need to buffer intermediate results. Moreover, the first processing core 160 and the second processing core 170 of the present invention are fully reusable, i.e. the first processing core 160 and the second processing core 170 can perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms, thereby saving the number of logic gates of the non-pipelined fft processor 100.
FIG. 2 is a system diagram illustrating a first processing core and a second processing core according to an embodiment of the invention. Referring to fig. 1 and fig. 2, in the present embodiment, the first processing core 160 includes a first adder array 210, a first multiplier array 220 and a first register set 230. The first register set 230 is coupled to the control logic circuit 180 and controlled by the first control instruction CM1 to sequentially provide the in-phase intermediate data DIB to the second processing core 170 and output the in-phase converted data DTI.
The first adder array 210 is coupled to the control logic circuit 180 and the first register set 230, and receives the in-phase operation data DOI and the quadrature intermediate data DQB. The first adder array is controlled by a first control command CM1 to add the in-phase data DOI, the quadrature intermediate data DQB, and the data of the first register set 230, and store the addition result in the first register set 230. The first multiplier array 220 is coupled to the control logic circuit 180 and the first register bank 230, and is controlled by a first control instruction CM1 to multiply the data of the first register bank 230 and store the multiplied result in the first register bank 230.
The second processing core 170 includes a second adder array 240, a second multiplier array 250, and a second register bank 260. The second register set 260 is coupled to the control logic circuit 180 and controlled by a second control command CM2 to sequentially provide the quadrature intermediate data DQB to the first processing core 160 and output the quadrature converted data DTQ. The second adder array 240 is coupled to the control logic circuit 180 and the second register set 260, and receives the quadrature operational data DOQ and the in-phase intermediate data DIB. The second adder array 240 controlled by the second control instruction CM2 adds the quadrature data DOQ, the in-phase intermediate data DIB, and the data of the second register 260, and stores the addition result in the second register bank 260.
The second multiplier array 250 is coupled to the control logic circuit 180 and the second register bank 260, and is controlled by a second control instruction CM2 to multiply the data of the second register 260 and store the multiplied result in the second register bank 260.
FIG. 3 is a timing diagram illustrating operation control of the first processing core performing 5-point FFT of in-phase operation data according to an embodiment of the invention. Referring to fig. 2 and 3, in the present embodiment, each row represents a resource, each column represents an operation clock period, the operation and storage operations of each column correspond to a first control command CM1, and the first processing core 160 performs a 5-point fast fourier transform of the in-phase operation data DOI.
The in-phase operation data DOI includes a first operation data I1, a second operation data I2, a third operation data I3, a fourth operation data I4 and a fifth operation data I5, the quadrature intermediate data DQB includes a first quadrature intermediate data QB1 and a second quadrature intermediate data QB2, the first adder array 210 includes a first adder a11, a second adder a12, a third adder a13 and a fourth adder a14, the first multiplier array 220 includes a first multiplier M11, a second multiplier M12 and a third multiplier M13, and the first register group 230 includes a first register R11, a second register R12, a third register R13, a fourth register R14, a fifth register R15 and a sixth register R16.
In the first operation clock period (marked as "0"), the first adder a11 stores the first addition result a11 of the second operation data I2 and the fifth operation data I5 in the first register R11, the second adder a12 stores the first subtraction result b11 of the second operation data I2 minus the fifth operation data I5 in the second register R12, the third adder a13 stores the second addition result a12 of the third operation data I3 and the fourth operation data I4 in the third register R13, and the fourth adder a14 stores the second subtraction result b12 of the third operation data I3 minus the fourth operation data I4 in the fourth register R14.
In the second operation clock period (labeled "1"), the first adder a11 stores the third addition result a13 of the first subtraction result b11 of the second register R12 and the second subtraction result b12 of the fourth register R14 in the first register R11, the second adder a11 stores the third addition result b11 obtained by subtracting the second addition result a11 of the third register R11 from the first addition result a11 of the first register R11 in the second register R11, the third adder a11 stores the first addition result a11 of the first register R11 and the fourth addition result a11 of the second addition result a11 of the third register R11 in the third register R11, the first multiplier M11 stores the first addition result M of the first subtraction result b11 of the second register R11 multiplied by 786 in the fourth register R11, and the second multiplier M11 stores the fifth addition result b11 of the second subtraction result R11 in the fifth multiplier 11.
During a third operation clock period (labeled as "2"), the first adder a11 stores the first operation data I1 and a fifth addition result a15 of a fourth addition result a14 of the third register R13 in the first register R11, the first multiplier M11 stores a third multiplication result M13 of the third addition result a13 multiplied by 486 of the first register R11 in the second register R12, the second multiplier M12 stores a fourth multiplication result M14 of the third subtraction result b13 of the second register R12 multiplied by 286 in the third register R13, and the third multiplier M13 stores a fourth addition result a14 of the third register R13 multiplied by a fifth multiplication result M15 of 128 in the sixth register R16.
In a fourth operation clock period (labeled "3"), the first adder a11 stores a fourth subtraction result b14 obtained by subtracting the fifth multiplication result m15 of the sixth register R16 from the first operation data I1 in the second register R12, the second adder a12 stores a fifth subtraction result b15 obtained by subtracting the second multiplication result m12 of the fifth register R15 from the third multiplication result m13 of the second register R12 in the fourth register R14, and the third adder a13 stores a sixth subtraction result b16 obtained by subtracting the third multiplication result m13 of the second register R12 from the first multiplication result m11 of the fourth register R14 in the fifth register R15. The fifth subtraction result b15 and the sixth subtraction result b16 are provided as the first in-phase intermediate data IB1 and the second in-phase intermediate data IB2 of the in-phase intermediate data QIB.
In a fifth operation clock period (labeled "4"), the first adder a11 stores the sixth addition result a16 of the fourth subtraction result b14 of the second register R12 and the fourth multiplication result m14 of the third register R13 in the second register R12, and the second adder a12 stores the seventh subtraction result b17 of the fourth subtraction result b14 of the second register R12 minus the fourth multiplication result m14 of the third register R13 in the third register R13.
In a sixth operation clock period (labeled as "5"), the first adder a11 stores the sixth addition result a16 of the second register R12 and the seventh addition result a17 of the first quadrature intermediate data QB1 in the second register R12, the second adder a12 stores the seventh subtraction result b17 of the third register R13 and the eighth addition result a18 of the second quadrature intermediate data QB2 in the third register R13, the third adder a13 stores the eighth subtraction result b18 of the seventh subtraction result b17 of the third register R13 and the second quadrature intermediate data QB2 in the fourth register R14, and the fourth adder a14 stores the ninth subtraction result b19 of the sixth addition result a16 of the second register R12 and the first quadrature intermediate data QB1 in the fourth register R14.
After the sixth operation clock period, the fifth addition result a15, the seventh addition result a17, the eighth addition result a18, the eighth subtraction result b18 and the ninth subtraction result b19 are provided as the first in-phase conversion data IT1, the second in-phase conversion data IT2, the third in-phase conversion data IT3, the fourth in-phase conversion data IT4 and the fifth in-phase conversion data IT5 in the in-phase conversion data DTI. The first operation clock period (labeled "0"), the second operation clock period (labeled "1"), the third operation clock period (labeled "2"), the fourth operation clock period (labeled "3"), the fifth operation clock period (labeled "4"), and the sixth operation clock period (labeled "5") are arranged in this order.
FIG. 4 is a timing diagram illustrating operation control of the second processing core performing a 5-point fast Fourier transform of orthogonal operation data according to an embodiment of the invention. Referring to fig. 2 to 4, in the present embodiment, each row represents a resource, each column represents an operation clock period, the operation and storage operations of each column correspond to a second control command CM2, and the second processing core 170 performs 5-point fast fourier transform of the orthogonal operation data DOQ.
The quadrature operational data DOQ includes first operational data Q1, second operational data Q2, third operational data Q3, fourth operational data Q4 and fifth operational data Q5, the in-phase intermediate data DIB includes first in-phase intermediate data IB1 and second in-phase intermediate data IB2, the first adder array 210 includes a first adder a21, a second adder a22, a third adder a23 and a fourth adder a24, the first multiplier array 220 includes a first multiplier M21, a second multiplier M22 and a third multiplier M23, and the first register bank 230 includes a first register R21, a second register R22, a third register R23, a fourth register R24, a fifth register R25 and a sixth register R26.
In the first operation clock period (marked as "0"), the first adder a21 stores the first addition result a21 of the second operation data Q2 and the fifth operation data Q5 in the first register R21, the second adder a22 stores the first subtraction result b21 of the second operation data Q2 minus the fifth operation data Q5 in the second register R22, the third adder a23 stores the second addition result a22 of the third operation data Q3 and the fourth operation data Q4 in the third register R23, and the fourth adder a24 stores the second subtraction result b22 of the third operation data Q3 minus the fourth operation data Q4 in the fourth register R24.
In the second operation clock period (labeled "1"), the first adder a21 stores the third addition result a23 of the first subtraction result b21 of the second register R22 and the second subtraction result b22 of the fourth register R24 in the first register R21, the second adder a21 stores the third addition result b21 obtained by subtracting the second addition result a21 of the third register R21 from the first addition result a21 of the first register R21 in the second register R21, the third adder a21 stores the first addition result a21 of the first register R21 and the fourth addition result a21 of the second addition result a21 of the third register R21 in the third register R21, the first multiplier M21 stores the first addition result M of the first subtraction result b21 of the second register R21 multiplied by 786 in the fourth register R21, and the second multiplier M21 stores the fifth addition result b21 of the second subtraction result R21 in the fifth multiplier 21.
During a third operation clock period (labeled "2"), the first adder a21 stores the first operation data Q1 and a fifth addition result a25 of the fourth addition result a24 of the third register R23 in the first register R21, the first multiplier M21 stores a third multiplication result M23 of the third addition result a23 multiplied by 486 of the first register R21 in the second register R22, the second multiplier M22 stores a fourth multiplication result M24 of the third subtraction result b23 of the second register R22 multiplied by 286 in the third register R23, and the third multiplier M23 stores a fifth multiplication result M25 of the fourth addition result a24 of the third register R23 multiplied by 128 in a sixth register R26.
In a fourth operation clock period (labeled "3"), the first adder a21 stores a fourth subtraction result b24 obtained by subtracting the fifth multiplication result m25 of the sixth register R26 from the first operation data Q1 in the second register R22, the second adder a22 stores a fifth subtraction result b25 obtained by subtracting the second multiplication result m22 of the fifth register R25 from the third multiplication result m23 of the second register R22 in the fourth register R24, and the third adder a23 stores a sixth subtraction result b26 obtained by subtracting the third multiplication result m23 of the second register R22 from the first multiplication result m21 of the fourth register R24 in the fifth register R25. Wherein the fifth subtraction result b25 and the sixth subtraction result b26 are provided as the first quadrature intermediate data QB1 and the second quadrature intermediate data QB2 of the quadrature intermediate data DQB.
In a fifth operation clock period (labeled "4"), the first adder a21 stores the sixth addition result a26 of the fourth subtraction result b24 of the second register R22 and the fourth multiplication result m24 of the third register R23 in the second register R22, and the second adder a22 stores the seventh subtraction result b27 of the fourth subtraction result b24 of the second register R22 minus the fourth multiplication result m24 of the third register R23 in the third register R23.
In a sixth operation clock period (labeled "5"), the first adder a21 stores the eighth subtraction result b28 obtained by subtracting the first in-phase intermediate data IB1 from the sixth addition result a26 of the second register R22 in the second register R22, the second adder a22 stores the ninth subtraction result b29 obtained by subtracting the second in-phase intermediate data IB2 from the seventh subtraction result b27 of the third register R23 in the third register R23, the third adder a23 stores the seventh subtraction result b27 of the third register R23 and the seventh addition result a27 of the second in-phase intermediate data IB2 in the fourth register R24, and the fourth adder a24 stores the sixth addition result a26 of the second register R22 and the eighth addition result a28 of the first in-phase intermediate data IB1 in the fourth register R24.
Wherein, after the sixth operation clock period, the fifth addition result a25, the eighth subtraction result b28, the ninth subtraction result b29, the seventh addition result a27, and the eighth addition result a28 are provided as the first orthogonal conversion data QT1, the second orthogonal conversion data QT2, the third orthogonal conversion data QT3, the fourth orthogonal conversion data QT4, and the fifth orthogonal conversion data QT5 in the orthogonal conversion data DTQ. The first operation clock period (labeled "0"), the second operation clock period (labeled "1"), the third operation clock period (labeled "2"), the fourth operation clock period (labeled "3"), the fifth operation clock period (labeled "4"), and the sixth operation clock period (labeled "5") are arranged in this order.
FIG. 5 is a flowchart illustrating an operation control method of a non-pipelined FFT processor according to an embodiment of the invention. Referring to fig. 1, in the present embodiment, the operation control method includes the following steps. First, a first control command and a second control command are provided to a first processing core and a second processing core respectively through a control logic circuit (step S510). Then, the first processing core is controlled by the first control command, and the first processing core performs 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms in sequence according to the plurality of in-phase operation data and the plurality of orthogonal intermediate data from the second processing core to provide a plurality of in-phase intermediate data and a plurality of in-phase transformed data in sequence (step S520). Finally, the second processing core is controlled by the second control command, and performs 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms according to the plurality of orthogonal operation data and the in-phase intermediate data, and sequentially provides the orthogonal intermediate data and the plurality of orthogonal transform data (step S530). The sequence of steps S510, S520, and S530 is for illustration, and the embodiment of the invention is not limited thereto. The details of steps S510, S520, and S530 can be shown in the embodiments of fig. 1, fig. 2, fig. 3, and fig. 4, and are not repeated herein.
In summary, the non-pipelined fft processor and the operation control method thereof according to the present invention can reduce a large amount of memory usage because the non-pipelined fft processor does not need to cache intermediate results. In addition, the first processing core and the second processing core of the present invention are fully reused, i.e. the first processing core and the second processing core can perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms, thereby saving the number of logic gates of the circuit.
Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention.

Claims (17)

1. A non-pipelined fast fourier transform processor, comprising:
a control logic circuit;
a first processing core coupled to the control logic; and
a second processing core coupled to the control logic circuit and the first processing core;
wherein the control logic provides a first control instruction and a second control instruction to the first processing core and the second processing core, respectively,
the first processing core receives a plurality of in-phase operation data and a plurality of orthogonal intermediate data from the second processing core, and is controlled by the first control instruction to perform 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform in sequence according to the in-phase operation data and the orthogonal intermediate data, and to provide a plurality of in-phase intermediate data and a plurality of in-phase transformation data in sequence,
the second processing core receives a plurality of orthogonal operation data and the in-phase intermediate data, is controlled by the second control instruction to perform 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform according to the orthogonal operation data and the in-phase intermediate data, and sequentially provides the orthogonal intermediate data and a plurality of orthogonal transformation data.
2. The non-pipelined fast fourier transform processor of claim 1 wherein the first processing core and the second processing core perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms using a Winograd Small-N algorithm.
3. The non-pipelined fast fourier transform processor of claim 1, wherein the first processing core comprises:
a first register set for sequentially providing the in-phase intermediate data and the in-phase conversion data;
a first adder array coupled to the control logic circuit and the first register set and receiving the in-phase operation data and the quadrature intermediate data, the first adder array controlled by the first control instruction to add the in-phase operation data, the quadrature intermediate data and the data of the first register set, and storing the addition result in the first register set; and
the first multiplier array is coupled to the control logic circuit and the first register set, is controlled by the first control instruction to multiply the data of the first register set, and stores a multiplication result in the first register set.
4. The non-pipelined fast Fourier transform processor of claim 3, wherein the first processing core performs a 5-point fast Fourier transform of the in-phase operation data comprising a first operation data, a second operation data, a third operation data, a fourth operation data, and a fifth operation data, the quadrature intermediate data comprising a first quadrature intermediate data and a second quadrature intermediate data, and wherein,
during a first operation clock, a first adder stores a first addition result of the second operation data and the fifth operation data in a first register, a second adder stores a first subtraction result of the second operation data minus the fifth operation data in a second register, a third adder stores a second addition result of the third operation data and the fourth operation data in a third register, a fourth adder stores a second subtraction result of the third operation data minus the fourth operation data in a fourth register,
during a second operation clock, the first adder stores a third subtraction result of the first subtraction result of the second register and the second subtraction result of the fourth register in the first register, the second adder stores a third subtraction result of the first addition result of the first register minus the second addition result of the third register in the second register, the third adder stores a fourth addition result of the first register and the second addition result of the third register in the third register, a first multiplier stores a first multiplication result of the first subtraction result of the second register multiplied by 786 in the fourth register, a second multiplier stores a second multiplication result of the second subtraction result of the fourth register multiplied by 186 in a fifth register,
during a third operation clock, the first adder stores a fifth addition result of the first operation data and the fourth addition result of the third register in the first register, the first multiplier stores a third multiplication result of the third addition result of the first register multiplied by 486 in the second register, the second multiplier stores a fourth multiplication result of the third subtraction result of the second register multiplied by 286 in the third register, a third multiplier stores a fifth multiplication result of the fourth addition result of the third register multiplied by 128 in a sixth register,
during a fourth operation clock, the first adder stores a fourth subtraction result of the first operation data minus the fifth multiplication result of the sixth register in the second register, the second adder stores a fifth subtraction result of the third multiplication result of the second register minus the second multiplication result of the fifth register in the fourth register, the third adder stores a sixth subtraction result of the first multiplication result of the fourth register minus the third multiplication result of the second register in the fifth register, wherein the fifth subtraction result and the sixth subtraction result are provided as a first in-phase intermediate data and a second in-phase intermediate data of the in-phase intermediate data,
during a fifth operation clock, the first adder stores a sixth addition result of the fourth subtraction result of the second register and the fourth multiplication result of the third register in the second register, the second adder stores a seventh subtraction result of the fourth subtraction result of the second register minus the fourth multiplication result of the third register in the third register,
during a sixth operation clock, the first adder stores the sixth addition result of the second register and a seventh addition result of the first orthogonal intermediate data in the second register, the second adder stores the seventh subtraction result of the third register and an eighth addition result of the second orthogonal intermediate data in the third register, the third adder stores an eighth subtraction result of the seventh subtraction result of the third register minus the second orthogonal intermediate data in the fourth register, the fourth adder stores a ninth subtraction result of the sixth addition result of the second register minus the first orthogonal intermediate data in the fourth register,
wherein, after the sixth operation clock period, the fifth addition result, the seventh addition result, the eighth subtraction result and the ninth subtraction result are provided as a first in-phase conversion data, a second in-phase conversion data, a third in-phase conversion data, a fourth in-phase conversion data and a fifth in-phase conversion data among the in-phase conversion data.
5. The non-pipelined fast Fourier transform processor of claim 4, wherein the first operational clock period, the second operational clock period, the third operational clock period, the fourth operational clock period, the fifth operational clock period, and the sixth operational clock period are arranged in sequence.
6. The non-pipelined fast fourier transform processor of claim 1, wherein the second processing core comprises:
a second register set for sequentially providing the orthogonal intermediate data and the orthogonal transform data;
a second adder array coupled to the control logic circuit and the second register set and receiving the quadrature operation data and the in-phase intermediate data, the second adder array controlled by the second control instruction to add the quadrature operation data, the in-phase intermediate data and the data of the second register set, and storing the addition result in the second register set; and
and a second multiplier array coupled to the control logic circuit and the second register set and controlled by the second control instruction to multiply the data in the second register set and store the multiplied result in the second register set.
7. The non-pipelined fast Fourier transform processor of claim 6 wherein the second processing core performs a 5-point fast Fourier transform of the quadrature operational data comprising a first operational data, a second operational data, a third operational data, a fourth operational data and a fifth operational data, the in-phase intermediate data comprising a first in-phase intermediate data and a second in-phase intermediate data, and wherein,
during a first operation clock, a first adder stores a first addition result of the second operation data and the fifth operation data in a first register, a second adder stores a first subtraction result of the second operation data minus the fifth operation data in a second register, a third adder stores a second addition result of the third operation data and the fourth operation data in a third register, a fourth adder stores a second subtraction result of the third operation data minus the fourth operation data in a fourth register,
during a second operation clock, the first adder stores a third subtraction result of the first subtraction result of the second register and the second subtraction result of the fourth register in the first register, the second adder stores a third subtraction result of the first addition result of the first register minus the second addition result of the third register in the second register, the third adder stores a fourth addition result of the first register and the second addition result of the third register in the third register, a first multiplier stores a first multiplication result of the first subtraction result of the second register multiplied by 786 in the fourth register, a second multiplier stores a second multiplication result of the second subtraction result of the fourth register multiplied by 186 in a fifth register,
during a third operation clock, the first adder stores a fifth addition result of the first operation data and the fourth addition result of the third register in the first register, the first multiplier stores a third multiplication result of the third addition result of the first register multiplied by 486 in the second register, the second multiplier stores a fourth multiplication result of the third subtraction result of the second register multiplied by 286 in the third register, a third multiplier stores a fifth multiplication result of the fourth addition result of the third register multiplied by 128 in a sixth register,
during a fourth operation clock, the first adder stores a fourth subtraction result of the first operation data minus the fifth multiplication result of the sixth register in the second register, the second adder stores a fifth subtraction result of the third multiplication result of the second register minus the second multiplication result of the fifth register in the fourth register, the third adder stores a sixth subtraction result of the first multiplication result of the fourth register minus the third multiplication result of the second register in the fifth register, wherein the fifth subtraction result and the sixth subtraction result are provided as a first quadrature intermediate data and a second quadrature intermediate data of the quadrature intermediate data,
during a fifth operation clock, the first adder stores a sixth addition result of the fourth subtraction result of the second register and the fourth multiplication result of the third register in the second register, the second adder stores a seventh subtraction result of the fourth subtraction result of the second register minus the fourth multiplication result of the third register in the third register,
during a sixth operation clock, the first adder stores an eighth subtraction result of the sixth subtraction result of the second register minus the first in-phase intermediate data in the second register, the second adder stores a ninth subtraction result of the seventh subtraction result of the third register minus the second in-phase intermediate data in the third register, the third adder stores the seventh subtraction result of the third register and a seventh addition result of the second in-phase intermediate data in the fourth register, and the fourth adder stores the sixth addition result of the second register and an eighth addition result of the first in-phase intermediate data in the fourth register,
wherein, after the sixth operation clock period, the fifth addition result, the eighth subtraction result, the ninth subtraction result, the seventh addition result, and the eighth addition result are provided as a first orthogonal transform data, a second orthogonal transform data, a third orthogonal transform data, a fourth orthogonal transform data, and a fifth orthogonal transform data among the orthogonal transform data.
8. The non-pipelined fast Fourier transform processor of claim 7, wherein the first operational clock period, the second operational clock period, the third operational clock period, the fourth operational clock period, the fifth operational clock period, and the sixth operational clock period are arranged in sequence.
9. The non-pipelined fast fourier transform processor of claim 1, further comprising:
a state master control circuit;
an input/output control circuit, coupled to a storage unit and the state master control circuit, for receiving 3780-point input data, so as to be controlled by the state master control circuit to store or read the 3780-point input data in the storage unit; and
an address mapping circuit coupled to the storage unit and the state master control circuit, controlled by the state master control circuit to read the storage unit to provide the in-phase operation data and the quadrature operation data, and further to store the in-phase conversion data and the quadrature conversion data in the storage unit.
10. The non-pipelined fast fourier transform processor of claim 9, further comprising:
a first cache circuit coupled to the address mapping circuit, the first processing core and the second processing core for caching the in-phase operation data and the quadrature operation data;
a second buffer circuit coupled to the address mapping circuit, the first processing core and the second processing core for buffering the in-phase conversion data and the quadrature conversion data.
11. An operation control method for a non-pipelined fast fourier transform processor, comprising:
providing a first control instruction and a second control instruction to a first processing core and a second processing core respectively through a control logic circuit;
controlling the first processing core through the first control instruction, and performing 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform by the first processing core according to a plurality of in-phase operation data and a plurality of orthogonal intermediate data from the second processing core in sequence to provide a plurality of in-phase intermediate data and a plurality of in-phase transform data in sequence; and
the second processing core is controlled by the second control command, and performs 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform according to a plurality of orthogonal operation data and the in-phase intermediate data, and sequentially provides the orthogonal intermediate data and a plurality of orthogonal transform data.
12. The method of claim 11, wherein the first processing core and the second processing core perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms using Winograd Small-N algorithm.
13. The method of claim 11, wherein the in-phase operation data comprises a first operation data, a second operation data, a third operation data, a fourth operation data and a fifth operation data, the quadrature intermediate data comprises a first quadrature intermediate data and a second quadrature intermediate data, and the first processing core performs a 5-point fast fourier transform of the in-phase operation data comprises:
during a first operation clock, a first adder stores a first addition result of the second operation data and the fifth operation data in a first register, a second adder stores a first subtraction result of the second operation data minus the fifth operation data in a second register, a third adder stores a second addition result of the third operation data and the fourth operation data in a third register, and a fourth adder stores a second subtraction result of the third operation data minus the fourth operation data in a fourth register;
during a second operation clock, the first adder stores a third subtraction result of the first subtraction result of the second register and the second subtraction result of the fourth register in the first register, the second adder subtracts the second addition result of the third register from the first addition result of the first register to store a third subtraction result in the second register, the third adder stores a fourth addition result of the first register and the second addition result of the third register in the third register, a first multiplier stores a first multiplication result of the first subtraction result of the second register multiplied by 786 in the fourth register, and a second multiplier stores a second multiplication result of the second subtraction result of the fourth register multiplied by 186 in a fifth register;
during a third operation clock, the first adder stores a fifth addition result of the first operation data and the fourth addition result of the third register in the first register, the first multiplier stores a third multiplication result of the third addition result of the first register multiplied by 486 in the second register, the second multiplier stores a fourth multiplication result of the third subtraction result of the second register multiplied by 286 in the third register, and a third multiplier stores a fifth multiplication result of the fourth addition result of the third register multiplied by 128 in a sixth register;
during a fourth operation clock, the first adder stores a fourth subtraction result obtained by subtracting the fifth multiplication result of the sixth register from the first operation data in the second register, the second adder stores a fifth subtraction result obtained by subtracting the second multiplication result of the fifth register from the third multiplication result of the second register in the fourth register, the third adder stores a sixth subtraction result obtained by subtracting the third multiplication result of the second register from the first multiplication result of the fourth register in the fifth register, wherein the fifth subtraction result and the sixth subtraction result are provided as a first in-phase intermediate data and a second in-phase intermediate data of the in-phase intermediate data;
during a fifth operation clock, the first adder stores a sixth addition result of the fourth subtraction result of the second register and the fourth multiplication result of the third register in the second register, and the second adder stores a seventh subtraction result of the fourth subtraction result of the second register minus the fourth multiplication result of the third register in the third register; and
during a sixth operation clock, the first adder stores the sixth addition result of the second register and a seventh addition result of the first orthogonal intermediate data in the second register, the second adder stores the seventh subtraction result of the third register and an eighth addition result of the second orthogonal intermediate data in the third register, the third adder stores an eighth subtraction result of the seventh subtraction result of the third register minus the second orthogonal intermediate data in the fourth register, and the fourth adder stores a ninth subtraction result of the sixth addition result of the second register minus the first orthogonal intermediate data in the fourth register;
after the sixth operation clock period, the fifth addition result, the seventh addition result, the eighth subtraction result, and the ninth subtraction result are provided as a first in-phase-converted data, a second in-phase-converted data, a third in-phase-converted data, a fourth in-phase-converted data, and a fifth in-phase-converted data of the in-phase-converted data.
14. The method of claim 13, wherein the first operational clock period, the second operational clock period, the third operational clock period, the fourth operational clock period, the fifth operational clock period, and the sixth operational clock period are arranged in sequence.
15. The method of claim 11, wherein the quadrature data comprises a first operation data, a second operation data, a third operation data, a fourth operation data and a fifth operation data, the in-phase intermediate data comprises a first in-phase intermediate data and a second in-phase intermediate data, and the second processing core performs a 5-point fast fourier transform of the quadrature data, comprising:
during a first operation clock, a first adder stores a first addition result of the second operation data and the fifth operation data in a first register, a second adder stores a first subtraction result of the second operation data minus the fifth operation data in a second register, a third adder stores a second addition result of the third operation data and the fourth operation data in a third register, and a fourth adder stores a second subtraction result of the third operation data minus the fourth operation data in a fourth register;
during a second operation clock, the first adder stores a third subtraction result of the first subtraction result of the second register and the second subtraction result of the fourth register in the first register, the second adder subtracts the second addition result of the third register from the first addition result of the first register to store a third subtraction result in the second register, the third adder stores a fourth addition result of the first register and the second addition result of the third register in the third register, a first multiplier stores a first multiplication result of the first subtraction result of the second register multiplied by 786 in the fourth register, and a second multiplier stores a second multiplication result of the second subtraction result of the fourth register multiplied by 186 in a fifth register;
during a third operation clock, the first adder stores a fifth addition result of the first operation data and the fourth addition result of the third register in the first register, the first multiplier stores a third multiplication result of the third addition result of the first register multiplied by 486 in the second register, the second multiplier stores a fourth multiplication result of the third subtraction result of the second register multiplied by 286 in the third register, and a third multiplier stores a fifth multiplication result of the fourth addition result of the third register multiplied by 128 in a sixth register;
during a fourth operation clock, the first adder stores a fourth subtraction result of subtracting the fifth multiplication result of the sixth register from the first operation data in the second register, the second adder stores a fifth subtraction result of subtracting the second multiplication result of the fifth register from the third multiplication result of the second register in the fourth register, the third adder stores a sixth subtraction result of subtracting the third multiplication result of the second register from the first multiplication result of the fourth register in the fifth register, wherein the fifth subtraction result and the sixth subtraction result are provided as a first quadrature intermediate data and a second quadrature intermediate data of the quadrature intermediate data;
during a fifth operation clock, the first adder stores a sixth addition result of the fourth subtraction result of the second register and the fourth multiplication result of the third register in the second register, and the second adder stores a seventh subtraction result of the fourth subtraction result of the second register minus the fourth multiplication result of the third register in the third register; and
during a sixth operation clock, the first adder stores an eighth subtraction result obtained by subtracting the first in-phase intermediate data from the sixth subtraction result of the second register in the second register, the second adder stores a ninth subtraction result obtained by subtracting the second in-phase intermediate data from the seventh subtraction result of the third register in the third register, the third adder stores the seventh subtraction result of the third register and a seventh addition result of the second in-phase intermediate data in the fourth register, and the fourth adder stores the sixth addition result of the second register and an eighth addition result of the first in-phase intermediate data in the fourth register;
after the sixth operation clock period, the fifth addition result, the eighth subtraction result, the ninth subtraction result, the seventh addition result, and the eighth addition result are provided as a first orthogonally-converted data, a second orthogonally-converted data, a third orthogonally-converted data, a fourth orthogonally-converted data, and a fifth orthogonally-converted data among the orthogonally-converted data.
16. The method of claim 15, wherein the first operational clock period, the second operational clock period, the third operational clock period, the fourth operational clock period, the fifth operational clock period, and the sixth operational clock period are arranged in sequence.
17. The method of claim 11, further comprising:
the 3780-point input data is stored or read in the storage unit through an input/output control circuit; and
the storage unit is read through an address mapping circuit to provide the in-phase operation data and the quadrature operation data, and then the in-phase conversion data and the quadrature conversion data are stored in the storage unit.
CN201610177927.1A 2016-03-25 2016-03-25 Non-pipeline fast Fourier transform processor and operation control method thereof Active CN107229596B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610177927.1A CN107229596B (en) 2016-03-25 2016-03-25 Non-pipeline fast Fourier transform processor and operation control method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610177927.1A CN107229596B (en) 2016-03-25 2016-03-25 Non-pipeline fast Fourier transform processor and operation control method thereof

Publications (2)

Publication Number Publication Date
CN107229596A CN107229596A (en) 2017-10-03
CN107229596B true CN107229596B (en) 2020-07-31

Family

ID=59932702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610177927.1A Active CN107229596B (en) 2016-03-25 2016-03-25 Non-pipeline fast Fourier transform processor and operation control method thereof

Country Status (1)

Country Link
CN (1) CN107229596B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111338695B (en) * 2018-12-19 2022-05-17 中科寒武纪科技股份有限公司 Data processing method based on pipeline technology and related product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8615538B1 (en) * 2010-03-09 2013-12-24 The United States Of America As Represented By The Secretary Of The Navy Sub-filtering finite impulse response (FIR) filter for frequency search capability
CN103631759A (en) * 2012-08-22 2014-03-12 中兴通讯股份有限公司 Device and method for achieving fast Fourier transformation/discrete Fourier transformation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8615538B1 (en) * 2010-03-09 2013-12-24 The United States Of America As Represented By The Secretary Of The Navy Sub-filtering finite impulse response (FIR) filter for frequency search capability
CN103631759A (en) * 2012-08-22 2014-03-12 中兴通讯股份有限公司 Device and method for achieving fast Fourier transformation/discrete Fourier transformation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
On the baseband compensation of IQ imbalances in OFDM systems;Alireza Tarighat et al.;《2004 IEEE International Conference on Acoustics, Speech, and Signal Processing》;20040517;第1021-1024页 *
基于快速傅里叶变换的遥测信号载波估计算法;冯欣 等;《探测与控制学报》;20130831;第35卷(第4期);第36-39页,第43页 *

Also Published As

Publication number Publication date
CN107229596A (en) 2017-10-03

Similar Documents

Publication Publication Date Title
US10140251B2 (en) Processor and method for executing matrix multiplication operation on processor
US9275014B2 (en) Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods
CN103440121B (en) A kind of triangular matrix multiplication vectorization method of vector processor-oriented
US9104584B2 (en) Apparatus and method for performing a complex number operation using a single instruction multiple data (SIMD) architecture
US11874896B2 (en) Methods and apparatus for job scheduling in a programmable mixed-radix DFT/IDFT processor
KR20090018042A (en) Pipeline fft architecture and method
CN102065309B (en) DCT (Discrete Cosine Transform) realizing method and circuit
EP4102354B1 (en) Method, circuit, and soc for performing matrix multiplication operation
Kumar et al. Area and frequency optimized 1024 point Radix-2 FFT processor on FPGA
CN107229596B (en) Non-pipeline fast Fourier transform processor and operation control method thereof
CN114996638A (en) Configurable fast Fourier transform circuit with sequential architecture
CN112799634B (en) Based on base 2 2 MDC NTT structured high performance loop polynomial multiplier
US10127040B2 (en) Processor and method for executing memory access and computing instructions for host matrix operations
Cho et al. Pipelined FFT for wireless communications supporting 128–2048/1536-point transforms
CN104360986B (en) A kind of implementation method of parallelization matrix inversion hardware unit
Chin et al. Implementation of a two-dimensional FFT/IFFT processor for real-time high-resolution synthetic aperture radar imaging
US9087003B2 (en) Vector NCO and twiddle factor generator
CN111756478A (en) Method and device for realizing QR decomposition of matrix with low complexity
Karlsson et al. Cost-efficient mapping of 3-and 5-point DFTs to general baseband processors
CN111404858A (en) Efficient FFT processing method and device applied to broadband satellite communication system
CN109117454B (en) 3780-point fast Fourier transform processor and operating method thereof
CN1937605B (en) Phase position obtaining device
US20230237121A1 (en) Method for accelerating fast fourier transform based on field programmable gate array
CN115033205B (en) Low-delay high-precision constant value divider
Zhu et al. A configurable distributed systolic array for QR decomposition in MIMO-OFDM systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant