CN107229596B

CN107229596B - Non-pipeline fast Fourier transform processor and operation control method thereof

Info

Publication number: CN107229596B
Application number: CN201610177927.1A
Authority: CN
Inventors: 董旭
Original assignee: Ali Corp
Current assignee: Ali Corp
Priority date: 2016-03-25
Filing date: 2016-03-25
Publication date: 2020-07-31
Anticipated expiration: 2036-03-25
Also published as: CN107229596A

Abstract

The invention provides a non-pipeline fast Fourier transform processor and an operation control method thereof. The conversion processor includes a control logic circuit, a first processing core and a second processing core. The first processing core is coupled to the control logic circuit. The second processing core is coupled to the control logic circuit and the first processing core. The control logic circuit provides a first control instruction and a second control instruction to the first processing core and the second processing core respectively. The first processing core is controlled by a first control instruction to perform fast Fourier transform according to a plurality of in-phase operation data and a plurality of orthogonal intermediate data. The second processing core is controlled by a second control instruction to perform fast Fourier transform according to the plurality of orthogonal operation data and the plurality of in-phase intermediate data.

Description

Non-pipeline fast Fourier transform processor and operation control method thereof

Technical Field

The present invention relates to a fast fourier transform processor, and more particularly, to a non-pipelined fast fourier transform processor and an operation control method thereof.

Background

The terrestrial digital multimedia/television broadcasting system (DTMB) gradually becomes the digital multimedia/television transmission standard in china due to its advantages of high transmission efficiency or spectrum efficiency, strong anti-multipath interference capability, good signal channel estimation performance, suitability for mobile reception, and the like. Moreover, the 3780 point fourier transform (FFT) and inverse fourier transform (IFFT) modules have become one of the important modules of the chinese terrestrial digital multimedia/television broadcasting system. Since the above modules cannot directly use the mature base-2 (base-2) and base-4 (base-4) algorithms for conversion, the 3780 point fourier transform and inverse fourier transform modules need an algorithm and hardware circuit implementation method with good calculation efficiency and reasonable hardware resources.

Disclosure of Invention

The invention provides a non-pipeline fast Fourier transform processor and an operation control method thereof, and the non-pipeline fast Fourier transform processor can reduce hardware cost.

The invention relates to a non-pipeline fast Fourier transform processor, which comprises a control logic circuit, a first processing core and a second processing core. The first processing core is coupled to the control logic circuit. The second processing core is coupled to the control logic circuit and the first processing core. The control logic circuit provides a first control instruction and a second control instruction to the first processing core and the second processing core respectively. The first processing core receives a plurality of in-phase operation data and a plurality of orthogonal intermediate data from the second processing core, and is controlled by a first control instruction to perform 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform in sequence according to the in-phase operation data and the orthogonal intermediate data, and to provide a plurality of in-phase intermediate data and a plurality of in-phase transformation data in sequence. The second processing core receives a plurality of orthogonal operation data and the in-phase intermediate data, is controlled by a second control instruction to perform 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform according to the orthogonal operation data and the in-phase intermediate data, and sequentially provides the orthogonal intermediate data and a plurality of orthogonal transformation data.

The operation control method of the non-pipeline type fast Fourier transform processor comprises the following steps. A first control instruction and a second control command are provided to a first processing core and a second processing core respectively through a control logic circuit. The first processing core is controlled through a first control instruction, and the first processing core sequentially performs 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform according to a plurality of in-phase operation data and a plurality of orthogonal intermediate data from the second processing core to sequentially provide a plurality of in-phase intermediate data and a plurality of in-phase conversion data. The second processing core is controlled by the second control instruction, and performs 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform according to a plurality of orthogonal operation data and the in-phase intermediate data, and sequentially provides the orthogonal intermediate data and a plurality of orthogonal transformation data.

Based on the above, the non-pipelined fft processor and the operation control method thereof of the present invention can reduce the use of a large amount of memory because the non-pipelined fft processor does not need to cache intermediate results. In addition, the first processing core and the second processing core of the present invention are fully reused, i.e. the first processing core and the second processing core can perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms, thereby saving the number of logic gates of the circuit.

In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

FIG. 1 is a system diagram of a non-pipelined fast Fourier transform processor according to one embodiment of the invention.

FIG. 2 is a system diagram illustrating a first processing core and a second processing core according to an embodiment of the invention.

FIG. 3 is a timing diagram illustrating operation control of the first processing core performing 5-point FFT of in-phase operation data according to an embodiment of the invention.

FIG. 4 is a timing diagram illustrating operation control of the second processing core performing a 5-point fast Fourier transform of orthogonal operation data according to an embodiment of the invention.

FIG. 5 is a flowchart illustrating an operation control method of a non-pipelined FFT processor according to an embodiment of the invention.

Description of the reference numerals

100: non-pipelined fast Fourier transform processor

110: state master control circuit

120: input/output control circuit

130: storage unit

140: address mapping circuit

150: first buffer circuit

160: a first processing core

170: a second processing core

180: control logic circuit

190: second buffer circuit

210: first adder array

220: first multiplier array

230: first register group

240: second adder array

250: second multiplier array

260: second register set

A11, A21: first adder

a11, a 21: first addition result

A12, A22: second adder

a12, a 22: second addition result

A13, A23: third adder

a13, a 23: third phase addition

A14, A24: fourth adder

a14, a 24: fourth addition result

a15, a 25: fifth addition result

a16, a 26: sixth addition result

a17, b 27: seventh addition result

a18, a 28: eighth addition result

b11, b 21: first subtraction result

b12, b 22: second subtraction result

b13, b 23: third phase reduction result

b14, b 24: fourth subtraction result

b15, b 25: fifth subtraction result

b16, b 26: sixth subtraction result

b17, b 27: the seventh subtraction result

b18, b 28: eighth subtraction result

b19, b 29: ninth subtraction result

CM 1: a first control instruction

CM 2: a second control instruction

D₁～D₃₇₈₀: inputting data

DIB: in-phase intermediate data

DOI: in-phase operation data

DOQ: orthogonal operation data

DQB: orthogonal intermediate data

DTI: in-phase converted data

DTQ: quadrature converting data

I1, Q1: first operation data

I2, Q2: second operation data

I3, Q3: third operation data

I4, Q4: fourth operation data

I5, Q5: fifth operation data

IB 1: first in-phase intermediate data

IB 2: second in-phase intermediate data

IT 1: first phase-inversion data

IT 2: second in-phase converted data

IT 3: third in-phase converted data

IT 4: fourth in-phase converted data

IT 5: fifth in-phase conversion data

m11, m 21: first multiplication result

M11, M21: first multiplier

m12, m 22: second multiplication result

M12, M22: second multiplier

m13, m 23: third phase multiplication result

M13, M23: third multiplier

m14, m 24: the fourth multiplication result

m15, m 25: result of the fifth multiplication

QB 1: first orthogonal intermediate data

QB 2: second orthogonal intermediate data

QT 1: first quadrature-converted data

QT 2: second quadrature-converted data

QT 3: third orthogonally converted data

QT 4: fourth quadrature-converted data

QT 5: fifth quadrature converted data

R11, R21: first register

R12, R22: second register

R13, R23: third register

R14, R24: fourth register

R15, R25: the fifth register

R16, R26: the sixth register

Steps S510, S520, S530:

Detailed Description

FIG. 1 is a system diagram of a non-pipelined fast Fourier transform processor according to one embodiment of the invention. Referring to fig. 1, in the present embodiment, a non-pipelined fft processor 100 includes a state master circuit 110, an input/output control circuit 120, a storage unit 130, an address mapping circuit 140, a first buffer circuit 150, a first processing core 160, a second processing core 170, a control logic circuit 180, and a second buffer circuit 190.

The state control circuit 110 is coupled to the input/output control circuit 120 and the address mapping circuit 140 for controlling data reading and data transmission operations of the input/output control circuit 120 and the address mapping circuit 140. The input/output control circuit 120 is coupled to the storage unit 130 and the state control circuit 110, and receives 3780-point input data D₁～D₃₇₈₀For storing or reading 3780-point input data D controlled by the state master circuit 110₁～D₃₇₈₀In the storage unit 130. The address mapping circuit 140 is coupled to the storage unit 130 and the state master circuit 110, and is controlled by the state master circuit 110 to read the storage unit 130 to provide the in-phase operation data DOI and the quadrature operation data DOQ, and further store the in-phase conversion data DTI and the quadrature conversion data DTQ in the storage unit 130.

The first cache circuit 150 is coupled to the address mapping circuit 140, the first processing core 160 and the second processing core 170 for caching the in-phase operation data DOI and the quadrature operation data DOQ. The first processing core 160 is coupled to the first cache circuit 150, the second processing core 170 and the control logic circuit 180, for receiving the in-phase operation data DOI from the first cache circuit 150, the plurality of quadrature intermediate data DQB from the second processing core 170 and a first control command CM1 provided by the control logic circuit 180. The second processing core 170 is coupled to the first buffer circuit 150, the first processing core 160 and the control logic circuit 180 for receiving the quadrature data DOQ from the first buffer circuit 150, the plurality of in-phase intermediate data DIB from the first processing core 160 and the control logic circuit 180.

In this embodiment, a Prime Factor Algorithm (PFA) is used to input 3780 points (corresponding to 3780 points) into the data D₁～D₃₇₈₀) Decomposing the data into 35 points and 108 points, and then decomposing the 35 points into 5 points and 7 points respectively; 108 points are decomposed into 4 points and 27 points, and finally 27 points are decomposed into 9 points and 3 points. In other words, the first processing core 160 and the second processing core 170 perform 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform to complete the 3780-point input data D by the prime factor algorithm₁～D₃₇₈₀Where 3, 4, 5, and 7 are co-prime and therefore do not require a rotation factor, but 3 and 9 are not co-prime and therefore require 27 (i.e., 9 × 3) rotation factors.

Therefore, the first processing core 160 is controlled by the first control command CM1 to perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms in sequence according to the in-phase operation data DOI and the quadrature intermediate data DQB, and to provide a plurality of in-phase intermediate data DIB and a plurality of in-phase transformed data DTI in sequence; similarly, the second processing core 170 is also controlled by the second control command CM2 to perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms according to the quadrature operation data DOQ and the in-phase intermediate data DIB, and to sequentially provide the quadrature intermediate data DQB and the plurality of quadrature transformed data DTQs. The first processing core 160 and the second processing core 170 may perform 3-point, 4-point, 5-point, 7-point, and 9-point fast fourier transforms using Winograd Small-N algorithm.

The second buffer circuit 190 is coupled to the address mapping circuit 140, the first processing core 160 and the second processing core 170 for buffering the in-phase conversion data DTI and the quadrature conversion data DTQ.

In light of the above, a significant amount of memory usage may be reduced since the non-pipelined fft processor 100 does not need to buffer intermediate results. Moreover, the first processing core 160 and the second processing core 170 of the present invention are fully reusable, i.e. the first processing core 160 and the second processing core 170 can perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms, thereby saving the number of logic gates of the non-pipelined fft processor 100.

FIG. 2 is a system diagram illustrating a first processing core and a second processing core according to an embodiment of the invention. Referring to fig. 1 and fig. 2, in the present embodiment, the first processing core 160 includes a first adder array 210, a first multiplier array 220 and a first register set 230. The first register set 230 is coupled to the control logic circuit 180 and controlled by the first control instruction CM1 to sequentially provide the in-phase intermediate data DIB to the second processing core 170 and output the in-phase converted data DTI.

The first adder array 210 is coupled to the control logic circuit 180 and the first register set 230, and receives the in-phase operation data DOI and the quadrature intermediate data DQB. The first adder array is controlled by a first control command CM1 to add the in-phase data DOI, the quadrature intermediate data DQB, and the data of the first register set 230, and store the addition result in the first register set 230. The first multiplier array 220 is coupled to the control logic circuit 180 and the first register bank 230, and is controlled by a first control instruction CM1 to multiply the data of the first register bank 230 and store the multiplied result in the first register bank 230.

The second processing core 170 includes a second adder array 240, a second multiplier array 250, and a second register bank 260. The second register set 260 is coupled to the control logic circuit 180 and controlled by a second control command CM2 to sequentially provide the quadrature intermediate data DQB to the first processing core 160 and output the quadrature converted data DTQ. The second adder array 240 is coupled to the control logic circuit 180 and the second register set 260, and receives the quadrature operational data DOQ and the in-phase intermediate data DIB. The second adder array 240 controlled by the second control instruction CM2 adds the quadrature data DOQ, the in-phase intermediate data DIB, and the data of the second register 260, and stores the addition result in the second register bank 260.

The second multiplier array 250 is coupled to the control logic circuit 180 and the second register bank 260, and is controlled by a second control instruction CM2 to multiply the data of the second register 260 and store the multiplied result in the second register bank 260.

FIG. 3 is a timing diagram illustrating operation control of the first processing core performing 5-point FFT of in-phase operation data according to an embodiment of the invention. Referring to fig. 2 and 3, in the present embodiment, each row represents a resource, each column represents an operation clock period, the operation and storage operations of each column correspond to a first control command CM1, and the first processing core 160 performs a 5-point fast fourier transform of the in-phase operation data DOI.

The in-phase operation data DOI includes a first operation data I1, a second operation data I2, a third operation data I3, a fourth operation data I4 and a fifth operation data I5, the quadrature intermediate data DQB includes a first quadrature intermediate data QB1 and a second quadrature intermediate data QB2, the first adder array 210 includes a first adder a11, a second adder a12, a third adder a13 and a fourth adder a14, the first multiplier array 220 includes a first multiplier M11, a second multiplier M12 and a third multiplier M13, and the first register group 230 includes a first register R11, a second register R12, a third register R13, a fourth register R14, a fifth register R15 and a sixth register R16.

In the first operation clock period (marked as "0"), the first adder a11 stores the first addition result a11 of the second operation data I2 and the fifth operation data I5 in the first register R11, the second adder a12 stores the first subtraction result b11 of the second operation data I2 minus the fifth operation data I5 in the second register R12, the third adder a13 stores the second addition result a12 of the third operation data I3 and the fourth operation data I4 in the third register R13, and the fourth adder a14 stores the second subtraction result b12 of the third operation data I3 minus the fourth operation data I4 in the fourth register R14.

In the second operation clock period (labeled "1"), the first adder a11 stores the third addition result a13 of the first subtraction result b11 of the second register R12 and the second subtraction result b12 of the fourth register R14 in the first register R11, the second adder a11 stores the third addition result b11 obtained by subtracting the second addition result a11 of the third register R11 from the first addition result a11 of the first register R11 in the second register R11, the third adder a11 stores the first addition result a11 of the first register R11 and the fourth addition result a11 of the second addition result a11 of the third register R11 in the third register R11, the first multiplier M11 stores the first addition result M of the first subtraction result b11 of the second register R11 multiplied by 786 in the fourth register R11, and the second multiplier M11 stores the fifth addition result b11 of the second subtraction result R11 in the fifth multiplier 11.

During a third operation clock period (labeled as "2"), the first adder a11 stores the first operation data I1 and a fifth addition result a15 of a fourth addition result a14 of the third register R13 in the first register R11, the first multiplier M11 stores a third multiplication result M13 of the third addition result a13 multiplied by 486 of the first register R11 in the second register R12, the second multiplier M12 stores a fourth multiplication result M14 of the third subtraction result b13 of the second register R12 multiplied by 286 in the third register R13, and the third multiplier M13 stores a fourth addition result a14 of the third register R13 multiplied by a fifth multiplication result M15 of 128 in the sixth register R16.

In a fourth operation clock period (labeled "3"), the first adder a11 stores a fourth subtraction result b14 obtained by subtracting the fifth multiplication result m15 of the sixth register R16 from the first operation data I1 in the second register R12, the second adder a12 stores a fifth subtraction result b15 obtained by subtracting the second multiplication result m12 of the fifth register R15 from the third multiplication result m13 of the second register R12 in the fourth register R14, and the third adder a13 stores a sixth subtraction result b16 obtained by subtracting the third multiplication result m13 of the second register R12 from the first multiplication result m11 of the fourth register R14 in the fifth register R15. The fifth subtraction result b15 and the sixth subtraction result b16 are provided as the first in-phase intermediate data IB1 and the second in-phase intermediate data IB2 of the in-phase intermediate data QIB.

In a fifth operation clock period (labeled "4"), the first adder a11 stores the sixth addition result a16 of the fourth subtraction result b14 of the second register R12 and the fourth multiplication result m14 of the third register R13 in the second register R12, and the second adder a12 stores the seventh subtraction result b17 of the fourth subtraction result b14 of the second register R12 minus the fourth multiplication result m14 of the third register R13 in the third register R13.

In a sixth operation clock period (labeled as "5"), the first adder a11 stores the sixth addition result a16 of the second register R12 and the seventh addition result a17 of the first quadrature intermediate data QB1 in the second register R12, the second adder a12 stores the seventh subtraction result b17 of the third register R13 and the eighth addition result a18 of the second quadrature intermediate data QB2 in the third register R13, the third adder a13 stores the eighth subtraction result b18 of the seventh subtraction result b17 of the third register R13 and the second quadrature intermediate data QB2 in the fourth register R14, and the fourth adder a14 stores the ninth subtraction result b19 of the sixth addition result a16 of the second register R12 and the first quadrature intermediate data QB1 in the fourth register R14.

After the sixth operation clock period, the fifth addition result a15, the seventh addition result a17, the eighth addition result a18, the eighth subtraction result b18 and the ninth subtraction result b19 are provided as the first in-phase conversion data IT1, the second in-phase conversion data IT2, the third in-phase conversion data IT3, the fourth in-phase conversion data IT4 and the fifth in-phase conversion data IT5 in the in-phase conversion data DTI. The first operation clock period (labeled "0"), the second operation clock period (labeled "1"), the third operation clock period (labeled "2"), the fourth operation clock period (labeled "3"), the fifth operation clock period (labeled "4"), and the sixth operation clock period (labeled "5") are arranged in this order.

FIG. 4 is a timing diagram illustrating operation control of the second processing core performing a 5-point fast Fourier transform of orthogonal operation data according to an embodiment of the invention. Referring to fig. 2 to 4, in the present embodiment, each row represents a resource, each column represents an operation clock period, the operation and storage operations of each column correspond to a second control command CM2, and the second processing core 170 performs 5-point fast fourier transform of the orthogonal operation data DOQ.

The quadrature operational data DOQ includes first operational data Q1, second operational data Q2, third operational data Q3, fourth operational data Q4 and fifth operational data Q5, the in-phase intermediate data DIB includes first in-phase intermediate data IB1 and second in-phase intermediate data IB2, the first adder array 210 includes a first adder a21, a second adder a22, a third adder a23 and a fourth adder a24, the first multiplier array 220 includes a first multiplier M21, a second multiplier M22 and a third multiplier M23, and the first register bank 230 includes a first register R21, a second register R22, a third register R23, a fourth register R24, a fifth register R25 and a sixth register R26.

In the first operation clock period (marked as "0"), the first adder a21 stores the first addition result a21 of the second operation data Q2 and the fifth operation data Q5 in the first register R21, the second adder a22 stores the first subtraction result b21 of the second operation data Q2 minus the fifth operation data Q5 in the second register R22, the third adder a23 stores the second addition result a22 of the third operation data Q3 and the fourth operation data Q4 in the third register R23, and the fourth adder a24 stores the second subtraction result b22 of the third operation data Q3 minus the fourth operation data Q4 in the fourth register R24.

In the second operation clock period (labeled "1"), the first adder a21 stores the third addition result a23 of the first subtraction result b21 of the second register R22 and the second subtraction result b22 of the fourth register R24 in the first register R21, the second adder a21 stores the third addition result b21 obtained by subtracting the second addition result a21 of the third register R21 from the first addition result a21 of the first register R21 in the second register R21, the third adder a21 stores the first addition result a21 of the first register R21 and the fourth addition result a21 of the second addition result a21 of the third register R21 in the third register R21, the first multiplier M21 stores the first addition result M of the first subtraction result b21 of the second register R21 multiplied by 786 in the fourth register R21, and the second multiplier M21 stores the fifth addition result b21 of the second subtraction result R21 in the fifth multiplier 21.

During a third operation clock period (labeled "2"), the first adder a21 stores the first operation data Q1 and a fifth addition result a25 of the fourth addition result a24 of the third register R23 in the first register R21, the first multiplier M21 stores a third multiplication result M23 of the third addition result a23 multiplied by 486 of the first register R21 in the second register R22, the second multiplier M22 stores a fourth multiplication result M24 of the third subtraction result b23 of the second register R22 multiplied by 286 in the third register R23, and the third multiplier M23 stores a fifth multiplication result M25 of the fourth addition result a24 of the third register R23 multiplied by 128 in a sixth register R26.

In a fourth operation clock period (labeled "3"), the first adder a21 stores a fourth subtraction result b24 obtained by subtracting the fifth multiplication result m25 of the sixth register R26 from the first operation data Q1 in the second register R22, the second adder a22 stores a fifth subtraction result b25 obtained by subtracting the second multiplication result m22 of the fifth register R25 from the third multiplication result m23 of the second register R22 in the fourth register R24, and the third adder a23 stores a sixth subtraction result b26 obtained by subtracting the third multiplication result m23 of the second register R22 from the first multiplication result m21 of the fourth register R24 in the fifth register R25. Wherein the fifth subtraction result b25 and the sixth subtraction result b26 are provided as the first quadrature intermediate data QB1 and the second quadrature intermediate data QB2 of the quadrature intermediate data DQB.

In a fifth operation clock period (labeled "4"), the first adder a21 stores the sixth addition result a26 of the fourth subtraction result b24 of the second register R22 and the fourth multiplication result m24 of the third register R23 in the second register R22, and the second adder a22 stores the seventh subtraction result b27 of the fourth subtraction result b24 of the second register R22 minus the fourth multiplication result m24 of the third register R23 in the third register R23.

In a sixth operation clock period (labeled "5"), the first adder a21 stores the eighth subtraction result b28 obtained by subtracting the first in-phase intermediate data IB1 from the sixth addition result a26 of the second register R22 in the second register R22, the second adder a22 stores the ninth subtraction result b29 obtained by subtracting the second in-phase intermediate data IB2 from the seventh subtraction result b27 of the third register R23 in the third register R23, the third adder a23 stores the seventh subtraction result b27 of the third register R23 and the seventh addition result a27 of the second in-phase intermediate data IB2 in the fourth register R24, and the fourth adder a24 stores the sixth addition result a26 of the second register R22 and the eighth addition result a28 of the first in-phase intermediate data IB1 in the fourth register R24.

Wherein, after the sixth operation clock period, the fifth addition result a25, the eighth subtraction result b28, the ninth subtraction result b29, the seventh addition result a27, and the eighth addition result a28 are provided as the first orthogonal conversion data QT1, the second orthogonal conversion data QT2, the third orthogonal conversion data QT3, the fourth orthogonal conversion data QT4, and the fifth orthogonal conversion data QT5 in the orthogonal conversion data DTQ. The first operation clock period (labeled "0"), the second operation clock period (labeled "1"), the third operation clock period (labeled "2"), the fourth operation clock period (labeled "3"), the fifth operation clock period (labeled "4"), and the sixth operation clock period (labeled "5") are arranged in this order.

FIG. 5 is a flowchart illustrating an operation control method of a non-pipelined FFT processor according to an embodiment of the invention. Referring to fig. 1, in the present embodiment, the operation control method includes the following steps. First, a first control command and a second control command are provided to a first processing core and a second processing core respectively through a control logic circuit (step S510). Then, the first processing core is controlled by the first control command, and the first processing core performs 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms in sequence according to the plurality of in-phase operation data and the plurality of orthogonal intermediate data from the second processing core to provide a plurality of in-phase intermediate data and a plurality of in-phase transformed data in sequence (step S520). Finally, the second processing core is controlled by the second control command, and performs 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms according to the plurality of orthogonal operation data and the in-phase intermediate data, and sequentially provides the orthogonal intermediate data and the plurality of orthogonal transform data (step S530). The sequence of steps S510, S520, and S530 is for illustration, and the embodiment of the invention is not limited thereto. The details of steps S510, S520, and S530 can be shown in the embodiments of fig. 1, fig. 2, fig. 3, and fig. 4, and are not repeated herein.

In summary, the non-pipelined fft processor and the operation control method thereof according to the present invention can reduce a large amount of memory usage because the non-pipelined fft processor does not need to cache intermediate results. In addition, the first processing core and the second processing core of the present invention are fully reused, i.e. the first processing core and the second processing core can perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms, thereby saving the number of logic gates of the circuit.

Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A non-pipelined fast fourier transform processor, comprising:

a control logic circuit;

a first processing core coupled to the control logic; and

a second processing core coupled to the control logic circuit and the first processing core;

wherein the control logic provides a first control instruction and a second control instruction to the first processing core and the second processing core, respectively,

the first processing core receives a plurality of in-phase operation data and a plurality of orthogonal intermediate data from the second processing core, and is controlled by the first control instruction to perform 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform in sequence according to the in-phase operation data and the orthogonal intermediate data, and to provide a plurality of in-phase intermediate data and a plurality of in-phase transformation data in sequence,

the second processing core receives a plurality of orthogonal operation data and the in-phase intermediate data, is controlled by the second control instruction to perform 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform according to the orthogonal operation data and the in-phase intermediate data, and sequentially provides the orthogonal intermediate data and a plurality of orthogonal transformation data.

2. The non-pipelined fast fourier transform processor of claim 1 wherein the first processing core and the second processing core perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms using a Winograd Small-N algorithm.

3. The non-pipelined fast fourier transform processor of claim 1, wherein the first processing core comprises:

a first register set for sequentially providing the in-phase intermediate data and the in-phase conversion data;

a first adder array coupled to the control logic circuit and the first register set and receiving the in-phase operation data and the quadrature intermediate data, the first adder array controlled by the first control instruction to add the in-phase operation data, the quadrature intermediate data and the data of the first register set, and storing the addition result in the first register set; and

the first multiplier array is coupled to the control logic circuit and the first register set, is controlled by the first control instruction to multiply the data of the first register set, and stores a multiplication result in the first register set.

4. The non-pipelined fast Fourier transform processor of claim 3, wherein the first processing core performs a 5-point fast Fourier transform of the in-phase operation data comprising a first operation data, a second operation data, a third operation data, a fourth operation data, and a fifth operation data, the quadrature intermediate data comprising a first quadrature intermediate data and a second quadrature intermediate data, and wherein,

during a first operation clock, a first adder stores a first addition result of the second operation data and the fifth operation data in a first register, a second adder stores a first subtraction result of the second operation data minus the fifth operation data in a second register, a third adder stores a second addition result of the third operation data and the fourth operation data in a third register, a fourth adder stores a second subtraction result of the third operation data minus the fourth operation data in a fourth register,

during a second operation clock, the first adder stores a third subtraction result of the first subtraction result of the second register and the second subtraction result of the fourth register in the first register, the second adder stores a third subtraction result of the first addition result of the first register minus the second addition result of the third register in the second register, the third adder stores a fourth addition result of the first register and the second addition result of the third register in the third register, a first multiplier stores a first multiplication result of the first subtraction result of the second register multiplied by 786 in the fourth register, a second multiplier stores a second multiplication result of the second subtraction result of the fourth register multiplied by 186 in a fifth register,

during a third operation clock, the first adder stores a fifth addition result of the first operation data and the fourth addition result of the third register in the first register, the first multiplier stores a third multiplication result of the third addition result of the first register multiplied by 486 in the second register, the second multiplier stores a fourth multiplication result of the third subtraction result of the second register multiplied by 286 in the third register, a third multiplier stores a fifth multiplication result of the fourth addition result of the third register multiplied by 128 in a sixth register,

during a fourth operation clock, the first adder stores a fourth subtraction result of the first operation data minus the fifth multiplication result of the sixth register in the second register, the second adder stores a fifth subtraction result of the third multiplication result of the second register minus the second multiplication result of the fifth register in the fourth register, the third adder stores a sixth subtraction result of the first multiplication result of the fourth register minus the third multiplication result of the second register in the fifth register, wherein the fifth subtraction result and the sixth subtraction result are provided as a first in-phase intermediate data and a second in-phase intermediate data of the in-phase intermediate data,

during a fifth operation clock, the first adder stores a sixth addition result of the fourth subtraction result of the second register and the fourth multiplication result of the third register in the second register, the second adder stores a seventh subtraction result of the fourth subtraction result of the second register minus the fourth multiplication result of the third register in the third register,

during a sixth operation clock, the first adder stores the sixth addition result of the second register and a seventh addition result of the first orthogonal intermediate data in the second register, the second adder stores the seventh subtraction result of the third register and an eighth addition result of the second orthogonal intermediate data in the third register, the third adder stores an eighth subtraction result of the seventh subtraction result of the third register minus the second orthogonal intermediate data in the fourth register, the fourth adder stores a ninth subtraction result of the sixth addition result of the second register minus the first orthogonal intermediate data in the fourth register,

wherein, after the sixth operation clock period, the fifth addition result, the seventh addition result, the eighth subtraction result and the ninth subtraction result are provided as a first in-phase conversion data, a second in-phase conversion data, a third in-phase conversion data, a fourth in-phase conversion data and a fifth in-phase conversion data among the in-phase conversion data.

5. The non-pipelined fast Fourier transform processor of claim 4, wherein the first operational clock period, the second operational clock period, the third operational clock period, the fourth operational clock period, the fifth operational clock period, and the sixth operational clock period are arranged in sequence.

6. The non-pipelined fast fourier transform processor of claim 1, wherein the second processing core comprises:

a second register set for sequentially providing the orthogonal intermediate data and the orthogonal transform data;

a second adder array coupled to the control logic circuit and the second register set and receiving the quadrature operation data and the in-phase intermediate data, the second adder array controlled by the second control instruction to add the quadrature operation data, the in-phase intermediate data and the data of the second register set, and storing the addition result in the second register set; and

and a second multiplier array coupled to the control logic circuit and the second register set and controlled by the second control instruction to multiply the data in the second register set and store the multiplied result in the second register set.

7. The non-pipelined fast Fourier transform processor of claim 6 wherein the second processing core performs a 5-point fast Fourier transform of the quadrature operational data comprising a first operational data, a second operational data, a third operational data, a fourth operational data and a fifth operational data, the in-phase intermediate data comprising a first in-phase intermediate data and a second in-phase intermediate data, and wherein,

during a fourth operation clock, the first adder stores a fourth subtraction result of the first operation data minus the fifth multiplication result of the sixth register in the second register, the second adder stores a fifth subtraction result of the third multiplication result of the second register minus the second multiplication result of the fifth register in the fourth register, the third adder stores a sixth subtraction result of the first multiplication result of the fourth register minus the third multiplication result of the second register in the fifth register, wherein the fifth subtraction result and the sixth subtraction result are provided as a first quadrature intermediate data and a second quadrature intermediate data of the quadrature intermediate data,

during a sixth operation clock, the first adder stores an eighth subtraction result of the sixth subtraction result of the second register minus the first in-phase intermediate data in the second register, the second adder stores a ninth subtraction result of the seventh subtraction result of the third register minus the second in-phase intermediate data in the third register, the third adder stores the seventh subtraction result of the third register and a seventh addition result of the second in-phase intermediate data in the fourth register, and the fourth adder stores the sixth addition result of the second register and an eighth addition result of the first in-phase intermediate data in the fourth register,

wherein, after the sixth operation clock period, the fifth addition result, the eighth subtraction result, the ninth subtraction result, the seventh addition result, and the eighth addition result are provided as a first orthogonal transform data, a second orthogonal transform data, a third orthogonal transform data, a fourth orthogonal transform data, and a fifth orthogonal transform data among the orthogonal transform data.

8. The non-pipelined fast Fourier transform processor of claim 7, wherein the first operational clock period, the second operational clock period, the third operational clock period, the fourth operational clock period, the fifth operational clock period, and the sixth operational clock period are arranged in sequence.

9. The non-pipelined fast fourier transform processor of claim 1, further comprising:

a state master control circuit;

an input/output control circuit, coupled to a storage unit and the state master control circuit, for receiving 3780-point input data, so as to be controlled by the state master control circuit to store or read the 3780-point input data in the storage unit; and

an address mapping circuit coupled to the storage unit and the state master control circuit, controlled by the state master control circuit to read the storage unit to provide the in-phase operation data and the quadrature operation data, and further to store the in-phase conversion data and the quadrature conversion data in the storage unit.

10. The non-pipelined fast fourier transform processor of claim 9, further comprising:

a first cache circuit coupled to the address mapping circuit, the first processing core and the second processing core for caching the in-phase operation data and the quadrature operation data;

a second buffer circuit coupled to the address mapping circuit, the first processing core and the second processing core for buffering the in-phase conversion data and the quadrature conversion data.

11. An operation control method for a non-pipelined fast fourier transform processor, comprising:

providing a first control instruction and a second control instruction to a first processing core and a second processing core respectively through a control logic circuit;

controlling the first processing core through the first control instruction, and performing 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform by the first processing core according to a plurality of in-phase operation data and a plurality of orthogonal intermediate data from the second processing core in sequence to provide a plurality of in-phase intermediate data and a plurality of in-phase transform data in sequence; and

the second processing core is controlled by the second control command, and performs 3-point, 4-point, 5-point, 7-point and 9-point fast Fourier transform according to a plurality of orthogonal operation data and the in-phase intermediate data, and sequentially provides the orthogonal intermediate data and a plurality of orthogonal transform data.

12. The method of claim 11, wherein the first processing core and the second processing core perform 3-point, 4-point, 5-point, 7-point and 9-point fast fourier transforms using Winograd Small-N algorithm.

13. The method of claim 11, wherein the in-phase operation data comprises a first operation data, a second operation data, a third operation data, a fourth operation data and a fifth operation data, the quadrature intermediate data comprises a first quadrature intermediate data and a second quadrature intermediate data, and the first processing core performs a 5-point fast fourier transform of the in-phase operation data comprises:

during a first operation clock, a first adder stores a first addition result of the second operation data and the fifth operation data in a first register, a second adder stores a first subtraction result of the second operation data minus the fifth operation data in a second register, a third adder stores a second addition result of the third operation data and the fourth operation data in a third register, and a fourth adder stores a second subtraction result of the third operation data minus the fourth operation data in a fourth register;

during a second operation clock, the first adder stores a third subtraction result of the first subtraction result of the second register and the second subtraction result of the fourth register in the first register, the second adder subtracts the second addition result of the third register from the first addition result of the first register to store a third subtraction result in the second register, the third adder stores a fourth addition result of the first register and the second addition result of the third register in the third register, a first multiplier stores a first multiplication result of the first subtraction result of the second register multiplied by 786 in the fourth register, and a second multiplier stores a second multiplication result of the second subtraction result of the fourth register multiplied by 186 in a fifth register;

during a third operation clock, the first adder stores a fifth addition result of the first operation data and the fourth addition result of the third register in the first register, the first multiplier stores a third multiplication result of the third addition result of the first register multiplied by 486 in the second register, the second multiplier stores a fourth multiplication result of the third subtraction result of the second register multiplied by 286 in the third register, and a third multiplier stores a fifth multiplication result of the fourth addition result of the third register multiplied by 128 in a sixth register;

during a fourth operation clock, the first adder stores a fourth subtraction result obtained by subtracting the fifth multiplication result of the sixth register from the first operation data in the second register, the second adder stores a fifth subtraction result obtained by subtracting the second multiplication result of the fifth register from the third multiplication result of the second register in the fourth register, the third adder stores a sixth subtraction result obtained by subtracting the third multiplication result of the second register from the first multiplication result of the fourth register in the fifth register, wherein the fifth subtraction result and the sixth subtraction result are provided as a first in-phase intermediate data and a second in-phase intermediate data of the in-phase intermediate data;

during a fifth operation clock, the first adder stores a sixth addition result of the fourth subtraction result of the second register and the fourth multiplication result of the third register in the second register, and the second adder stores a seventh subtraction result of the fourth subtraction result of the second register minus the fourth multiplication result of the third register in the third register; and

during a sixth operation clock, the first adder stores the sixth addition result of the second register and a seventh addition result of the first orthogonal intermediate data in the second register, the second adder stores the seventh subtraction result of the third register and an eighth addition result of the second orthogonal intermediate data in the third register, the third adder stores an eighth subtraction result of the seventh subtraction result of the third register minus the second orthogonal intermediate data in the fourth register, and the fourth adder stores a ninth subtraction result of the sixth addition result of the second register minus the first orthogonal intermediate data in the fourth register;

after the sixth operation clock period, the fifth addition result, the seventh addition result, the eighth subtraction result, and the ninth subtraction result are provided as a first in-phase-converted data, a second in-phase-converted data, a third in-phase-converted data, a fourth in-phase-converted data, and a fifth in-phase-converted data of the in-phase-converted data.

14. The method of claim 13, wherein the first operational clock period, the second operational clock period, the third operational clock period, the fourth operational clock period, the fifth operational clock period, and the sixth operational clock period are arranged in sequence.

15. The method of claim 11, wherein the quadrature data comprises a first operation data, a second operation data, a third operation data, a fourth operation data and a fifth operation data, the in-phase intermediate data comprises a first in-phase intermediate data and a second in-phase intermediate data, and the second processing core performs a 5-point fast fourier transform of the quadrature data, comprising:

during a fourth operation clock, the first adder stores a fourth subtraction result of subtracting the fifth multiplication result of the sixth register from the first operation data in the second register, the second adder stores a fifth subtraction result of subtracting the second multiplication result of the fifth register from the third multiplication result of the second register in the fourth register, the third adder stores a sixth subtraction result of subtracting the third multiplication result of the second register from the first multiplication result of the fourth register in the fifth register, wherein the fifth subtraction result and the sixth subtraction result are provided as a first quadrature intermediate data and a second quadrature intermediate data of the quadrature intermediate data;

during a sixth operation clock, the first adder stores an eighth subtraction result obtained by subtracting the first in-phase intermediate data from the sixth subtraction result of the second register in the second register, the second adder stores a ninth subtraction result obtained by subtracting the second in-phase intermediate data from the seventh subtraction result of the third register in the third register, the third adder stores the seventh subtraction result of the third register and a seventh addition result of the second in-phase intermediate data in the fourth register, and the fourth adder stores the sixth addition result of the second register and an eighth addition result of the first in-phase intermediate data in the fourth register;

after the sixth operation clock period, the fifth addition result, the eighth subtraction result, the ninth subtraction result, the seventh addition result, and the eighth addition result are provided as a first orthogonally-converted data, a second orthogonally-converted data, a third orthogonally-converted data, a fourth orthogonally-converted data, and a fifth orthogonally-converted data among the orthogonally-converted data.

16. The method of claim 15, wherein the first operational clock period, the second operational clock period, the third operational clock period, the fourth operational clock period, the fifth operational clock period, and the sixth operational clock period are arranged in sequence.

17. The method of claim 11, further comprising:

the 3780-point input data is stored or read in the storage unit through an input/output control circuit; and

the storage unit is read through an address mapping circuit to provide the in-phase operation data and the quadrature operation data, and then the in-phase conversion data and the quadrature conversion data are stored in the storage unit.