CN103699355A - Variable-order pipeline serial multiply-accumulator - Google Patents

Variable-order pipeline serial multiply-accumulator Download PDF

Info

Publication number
CN103699355A
CN103699355A CN201310738598.XA CN201310738598A CN103699355A CN 103699355 A CN103699355 A CN 103699355A CN 201310738598 A CN201310738598 A CN 201310738598A CN 103699355 A CN103699355 A CN 103699355A
Authority
CN
China
Prior art keywords
multiply accumulating
group
exponent number
signal
flowing water
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310738598.XA
Other languages
Chinese (zh)
Other versions
CN103699355B (en
Inventor
潘红兵
黄炎
李伟
李丽
何书专
沙金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201310738598.XA priority Critical patent/CN103699355B/en
Publication of CN103699355A publication Critical patent/CN103699355A/en
Application granted granted Critical
Publication of CN103699355B publication Critical patent/CN103699355B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a variable-order pipeline serial multiply-accumulator comprising a group of multipliers, three groups of accumulators and corresponding control circuits. The group of multipliers is used for multiplying of two channels of input data and outputting multiplying results. The first group of the three groups of accumulators is used for accumulating the multiplying results, the other two of the three groups of accumulators are used for accumulating pipeline results of the first group of accumulators after the accumulating process of the first group of accumulators, and thus, the first group of accumulators can continue to process data of the next stage. The corresponding control circuits are used for adding extra control signals and control logics and eliminating the head-tail zero padding process of algorithms. The variable-order pipeline serial multiply-accumulator has the advantage that, in application, the head-tail zero padding process of the algorithms and redundant multiplying and accumulating caused by the head-tail zero padding process can be omitted during computing, and accordingly performance index approaching the theoretical estimation can be acquired.

Description

A kind of change rank flowing water serial multiply accumulating device
Technical field
The present invention relates to multiply accumulating device, relate in particular to a kind of change rank flowing water serial multiply accumulating device.
Background technology
All the time, Digital Signal Processing, is widely used in all kinds of field of engineering technology as important technological means.In recent years, the development along with scientific and technical, becomes again one of theoretical foundation of the new branch of science such as artificial intelligence, and the popularity of its importance and application is huge.
Mostly the main algorithm that Digital Signal Processing is used is data to carry out filtering, convolution, relevant and analysis of spectrum computing etc.These algorithms have similar framework, i.e. multiply accumulating structure.The N rank FIR wave filter of take is example, and its function expression may be defined as: (filter coefficient is)
Conventionally there is several different methods to realize FIR wave filter, single multiplication serial wave filter for example, serial wave filter, parallelism wave filter and semi-parallelism wave filter based on symmetry coefficient FIR.
General serial filter construction is multiply accumulating structure, only comprises a multiplier and a totalizer can realize, and average flow time is N cycle, as Fig. 1.Parallelism wave filter needs N multiplier, and N-1 totalizer enters into first filtering output from first data, needs N+logN cycle, and flow time is only 1 clk afterwards, as shown in Figure 2.Half parallel wave filter is by parallel architecture and multiply accumulating architecture combined, makes compromise between Area and Speed, so its area and performance also must fall between.
Summary of the invention
The object of the invention is to overcome the deficiency of above prior art, and a kind of change rank flowing water serial multiply accumulating device is provided, and specifically has following technical scheme to realize:
Described change rank flowing water serial multiply accumulating device, comprises
One group of multiplier, for carrying out the multiplication operations of two-way input data, and exports multiplication result;
Three groups of totalizers, first group of totalizer carried out the cumulative of multiplication result, and second group and the 3rd group adds hair device and after cumulative end, the result on first group of totalizer pipelining-stage is added successively, thereby has guaranteed that first group of totalizer can continue to process the data of next stage;
Corresponding control circuit, for increasing extra control signal and steering logic, for saving the head and the tail zero padding operation of algorithm.
The further design of described change rank flowing water serial multiply accumulating device is, described each totalizer and a corresponding connection of data selector.
The further design of described change rank flowing water serial multiply accumulating device is, described corresponding control circuit comprises
Exponent number register, for recording multiply accumulating exponent number, and exports multiply accumulating exponent number;
Counter module, for receiving the enabling signal of multiply accumulating exponent number and multiply accumulating operation, adds up multiply accumulating exponent number, and enabling signal is transmitted;
Four controllers, three controllers are wherein for controlling the gating of described data selector, and a remaining controller is for controlling the output of multiply accumulating device;
Steering logic unit, receives the signal of depositing that comes from the multiply accumulating exponent number of counter module and exponent number register by controller, output multiply accumulating Output rusults and output useful signal.
The further design of described change rank flowing water serial multiply accumulating device is, described data selector is connected with counter module by corresponding controller, makes the gating of the complete paired data selector switch of counter module.
The further design of described change rank flowing water serial multiply accumulating device is, described output useful signal enables input signal as writing of multiply accumulating Output rusults storage.
Advantage of the present invention is as follows:
The realization of change factorial totalizer provided by the invention make can save when computing head and the tail zero padding in algorithm operation and by the unnecessary multiply accumulating of generation, thereby obtain the performance index that approach theoretical calculation.
Accompanying drawing explanation
Fig. 1 is basic multiply accumulating structure.
Fig. 2 is parallel FIR wave filter framework.
Fig. 3 is flowing water serial multiply accumulating device schematic diagram.
Fig. 4 becomes rank flowing water serial multiply accumulating device designed holder composition.
Fig. 5 is the interconnected figure of system emulation basic module.
Embodiment
Below in conjunction with accompanying drawing, the present invention program is elaborated.
The change rank flowing water serial multiply accumulating device that the present embodiment provides, comprises one group of multiplier, three groups of totalizers and corresponding control circuit.One group of multiplier, for carrying out the multiplication operations of two-way input data, and exports multiplication result.Three groups of totalizers, first group of totalizer carried out the cumulative of multiplication result, and second group and the 3rd group adds hair device and after cumulative end, the result on first group of totalizer pipelining-stage is added successively, thereby has guaranteed that first group of totalizer can continue to process the data of next stage.Corresponding control circuit, for increasing extra control signal and steering logic, for saving the head and the tail zero padding operation of algorithm.Each totalizer and a corresponding connection of data selector.
Corresponding control circuit comprises exponent number register, counter module, steering logic unit and four controllers.Exponent number register, for recording multiply accumulating exponent number, and exports multiply accumulating exponent number.Counter module, for receiving the enabling signal of multiply accumulating exponent number and multiply accumulating operation, adds up multiply accumulating exponent number, and enabling signal is transmitted.Four controllers, three controllers are wherein for controlling the gating of data selector, and a remaining controller is for controlling the output of multiply accumulating device.Steering logic unit, receives the signal of depositing that comes from the multiply accumulating exponent number of counter module and exponent number register by controller, output multiply accumulating Output rusults and output useful signal.Data selector is connected with counter module by corresponding controller, makes the gating of the complete paired data selector switch of counter module.
Become rank flowing water serial multiply accumulating device design architecture, referring to Fig. 4.Suppose that the inner flowing water progression of each computing ip is three grades, at output terminal, deposit one-level (consideration timing closure), be equivalent to level Four pipeline.As shown in the figure, start signal is the enabling signal of multiply accumulating operation, and order_mul_accu is for recording the signal of exponent number, din0, din1 is two paths of data input, dout is multiply accumulating Output rusults, wen, for output useful signal, enables input as writing of memory simultaneously.Inside modules mainly comprises a multiplier, three totalizers and each self-corresponding mux, four controll blocks, counter module and depositing signal.Its middle controller 1 ~ 3 is controlled respectively three totalizers mux gating separately, and controller 4 is responsible for output.The input signal of steering logic all comes from the signal of depositing of " count " (counter) and " order_mul_accu ".
In order to verify function and the performance of multiply accumulating module, need other submodule complete systems of collocation, as shown in Figure 5.System mainly comprises: multiply accumulating module, AGU(address generator), memory, top-level module and testbench.
Multiply accumulating module
A single-precision floating point multiplication ip and three addition ip (inside is three grades of flowing water) have been called, design is mainly carried out RTL coding to logics such as four controll blocks, counter, output controls, input signal order_mul_accu and counter is carried out to shift LD simultaneously.
Memory module
First define a submodule, with verilog, write register group, data width 64b, degree of depth 8K.Then at three bank of this submodule exampleization for memory top layer, be respectively used to deposit two-way source data and multiply accumulating result.
AGU module
For special algorithm, for generation of required vector address sequence.Mainly comprise read through model and writing module.Read through model is according to algorithm requirements, address sequence corresponding to design source data, and collocation chip selection signal and read enable signal and send to bank1 and bank2 to peek, the number reading is passed to multiply accumulating module and is carried out computing, i.e. rdata1 and rdata2; Writing module is exported by multiply accumulating device " wen " to control address cumulative, and simultaneously by this address signal, the wen signal of multiply accumulating module, wdata signal are exported to bank3 and are carried out data storage.
Top-level module
By multiply accumulating module, it is interconnected that memory module and AGU module are carried out top layer.
Testbench module
Define clock, reset signal and each parameter of algorithm, data file is imported to the bank1 of memory, bank2 carries out initialization, sends the computing of start signal enabling, after receiving the finish signal of module top layer, finishes emulation.
System testing
Operating system adopts linux system, by VCS simulation validation tool and Design Compiler synthesis tool, carries out functional simulation.
(1) basic module test: carry out the design of multiply accumulating device submodule with multiplicaton addition unit, mainly test each steering logic sequential; Test AGU module, determines whether the address sequence of its generation is expection situation.
(2) integration testing: to specifying memory to import test source operand, generate start signal enabling algorithm by testbench and carry out, transmit each configuration parameter simultaneously.After finishing, algorithm top layer is beamed back finish signal.Operation result obtains by checking corresponding memor, and contrasts with matlab operation result.
(3) DC is comprehensive: further carry out code detection; Whether adopt the 40nm technology library of TSMC to carry out logic synthesis, observing multiply accumulating device module has slack.
In sum, the realization of the change factorial totalizer that the present embodiment provides make can save when computing head and the tail zero padding in algorithm operation and by the unnecessary multiply accumulating of generation, thereby obtain the performance index that approach theoretical calculation.

Claims (5)

1. become a rank flowing water serial multiply accumulating device, it is characterized in that comprising
One group of multiplier, for carrying out the multiplication operations of two-way input data, and exports multiplication result;
Three groups of totalizers, first group of totalizer carried out the cumulative of multiplication result, and second group and the 3rd group adds hair device and after cumulative end, the result on first group of totalizer pipelining-stage is added successively, thereby has guaranteed that first group of totalizer can continue to process the data of next stage;
Corresponding control circuit, for increasing extra control signal and steering logic, for saving the head and the tail zero padding operation of algorithm.
2. change according to claim 1 rank flowing water serial multiply accumulating device, is characterized in that, described each totalizer and a corresponding connection of data selector.
3. change according to claim 2 rank flowing water serial multiply accumulating device, is characterized in that described corresponding control circuit comprises
Exponent number register, for recording multiply accumulating exponent number, and exports multiply accumulating exponent number;
Counter module, for receiving the enabling signal of multiply accumulating exponent number and multiply accumulating operation, adds up multiply accumulating exponent number, and enabling signal is transmitted;
Four controllers, three controllers are wherein for controlling the gating of described data selector, and a remaining controller is for controlling the output of multiply accumulating device;
Steering logic unit, receives the signal of depositing that comes from the multiply accumulating exponent number of counter module and exponent number register by controller, output multiply accumulating Output rusults and output useful signal.
4. change according to claim 3 rank flowing water serial multiply accumulating device, is characterized in that described data selector is connected with counter module by corresponding controller, makes the gating of the complete paired data selector switch of counter module.
5. change according to claim 4 rank flowing water serial multiply accumulating device, is characterized in that, described output useful signal enables input signal as writing of multiply accumulating Output rusults storage.
CN201310738598.XA 2013-12-30 2013-12-30 Variable-order pipeline serial multiply-accumulator Active CN103699355B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310738598.XA CN103699355B (en) 2013-12-30 2013-12-30 Variable-order pipeline serial multiply-accumulator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310738598.XA CN103699355B (en) 2013-12-30 2013-12-30 Variable-order pipeline serial multiply-accumulator

Publications (2)

Publication Number Publication Date
CN103699355A true CN103699355A (en) 2014-04-02
CN103699355B CN103699355B (en) 2017-02-08

Family

ID=50360896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310738598.XA Active CN103699355B (en) 2013-12-30 2013-12-30 Variable-order pipeline serial multiply-accumulator

Country Status (1)

Country Link
CN (1) CN103699355B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504205A (en) * 2014-12-29 2015-04-08 南京大学 Parallelizing two-dimensional division method of symmetrical FIR (Finite Impulse Response) algorithm and hardware structure of parallelizing two-dimensional division method
CN106325812A (en) * 2015-06-15 2017-01-11 华为技术有限公司 Processing method and device for multiplication and accumulation operation
CN109976707A (en) * 2019-03-21 2019-07-05 西南交通大学 A kind of variable bit width multiplier automatic generating method
CN117555515A (en) * 2024-01-11 2024-02-13 成都市晶蓉微电子有限公司 Digital ASIC serial-parallel combined multiplier for balancing performance and area

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4507725A (en) * 1982-07-01 1985-03-26 Rca Corporation Digital filter overflow sensor
US5923273A (en) * 1996-11-18 1999-07-13 Crystal Semiconductor Corporation Reduced power FIR filter
CN1963745A (en) * 2006-12-01 2007-05-16 浙江大学 High speed split multiply accumulator apparatus
CN101834615A (en) * 2009-03-12 2010-09-15 普然通讯技术(上海)有限公司 Implementation method of Reed-Solomon encoder
CN101916177A (en) * 2010-07-26 2010-12-15 清华大学 Configurable multi-precision fixed point multiplying and adding device
CN102053186A (en) * 2009-11-10 2011-05-11 北京普源精电科技有限公司 Digital oscilloscope with variable-order digital filter
CN102629189A (en) * 2012-03-15 2012-08-08 湖南大学 Water floating point multiply-accumulate method based on FPGA

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4507725A (en) * 1982-07-01 1985-03-26 Rca Corporation Digital filter overflow sensor
US5923273A (en) * 1996-11-18 1999-07-13 Crystal Semiconductor Corporation Reduced power FIR filter
CN1963745A (en) * 2006-12-01 2007-05-16 浙江大学 High speed split multiply accumulator apparatus
CN101834615A (en) * 2009-03-12 2010-09-15 普然通讯技术(上海)有限公司 Implementation method of Reed-Solomon encoder
CN102053186A (en) * 2009-11-10 2011-05-11 北京普源精电科技有限公司 Digital oscilloscope with variable-order digital filter
CN101916177A (en) * 2010-07-26 2010-12-15 清华大学 Configurable multi-precision fixed point multiplying and adding device
CN102629189A (en) * 2012-03-15 2012-08-08 湖南大学 Water floating point multiply-accumulate method based on FPGA

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
刘艳萍: "《EDA技术及应用教程》", 31 August 2012, 北京航空航天大学出版社 *
徐远泽等: "FIR滤波器的FPGA实现方法", 《现代电子技术》 *
王堃: "基于多核的并行程序设计及优化", 《中国优秀硕士学位论文全文数据库 信息科学辑》 *
西瑞克斯(北京)通信设备有限公司: "《无线通信的MATLAB和FPGA实现》", 30 June 2009, 人民邮电出版社 *
陆光华等: "《数字信号处理》", 31 October 2005, 西安电子科技大学出版社 *
黄晓林: "NOC多核处理器FPGA开发板的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
黄炎等: "NCS算法的并行化设计实现", 《计算机工程与设计》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504205A (en) * 2014-12-29 2015-04-08 南京大学 Parallelizing two-dimensional division method of symmetrical FIR (Finite Impulse Response) algorithm and hardware structure of parallelizing two-dimensional division method
CN104504205B (en) * 2014-12-29 2017-09-15 南京大学 A kind of two-dimentional dividing method of the parallelization of symmetrical FIR algorithm and its hardware configuration
CN106325812A (en) * 2015-06-15 2017-01-11 华为技术有限公司 Processing method and device for multiplication and accumulation operation
CN106325812B (en) * 2015-06-15 2019-03-08 华为技术有限公司 It is a kind of for the processing method and processing device for multiplying accumulating operation
CN109976707A (en) * 2019-03-21 2019-07-05 西南交通大学 A kind of variable bit width multiplier automatic generating method
CN117555515A (en) * 2024-01-11 2024-02-13 成都市晶蓉微电子有限公司 Digital ASIC serial-parallel combined multiplier for balancing performance and area
CN117555515B (en) * 2024-01-11 2024-04-02 成都市晶蓉微电子有限公司 Digital ASIC serial-parallel combined multiplier for balancing performance and area

Also Published As

Publication number Publication date
CN103699355B (en) 2017-02-08

Similar Documents

Publication Publication Date Title
CN102945224A (en) High-speed variable point FFT (Fast Fourier Transform) processor based on FPGA (Field-Programmable Gate Array) and processing method of high-speed variable point FFT processor
CN101782893A (en) Reconfigurable data processing platform
CN103699355A (en) Variable-order pipeline serial multiply-accumulator
CN103870438B (en) A kind of circuit structure utilizing number theoretic transform to calculate cyclic convolution
CN102931994B (en) Be applied to high speed signal sampling and synchronous framework and the method for signal processing chip
CN102541749B (en) Multi-granularity parallel storage system
CN113157637B (en) High-capacity reconfigurable FFT operation IP core based on FPGA
CN108710505A (en) A kind of expansible Sparse Matrix-Vector based on FPGA multiplies processor
CN107545914B (en) Method and apparatus for intelligent memory interface
CN103559019A (en) Universal floating point full-pipeline FFT (Fast Fourier Transform) operation IP (Internet Protocol) core
CN104504205B (en) A kind of two-dimentional dividing method of the parallelization of symmetrical FIR algorithm and its hardware configuration
CN104679670A (en) Shared data caching structure and management method for FFT (fast Fourier transform) and FIR (finite impulse response) algorithms
CN102364456A (en) 64-point fast Fourier transform (FFT) calculator
US9460007B1 (en) Programmable hardware blocks for time-sharing arithmetic units using memory mapping of periodic functions
CN102541813B (en) Method and corresponding device for multi-granularity parallel FFT (Fast Fourier Transform) butterfly computation
CN102411557B (en) Multi-granularity parallel FFT (Fast Fourier Transform) computing device
CN104317554A (en) Device and method of reading and writing register file data for SIMD (Single Instruction Multiple Data) processor
CN106407535A (en) Field-programmable gate array chip-based process mapping method
CN103377029B (en) parameterized universal FIFO control method
CN102117264B (en) Fast Walsh transform realization method based on FPGA (Field Programmable Gate Array)
CN105955705A (en) Reconfigurable multi-channel detection algorithm accelerator
CN103293373A (en) Electric energy metering device and electric energy metering chip thereof
Baher et al. Dynamic power estimation using transaction level modeling
CN102591796B (en) Parallel position reversal sequence device and method
CN102045078B (en) FPGA (Field Programmable Gate Array) based software receiver system and implementation method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant