CN1187698C - Design method of built-in parallel two-dimensional discrete wavelet conversion VLSI structure - Google Patents

Design method of built-in parallel two-dimensional discrete wavelet conversion VLSI structure Download PDF

Info

Publication number
CN1187698C
CN1187698C CNB031146023A CN03114602A CN1187698C CN 1187698 C CN1187698 C CN 1187698C CN B031146023 A CNB031146023 A CN B031146023A CN 03114602 A CN03114602 A CN 03114602A CN 1187698 C CN1187698 C CN 1187698C
Authority
CN
China
Prior art keywords
row
filtering
technology
hardware
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB031146023A
Other languages
Chinese (zh)
Other versions
CN1448871A (en
Inventor
兰旭光
郑南宁
周宁
梅魁志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CNB031146023A priority Critical patent/CN1187698C/en
Publication of CN1448871A publication Critical patent/CN1448871A/en
Application granted granted Critical
Publication of CN1187698C publication Critical patent/CN1187698C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Image Processing (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention discloses a design method for the two-dimensional discrete wavelet transformation based on the intrinsic parallel VLSI structure of the lifting algorithm. Through the techniques of shift and addition, multiplication operation in a filter is converted into the operation of a shift register and a summator so that the calculation amount is greatly reduced, and hardware resources and cost are saved; through row cache technique, a row and a column filters are connected and are simultaneously operated so that an amount of memory space is saved; through delay register technique, pipeline processing is realized so that hardware processing speed is improved, and the hardware utilization rate is also improved; through unified symmetrical expanding technique, the present invention has versatility. The present invention is a high efficiency hardware implementation proposal for two-dimensional discrete wavelet transformation, which has a plurality of advantages. Through the optimization design of the row filter and the column filter of two-dimensional discrete wavelet transformation, the hardware utilization rate reaches 100%; in one working clock, two outputs occur so that parallelism is realized, and no hardware cost is increased; obtained hardware structure is simple, and thus, VLSI can be easily realized.

Description

The VLSI construction design method of the inherent two-dimensional discrete wavelet conversion that walks abreast
One, affiliated technical field
The invention belongs to the VLSI design field.Be specifically related in JPEG2000 hardware is realized, design a kind of inherence parallel, efficiently based on the VLSI construction design method of the two-dimensional discrete wavelet conversion of boosting algorithm.
Two, background technology
The 2D-DWT hardware configuration has good time frequency analysis characteristic is being applied it in a lot of real systems, as the coded system of Jpeg2000 standard and Video processing etc., consider the requirement of speed and area, need realize compressibility with chip.In the present existing 2-D DWT chip structure, most of structure is just the same for the structural design in two stages of fast discrete wavelet transformation (row filtering and row filtering), though reduced the control complexity like this, but the storage of the intermediate result of row filtering needs a large amount of storage spaces, so not only reduced the hardware utilization factor, and increased the consumption and the chip cost of hardware resource, limited the speed of chip deal with data.
Three, summary of the invention
According to defective that exists in the above-mentioned background technology and deficiency, the objective of the invention is to, provide a kind of hardware utilization factor height, cost low, have the VLSI structure Design method of the parallel two-dimensional discrete wavelet conversion in the inherence of concurrency, hardware spending 2-D DWT little, simple in structure based on boosting algorithm.
For achieving the above object, the solution that the present invention adopts is: the VLSI construction design method of the inherent two-dimensional discrete wavelet conversion that walks abreast, carry out in the following manner:
1) multiply operation in the wave filter is become the operation of shift register and totalizer by " displacement adds " technology;
2) by " row cache (LineBuffer) " technology, make high pass, low-pass filter, line direction, column direction filtering transformation are realized parallel work-flow in the hardware configuration of a compactness, reduced storage space, save the expense of hardware resource, increased hardware utilization and processing speed;
3) by " delay time register " technology, realize Pipeline thought, make and structure optimization improved processing speed.
4) make this filter construction have versatility by unified " border symmetry expansion " technology.
Described displacement adds technology, and real multiply is with 2 n power, is equivalent to this real number n position that moves to left; And any one wavelet filter coefficient can by limited 2 integral number power and expression; Like this, the product of a real number and a wavelet filter coefficient just can and be represented by the limited individual that be shifted of this real number.
Described " row cache " technology is: the filtering intermediate result of the line direction of wavelet transform (or column direction), deposit in row (or row) array cache (Line Buffer Array), when reaching certain line number (or columns) (being no more than 9 row among the embodiment), the filtering of column direction (or line direction) just can be from row (or row) buffer memory reading of data, be listed as the filtering transformation of (or row) direction, like this, after the discrete wavelet line translation began, the filtering of row (or row) direction just can begin.In a clock, can hold the operation of row, column filtering transformation simultaneously, realize the concurrency of row, column trend pass filtering conversion.The size of row cache array is to determine according to the number of Lifting algorithm and symmetry expansion.
Described " delay time register " technology is: when the row, column wave filter of design two-dimensional discrete wavelet conversion, in order to improve hardware process speed, adopt streamline thought, by the data in input of delay time register buffer memory and the computation process, make wave filter to handle continuously and output data, like this, make hardware resource be fully utilized.
Described unification " border symmetry expansion " technology is: in the wavelet transform based on boosting algorithm, the border will be expanded, so that image obtains reconstruct accurately.In the Jpeg2000 standard code, boosting algorithm has different expansions to count for the wavelet filter of different length, according to symmetrical extension principle, feasible expansion has versatility, in the Jpeg2000 standard recommendation, 5/3 and 9/7 can adopt unified expansion like this, make filter construction have versatility.
The present invention is a kind of hard-wired design proposal of efficient 2-D DWT with multiple advantage.By optimal design to discrete wavelet transformer line feed filtering and row filtering, the hardware utilization factor reaches 100%, realized the concurrency of row, column filtering transformation, replace multiply operation with the displacement add operation, make operand significantly reduce, and not increasing hardware spending, the hardware configuration that obtains is also very simple, is highly susceptible to the realization of VLSI.
Fig. 1 is that embodiment of the invention multiply operation changes displacement add operation structural drawing into.
Fig. 2 is the system assumption diagram of the two-dimensional discrete wavelet conversion of the embodiment of the invention based on the VLSI realization of boosting algorithm.
Fig. 3 is the structured flowchart of embodiment of the invention row, column filtering transformation.
Fig. 4 is the structured flowchart of embodiment of the invention line direction filtering.
Fig. 5 is the structured flowchart of embodiment of the invention column direction filtering.
Five, embodiment
The present invention is described in more detail below in conjunction with drawings and Examples, but the invention is not restricted to this embodiment.
According to technical scheme of the present invention, the inventor has provided embodiments of the invention.What use in the present embodiment is one group of biorthogonal wavelet wave filter---DAUBECHIES 9/7 biorthogonal wavelet in the JPEG2000 standard.
In the present embodiment, at first the multiply operation in the wave filter is realized with shift register and totalizer by general " displacement adds " technology.In calculating process, adopt fixed-point number to calculate, back 13 is decimal, data width is 24.
By " displacement adds " technology, present embodiment changes multiplication into the displacement add operation.In view of the coefficient of the wave filter of two-dimensional discrete wavelet conversion is fixed, make design more optimize like this.Two-dimensional discrete wavelet conversion core based on boosting algorithm is to the prediction of odd point and the renewal of dual numbers point; I.e. (boosting algorithm of D9/7) now illustrates step1:
step1:Y(2n+1)=X ext(2n+1)+α×(X ext(2n)+X ext(2n+2))i 0-3≤2n+1<i 1+3
I wherein 0, (i 1-1), the beginning index of expression input delegation (or row) data and index respectively at last.
Filter coefficient is as follows with binary representation:
α=-1.586134342=-1.1001011000001
=-(1+1/2+1/16+1/64+1/128+1/8192);
In Fig. 1, be in the present embodiment, first is negative value because of α, thus get the complement code of Num earlier,
Num=~Num+1; Then have
α * Num=Num+Num>>1+Num>>4+Num>>6+Num>>7+Num>>13, promptly multiply operation is optimized for the displacement add operation.Other coefficient is all done same processing.Represent negative with complement code in the present embodiment;
In Fig. 2, provided the total system assumption diagram based on boosting algorithm VLSI realization of present embodiment.Control module is to judge whether to carry out the next stage wavelet decomposition according to current decomposed class and desired wavelet decomposition progression, and the data of control change input.The row expansion module is that input is expanded, and the line translation module is that the data after the expansion are carried out Filtering Processing.Row cache module (Line Buffer Array) is the line translation result who deposits some, to realize the parallel of row-column transform.Rank transformation is that sense data is carried out the row Filtering Processing from row cache, and scalar/vector produces according to row filtering output result and distributes the address, and the coefficient of current wavelet decomposition is write memory module.By this architecture, not only can make high-pass filtering and low-pass filtering in the row and column direction realize walking abreast, and it is parallel that the conversion of row, column trend pass filtering is realized.
The inherent concurrency here refers on the basis that does not increase hardware cost, has realized actual parallel processing, comprises that high pass, low-pass filtering and the ranks direction in line direction and the column direction carried out filtering simultaneously.In each work clock, produce two outputs.
In Fig. 3, provided the line direction of present embodiment, the integrated filter of column direction, can realize parallel the carrying out of filtering transformation of line direction and column direction like this.
In Fig. 4, provided the line direction filtering transformation structure of present embodiment.Utilize delay time register that input is divided into even number point and odd point, enter the α module, the addition of even number point is multiplied each other with filter coefficient then (being optimized for the displacement add operation) again with the odd point addition, utilizes the Pipeline register to realize pipeline processes.Then the processing mode identical with the α module is divided into odd point and even number point to the output of last module, enters β, γ, δ module successively.Along with the continuous input of data, line filter is handled continuously.Can make full use of hardware like this, make its utilization factor reach 100%.And control is very simple, just utilizes several delay time registers (DFF) can realize the continuous processing of filtering.The result of row filtering deposits in the Line Buffer array, and present embodiment has adopted 11 Line Buffer, 24 of bit wide positions, and length is picture traverse, 256.Why using 11, is to carry out in order to realize that line translation and rank transformation are parallel, has so just saved the Memory space of storage line conversion intermediate result greatly, makes hardware more compact, has improved arithmetic speed when reducing area, also meets chip design thought.Can see that from structure the capable filtering transformation of D9/7 biorthogonal wavelet wave filter only needs 8 Pipeline registers, 6 delay time registers, 8 totalizers and 4 shift registers just can be realized.Because shift register can and be realized with line, so shift register does not increase spending of hardware basically.Compare with classic method, this structure has greatly reduced the complexity of spending of hardware and structure.
In Fig. 5, provided the structure of present embodiment row filtering stages.The design of row filter structure is different with the row filtering stage.Because, when having only the certain number of conversion, row filtering just can carry out row filtering, and present embodiment is no more than 9 row.And the data that row rate ripple is handled are come the result of filtering voluntarily, so when the required data number of rank transformation satisfies, get final product begin column filtering, (size of data number is got according to boosting algorithm and symmetry expansion number), be to begin just can carry out row filtering (symmetry expansion earlier is 9 row after the expansion) in the present embodiment from fifth line.In order to accelerate the processing speed of row filtering, in order to avoid data produce accumulation in Line Buffer Array, from Line Buffer Array, read 9 data at a clock, carrying out rank transformation handles, the result of line translation simultaneously continues to deposit among the Line Buffer, reads and writes Line Buffer array and carry out simultaneously.9 data enter 4 modules successively, and two data of last clock output are respectively LL, LH, or HL, LH.According to output data, produce corresponding address, in the write store.As can be seen, row filtering needs 8 Pipeline registers from structural drawing, 1 delay time register, and hardware process speed has been quickened in 19 totalizers and 10 displacement add operations.Read 9 data simultaneously, make processing speed improve greatly.So just realized the parallel processing of line translation and rank transformation.
In the present embodiment, " symmetry expansion " technology by unified makes expansion have versatility.Counting of symmetry expansion is to expand with the parity of last index according to the type of wave filter and the beginning index of delegation's (row) data.During design row (row) is begun index i 0Unification is zero, and then according to extension principle, counting expansion unified is that symmetry expansion maximum in all filter types is counted.

Claims (1)

1. the VLSI construction design method of the parallel two-dimensional discrete wavelet conversion in an inherence is characterized in that, comprises following content:
1) multiply operation in the wave filter is become the operation of shift register and totalizer by " displacement adds " technology;
2), line direction, column direction filtering transformation are realized parallel in the hardware configuration of a compactness by " row cache " technology;
3), realize pipeline organization by " delay time register " technology;
4) make this filter construction have versatility by unified " border symmetry expansion " technology;
Described " displacement adds " technology is:
Real multiply is with 2 n power, be equivalent to this real number n position that moves to left, the product of a real number and a wavelet filter coefficient just by limited displacement of real number and represent;
Described " row cache " technology is:
The filtering intermediate result of the line direction of wavelet transform, deposit in the row cache array, when reaching certain line number, the filtering of column direction just can be from row cache reading of data, carry out the filtering transformation of column direction, in a clock, can carry out the operation of row, column filtering transformation simultaneously, realize parallel work-flow; Perhaps the filtering intermediate result of the column direction of wavelet transform, deposit in the row array cache, when reaching certain columns, the filtering of line direction just can be from the row buffer memory reading of data, carry out the filtering transformation of line direction, in a clock, can carry out the operation of row, row filtering transformation simultaneously, realize parallel work-flow;
Described " delay time register " technology is:
The row, column Design of Filter of two-dimensional discrete wavelet conversion adopts pipelining technique, and the data by in input of delay time register buffer memory and the computation process make wave filter handle continuously;
" border symmetry expansion " technology of described unification is:
In the wavelet transform based on boosting algorithm, the border will be expanded, so that image obtains reconstruct accurately, during design row or column being begun the index unification is zero, according to extension principle, counting expansion unified is that symmetry expansion maximum in all filter types is counted then.
CNB031146023A 2003-04-07 2003-04-07 Design method of built-in parallel two-dimensional discrete wavelet conversion VLSI structure Expired - Fee Related CN1187698C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB031146023A CN1187698C (en) 2003-04-07 2003-04-07 Design method of built-in parallel two-dimensional discrete wavelet conversion VLSI structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB031146023A CN1187698C (en) 2003-04-07 2003-04-07 Design method of built-in parallel two-dimensional discrete wavelet conversion VLSI structure

Publications (2)

Publication Number Publication Date
CN1448871A CN1448871A (en) 2003-10-15
CN1187698C true CN1187698C (en) 2005-02-02

Family

ID=28684128

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB031146023A Expired - Fee Related CN1187698C (en) 2003-04-07 2003-04-07 Design method of built-in parallel two-dimensional discrete wavelet conversion VLSI structure

Country Status (1)

Country Link
CN (1) CN1187698C (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1317897C (en) * 2004-09-28 2007-05-23 华中科技大学 Parallel two-dimension discrete small wave transform circuit
JP4182447B2 (en) * 2006-07-14 2008-11-19 ソニー株式会社 Information processing apparatus and method, program, and recording medium
JP4182446B2 (en) * 2006-07-14 2008-11-19 ソニー株式会社 Information processing apparatus and method, program, and recording medium
CN101452572B (en) * 2007-12-07 2010-08-25 华中科技大学 Image rotating VLSI structure based on cubic translation algorithm
CN101534439A (en) * 2008-03-13 2009-09-16 中国科学院声学研究所 Low power consumption parallel wavelet transforming VLSI structure
CN102170276B (en) * 2011-03-01 2013-08-21 深圳市蓝韵实业有限公司 Up-sampling filtering method for ultrasonic signal processing
CN102521793B (en) * 2011-12-01 2014-06-18 福州瑞芯微电子有限公司 Storage control device and method for realizing saving of image storage space
CN103164551A (en) * 2011-12-09 2013-06-19 天津工业大学 Method of constructing low temperature subarray on reconfigurable very large scale integration (VLSI) array
US9769356B2 (en) 2015-04-23 2017-09-19 Google Inc. Two dimensional shift array for image processor
CN105611115B (en) * 2015-12-15 2019-03-05 路博超 A kind of time division multiplexing two-dimensional wavelet transformation system based on Zynq Series FPGA

Also Published As

Publication number Publication date
CN1448871A (en) 2003-10-15

Similar Documents

Publication Publication Date Title
Zhang et al. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system
CN1187698C (en) Design method of built-in parallel two-dimensional discrete wavelet conversion VLSI structure
Chang et al. VWA: Hardware efficient vectorwise accelerator for convolutional neural network
CN107341133B (en) Scheduling method of reconfigurable computing structure based on LU decomposition of arbitrary dimension matrix
CN101697486A (en) Two-dimensional wavelet transformation integrated circuit structure
CN102572429B (en) Hardware framework for two-dimensional discrete wavelet transformation
CN1268231A (en) Variable block size 2-dimensional inverse discrete cosine transform engine
CN1916959A (en) Scaleable large-scale 2D convolution circuit
CN114399036A (en) Efficient convolution calculation unit based on one-dimensional Winograd algorithm
CN1295653C (en) Circuit for realizing direct two dimension discrete small wave change
CN201111042Y (en) Two-dimension wavelet transform integrate circuit structure
CN102970545A (en) Static image compression method based on two-dimensional discrete wavelet transform algorithm
CN101430737B (en) Wavelet transformation-improved VLSI structure design method
Meher et al. Hardware-efficient systolic-like modular design for two-dimensional discrete wavelet transform
Shirvaikar et al. A comparison between DSP and FPGA platforms for real-time imaging applications
CN1215553C (en) Design method of built-in parallel two-dimensional discrete wavelet conversion VLSI structure
Tewari et al. High-speed & memory efficient 2-d dwt on xilinx spartan3a dsp using scalable polyphase structure with da for jpeg2000 standard
CN1317897C (en) Parallel two-dimension discrete small wave transform circuit
CN1835587A (en) Method of realizing VLSI of brightness interpolator based on AVS movement compensation
CN107577834A (en) A kind of two-dimensional discrete wavelet conversion architecture design based on boosting algorithm
CN1553310A (en) Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof
CN116167423B (en) Device and accelerator for realizing CNN convolution layer
KR20100023123A (en) A 2phase pipelining buffer management of idwt for the reusage of convolved products
Cao et al. Efficient architecture for two-dimensional discrete wavelet transform based on lifting scheme
Wang et al. An FPGA-based reconfigurable CNN training accelerator using decomposable Winograd

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20050202

Termination date: 20120407