WO2000055757A1

WO2000055757A1 - A fast multiplierless transform

Info

Publication number: WO2000055757A1
Application number: PCT/US2000/006941
Authority: WO
Inventors: Trac D. Tran
Original assignee: The Johns Hopkins University
Priority date: 1999-03-17
Filing date: 2000-03-17
Publication date: 2000-09-21
Also published as: EP1311975A4; EP1311975A1; AU3751900A

Abstract

A preferred embodiment of the invention provides an M x M multiplierless perfect reconstruction block transform (i.e., perfect reconstruction is realized if the invention is utilized for both analysis and synthesis, otherwise, if the invention is used on only one end, near-perfect reconstruction results) with symmetric/antisymmetric basis functions, i.e., linear phase filter. In a preferred embodiment, a cascade of ±1 butterfly stages (12) is provided followed by two invertible matrices. Each invertible matrix is made up of a cascade of lifting steps (16) and scaling factors (αi). The scaling factors can be folded into the quantization stepsizes. A forward binDCT (e.g., for signal analysis), can be implemented using a cascade of ±1 butterflies (12) and dyadic lifting steps (16). Furthermore, an inverse binDCT (e.g., for signal synthesis can be easily realized using lifting steps (16) of reverse order and inverted polarity, cascaded with ±1 butterflies (12). In accordance with an embodiment of the invention, all of the lifting coefficients have been chosen to be dyadic rational numbers. Each dyadic lifting step can be constructed by a simple combination of shift-and-add operations.

Description

A FAST MULTIPLIERLESS TRANSFORM

This application claims the benefit of U.S. Provisional Patent Application No.

60/124,746, filed on March 17, 1999, which is incorporated herein by reference in

its entirety.

BACKGROUND OF THE INVENTION

Mathematical transforms (e.g., the fourier transform, the discrete cosine

transform, the wavelet transform, etc.) have found application in signal processing

computer algorithms, particularly the compression and coding of speech, audio,

image and video data. Such transforms are used in the currendy popular image

compression standard referred to as "JPEG" developed by the Joint Photographic

Experts Group, as well as in high-performance video coding standards such as MPEG

and H.263. These operate by dividing up an image or video into sections or blocks,

performing transforms on the data in the blocks, and compressing or coding the

coefficients resulting from the transform.

The JPEG standard, for example, operates by dividing an image into 8x8

blocks of pixels (i.e., each block is an array of 64 pixels: 8 pixels in height and 8 pixels

in width). A mathematical transform known as the "discrete cosine transform

(DCT)" is calculated for each block in the image to produce a data set of coefficients. A variable length code-based compression technique is then used to compress the

data set and store the compressed data stream in an output file. The output file can

then be decompressed to recover the DCT coefficients, the inverse DCT transform

calculated, and the resulting blocks of pixels displayed to represent the original form

of the image.

Mathematically, the DCT transforms the blocks of pixels from the spatial

representation to the frequency representation. In the frequency representation, the

image or video information can be divided into frequency components, some of

which are more important than others. The compression algorithm selectively

quantizes or discards the frequency components that do not adversely affect the

reconstructed picture contents. In this manner, compression is achieved.

From a statistical signal processing standpoint, the DCT is a robust

approximation to the optimal discrete-time Karhunen-Loeve transform (KLT) of a

first-order Gauss-Markov process with a positive correlation coefficient p when p —»

1. The KLT is optimal in the sense that it provides the most accurate representation

of a signal by producing transform coefficients that are the eigen values of the input's

autocorrelation matrix, and as known in the art, the eigen values are the optimum

achievable result. In other words, the KLT is optimal in the energy compaction

sense, i.e., among unitary transforms, the KLT packs signal energy into the fewest

number of coefficients. However, the KLT is signal-dependent, therefore, it is computationally complex and expensive to implement. The DCT has proven to be a

much better alternative in practice in that it is signal independent, has linear phase,

has real coefficients, and can be implemented using fast algorithms.

The DCT coefficients, X[0] through X[M-1], of an M x M transform (i.e., a

transform having M² individual signal samples at its input to be processed row by row

as M basis functions) and the input image pixels, x[0] through x[M-l], are related by

the following equations:

M-\

Tim

X[m] = a ^ x[n] cos (2« + 1) 0 < m < M - 1, n=0 2M

with

It has a closed-form expression and it is orthogonal. Orthogonality is desirable in a

transform inasmuch as there is no redundancy among the M transform bases (e.g., they are orthogonal to each other). In the matrix format, we can represent the

transformation of an input, x, to a transform coefficient, X, as X = Px, and the

reconstruction of the transform coefficient back into the input x as x = P^rX, where P

is the DCT transform matrix and P^τ is its inverse transform matrix. Exploiting the

symmetry of the coefficients cos [(πm/2M) (2n + 1)] in the equation above, the

functions performed by the DCT transform matrix P can be represented by a series of

±1 butterflies 12 and rotation angles 14, as illustrated in Fig. 1. The ±1 butterflies

are a series of crossings which process one pair of inputs at a time (i.e., x[0] through

x[7]) by adding and subtracting the inputs in pairs. For example, the first butterfly

encountered by the data set x[0] through x[7] shown in Fig. 1 operates on the data

pair x[0], x[7]. In the top branch of the first butterfly, the term x[0] is added to the

term x[7], while in the bottom branch, the term x[7] is subtracted from x[0].

Similarly, in the second butterfly, the term x[l] is added to the term x[6], while in

the bottom branch the term x[6] is subtracted from x[l]. In an 8 x 8 transform, the

DCT initially performs four such butterflies (i.e., crossings) for a total of four discrete

addition operations and four discrete subtraction operations. The DCT rotation

angles are a series of multiplication operations carried out on the results achieved by

the ±1 butterflies. As depicted in Fig. 1, "C π/4" signifies a multiplication operation

(i.e., multiplying an input by cosine π/4). Similarly, "S π/4" signifies a multiplication

operation (i.e., multiplying an input by sine π/4). This process results in a fast DCT implementation. Using this method, eight DCT coefficients N[0] - X[7] can be

computed using 13 multiplication operations and 29 addition operations.

Despite all of the above advantages of the DCT, there is still room for

improvement. For example, the DCT is a floating-point transform, i.e., the cos and

sin multipliers in the 5 rotation angles {π/4, 3π/8, π/4, 7π/16, 3π/16} are real

numbers with infinite precision. Hence, the DCT cannot map integers to integers

losslessly inasmuch as the product of the inputs, as processed by the ±1 butterflies,

and the DCT rotation angles result in floating-point outputs that cannot be

represented in binary form without truncation, thereby resulting in the loss of input

data. More importantly, floating-point implementations are relatively slow, require

too much hardware space, and consume too much power.

To combat these problems, most implementations of the DCT are based on

integer arithmetic by scaling up the floating-point multipliers by very large factors

(e.g., multiplying by a factor of 1000 or greater to rid the calculation of floating

decimal points). Unfortunately, however, this ad-hoc approximation method still has

problems; for instance, it does not reduce the complexity of the circuit and it still has

truncation errors. Moreover, if more computational accuracy is needed, the scaling

factors must be very large, thereby adding more complexity to the implementation. SUMMARY OF THE INVENTION

A preferred embodiment of the invention provides an M x M multiplierless

perfect reconstruction block transform (i.e., perfect reconstruction is realized if the

invention is utilized for both analysis and synthesis, otherwise, if the invention is used

on only one end, near-perfect reconstruction results) with symmetric/antisymmetric

basis functions, i.e., linear phase filters. In a preferred embodiment, a cascade of ±1

butterfly stages is provided followed by two invertible matrices. Each invertible

matrix is made up of a cascade of lifting steps and scaling factors (α_t). The scaling

factors can be folded into the quantization stepsizes. A forward binDCT (e.g., for

signal analysis), can be implemented using a cascade of ±1 butterflies and dyadic

lifting steps. Furthermore, an inverse binDCT (e.g., for signal synthesis) can be easily

realized using lifting steps of reverse order and inverted polarity, cascaded with ±1

butterflies.

In accordance with an embodiment of the invention, all of the lifting

coefficients have been chosen to be dyadic rational numbers. Each dyadic lifting step

can be constructed by a simple combination of shift-and-add operations. BRIEF DESCRIPTION OF THE DRAWINGS

These and other advantages, features, and applications of the invention will be

apparent from the following detailed description of the invention which is provided

in connection with the accompanying drawings in which:

Fig. 1 is a signal flow graph which depicts a conventional fast implementation

of an 8 x 8 floating-point DCT;

Figs. 2(a)-2(c) are signal flow graphs which depict variations on lifting steps

used in the analysis and synthesis of data samples;

Fig. 3 is a signal flow graph which depicts a general solution in accordance

with the invention for a perfect reconstruction block transform whose basis functions

have linear phase;

Fig. 4 is a signal flow graph which depicts how an invertible matrix U can be

decomposed into a cascade of lifting steps and scaling factors (α._j) in accordance with

a preferred embodiment of the invention;

Figs. 5-11 depict a forward binDCT in accordance with preferred

embodiments of the invention; Fig. 12 depicts frequency responses of a binDCT in accordance with a

preferred embodiment of the invention;

Fig. 13 depicts a transform and inverse transform matrix of the first

embodiment of a binDCT;

Fig. 14 depicts peak signal-to-noise ratios (PSNR) realized for two

embodiments of the binDCT; and

Fig. 15 depicts a simplified block diagram of a transmission system

incorporating analysis and synthesis filters which employ a binDCT block transform

in accordance with preferred embodiments of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Preferred embodiments and applications of the invention will now be

described with reference to Figs. 2-15. Other embodiments may be realized and

structural or logical changes may be made to the disclosed embodiments without

departing from the spirit or scope of the invention. Although the invention is

particularly described as applied to the analysis and synthesis of image and video data

samples, it should be readily apparent that the invention may be applied to any other

data samples or sets having the same or similar problems.

Instead of factoring the DCT into a cascade of rotation angles, as in

conventional practice in the art, the invention utilizes a series of lifting steps (also

known as "shear" operations or "ladder" structures). The basic principle behind a

lifting step is demonstrated in the signal flow graphs depicted in Figs. 2(a)-2(c). In

accordance with a preferred embodiment of the invention, the denominator of the

lifting coefficient /, is deliberately chosen to be dyadic in that it is a power of 2 (i.e., a

rational number that can be written in the form of l/2^m, k, m e -Z) for easy binary

implementation. Fig. 2(b) shows that the lifting step can achieve perfect

reconstruction of a signal even with the presence of a nonlinear truncation operator

such as a "floor" (which rounds down to the nearest integer), a "round" (which

rounds either up or down to the nearest integer), or a ceiling" (which rounds up to the nearest integer). Fig. 2(c) illustrates how this property is exploited to implement

the lifting step very efficiently using a combination of addition operations and bit-

shift operations. The method is always feasible as long as the lifting coefficient is

dyadic. For example, a lifting coefficient of 3/8 signifies "multiplying" an input

value by 3 followed by a "division" by 8 implemented by a right-shift of 3 binary

places (i.e., 2³ = 8 in the denominator). It should be noted that the term

"multiplying" is actually inaccurate since a series of bit-shift and addition operations

are performed upon the signal to achieve the same result without performing the

actual multiplication operation.

As known in the art, binary multiplication may be simplified through a series

of bit-shift and addition operations. Generally, in order to implement a

multiplication operation, a multiplicand (i.e., a number that is to be multiplied by

another) is multiplied by a multiplier (i.e., a number by which a multiplicand is to be

multiplied). When multiplying binary numbers, bit-shift and addition operations

may be implemented to effectively perform a multiplication without the same level of

complexity required for multiplication.

A general description of binary bit-shift and addition operations, as substitutes

for multiplication is found in U.S. Patent No. 3,691,359 to Dell et al., which is

incorporated herein by reference. For example, the numerator 3 of the lifting coefficient 3/8 signifies

performing the following steps: a) a left-shift of 1 binary place upon the input signal

to the lifting step; and b) after a left-shift of one place is performed (e.g., the decimal

value 5 is represented as 101 in binary; after a left-shift, 101 becomes 1010), the

result is then added to the original value of the input signal (e.g., 101), resulting in a

lifting step output of 1111 binary (or decimal value 15). A multiplication by 3 has

just been performed with just one bit-shift and one addition operation. In

accordance with a preferred embodiment of the invention, each of the DCT's

rotation angles (e.g., 14 of Fig. 1) are replaced with a minimal number of dyadic

lifting steps, yet the resulting transform still retains most of the DCT's superior

performance.

The DCT and the novel transformation referenced herein as "binDCT" are

both M x M block transforms. For purposes of illustration only, an 8 x 8 block will

be described herein. At the forward transform in the encoder, 8 input samples x[0] -

x[7] are processed at a time, yielding 8 transform coefficients X[0] - X[7] that can

then be quantized and encoded (e.g., entropy encoded) in a manner known in the

art. In the decoder, the 8 encoded transform coefficients are entropy decoded,

inverse quantized, and then fed to the inverse transform module. For ID signals

such as speech and audio, the input stream can be divided into 8 -sample segments,

each is then processed independently. For 2D and 3D signals (images and videos

respectively), the input can be segmented into non-overlapped 8 x 8 blocks or 8 x 8 x 8 cubes. Each block or cube can be processed independently in separable fashion.

For example, the ID transformation can be applied to every row, then to every

column, to obtain a 2D transformation.

As depicted in Figs. 5-8, each of which depicts a preferred embodiment of the

invention, the 8 input samples labeled from Λ^;[0] to x[7] are first processed in four

pairs of two by the ±1 butterflies 12. Each of the four ±1 butterflies 12 takes a pair of

data samples and computes their sum as well as their difference.

As depicted in Fig. 3, each of the resulting sums may be arranged in the 4

upper branches Ul, U2, U3, U4, which will be processed by invertible matrix U to

produce the 4 even-indexed transform coefficients X[0], X[2], X[4], X[6] associated

with the transform's 4 symmetric basis functions. Likewise, all of the resulting

differences are fed into the 4 lower branches VI, V2, V3, V4, that will be further

transformed by invertible matrix V to produce the 4 odd-indexed transform

coefficients X[l], X[3], X[5], X[7] associated with the transform's 4 antisymmetric

basis functions.

In accordance with an embodiment of the invention, the outputs of the

butterflies are processed significantly different by the binDCT, as compared with the

DCT. For instance, whereas the DCT relies on rotation angles (e.g., 14 of Fig. 1)

which are depicted below in matrix form as cos#, - sin#, cos θ, sin θ_t or sinfi cos# sin#, - cos#

the binDCT, in accordance with a preferred embodiment of the invention, relies on

lifting steps, or cascades of shears (e.g., 16 of Figs. 5-8), a general matrix

representation of which (to be described more fully below) is depicted below as

1 /, 0 1 /, ₊ ι 1

Since a rotation angle is orthogonal, its inverse is simply its transpose, e.g.,

cos#, - sin6> T cos#, - sin6> cos θ, sin θ, cos θ, - sin6ϊ 1 0 sin#, cos6> sin θ, cos θ, - sin#, cos#, sin θ, cos θ, 0 1

Inverting a lifting step is achieved simply by inverting its polarity, i.e.

This is always true as long as the diagonal elements of the matrix are unity with the

same polarity (both +1 or -1). In the case that the diagonal elements have opposite

polarity, the inverse of the lifting step is simply itself, i.e.,

Therefore, transforms constructed from series of lifting steps are still perfectly

invertible. Since the multiplication operations associated with the DCT rotation

angles require floating-point arithmetic, and since it is very difficult to choose

rotation angles that yield integer multiplications, in accordance with a preferred

embodiment of the invention, the orthogonality associated with the DCT may be

sacrificed in the binDCT in favor of lower complexity. Specifically, lifting steps are

used to replace the DCT's rotation angles, preferably having the denominator of the

lifting coefficients /, chosen to be dyadic, both the forward and the inverse transform

can be implemented without using any multiplication (or division) arithmetic

operations, particularly floating point multiplication. Fig. 9 depicts an embodiment of an inverse binDCT which employs inverted lifting steps having dyadic lifting

coefficients.

In accordance with another aspect of the invention, specific choices of lifting

coefficients /, were made to result in high-performance transforms.

In each of the 4 upper branches of the binDCTs depicted within Figs. 5-8, the

DCT's 2 butterflies (12a as in Fig. 1) are retained; however, the two upper branch

DCT rotation angles of π/4 and 3π/8 can be replaced with one of the following sets

of lifting steps depicted below in matrix form.

For example, the upper branch DCT rotation angle of π/4 (14a as seen in

Fig. 1) is replaced with a pair of lifting steps 16a (of Figs. 5-8). For illustrative

purposes, consider an input A on the upper branch of the first lifting step of 16a and

an input of B on the lower branch of the first lifting step of 16a. Consequently, the

output on the upper branch of the first lifting step is A+B and the output on the

lower branch of the first lifting step is B. Similarly, the input on the lower branch of

the second lifting step is B and the input for the upper branch of the second lifting

step is A+B. Consequently, the output for the upper branch of the second lifting

step is A+B and the output for the lower branch of the second lifting step is V₂(A+B)-

B. In accordance with a preferred embodiment of the invention, these lifting steps can be implemented in software and/or hardware using a series of binary shifts and

adds on the input data to produce the appropriate transform coefficients (e.g.,

X[0]...X[7]) representative of the original image data samples.

While any one of the above 3 matrix sets (or other equivalent sets apparent to

those of ordinary skill in the art) may be employed to obtain these output results

(albeit, with differing levels of accuracy), the first matrix set set forth above will be

derived, by way of example, as a replacement for the upper branch π/4 rotation angle

in Fig. 1.

One way in which respective outputs of A+B and B for the upper and lower

branches of the first lifting step may be obtained is to multiply the input vector

A B

by a matrix which represents the first lifting step of 16a:

1 1

0 1 Performing matrix multiplication results in the respective upper and lower branch

outputs of A+B and B in the first lifting step.

One way in which respective outputs of A+B and V₂(A+B)-B for the upper and

lower branches of the second lifting step of 16a may be obtained is to multiply the

vector input to the second lifting step

A + B B

by a matrix which represents the second lifting step of 16a:

1 0

1/2 -1

Using a method similar to that described above for the upper branch DCT

rotation angle of π/4 in the graph of Fig. 1, the upper branch DCT rotation angle of

3π/8 (14b in Fig. 1) may be replaced with the following sets of lifting steps (or

equivalent lifting steps apparent to those skilled in the art) depicted below in matrix

form: 3τv_ 8

In each of the 4 lower branches of the binDCTs, depicted within Figs. 5-8,

which yield the odd-indexed transform coefficients (i.e., X[l], X[3], X[5], X[7]),

three DCT rotation angles (i.e., π/4 (14d), 7π/16 (14e), and 3π/16 (14f, as shown

in Fig. 1 ) are also replaced with lifting steps, particularly those lifting steps having

dyadic denominators in their lifting coefficients. It should be noted that special care

must be taken in choosing a lifting coefficient to replace rotation angle π/4 (14d) in

the lower branches since it precedes the remaining angles in the structure. The

following choices of lifting steps (depicted in matrix form) are preferable to yield

better coding performances although other equivalent lifting steps may be apparent

to those skilled in the art:

^π . -1 5/8 1 0 -1 3/8 1 0 1 -7/16 -1 7/16 1 0 1 -7/16 4 ^' 0 1 3/8 1 0 1 11/16 1 0 1 0 1 3/4 1 0 1

Next, the DCT rotation angle 7π/16 (14e shown in Fig. 1) may be replaced with

one of the following sets of lifting steps (or equivalent steps) depicted below in

matrix form:

^"1 -1/8^" 1 0^" ^"l - 3/16^" 1 0 1 1 1 0

, or

16 ^' 0 1 3/16 1 _ _0 1 ^■ 13/16 1 0 1 ^■ 13/16 1

Finally, the DCT rotation angle 3π/16 (14f shown in Fig. 1) may be replaced with

matrix form:

π _t 16

Note that each embodiment of the binDCT still employs the following ±1 butterfly

(of Fig. 1) which may be represented in matrix form as:

1 1

1 - 1 and whose inverse is itself scaled down by a factor of two. To recover xfi] exactly on

the synthesis end, the outputs of the inverse binDCTs have to be attenuated by Vt,

which is preferably implemented in software and/or hardware as a right-shift of the

data two binary places. It should be noted that in lossless coding, the ±1 butterfly is

not very efficient since it amplifies the signal by 2. To compensate for this effect, in

accordance with another aspect of the invention one or more ±1 butterflies (e.g., 12

in Figs. 5-8) may be replaced with either of the following sets of lifting steps (or

equivalent steps) as depicted below in matrix form:

As an example, Figs. 10 and 11 are respectively provided to illustrate lossless

versions of the embodiment of the binDCT shown in Fig. 5 and its inverse binDCT

in which the ±1 butterflies (e.g., 12 of Fig. 5-8) are replaced with lifting steps having

dyadic denominators in their lifting coefficients. Similar modifications may be made

to each embodiment (e.g., as respectively depicted in Figs. 5-8) without departing

from the spirit or scope of the invention. Fig. 12 depicts a graphical representation of frequency response for a forward

and inverse binDCT, respectively, in accordance with a preferred embodiment of the

invention.

Fig. 13 illustrates a table of coefficients for a transform matrix which may be

used to produce the same or substantially the same coefficients produced by binDCT,

in accordance with embodiments of the invention (e.g., as in Fig. 5). The top

portion of the chart 130 depicts transform coefficients used for signal analysis. The

bottom portion of the chart 132 depicts transform coefficients used for signal

synthesis. Each portion of the table 130, 132 is obtained by first converting each

±1 butterfly and each lifting step within binDCT, e.g., the preferred embodiment of

the invention shown in Fig. 5, into matrix form. Next, multiplying each of the

matrices within a basis function (e.g., x[0]) to respectively produce results in the 8

rows representative of the 8 output coefficients produced using binDCT. That is, the

Fig. 13 table provides a matrix of transform coefficients, h₀ through h₇, which are M

basis functions of an M x M transform consistent with a preferred embodiment of the

invention.

Fig. 14 depicts a table showing empirical data of peak signal-to-noise ratios

(PSNR) between reconstructed and original images. The table depicts results for

three separate test images represented in 512 x 512 8-bit grayscale; "Lena,"

"Goldhill" and "Barbara." The table depicts the PSNR for the DCT in the left column of each test image and offers a comparison with two preferred embodiments

of the invention (of Figs. 5 and 7, respectively). As can be seen in Fig. 14, the

performance of the binDCT is very similar to that of the DCT.

Referring now to Fig. 15, a transmission system is depicted in which each

embodiment of the binDCT may be employed within a respective analysis block 150,

152, and each inverted binDCT may be employed within a respective synthesis block

154, 156. At the analysis side, an input signal Y(z) is processed by a pair of analysis

block 150, 152, each employing an embodiment of the binDCT block transform in

accordance with the invention. The resulting sub -band signals are then decimated by

a factor of two. At the synthesis side, the signals Y₀(z), Yi(z) are interpolated, and

transformed by analysis blocks 154, 156, each of which employs an embodiment of

the inverse binDCT block transform in accordance with the invention. The

synthesized signals are then added to produce the output signal Y(z)'. As should be

readily apparent, the application of the invention is not limited to a transmission

system and may be employed for other applications including, but not limited to,

storing the coded data and reproduction of the data using any given medium.

Furthermore, although an 8 x 8 transform has been depicted, the implementation of

the invention is not limited to 8 x 8 transforms and, accordingly, may be used to

process matrices of any dimension without departing from the spirit or scope of the

invention. The invention provides a simplified block transform that can be implemented

using only bit-shift and add operations. The transform replaces DCT rotation angles

with a series of lifting steps preferably having dyadic denominators in their lifting

coefficients specifically chosen to allow the binDCT to operate with substantially the

same degree of accuracy as the DCT, but without the DCT's level of complexity.

The result is a simpler, faster, transform process which retains all of the practical

benefits of the DCT.

The novel block transform of the invention, "binDCT," offers at least the

following advantages:

• The binDCT has a fast, elegant implementation utilizing only shift-and-

add operations. No multiplication is needed. For example, eight

transform coefficients can be computed using only 14 bit shift operations

and 31 addition operations (13 floating point-multiplication operations

and 29 addition operations are required for the DCT);

• The binDCT can map integers to integers with exact reconstruction. It

also allows a unifying lossy/lossless coding framework; • In software implementation, the binDCT is at least 3 - 4 times faster than

the floating-point DCT. In hardware implementations, the binDCT may

be approximately 10 times or more faster than the floating-point DCT;

• The multiplierless property of the binDCT allows efficient very large scale

integrated circuit (VLSI) implementations in terms of both chip area and

power consumption;

• The binDCT approximates the DCT very closely. Perceptual quantization

matrices and coding strategies designed specifically for the DCT can be

applied to the binDCT immediately without any complicated

modification; and

• The cascade of a well-tuned forward binDCT followed by an inverse DCT

(or a forward DCT followed by an inverse binDCT) produces a reasonable

near-perfect-reconstruction system.

The binDCT provides very high coding performance. For instance, the

coding gain (a popular measure of a transform's energy compaction property) of

several binDCT versions may range in certain examples from 8.77 to 8.82 dB,

whereas the DCT has a coding gain of 8.83 dB. In image coding experiments, the

binDCT may be approximately 0.1 - 1.0 dB below the DCT in the signal-to-noise ratios of the reconstructed images in certain examples. The binDCT may be found

to offer higher visual reconstructed quality than the DCT.

While preferred embodiments of the invention have been described and

illustrated, it should be apparent that many modifications can be made to the

invention and the invention's application without departing from its spirit or scope.

Accordingly, the invention is not limited by the foregoing description or drawings,

but is only limited by the scope of the appended claims.

Claims

What is claimed is:

1. A method of processing digital samples, the method comprising:

processing the digital samples with a plurality of lifting steps, said act of

processing being multiplierless.

2. The method as in claim 1 further comprising:

receiving at an input of a transform a plurality of sample segments of discrete

time signals; and

generating a plurality of transform coefficients.

3. The method as in claim 2 further comprising processing the plurality of

sample segments in pairs of two, said act of processing generating a plurality of sums

and differences of said sample segments.

4. The method as in claim 1, wherein a lifting coefficient of the lifting

steps is dyadic.

5. The method as in claim 3, wherein said act of processing is achieved

with at least one ±1 butterfly.

6. The method as in claim 5, wherein said act of processing is achieved

with a lifting step which, in matrix form, can be expressed as either:

7. The method as in claim 1 further comprising:

receiving at an input of a transform a plurality of generated transform

coefficients; and

generating a plurality of sample segments of discrete time signals.

8. The method as in claim 7 further comprising processing a plurality of

sums and differences generated from said act of processing the digital samples with a

plurality of ±1 butterflies.

9. The method as in claim 8, wherein said act of processing a plurality of

sums and differences is achieved with a plurality of lifting steps having dyadic lifting

coefficients.

10. The method as in claim 2, wherein said act of receiving further

comprises receiving at an input of an M x M transform M sample segments of

discrete time signals.

11. The method as in claim 10, wherein said act of generating further

comprises generating M transform coefficients.

12. The method as in claim 7, wherein said act of receiving further

comprises receiving at an input of an inverse M x M transform M generated

transform coefficients.

13. The method as in claim 12, wherein said act of generating further

comprises generating M sample segments of discrete time signals.

14. The method as in claim 4, wherein one of the following sets of dyadic

lifting coefficients, as represented in matrix form, is employed:

^"

15. The method as in claim 4, wherein one of the following sets of dyadic

lifting coefficients, as represented in matrix form, is employed:

1 0

- 5/8 1

16. The method as in claim 4, wherein one of the following sets of dyadic

lifting coefficients, as represented in matrix form, is employed:

-1 5/8 1 -1 3/8 1 0 1 - 7/16 -1 7/16 1 0 ^■ 7/16 0 1 3/8

0 1 11/16 1 0 1 0 1 3/4 1 1

- 1 1/2 1 0 1 - 1/2 -1 3/8 1 0^" 1 - 3/8^' 1 1 or 0 1 3/4 1 0 1 0 1 3/4 1 0 1 1 -1

17. The method as in claim 4, wherein one of the following sets of dyadic

lifting coefficients, as represented in matrix form, is employed:

18. The method as in claim 4, wherein one of the following sets of dyadic

lifting coefficients, as represented in matrix form, is employed:

1 0 1 7/8 1 0 1 1/2 1 0 1 0 1 1/2 1 0 or

-1/2 1 0 1 - 1/8 1 0 1 - 3/4 1 - 1/4 1 0 1 -1/4 1

19. An analysis filter, the filter comprising:

a receiver for receiving at an input a plurality of sample segments of discrete

time signals; and

means for processing the plurality of sample segments of discrete time signals

with a plurality of lifting steps, the processing being multiplierless, each of the

plurality of lifting steps having a dyadic lifting coefficient, and the means for

processing also being capable of generating a plurality of transform coefficients.

20. A synthesis filter, the filter comprising:

a receiver for receiving at an input a plurality of generated transform

coefficients; and

means for processing the plurality of generated transform coefficients through

a plurality of lifting steps, the processing being multiplierless, each of the plurality of

lifting steps having a dyadic lifting coefficient, and said means for processing also

being capable of generating a plurality of sample segments of discrete time signals.

21. A computer readable storage medium containing a computer readable

code for operating a computer to process digital samples, said code causing said

computer to:

process the digital samples with a plurality of lifting steps, said act of

processing being multiplierless.

22. The computer readable storage medium of claim 21, wherein said code

further causes said computer to:

receive at an input of a transform a plurality of sample segments of discrete

time signals; and

generate a plurality of transform coefficients.

23. The computer readable storage medium method of claim 22, wherein

said code further causes said computer to:

process the plurality of sample segments in pairs of two, thereby generating a

plurality of sums and differences of said sample segments.

24. The computer readable storage medium of claim 21, wherein a lifting

coefficient of the lifting steps is dyadic.

25. The computer readable storage medium of claim 23, wherein said code

causes said act of processing to be achieved with at least one ±1 butterfly.

26. The computer readable storage medium of claim 25, wherein said act

of processing is achieved with a lifting step which, in matrix form, can be expressed as

either:

27. The computer readable storage medium of claim 21, wherein said code

further causes said computer to:

receive at an input of a transform a plurality of generated transform

coefficients; and

generate a plurality of sample segments of discrete time signals.

28. The computer readable storage medium of claim 27, wherein said code

further causes said computer to process a plurality of sums and differences generated

from said act of processing the digital samples with a plurality of ±1 butterflies.

29. The computer readable storage medium of claim 28, wherein said code

causes said act of processing a plurality of sums and differences to be achieved with a

plurality of lifting steps having dyadic lifting coefficients.

30. The computer readable storage medium of claim 22, wherein said code

further causes said computer to receive at an input of an M x M transform M sample

segments of discrete time signals.

31. The computer readable storage medium of claim 30, wherein said code

further causes said computer to generate M transform coefficients.

32. The computer readable storage medium of claim 27, wherein said code

further causes said computer to receive at an input of an inverse M x M transform M

generated transform coefficients.

33. The computer readable storage medium of claim 32, wherein said code

further causes said computer to generate M sample segments of discrete time signals.

34. The computer readable storage medium of claim 24, wherein said code causes

one of the following lifting coefficients, as represented in matrix form, to be

employed:

^"l - 1/2^" ^"l o^" ^"1 0^" ^"1 l^"

L^u lj |_1 -lj |_l/2 -lj [θ lj

35. The computer readable storage medium of claim 24, wherein said code

causes one of the following lifting coefficients, as represented in matrix form, to be

employed:

1 0 ^■ 3/8 1 0 1 1 1 0 or 3/8 1 1 - 5/8 1 0 1 - 5/8 1

36. The computer readable storage medium of claim 24, wherein said code

employed:

^"- ] I 5/8^" ^"1 0^" - 1 3/8] ^"1 0^" ^"1 - 7/16^" ^"- 1 7/16^" ^"1 0] - 7/16

0 1 3/8 1_ 0 lj 11/16 1_ _.0 lj _0 1_ 3/4 lj 1

37. The computer readable storage medium of claim 24, wherein said code

employed:

38. The computer readable storage medium of claim 24, wherein said code

employed:

1 0 1 7/8 1 0 1 1/2 1 0 1 0 1 1/2 1 0 or

-1/2 1 0 1 - 1/8 1 0 1 - 3/4 1 -1/4 1 0 1 -1/4 1