The Implementation of the 2D-IDCT


This page explains how the inverse discrete cosine transform(IDCT) is implemented. It follows the proposals of Arai, Agiu and Nakajima* as well as of Tseng and Miller** completed by the extensions made by Feig***.

*Y.Arai,T.Agui,M.Nakajima, A Fast DCT-SQ Scheme for Images, Trans. of the IEICE.E 71(11):1095(Nov.1988)
**B.D.Tseng,W.C.Miller, On Computing the Discrete Cosine Transform, IEEE Trans. on Computers, C-27(10):966-8(Oct. 1978)
***E.Feig,E.Linzer, Discrete Cosine Transform Algorithms for Image Data Compression, Proceedings Electronic Imaging '90 East, pp.84-7, Boston, MA(Oct.29 - Nov.1,1990)
also published in:W.B.Pennebaker,J.L.Mitchell,JPEG Still Image Compression Standard, Van Nostrand Reinhold, 1993

The starting point is the formula for the 1 dimensional DCT(1D-DCT):

1D-DCT.gif

The suffix 8 means that the DCT was built by 8 coefficients.

First the 1D-DCT is applied to all rows of the DCT matrix:

1D-DCT-ROW.gif

Then the 1D-DCT is applied to the columns of the the result of this operation:

1D-DCT-COLUMN.gif

By substitution we get the formula of the 2D-DCT:

2D-DCT-COLUMN.gif

which proofs: The 2D-DCT can be regarded as a double application of the 1D-DCT. Therefore we first regard the 1D-DCT.

By definition of:

subst1.gif

we can rewrite the formula:

transform1.gif

For some reason we compute HcosB.gif:

transform2.gif

Therefore the following is valid:

result1.gif

If we constitute a sequence of elements f(k), k = 0...15 with:

newseq.gif

we can rewrite the formula:

transform3.gif

imagin.gif

Because the 16 point discrete Fourier transform is defined by

DFT.gif

the following relation is valid:

DFT-DCT.gif

That is the basic idea: Instead of performing an inverse DCT an inverse DFT is performed. To obtain the DFT values we first multiply the DCT values with 2cosB.gif. It seems as this were very inefficient. But bear in mind that the last operation before the IDCT is the quantization. That means every value is to be multiplied with a certain constant depending on it's position in DCT-matrix. To obtain the DFT values we simply multiply the quantization constants with 2cosB.gif. As a result the quantization and DCT-DFT transformation is performed in one step. The IDFT actually computes 16 values. But we can simply ignore the higher 8 values (They are equal to the first 8 values but in reverse order).

To obtain the IDFT equations we first bring the DFT equations in matrix form. If we define a line of the original values by:

orig_val.gif

a line of the scaled transform values

trans_val.gif

and a function: Pfunc.gif we can establish a Matrix TM:

Tmatrix.gif

which transforms the values according to equation (5):

Mequ1.gif

Because:cos_law.gif
the cos computes only 0, 1, -1 an the positive and negative values of the following 3 constants:

constK.gif

Therefore the matrix can be rewritten:

Tmatrix1.gif

To get the matrix equation for IDFT the matrix TM must be inverted:

Mequ2.gif

The inversion gives:

Lmatrix.gif

with:

constC.gif

The matrix L can be factored:

Mequ3.gif

with:

Mgroup1.gif

So the IDFT can be performed by 5 steps:

glsys1.gif

In part M a simplification can be applied:

glsys2.gif

If you count the operations you'll get 29 additions and 5 multiplications.

The ordinary way is to exploit the horizontal-vertical properties of the DCT (equations (1) -(4)):

trans1D.gif

First the matrix L is applied to all rows. Then the result is rotated by 90° to the right and the L is applied once again. Because the 1D-IDFT is applied 16 times this results in 29 * 16 = 464 additions and 16 * 5 = 80 multiplications. The known libjpeg.a uses the horizontal-vertical properties of the DCT wereby some special cases are regarded.

But Feig*** describes an 2D-algorithm with less operations. The original matrix is re-packed into a one-line-matrix FF by concatenating all lines:

recons.gif

Then a matrix KK must be found which transforms FF in a one-line-matrix XX containing all retransformed lines in one line:

Mequ4.gif

The Matrix KK is the tensor or Kronecker product of L:

Mequ5.gif

To explain what it means regard the matrices V1 and V2:

kron1.gif

The tensor or Kronecker product is:

kron2.gif

By means of the following substitution:

subst2.gif

the matrix KK can be represented as:

Mequ6.gif

Because of some distributive properties of the tensor product this is the same as:

Mequ7.gif

Now we can decide how to compute the 3 tensor products: according to equation (7) or by exploiting the horizontal-vertical-properties of the tensor product (equations (1)-(4)).

According to Feig*** the current application uses the horizontal-vertical-property for the 1st and last tensor product. This requires 16 * 26 = 416 additions. Note that the rotation by 90° isn't a real operation! It is simply a permutation of the indexes.

The middle tensor product is computed according to equation (7):

Ten_Mmatrix.gif

Note that every 0 represents a 8x8 zero matrix!

The lines 1,2,4 and 8 are multiplications with the original Matrix M. We already know this can be performed by 3 additions and 5 multiplications per line:

Equ_Mmatrix.gif

The lines 3 and 6 are treated the same way extended by a multiplication with C4:

Equ_C4matrix.gif

This requires 7 multiplications, 3 additions and 2 shifts to the left.

To transform the values of lines 5 and 7, we build a matrix J only containing the lines 5 and 7 without the elements 5 and 7:

Jmatrix.gif

a submatrix M' of M by discarding both, lines 5 and 7 as well as columns 5 and 7:

Msmatrix.gif

The matrix N contains only the values which are both, in lines and columns 5 and 7 of matrix M:

Nmatrix.gif

The resulting matrix O:

Omatrix.gif

can be computed by:

Mequ10.gif

This gives equations like:

rotations.gif

They can be computed similarly to equations(6) by 3 additions and 3 multiplications.

To compute the 5th and the 7the value of lines 5 and 7 the values are to be multiplied with the tensor product of N:

Mequ8.gif

If we compute the matrix we get:

Nmatrix2.gif

But because of:

subst3.gif

this results in:

Mequ12.gif

which can be factored:

Mequ9.gif

Using the relation:subst4.gif we get:

Mequ11.gif

This implies 2 multiplications 10 additions and 2 shifts to the left.

If you count the operations for the tensor product of M you'll get:

linesadditions
per line
multiplications
per line
shifts to the left
per line
additions
total
multiplications
total
shifts to the left
total
1,2,4,835012200
3,63726144
5 and 76*3+106*3+2228202
total46546

If we add the 416 additions for the first and the last tensor product we get: 462 additions, 54 multiplications and 6 shifts to the left.


NEWSee also An experimental IDCT study by René Stöckel (including a patch to mpeg_play-3.3, which changes the IDCT to IDFT)
upBack to Java-inline-MPEG-Player