This page explains how the inverse discrete cosine transform(IDCT) is implemented. It follows the proposals of Arai, Agiu and Nakajima* as well as of Tseng and Miller** completed by the extensions made by Feig***.
*Y.Arai,T.Agui,M.Nakajima, A Fast DCT-SQ Scheme for Images,
Trans. of the IEICE.E 71(11):1095(Nov.1988)
**B.D.Tseng,W.C.Miller, On Computing the Discrete Cosine Transform,
IEEE Trans. on Computers, C-27(10):966-8(Oct. 1978)
***E.Feig,E.Linzer, Discrete Cosine Transform Algorithms for
Image Data Compression, Proceedings Electronic Imaging '90 East, pp.84-7, Boston,
MA(Oct.29 - Nov.1,1990)
also published in:W.B.Pennebaker,J.L.Mitchell,JPEG Still Image Compression Standard,
Van Nostrand Reinhold, 1993
The starting point is the formula for the 1 dimensional DCT(1D-DCT):
The suffix 8 means that the DCT was built by 8 coefficients.
First the 1D-DCT is applied to all rows of the DCT matrix:
Then the 1D-DCT is applied to the columns of the the result of this operation:
By substitution we get the formula of the 2D-DCT:
which proofs: The 2D-DCT can be regarded as a double application of the 1D-DCT. Therefore we first regard the 1D-DCT.
By definition of:
we can rewrite the formula:
For some reason we compute
:
Therefore the following is valid:
If we constitute a sequence of elements f(k), k = 0...15 with:
we can rewrite the formula:

Because the 16 point discrete Fourier transform is defined by
the following relation is valid:
That is the basic idea: Instead of performing an inverse DCT an inverse DFT is performed.
To obtain the DFT values we first multiply the DCT values with
.
It seems as this were very inefficient. But bear in mind that the last operation before
the IDCT is the quantization. That means every value is to be multiplied
with a certain constant depending on it's position in DCT-matrix. To obtain the DFT values
we simply multiply the quantization constants with
.
As a result the quantization and DCT-DFT transformation is performed in one step. The IDFT
actually computes 16 values. But we can simply ignore the higher 8 values (They are
equal to the first 8 values but in reverse order).
To obtain the IDFT equations we first bring the DFT equations in matrix form. If we define a line of the original values by:
a line of the scaled transform values
and a function:
we can establish a Matrix TM:
which transforms the values according to equation (5):
Because:
the cos computes only 0, 1, -1 an the positive and
negative values of the following 3 constants:
Therefore the matrix can be rewritten:
To get the matrix equation for IDFT the matrix TM must
be inverted:
The inversion gives:
with:
The matrix L can be factored:
with:
So the IDFT can be performed by 5 steps:
In part M a simplification can be applied:
If you count the operations you'll get 29 additions and 5 multiplications.
The ordinary way is to exploit the horizontal-vertical properties of the DCT (equations (1)
-(4)):
First the matrix L is applied to all rows. Then the result is rotated by 90° to
the right and the L is applied once again. Because the 1D-IDFT is applied 16 times
this results in 29 * 16 = 464 additions and 16 * 5 = 80
multiplications. The known libjpeg.a uses the horizontal-vertical properties of the DCT wereby some
special cases are regarded.
But Feig*** describes an 2D-algorithm with less operations.
The original matrix is re-packed into a one-line-matrix FF by
concatenating all lines:
Then a matrix KK must be found which transforms FF in a
one-line-matrix XX containing all retransformed lines in one line:
The Matrix KK is the tensor or Kronecker product of
L:
To explain what it means regard the matrices V1 and V2:
The tensor or Kronecker product is:
By means of the following substitution:
the matrix KK can be represented as:
Because of some distributive properties of the tensor product this is the same as:
Now we can decide how to compute the 3 tensor products: according to equation (7) or
by exploiting the horizontal-vertical-properties of the tensor product (equations (1)-(4)).
According to Feig*** the current application uses the horizontal-vertical-property for the 1st and last tensor
product. This requires 16 * 26 = 416 additions.
Note that the rotation by 90° isn't a real operation! It is simply a permutation of the indexes.
The middle tensor product is computed according to equation (7):
Note that every 0 represents a 8x8 zero matrix!
The lines 1,2,4 and 8 are multiplications with the original Matrix M.
We already know this can be performed by 3 additions and 5 multiplications per line:
The lines 3 and 6 are treated the same way extended by a multiplication with C4:
This requires 7 multiplications, 3 additions and 2 shifts to the left.
To transform the values of lines 5 and 7, we build a matrix J only
containing the lines 5 and 7 without the elements 5 and 7:
a submatrix M' of M by discarding both, lines 5 and
7 as well as columns 5 and 7:
The matrix N contains only the values which are both, in lines and columns
5 and 7 of matrix M:
The resulting matrix O:
can be computed by:
This gives equations like:
They can be computed similarly to equations(6) by 3 additions
and 3 multiplications.
To compute the 5th and the 7the value of lines 5 and 7 the values
are to be multiplied with the tensor product of N:
If we compute the matrix we get:
But because of:
this results in:
which can be factored:
Using the relation:
we get:
This implies 2 multiplications 10 additions and 2 shifts to the left.
If you count the operations for the tensor product of M you'll get:
| lines | additions per line | multiplications per line | shifts to the left per line | additions total | multiplications total | shifts to the left total |
|---|---|---|---|---|---|---|
| 1,2,4,8 | 3 | 5 | 0 | 12 | 20 | 0 |
| 3,6 | 3 | 7 | 2 | 6 | 14 | 4 |
| 5 and 7 | 6*3+10 | 6*3+2 | 2 | 28 | 20 | 2 |
| total | 46 | 54 | 6 | |||
If we add the 416 additions for the first and the last tensor product we get: 462 additions, 54 multiplications and 6 shifts to the left.
Java-inline-MPEG-Player