摘要

This paper presents a high-throughput, cost-effective implementation of six different integer transforms in the H.264/AVC high-profile coders, i.e., 4 x 4 forward, 4 x 4 inverse, forward Hadamard, inverse Hadamard, 8 x 8 forward, and 8 x 8 inverse transform, all integrated as a shared hardware. The 4 x 4 transform matrices are regularized by using permutation, partitioned into 2 x 2 blocks, and factored for maximal hardware sharing. By using two types of 4 x 4 transform matrices included in an 8 x 8 transform matrix, two different 8 x 8 transforms are both described as three steps and unified with minor modification. To improve throughput of the transform, two independent 4 x 4 transform blocks within the 8 x 8 transform block operate in parallel in the 4 x 4 transform mode, while the two-stage pipelined architecture is used in the 8 x 8 transform mode. Using 0.18-mu m CMOS technology, the maximum operating frequency of the proposed multitransform architecture is 200 MHz, which achieves 4.1 Gpixels/sec throughput rate with the hardware cost of 63618 gates. Compared with existing designs, the proposed design delivers at least 54% higher throughput at 38% higher throughput/area ratio in Adaptive Block-size Transform (ABT) mode.

  • 出版日期2010-4