摘要

In this study, a novel fast algorithm based hardware-sharing architecture for 4 x 4, 8 x 8, 16 x 16, and 32 x 32 inverse core transforms in high-efficiency video coding (HEVC) with a cost effective and highly hardware efficient design is developed. By using the symmetrical characteristics of the elements in inverse core transform matrices, the core transform matrix with symmetrical characteristics is factorized into several submatrices. Based on the symmetry and similarity between the submatrices, the hardware of the (N/2) x (N/2) inverse core transform is shared with that of the N x N inverse core transform for N = 32, 16, and 8. Compared with each transform design without hardware shares, the proposed multiplierless transform architecture reduces the hardware overheads of adders and shifters by 32 and 36 %, respectively. The hardware efficiency of the proposed architecture is up to 166 % higher than that of several previous transform designs for HEVC, and up to 141 % higher than that of field-programmable gate array (FPGA)-based 16-point transform designs. Because it uses 90-nm complimentary metal-oxide semiconductor (CMOS) technology produced by the Taiwan Semiconductor Manufacturing Company (TSMC), the proposed 1-D hardware sharing scheme requires 115.7 K gate counts to achieve an operational frequency of up to 200 MHz, and it can decode 4 x 2 K (4096 x 2048 pixels) and 8 K UHDTV (7680 x 4320 pixels) video in real time at up to 127 and 32 frames per second, respectively.

  • 出版日期2016-1