Memory-Hierarchical and Mode-Adaptive HEVC Intra Prediction Architecture for Quad Full HD Video Decoding

作者:Huang Chao Tsung*; Tikekar Mehul; Chandrakasan Anantha P
来源:IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2014, 22(7): 1515-1525.
DOI:10.1109/TVLSI.2013.2275571

摘要

This paper presents a high-throughput and area-efficient VLSI architecture for intra prediction in the emerging high efficiency video coding standard. Three design techniques are proposed to address the complexity systematically: 1) a hierarchical memory deployment that stores neighboring samples in 4.9 Kb of static RAM (SRAM) instead of 43.2-k gates of registers and increases throughput by processing reference samples in registers; 2) a mode-adaptive scheduling scheme for all prediction units, which provides at least 2 samples/cycle throughput while using low-throughput SRAM and can achieve 2.46 samples/cycle on the average based on the experimental results; and 3) resource sharing for multipliers and the readout circuits of reference sample registers, which can save 2.5-k gates. These techniques can efficiently reduce area by 40% but induce more power because of additional signal transitions. Signal-gating circuits are then applied to reduce 69% of SRAM power and 32% of logic power, which cost only 1.0-k gates. When synthesized at 200 MHz with 40-nm process, the proposed architecture needs only 27.0-k gates and 4.9 Kb of single-port SRAM. The layout core area is 0.036mm2, and the power consumption is 2.11 mW in the postlayout simulation. The corresponding performance can support quad full high-definition (HD) (3840 x 2160) video decoding at 30 frames/s.