摘要

Motion estimation (ME) is the most critical component of a video coding system, and it also dominates the major part of computation complexity and memory bandwidth. For H.264/AVC integer motion estimation (IME), this paper presents a novel memory-access and computation efficient full-search block-matching hardware architecture. With the highest level of on-chip data reuse, one-access for off-chip reference pixels is achieved, and the off-chip memory bandwidth is thus minimized. By distributed data caching and virtual connection of reference picture boundaries, the data traffic scheduling is simple, regular and efficient. The computation engine employs a two-dimensional (2-D) systolic processor array to calculate the absolute differences in single-instruction multiple-data (SIMD) manner, and 2-D adder trees to sum lip the absolute differences, all with 100% utilization. The proposed architecture fully supports variable block-size matching of H.264/AVC, and can produce 41 sums absolute differences (SADs) for one search point every cycle without bubble. The architecture is described in parameterized design, and an implementation for standard-definition digital TV encoding applications is presented. Theoretical analysis and experimental results show that, the proposed architecture can achieve the minimum off-chip memory bandwidth and the maximum computational performance.