摘要

Convolution filtering applications range from image recognition and video surveillance. Two observations drive the design of a new buffering architecture for convolu-tion filters. First, the convolutional operations are inherent-ly local;hence every pixel of the output feature maps is cal-culated by the neighboring pixels of the input feature maps. Even though the operation is simple, the convolution filter-ing is both computation-intensive and memory-intensive. For real-time applications, large amounts of on-chip memo-ries are required to support massively parallel processing architectures. Second, to avoid access to external memories directly, the data that are already stored in on-chip memo-ries should be used as many times as possible. Based on the-se two observations, we show that for a given throughput rate and off-chip memory bandwidth, a rotation-based data buffering architecture provide the optimum area-utilization results for a particular design point, which are commonly used applications in recognition area.

全文