摘要

In this paper, we propose a CGSA (Coarse Grained Spatial Architecture) which processes different kinds of convolution with high performance and low energy consumption. The architecture's 16 coarse grained parallel processing units achieve a peak 152 GOPS running at 500MHz by exploiting local data reuse of image data, feature map data and filter weights. It achieves 99 frames/s on the convolutional layers of the AlexNet benchmark, consuming 264mW working at 500MHz and 1V. We evaluated the architecture by comparing some recent CNN's accelerators. The evaluation result shows that the proposed architecture achieves 3x energy efficiency and 3.5x area efficiency than existing work of the similar architecture and technology proposed by Chen.