Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory

作者:Yin, Shouyi*; Tang, Shibin; Lin, Xinhan; Ouyang, Peng; Tu, Fengbin; Liu, Leibo; Zhao, Jishen; Xu, Cong; Li, Shuangcheng; Xie, Yuan; Wei, ShaoJun
来源:IEEE Transactions on Parallel and Distributed Systems, 2019, 30(1): 146-160.
DOI:10.1109/TPDS.2018.2858230

摘要

Recent advances in deep learning (DL) have stimulated increasing interests in neural networks (NN). From the perspective of operation type and network architecture, deep neural networks can be categorized into full convolution-based neural network (ConvNet), recurrent neural network (RNN), and fully-connected neural network (FCNet). Different types of neural networks are usually cascaded and combined as a hybrid neural network (Hybrid-NN) to complete real-life cognitive tasks. Such hybrid-NN implementation is memory-intensive with large number of memory accesses, hence the performance of hybrid-NN is often limited by the insufficient memory bandwidth. A "3D + 2.5D" integration system, which integrates a high-bandwidth 3D stacked DRAM side-by-side with a highly-parallel neural processing unit (NPU) on a silicon interposer, overcomes the bandwidth bottleneck in hybrid-NN acceleration. However, intensive concurrent 3D DRAM accesses produced by the NPU lead to a serious thermal problem in 3D DRAM. In this paper, we propose a neural processor called Parana for hybrid-NN acceleration in consideration of thermal problem of 3D DRAM. Parana solves the thermal problem of 3D memory by optimizing both the total number of memory accesses and memory accessing behaviors. For memory accessing behaviors, Parana balances the memory bandwidth by spatial division mapping hybrid-NN onto computing resources, which efficiently avoids that masses of memory accesses are issued in a short time period. To reduce the total number of memory accesses, we design a new NPU architecture and propose a memory-oriented tiling and scheduling mechanism to exploit the maximum utilization of on-chip buffer. Experimental results show that Parana reduces the peak temperature by up to 54.72 degrees C and the steady temperature by up to 32.27 degrees C over state-of-the-art accelerators with 3D memory without performance degradation.