Embedded Streaming Deep Neural Networks Accelerator With Applications

Dundar Aysegul<sup>*</sup>; Jin Jonghoon; Martini Berin; Culurciello Eugenio

doi:10.1109/TNNLS.2016.2545298

摘要

Deep convolutional neural networks (DCNNs) have become a very powerful tool in visual perception. DCNNs have applications in autonomous robots, security systems, mobile phones, and automobiles, where high throughput of the feed-forward evaluation phase and power efficiency are important. Because of this increased usage, many field-programmable gate array (FPGA)-based accelerators have been proposed. In this paper, we present an optimized streaming method for DCNNs' hardware accelerator on an embedded platform. The streaming method acts as a compiler, transforming a high-level representation of DCNNs into operation codes to execute applications in a hardware accelerator. The proposed method utilizes maximum computational resources available based on a novel-scheduled routing topology that combines data reuse and data concatenation. It is tested with a hardware accelerator implemented on the Xilinx Kintex-7 XC7K325T FPGA. The system fully explores weight-level and node-level parallelizations of DCNNs and achieves a peak performance of 247 G-ops while consuming less than 4 W of power. We test our system with applications on object classification and object detection in real-world scenarios. Our results indicate high-performance efficiency, outperforming all other presented platforms while running these applications.

出版日期2017-7

全文

访问全文

收藏分享被引(87) 浏览

更新时间：2024-05-05 05:18

Embedded Streaming Deep Neural Networks Accelerator With Applications

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友