Machine Intelligence on Resource-Constrained IoT Devices: The Case of Thread Granularity Optimization for CNN Inference

Motamedi Mohammad<sup>*</sup>; Fong Daniel; Ghiasi Soheil

doi:10.1145/3126555

摘要

Despite their remarkable performance in various machine intelligence tasks, the computational intensity of Convolutional Neural Networks (CNNs) has hindered their widespread utilization in resource-constrained embedded and IoT systems. To address this problem, we present a framework for synthesis of efficient CNN inference software targeting mobile SoC platforms. We argue that thread granularity can substantially impact the performance and energy dissipation of the synthesized inference software, and demonstrate that launching the maximum number of logical threads, often promoted as a guiding principle by GPGPU practitioners, does not result in an efficient implementation for mobile SoCs. We hypothesize that the runtime of a CNN layer on a particular SoC platform can be accurately estimated as a linear function of its computational complexity, which may seem counter-intuitive, as modern mobile SoCs utilize a plethora of heterogeneous architectural features and dynamic resource management policies. Consequently, we develop a principled approach and a data-driven analytical model to optimize granularity of threads during CNN software synthesis. Experimental results with several modern CNNs mapped to a commodity Android smartphone with a Snapdragon SoC show up to 2.37X speedup in application runtime, and up to 1.9X improvement in its energy dissipation compared to existing approaches.

出版日期2017-10

全文

访问全文

收藏分享被引(7) 浏览

更新时间：2021-01-17 18:41

Machine Intelligence on Resource-Constrained IoT Devices: The Case of Thread Granularity Optimization for CNN Inference

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友