摘要

Despite their remarkable performance in various machine intelligence tasks, the computational intensity of Convolutional Neural Networks (CNNs) has hindered their widespread utilization in resource-constrained embedded and IoT systems. To address this problem, we present a framework for synthesis of efficient CNN inference software targeting mobile SoC platforms. We argue that thread granularity can substantially impact the performance and energy dissipation of the synthesized inference software, and demonstrate that launching the maximum number of logical threads, often promoted as a guiding principle by GPGPU practitioners, does not result in an efficient implementation for mobile SoCs. We hypothesize that the runtime of a CNN layer on a particular SoC platform can be accurately estimated as a linear function of its computational complexity, which may seem counter-intuitive, as modern mobile SoCs utilize a plethora of heterogeneous architectural features and dynamic resource management policies. Consequently, we develop a principled approach and a data-driven analytical model to optimize granularity of threads during CNN software synthesis. Experimental results with several modern CNNs mapped to a commodity Android smartphone with a Snapdragon SoC show up to 2.37X speedup in application runtime, and up to 1.9X improvement in its energy dissipation compared to existing approaches.

  • 出版日期2017-10