Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems

Wang Zheng<sup>*</sup>; Grewe Dominik; O&#39; Boyle Michael F P

doi:10.1145/2677036

摘要

General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach to automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses automatic machine learning to build a predictive model to determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on themulticore host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on distinct GPU-based systems. We achieved average (up to) speedups of 4.51x and 4.20x (143x and 67x) on Core i7/NVIDIA GeForce GTX580 and Core i7/AMD Radeon 7970 platforms, respectively, over a sequential baseline. Our approach achieves, on average, greater than 10x speedups over two state-of-the-art automatic GPU code generators.

出版日期2014-12

全文

访问全文

收藏分享被引(30) 浏览

更新时间：2024-04-15 13:45

Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友