Multi-core optimization for conjugate gradient benchmark on heterogeneous processors

Deng Lin<sup>*</sup>; Dou Yong

doi:10.1007/s11771-011-0722-6

摘要

Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall', due to limited capacity of local storage, limited bandwidth and long latency for memory access. Aiming at this problem, a parallelization approach was proposed with six memory optimization schemes for CG, four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20, the parallelization approach can reach up to 21 and 133 times speedups with size A and B, respectively, compared with single power processor element. Finally, the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV, simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.

出版日期2011-4
单位中国人民解放军国防科学技术大学

全文

访问全文

收藏分享被引浏览

更新时间：2019-08-15 06:44

Multi-core optimization for conjugate gradient benchmark on heterogeneous processors

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友