An optimized matrix multiplication on ARMv7 architecture

Feng Kun<sup>*</sup>; Xu Cheng; Wang Wei; Yang Zhibang; Tian Zheng

doi:10.4156/jcit.vol7.issue10.5

摘要

A sufficiently optimized matrix multiplication on embedded systems can facilitate data processing in high performance mobile measuring equipment since plenty of the kernel mathematical algorithms are based on matrix multiplication. In this paper, we propose a matrix multiplication specially optimized for ARMv7 architecture. The performance-critical differences between ARMv7 and conventional desktop/server architecture are considered to block the simple implementation. The Advanced-SIMD (Single Instruction Multiple Data) engine NEON is additionally exploited to increase the arithmetic computing performance and decrease the memory access latency. Experimental results demonstrate that the proposed scheme is 7-20 times faster than the simple implementation and superior to popular algorithm and open source libraries.

出版日期2012

全文

访问全文

收藏分享被引浏览

更新时间：2018-08-03 07:51

An optimized matrix multiplication on ARMv7 architecture

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友