
A sufficiently optimized matrix multiplication on embedded systems can facilitate data processing in high performance mobile measuring equipment since plenty of the kernel mathematical algorithms are based on matrix multiplication. In this paper, we propose a matrix multiplication specially optimized for ARMv7 architecture. The performance-critical differences between ARMv7 and conventional desktop/server architecture are considered to block the simple implementation. The Advanced-SIMD (Single Instruction Multiple Data) engine NEON is additionally exploited to increase the arithmetic computing performance and decrease the memory access latency. Experimental results demonstrate that the proposed scheme is 7-20 times faster than the simple implementation and superior to popular algorithm and open source libraries.

  • 出版日期2012
