摘要

We develop a genetic algorithm based autotuning strategy in this paper. Autotuning is a platform independent code optimization process in which different hardware and software parameters of the code being optimized are identified and the parameter space explored to arrive at an alternative implementation that optimizes characteristics such as performance and energy consumption. The main advantage of our approach is that the number of possible compilations and executions that are explored in the configuration space is substantially smaller than exhaustive search. We demonstrate the usefulness of our approach to the underlying small matrix multiplication routines in spectral element solvers. The latter are an important class of higher order methods that are expected to be computationally intensive portion of the next generation of large scale CFD simulations. Our experimental results were conducted on a variety of existing platforms as well as on gem5 simulator platform with different cache configurations. On an existing platform, AMD Fusion, the genetic algorithm is able obtain 34% improvement in performance and 37% reduction in energy consumption over existing versions of the code. The fact that a very small fraction of the entire configuration space needs to be explored becomes very useful as algorithmic exploration is combined with exploration of cache configuration resulting in hardware/software co-optimization. We used the micro-architectural simulator, gem5, to evaluate different cache configurations for energy and performance trade-offs for out-of-order x86 cores at the micro-architectural level for small matrix multiplications. Our results show how genetic algorithm based autotuning strategy can come up with a close to optimal variant analyzing only about 0.25% of the exploration space.

  • 出版日期2016-6