摘要

Stencil computations form the heart of numerical simulations to solve Partial Differential Equations using Finite Difference, Finite Element, and Finite Volume methods. Geometric Multigrid is an optimal O(N), hierarchical tool employing stencil computations in its chief constituents, namely, smoothing, restriction, and interpolation. When Multigrid is parallelized over distributed-shared memory architectures, traditionally, the domain partitioning creates cubic partitions of the mesh to minimize overall communication. Thus, the orthodox approach considers only load-balancing and communication minimization for completely determining the domain partitioning. In this article, we show that these two factors are not sufficient to obtain optimal partitions for Parallel Geometric Multigrid. To this effect, we develop and validate a high level analytical model to show that close to 2-D partitions for Geometric Multigrid can give higher performance than the partitions returned by the MPI_Dims_create() function which minimizes the communication volume by default. We quantify sub-domain level cache-misses in Parallel Geometric Multigrid and obtain families of optimal domain partitions. We conclude that the sub-domain level cache-misses for the application-specific stencil computational kernel and communicated planes should be taken into account in addition to communication minimization/load-balance to obtain optimal partitions for Parallel Geometric Multigrid.

  • 出版日期2018-5-10