摘要

This article presents a systematic quantitative performance study for large finite element computations on extreme scale computing systems. Three parallel iterative solvers for the Stokes system, discretized by low order tetrahedral elements, are compared with respect to their numerical efficiency and their scalability running on up to 786432 parallel threads. An all-at-once multigrid method for the saddle point system using an Uzawa-type smoother provides the best overall performance with respect to memory consumption and time-to-solution. The largest system solved on a Blue Gene/Q system has more than ten trillion (1.1 x 10(13)) unknowns and requires about 13 min compute time. Despite the matrix free and highly optimized implementation, the memory requirement for the solution vector and the auxiliary vectors is about 200 TByte. A generalization of Brandt's notion of "textbook multigrid efficiency" is employed to study the algorithmic performance of the all-at-once multigrid solver at the extreme scale. The flexibility of the method is demonstrated for simulating incompressible fluid flow in a pipe filled with spherical obstacles.

  • 出版日期2016-11