摘要

This paper discusses performance optimization on the dynamical core of global numerical weather prediction model in Global/Regional Assimilation and Prediction System (GRAPES). GRAPES is a new generation of numerical weather prediction system developed and currently used by Chinese Meteorology Administration. The computational performance of the dynamical core in GRAPES relies on the efficient solution of three-dimensional Helmholtz equations, which lead to large-scale and sparse linear systems formulated by the discretization in space and time. We choose generalized conjugate residual (GCR) algorithm to solve the corresponding linear systems and further propose algorithm optimizations for large-scale parallelism in two aspects: (i) reduction of iteration number for solution and (ii) performance enhancement of each GCR iteration. The reduction of iteration number is achieved by advanced preconditioning techniques, combining block incomplete LU factorization-k preconditioner over 7-diagonals of the coefficient matrix with the restricted additive Schwarz method effectively . The improvement for GCR iteration is to reduce the global communication operations by refactoring the GCR algorithm, which decreases the communication overhead over large number of cores. Performance evaluation on the Tianhe-1A system shows that the new preconditioning techniques reduce almost one-third iterations for solving the linear systems, the proposed methods can obtain 25% performance improvement on average compared with the original version of Helmholtz solver in GRAPES, and the speedup with our algorithms can reach 10 using 2048 cores compared with 256 cores.