摘要

We present a scheme for the parallelization of quantum Monte Carlo method on graphical processing units, focusing on variational Monte Carlo simulation of bosonic systems. We use asynchronous execution schemes with shared memory persistence, and obtain an excellent utilization of the accelerator. The CUDA code is provided along with a package that simulates liquid helium-4. The program was benchmarked on several models of Nvidia CPU, including Fermi GTX560 and M2090, and the Kepler architecture 1(20 GPU. Special optimization was developed for the Kepler cards, including placement of data structures in the register space of the Kepler GPUs. Kepler-specific optimization is discussed. Program Summary Program title: QL Catalogue identifier: AEUP_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEUP_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in,distributed program, including test data, etc.: 40170 No. of bytes in distributed program, including test data, etc.: 1223080 Distribution format: tar.gz Programming language: CUDA-C, C, Fortran. Computer: Any computer with a CUDA-enabled GPU. Operating system: Linux. RAM: Typical execution uses as much RAM as is available on the CPU; usually between 1 GB and 12 GB. Minimal requirement is I MB. Classification: 4.12, 7.7. Nature ofproblem: QL package executes variational Monte Carlo for liquid helium-4 with Aziz II interaction potential and a Jastrow pair product trial wavefunction. Sampling is performed with a Metropolis scheme applied to single-particle updates. With minimal changes, the package can be applied to other bosonic fluids, given a pairwise interaction potential and a wavefunction in the form of a product of one- and two-body correlation factors. Solution method: The program is parallelized for execution with Nvidia GPU. By design, the generation of new configurations is performed with shared memory persistence and the asynchronous execution allows for the CPU load masking. Restrictions: Code is limited to variational Monte Carlo. Due to the limitation of the shared memory of GPU, only systems under 2000 particles can be treated on the Fermi generation cards, and up to 10000 on Kepler cards. Running time: Because of the statistical nature of Monte Carlo calculations, computations may be chained indefinitely to improve statistical accuracy. As an example, using the QL package, the energy of a liquid helium system with 1952 atoms can be computed to within 1 mK per atom in less than 20 min. This corresponds to the relative error of 10(-4). It is unlikely that a higher accuracy may be needed.

  • 出版日期2015-2