摘要

We present a set of C++ classes which allow one to use the graphics card processor%26apos;s cores for quantum ab initio simulations, i.e. a direct solving of the time-dependent Schrodinger equation, gaining the benefits from the parallel architecture of the graphical processor units. We use the Chebyshev polynomial and FFT algorithm. The solution is based on NVIDIA CUDA technology. The speed-up factor in the test runs of our classes performed using the graphics card processor can even be of order of 300 in comparison with the test runs using only the single core of CPU. Not only the Schrodinger equation can be integrated using the presented solver. With only small changes it can be used for solving the nonlinear Gross-Pitaevskii equation of BEC%26apos;s dynamics, the heat equation, the diffusion equation or other parabolic partial differential equations of second order.(1) %26lt;br%26gt;Program summary %26lt;br%26gt;Program title: QnDynCUDA %26lt;br%26gt;Catalogue identifier: AELE_v1_0 %26lt;br%26gt;Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AELE_v1_0.html %26lt;br%26gt;Program obtainable from: CPC Program Library, Queen%26apos;s University, Belfast, N. Ireland %26lt;br%26gt;Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html %26lt;br%26gt;No. of lines in distributed program, including test data, etc.: 101 359 %26lt;br%26gt;No. of bytes in distributed program, including test data, etc.: 3 165 228 %26lt;br%26gt;Distribution format: tar.gz %26lt;br%26gt;Programming language: C++. C for CUDA %26lt;br%26gt;Computer: Graphics card with CUDA technology recommended %26lt;br%26gt;Operating system: No limits (tested on 32-bit and 64-bit Windows and 64-bit Linux) %26lt;br%26gt;Has the code been vectorized or parallelized?: Yes, number of processors used - one CPU core and all CUDA cores of the selected processor of graphics card %26lt;br%26gt;RAM: Dependent on user%26apos;s parameters, typically between several tens of megabytes and several gigabytes (this concerns also the graphics card%26apos;s memory) %26lt;br%26gt;Supplementary material: Test input and output files (approx. 3.4 Gigabytes) are available %26lt;br%26gt;Classification: 2.7, 6.5 %26lt;br%26gt;Nature of problem: Solving the time-dependent Schrodinger equation. %26lt;br%26gt;Solution method: FFT and Chebyshev polynomial algorithin, CUDA technology. %26lt;br%26gt;Running time: Every test example included in the distribution package takes approximately an hour or so if the CPU is engaged and a day or so if only CPU is used.

  • 出版日期2012-3