摘要

The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computing on CUDA-enabled GPUs. The approximate string matching (ASM) for two strings X and Y of length m and n is a task to find a substring of Y most similar to X. The main contribution of this paper is to show an optimal parallel algorithm for the approximate string matching on the HMM and implement it on GeForce GTX 580 GPU. Our algorithm runs in O(n/w + mn/dw + nL/p + mnl/p) time units on the HMM with p threads, d streaming processors, memory band width w, global memory access latency L, and shared memory access latency l. We also show that the lower bound of the computing time is Omega(n/w + mn/dw + nL/p + mnl/p) time units. Thus, our algorithm for the approximate string matching is time optimal. Further, we implemented our algorithm on GeForce GTX 580 GPU and evaluated the performance. The experimental results show that the ASM of two strings of 1024 and 4M(= 2(22)) characters can be done in 419.6ms, while the sequential algorithm can compute it in 27720ms. Thus, our implementation on the GPU attains a speedup factor of 66.1 over the single CPU implementation.

  • 出版日期2014-12