A scalable parallelization of the gene duplication problem

作者:Wehe Andre; Chang Wen Chieh; Eulenstein Oliver; Aluru Srinivas*
来源:Journal of Parallel and Distributed Computing, 2010, 70(3): 237-244.
DOI:10.1016/j.jpdc.2009.09.010

摘要

Phylogenetics is a branch of Computational and evolutionary biology dealing with the inference of trees depicting evolutionary relationships among species and/or sequences. An important problem in phylogenetics is to find a species tree that is most parsimonious with a given set of gene trees, which are derived from sequencing multiple gene families from various subsets of species. The gene duplication problem is to compute a species tree that requires the minimum number of gene duplication events to reconciliate with the given set of gene trees. The best known heuristic algorithm for this NP-hard problem is a local optimization technique that runs in O(n(2) + kmn) time per search step, where k is the number of gene trees, n is the size of the species tree, and in is the maximum size of a gene tree. In this paper, we present a parallel algorithm for the gene duplication problem that runs in O (n(2)+kmn/p) time for up top = O(nk/log k) processors. Our algorithm exploits multiple levels of parallelism by parallelizing both the exploration of the search neighborhood and reconciliating of the gene trees with species trees in the neighborhood. Due to the wide variance in the sizes of the gene trees, it is difficult to completely characterize the behavior of the algorithm analytically. We present experimental results On the Blue Gene/L to study both levels of parallelism and how best they Should be combined to achieve overall minimum execution time. On a large problem that took about 62.5 h on a 3 GHz Pentium 4, our parallel algorithm ran in 7.7 min on a 1024 node Blue Gene/L.

  • 出版日期2010-3

全文