摘要

Motivation: Population trees represent past population divergence histories. The inference of population trees can be useful for the study of population evolution. With the size of data increases in large-scale population genetic projects, such as the 1000 Genomes Project, there are new computational challenges for ancestral population inference, including population tree inference. Existing methods for population tree inference are mainly designed for unlinked genetic variants (e.g. single nucleotide polymorphisms or SNPs). There is a potential loss of information by not considering the haplotypes. Results: In this article, we propose a new population tree inference method (called STELLSH) based on coalescent likelihood. The likelihood is for haplotypes over multiple SNPs within a non-recombining region, not unlinked variants. Unlike many existing ancestral inference methods, STELLSH does not use Monte Carlo approaches when computing the likelihood. For efficient computation, the likelihood model is approximated but still retains much information about population divergence history. STELLSH can find the maximum likelihood population tree based on the approximate likelihood. We show through simulation data and the 1000 Genomes Project data that STELLSH gives reasonably accurate inference results. STELLSH is reasonably efficient for data of current interest and can scale to handle whole-genome data.

  • 出版日期2015-3-1