摘要

The amount of missing data in many contemporary phylogenetic analyses has substantially increased relative to previous norms, particularly in supermatrix studies that compile characters from multiple previous analyses. In such cases the missing data are non-randomly distributed and usually present in all partitions (i.e. groups of characters) sampled. Parametric methods often provide greater resolution and support than parsimony in such cases, yet this may be caused by extrapolation of branch lengths from one partition to another. In this study I use contrived and simulated examples to demonstrate that likelihood, even when applied to simple matrices with little or no homoplasy, homogeneous evolution across groups of characters, perfect model fit, and hundreds or thousands of variable characters, can provide strong support for incorrect topologies when the matrices have non-random distributions of missing data distributed across all partitions. I do so using a systematic exploration of alternative seven-taxon tree topologies and distributions of missing data in two partitions to demonstrate that these likelihood-based artefacts may occur frequently and are not shared by parsimony. I also demonstrate that Bayesian Markov chain Monte Carlo analysis is more robust to these artefacts than is likelihood.

  • 出版日期2012-4