摘要

Contrived and simulated examples were used to quantify the range of conditions in which maximum likelihood and Bayesian MCMC methods are biased in favor of phylogenetic signal present in globally sampled characters over that present in conflicting locally sampled characters (those with missing data). The bias occurs in both the optimal tree identified as well as branch supports even when there are more locally sampled characters supporting the conflicting topology. The bias can lead to high bootstrap, SH-like aLRT support (up to 100%), and posterior probabilities for the conflicting clades. The bias can occur even when only a single terminal has missing data. The bias is not limited to likelihood methods that only ever present a single optimal tree that is fully resolved (as in PhyML and RAxML)-it can also occur in branch-and-bound PAUP* searches. The bias persists despite sampling numerous characters, and the bias is consistently unidirectional. The bias may occur in the context of incongruence between gene trees as well as within a single gene wherein terminals have different sequence lengths caused by DNA-amplification differences or gaps caused by indels. This bias is another example wherein commonly implemented parametric phylogenetic methods interpret ambiguity as support. In contrast, parsimony is robust to the bias.

  • 出版日期2014-11