摘要

How to quantify the phylogenetic information content of a data set is a longstanding question in phylogenetics, influencing both the assessment of data quality in completed studies and the planning of future phylogenetic projects. Recently, a method has been developed that profiles the phylogenetic informativeness (PI) of a data set through time by linking its site-specific rates of change to its power to resolve relationships at different timescales. Here, we evaluate the performance of this method in the case of 2 standard genetic markers for phylogenetic reconstruction, 28S ribosomal RNA and cytochrome oxidase subunit 1 (CO1) mitochondrial DNA, with maximum parsimony, maximum likelihood, and Bayesian analyses of relationships within a group of parasitoid wasps (Hymenoptera: Ichneumonidae, Diplazontinae). Retrieving PI profiles of the 2 genes from our own and from 3 additional data sets, we find that the method repeatedly overestimates the performance of the more quickly evolving CO1 compared with 28S. We explore possible reasons for this bias, including phylogenetic uncertainty, violation of the molecular clock assumption, model misspecification, and nonstationary nucleotide composition. As none of these provides a sufficient explanation of the observed discrepancy, we use simulated data sets, based on an idealized setting, to show that the optimum evolutionary rate decreases with increasing number of taxa. We suggest that this relationship could explain why the formula derived from the 4-taxon case overrates the performance of higher versus lower rates of evolution in our case and that caution should be taken when the method is applied to data sets including more than 4 taxa.

  • 出版日期2010-3