摘要

The neutral theory of molecular evolution (Kimura 1985) is the basis for most current statistical tests for detecting selection, mainly using polymorphism data within species, divergence data between species, and/or genomic structures like linkage disequilibrium (Wang et al. 2006). In most cases informative tests can only be constructed with ample variations within these parameters and many common tests are difficult to formulate when identity-by-descent is not clear, for example in gene families or repetitive elements. With the current progress being made toward whole-genome sequencing and re-sequencing efforts, as well as protein sequencing via tandem mass spectrometry where genomic sequencing is lacking, we felt it was necessary to re-visit possible methods for rapid screening and detection of evolutionary outliers. These outliers might be of interest for other research, such as candidate gene association studies or genome annotations, drug- and disease-target searches, and functional studies. We focused on methods that would work on both protein and nucleotide data, could be used on large gene or protein domain families, and could be generated quickly in order for "first pass" annotation of large scale data. For these reasons, we chose properties of trees generated routinely in molecular phylogenetic studies; genetic distance, tree shape and balance, and internal node statistics (Heard 1992). Our current research looking at protein domain family data and phylogenetic trees from PFAM (Finn et al. 2008) suggests this approach towards detecting evolutionary outliers is feasible, but additional work will be necessary to determine the parameters that suggest either positive or negative selection is occurring in specific gene families. This is particularly true when other factors such as rapid duplication and deletion of genes containing these domains is taking place, and we suggest phylogenetic statistics may be useful in combination with existing methodologies for detailed studies of gene family data.

  • 出版日期2011-5

全文