摘要

Phylogeny is both a fundamental tool in biology and a rich source of fascinating modeling and algorithmic problems. Today's wealth of sequenced genomes makes it increasingly important to understand evolutionary events such as duplications, losses, transpositions, inversions, lateral transfers, and domain shuffling. We focus on the gene duplication event, that constitutes a major force in the creation of genes with new function [Ohno 1970; Lynch and Force 2000] and, thereby also, of biodiversity. We introduce the probabilistic gene evolution model, which describes how a gene tree evolves within a given species tree with respect to speciation, gene duplication, and gene loss. The actual relation between gene tree and species tree is captured by a reconciliation, a concept which we generalize for more expressiveness. The model is a canonical generalization of the classical linear birth-death process, obtained by replacing the interval where the process takes place by a tree. For the gene evolution model, we derive efficient algorithms for some associated probability distributions: the probability of a reconciled tree, the probability of a gene tree, the maximum probability reconciliation, the posterior probability of a reconciliation, and sampling reconciliations with respect to the posterior probability. These algorithms provides the basis for several applications, including species tree construction, reconciliation analysis, orthology analysis, biogeography, and host-parasite co-evolution.

  • 出版日期2009-4