Accurate annotation of protein-coding genes in mitochondrial genomes

作者:Al Arab Marwa; zu Siederdissen Christian Hoener; Tout Kifah; Sahyoun Abdullah H; Stadler Peter F; Bernt Matthias*
来源:Molecular Phylogenetics and Evolution, 2017, 106: 209-216.
DOI:10.1016/j.ympev.2016.09.024

摘要

Mitochondrial genome sequences are available in large number and new sequences become published nowadays with increasing pace. Fast, automatic, consistent, and high quality annotations are a prerequisite for downstream analyses. Therefore, we present an automated pipeline for fast de novo annotation of mitochondrial protein-coding genes. The annotation is based on enhanced phylogeny-aware hidden Markov models (HMMs). The pipeline builds taxon-specific enhanced multiple sequence alignments (MSA) of already annotated sequences and corresponding HMMs using an approximation of the phylogeny. The MSAs are enhanced by fixing unannotated frameshifts, purging of wrong sequences, and removal of non-conserved columns from both ends. A comparison with reference annotations highlights the high quality of the results. The frameshift correction method predicts a large number of frameshifts, many of which are unknown. A detailed analysis of the frameshifts in nad3 of the Archosauria-Testudines group has been conducted.

  • 出版日期2017-1
  • 单位上海生物信息技术研究中心