A pipeline for the systematic identification of non-redundant full-ORF cDNAs for polymorphic and evolutionary divergent genomes: Application to the ascidian Ciona intestinalis

作者:Gilchrist Michael J*; Sobral Daniel; Khoueiry Pierre; Daian Fabrice; Laporte Batiste; Patrushev Ilya; Matsumoto Jun; Dewar Ken; Hastings Kenneth E M; Satou Yutaka; Lemaire Patrick; Rothbaecher Ute
来源:Developmental Biology, 2015, 404(2): 149-163.
DOI:10.1016/j.ydbio.2015.05.014

摘要

Genome-wide resources, such as collections of cDNA clones encoding for complete proteins (full-ORF clones), are crucial tools for studying the evolution of gene function and genetic interactions. Non-model organisms, in particular marine organisms, provide a rich source of functional diversity. Marine organism genomes are, however, frequently highly polymorphic and encode proteins that diverge significantly from those of well-annotated model genomes. The construction of full-ORF clone collections from non-model organisms is hindered by the difficulty of predicting accurately the N-terminal ends of proteins, and distinguishing recent paralogs from highly polymorphic alleles. We report a computational strategy that overcomes these difficulties, and allows for accurate gene level clustering of transcript data followed by the automated identification of full-ORFs with correct 5'- and 3'-ends. It is robust to polymorphism, includes paralog calling and does not require evolutionary proximity to well annotated model organisms. We developed this pipeline for the ascidian Ciona intestinalis, a highly polymorphic member of the divergent sister group of the vertebrates, emerging as a powerful model organism to study chordate gene function, Gene Regulatory Networks and molecular mechanisms underlying human pathologies. Using this pipeline we have generated the first full-ORF collection for a highly polymorphic marine invertebrate. It contains 19,163 full-ORF cDNA clones covering 60% of Ciona coding genes, and full-ORF orthologs for approximately half of curated human disease-associated genes.

  • 出版日期2015-8-15
  • 单位McGill