摘要

Custom sequence capture experiments are becoming an efficient approach for gathering large sets of orthologous markers in nonmodel organisms. Transcriptome-based exon capture utilizes transcript sequences to design capture probes, typically using a reference genome to identify intron-exon boundaries to exclude shorter exons (<200bp). Here, we test directly using transcript sequences for probe design, which are often composed of multiple exons of varying lengths. Using 1260 orthologous transcripts, we conducted sequence captures across multiple phylogenetic scales for frogs, including outgroups similar to 100Myr divergent from the ingroup. We recovered a large phylogenomic data set consisting of sequence alignments for 1047 of the 1260 transcriptome-based loci (similar to 561000bp) and a large quantity of highly variable regions flanking the exons in transcripts (similar to 70000bp), the latter improving substantially by only including ingroup species (similar to 797000bp). We recovered both shorter (<100bp) and longer exons (>200bp), with no major reduction in coverage towards the ends of exons. We observed significant differences in the performance of blocking oligos for target enrichment and nontarget depletion during captures, and differences in PCR duplication rates resulting from the number of individuals pooled for capture reactions. We explicitly tested the effects of phylogenetic distance on capture sensitivity, specificity, and missing data, and provide a baseline estimate of expectations for these metrics based on a priori knowledge of nuclear pairwise differences among samples. We provide recommendations for transcriptome-based exon capture design based on our results, cost estimates and offer multiple pipelines for data assembly and analysis.

  • 出版日期2016-9