A resource of single-nucleotide polymorphisms for rainbow trout generated by restriction-site associated DNA sequencing of doubled haploids

作者:Palti Yniv*; Gao Guangtu; Miller Michael R; Vallejo Roger L; Wheeler Paul A; Quillet Edwige; Yao Jianbo; Thorgaard Gary H; Salem Mohamed; Rexroad Caird E III
来源:Molecular Ecology Resources, 2014, 14(3): 588-596.
DOI:10.1111/1755-0998.12204

摘要

Salmonid genomes are considered to be in a pseudo-tetraploid state as a result of a genome duplication event that occurred between 25 and 100Ma. This situation complicates single-nucleotide polymorphism (SNP) discovery in rainbow trout as many putative SNPs are actually paralogous sequence variants (PSVs) and not simple allelic variants. To differentiate PSVs from simple allelic variants, we used 19 homozygous doubled haploid (DH) lines that represent a wide geographical range of rainbow trout populations. In the first phase of the study, we analysed SbfI restriction-site associated DNA (RAD) sequence data from all the 19 lines and selected 11 lines for an extended SNP discovery. In the second phase, we conducted the extended SNP discovery using PstI RAD sequence data from the selected 11 lines. The complete data set is composed of 145168 high-quality putative SNPs that were genotyped in at least nine of the 11 lines, of which 71446 (49%) had minor allele frequencies (MAF) of at least 18% (i.e. at least two of the 11 lines). Approximately 14% of the RAD SNPs in this data set are from expressed or coding rainbow trout sequences. Our comparison of the current data set with previous SNP discovery data sets revealed that 99% of our SNPs are novel. In the support files for this resource, we provide annotation to the positions of the SNPs in the working draft of the rainbow trout reference genome, provide the genotypes of each sample in the discovery panel and identify SNPs that are likely to be in coding sequences.

  • 出版日期2014-5