摘要

Pooled genomic DNA has been proposed as a cost-effective approach in genomewide association studies (GWAS). However, algorithms for genotype calling of biallelic SNP are not adequate with pooled DNA samples because they assume the presence of 2 fluorescent signals, 1 for each allele, and operate under the expectation that at most 2 copies of the variant allele can be found for any given SNP and DNA sample. We adapt analytical methodology from 2-channel gene expression microarray technology to SNP genotyping of pooled DNA samples. Using 5 datasets from beef cattle and broiler chicken of varying degrees of complexity in terms of design and phenotype, continuous and dichotomous, we show that both differential hybridization (M = green minus red intensity signal) and abundance (A = average of red and green intensities) provide useful information in the prediction of SNP allele frequencies. This is predominantly true when making inference about extreme SNP that are either nearly fixed or highly polymorphic. We propose the use of model-based clustering via mixtures of bivariate normal distributions as an optimal framework to capture the relationship between hybridization intensity and allele frequency from pooled DNA samples. The range of M and A values observed here are in agreement with those reported within the context of gene expression microarray and also with those from SNP array data within the context of analytical methodology for the identification of copy number variants. In particular, we confirm that highly polymorphic SNP yield a strong signal from both channels (red and green) while lowly or nonpolymorphic SNP yield a strong signal from 1 channel only. We further confirm that when the SNP allele frequencies are known, either because the individuals in the pools or from a closely related population are themselves genotyped, a multiple regression model with linear and quadratic components can be developed with high prediction accuracy. We conclude that when these approaches are applied to the estimation of allele frequencies, the resulting estimates allow for the development of cost-effective and reliable GWAS.

  • 出版日期2014-5
  • 单位CSIRO