A comparison of tagging methods and their tagging space

作者:Ke XY*; Miretti MM; Broxholme J; Hunt S; Beck S; Bentley DR; Deloukas P; Cardon LR
来源:Human Molecular Genetics, 2005, 14(18): 2757-2767.
DOI:10.1093/hmg/ddi309

摘要

Single-nucleotide polymorphism (SNP) tagging is widely used as a way of saving genotyping costs in association studies. A number of different tagging methods have been developed to reduce the number of markers to be genotyped while maintaining power for detecting effects on non-assayed SNPs. How the different methods perform in different settings, the degree to which they overlap and share common tags and how they differ are important questions. We investigated these questions by comparing three widely used tagging methods/algorithms-one haplotype r(2)-based method, one pair-wise r(2)-based method and one method which was based on haplotype diversity but focused on major haplotypes. Tagging efficiency was defined as the number of genotyped markers divided by the number of tagging SNPs. Tagging effectiveness was defined as the proportion of un-genotyped or 'hidden' SNPs being detected (having a pair-wise or haplotype r(2) with a set of tagging SNPs over a threshold, e.g. haplotype r(2)>= 0.80). The ENCODE regions genotyped on the HapMap CEPH individuals were examined in this study. Tagging effectiveness was generally poor for rare SNPs than for common SNPs, for all three tagging methods. Inclusion of rare SNPs into initial HapMap scheme could enhance the performance of tags on rare hidden SNPs at the expense of increased genotyping cost. At a moderate tagging efficiency, more than 90% of hidden SNPs detected by tagging SNPs selected by one method were also detected by tagging SNPs selected by another method, and this figure could be increased to 100% if tagging efficiency was allowed to drop. These results indicate that the tagging space is highly concordant between different tagging methods, despite the fact that they often involve different sets of tagging SNPs.

  • 出版日期2005-9-15