Hybridization modeling of oligonucleotide SNP arrays for accurate DNA copy number estimation

作者:Wan Lin; Sun Kelian; Ding Qi; Cui Yuehua; Li Ming; Wen Yalu; Elston Robert C; Qian Minping; Fu Wenjiang J*
来源:Nucleic Acids Research, 2009, 37(17): e117.
DOI:10.1093/nar/gkp559

摘要

Affymetrix SNP arrays have been widely used for single-nucleotide polymorphism (SNP) genotype calling and DNA copy number variation inference. Although numerous methods have achieved high accuracy in these fields, most studies have paid little attention to the modeling of hybridization of probes to off-target allele sequences, which can affect the accuracy greatly. In this study, we address this issue and demonstrate that hybridization with mismatch nucleotides (HWMMN) occurs in all SNP probe-sets and has a critical effect on the estimation of allelic concentrations (ACs). We study sequence binding through binding free energy and then binding affinity, and develop a probe intensity composite representation (PICR) model. The PICR model allows the estimation of ACs at a given SNP through statistical regression. Furthermore, we demonstrate with cell-line data of known true copy numbers that the PICR model can achieve reasonable accuracy in copy number estimation at a single SNP locus, by using the ratio of the estimated AC of each sample to that of the reference sample, and can reveal subtle genotype structure of SNPs at abnormal loci. We also demonstrate with HapMap data that the PICR model yields accurate SNP genotype calls consistently across samples, laboratories and even across array platforms.