摘要

It is common to have missing genotypes in practical genetic studies. The majority of the existing statistical methods, including those on haplotype analysis, assume that genotypes are missing at randomthat is, at a given marker, different genotypes and different alleles are missing with the same probabilities. In our previous work, we demonstrated that the violation of this assumption may lead to serious bias in haplotype frequency estimates and haplotype association analysis. We proposed a general missing data model to simultaneously characterize missing data patterns across a set of two or more biallelic markers. We proved that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under the general missing data model. In this study, we extend our work to multi-allelic markers and observe a similar finding. Simulation studies on the analysis of haplotypes consisting of two markers illustrate that our proposed model can reduce the bias for haplotype frequency estimates due to incorrect assumptions on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set from a study of scleroderma.

  • 出版日期2009