摘要

Discoveries and analyses of genetic variants at a gene or exome based on high-throughput sequencing technology are increasingly feasible. Although many well-known association tests have already been proposed in literature for testing whether a group of variants in a target region is associated with a disease of interest, however, the analytic challenges still remain profound. The power performance of these tests generally depends on the sample size, numbers of causal and neutral variants, variant frequency, effect size, and direction. Some of these factors are not easily controllable in practical applications. Further complications arise from missing genotype, population stratification or misspecification of the working model. Previous studies showed that many model-based tests might create false positive results or decrease power when there was population stratification effect or missing genotype and simple imputation was used. Here, we demonstrate by simulations that type I errors of the well-known model-based tests are often inflated as well, even the working model deviates slightly from the true model. We propose a model-free test and show this test to be almost uniformly most powerful among the competing tests under very general simulation conditions with covariates. This test does not require genotype data to be complete and hence difficult imputation can be avoided. We also discuss how to adjust for the effect of population stratification based on principal components, and use a Shanghai Breast Cancer Study to demonstrate application of the new test.

全文