摘要

Geneticists have, for years, understood the nature of genome-wide association studies using common genomic variants. Recently, however, focus has shifted to the analysis of rare variants. This presents potential problems for researchers, as rare variants do not always behave in the same way common variants do, sometimes rendering decades of solid intuition moot. In this paper, we present examples of the differences between common and rare variants. We show why one must be significantly more careful about the origin of rare variants, and how failing to do so can lead to highly inflated type I error. We then explain how to best avoid such concerns with careful understanding and study design. Additionally, we demonstrate that a seemingly low error rate in next-generation sequencing can dramatically impact the false-positive rate for rare variants. This is due to the fact that rare variants are, by definition, seen infrequently, making it hard to distinguish between errors and real variants. Compounding this problem is the fact that the proportion of errors is likely to get worse, not better, with increasing sample size. One cannot simply scale their way up in order to solve this problem. Understanding these potential pitfalls is a key step in successfully identifying true associations between rare variants and diseases.

  • 出版日期2015-3