摘要

Background: Despite the thrilling advances in identifying gene variants that influence common diseases, most of the heritable risk for many common diseases still remains unidentified. One of the possible reasons for this missing heritability is that the genome-wide association study (GWAS) approaches have been focusing on common rather than rare single nucleotide variants (SNVs). Consequently, there is currently a great deal of interest in developing methods that can interrogate rare variants for association with diseases. Methods: We propose a two-step method (termed rPLS) to reveal possible genetic effects related to rare as well as common variants. The procedure starts with removing irrelevant variants using penalized regression (regularization) which is followed by partial least squares (PLS) on the surviving SNVs to find an optimal linear combination of rare and common SNVs within a genomic region that is tested for its association with the trait of interest. Results: Simulation settings based on the 1000 Genomes sequencing data and reflecting real situations demonstrated that rPLS performs well compared to existing methods especially when there are a large number of noncausal variants (both rare and common) present in the gene and when causal SNVs have different effect sizes and directions.

  • 出版日期2012