摘要

Two main reasons for the difficulties to search for susceptibility single-nucleotide polymorphisms (SNPs) underlying genetic diseases are that the findings are not easy to be confirmed and the interactions between potential susceptibility SNPs are not clear. Many available association studies usually presented results with significance levels but did not illustrate the stability of the results. In some sense, their performances might be unclear in real practice. In this paper, we develop a novel method based on mutual information theory and linkage disequilibrium by grouping case-control. Mutual information (MI) is used to test multiple SNPs in combining with disease status. Those SNPs contributing the maximum MI are selected as potential susceptibility SNPs. Linkage disequilibrium (LD) analysis is used to extend MI detected result so that both direct and indirect factors can be included in the final result. The purpose of case-control grouping is to generate a number of data groups by randomly sampling from target samples. Each group is assumed to have almost the same number of individuals (cases and controls), and overlap is allowed among the groups. We apply the method to each data group, and then make comparisons and intersections between the results obtained from each of the groups so as to give the final result. We implement the method by continuously grouping until the final result reaches a stable state and a highly significance level. The experimental results indicate that our method to detect susceptibility SNPs in simulated and real data sets has shown remarkable success.

全文