摘要

Complex diseases usually involve complex interactions between multiple loci. The artificial intelligent algorithm is a plausible strategy to evade combinatorial explosion. However, the randomness of solution of this algorithm loses decreases the confidence of biological researchers on this algorithm. Meanwhile, the lack of an efficient and effective measure to profile the distribution of cases and controls impedes the discovery of pathogenic epistasis. Here we present an efficient method called maximum dissimilarity-minimum entropy (MDME) to analyze breast cancer single-nucleotide polymorphism (SNP) data. The method searches risky barcodes, which to increase the odds ratio and relative risk of the breast cancer. This method based on the hypothesis that if a specific barcode is associated with a disease, then the barcode permits distinction of cases from controls and more importantly it shows a relative consistent pattern in cases. An analysis based on simulated dataset explains the necessity of minimum entropy. Experimental results show that our method can find the most risky barcode that contributes to breast cancer susceptibility. Our method may also mine several pathogenic barcodes that condition the different subtypes of cancer.