摘要

Background: Susceptible barcode recognition plays an important role in the diagnosis and treatment of complex diseases. Numerous approaches have been proposed to identify risky barcodes involved in the progress of complex diseases. However, some methods only consider differences in barcode frequencies between the control and disease groups; as such, these methods may be partial or even wrong. For example, some barcodes with a high risk ratio yield a low frequency on cases or exhibit a high frequency on controls, which may unreasonable from a statistical point. @@@ Results: In our study, a stricter criteria, maximum discrepancy and maximum constituency, is designed to evaluate each barcode and ant colony algorithm is used to search combination space of epistasis. For complex diseases with multi-subtypes, our method can list several potential barcodes contributing to different subtypes of complex diseases. Another contribution of this work is to introduce a method for determining the length of barcodes and excluding noisy barcodes whose frequencies are abnormal. In addition, common pathogenic genes shared by different risky barcodes are also recognized, which may provide key clue for further study, such as gene function analysis. @@@ Conclusions: Experimental results reveal that our method can find multiple risky barcodes whose risk ratio and odds ratio are >1. These barcodes could be related to different subtypes of complex diseases.