A conformal Bayesian network for classification of Mycobacterium tuberculosis complex lineages

作者:Aminian Minoo*; Shabbeer Amina; Bennett Kristin P
来源:BMC Bioinformatics, 2010, 11(S3): S4.
DOI:10.1186/1471-2105-11-S3-S4

摘要

Background: We present a novel conformal Bayesian network (CBN) to classify strains of Mycobacterium tuberculosis Complex (MTBC) into six major genetic lineages based on two high-throuput biomarkers: mycobacterial interspersed repetitive units (MIRU) and spacer oligonucleotide typing (spoligotyping). MTBC is the causative agent of tuberculosis (TB), which remains one of the leading causes of disease and morbidity world-wide. DNA fingerprinting methods such as MIRU and spoligotyping are key components in the control and tracking of modern TB. Results: CBN is designed to exploit background knowledge about MTBC biomarkers. It can be trained on large historical TB databases of various subsets of MTBC biomarkers. During TB control efforts not all biomarkers may be available. So, CBN is designed to predict the major lineage of isolates genotyped by any combination of the PCR-based typing methods: spoligotyping and MIRU typing. CBN achieves high accuracy on three large MTBC collections consisting of over 34,737 isolates genotyped by different combinations of spoligotypes, 12 loci of MIRU, and 24 loci of MIRU. CBN captures distinct MIRU and spoligotype signatures associated with each lineage, explaining its excellent performance. Visualization of MIRU and spoligotype signatures yields insight into both how the model works and the genetic diversity of MTBC. Conclusions: CBN conforms to the available PCR-based biological markers and achieves high performance in identifying major lineages of MTBC. The method can be readily extended as new biomarkers are introduced for TB tracking and control. An online tool (http://www.cs.rpi.edu/similar to bennek/tbinsight/tblineage) makes the CBN model available for TB control and research efforts.

  • 出版日期2010