摘要

Background: In complex diseases, alterations of multiple molecular and cellular components in response to perturbations are indicative of disease physiology. While expression level of genes from highthroughput analysis can vary among patients, the common path among disease progression suggests that the underlying cellular sub-processes involving associated genes follow similar fates. Motivated by the interconnected nature of sub-processes, we have developed an automated methodology that combines ideas from biological networks, statistical models, and game theory, to probe connected cellular processes. The core concept in our approach uses probability of change (POC) to indicate the probability that a gene's expression level has changed between two conditions. POC facilitates the definition of change at the neighborhood, pathway, and network levels and enables evaluation of the influence of diseases on the expression. The 'connected' disease-related genes (DRG) identified display coherent and concomitant differential expression levels along paths. Results: RNA-Seq and microarray breast cancer subtyping expression data sets were used to identify DRG between subtypes. A machine-learning algorithm was trained for subtype discrimination using the DRG, and the training yielded a set of biomarkers. The discriminative power of the biomarkers was tested using an unseen data set. Biomarkers identified overlaps with disease-specific identified genes, and we were able to classify disease subtypes with 100% and 80% agreement with PAM50, for microarray and RNA-Seq data set respectively. Conclusions: We present an automated probabilistic approach that offers unbiased and reproducible results, thus complementing existing methods in DRG and biomarker discovery for complex diseases.

  • 出版日期2015-12-1