摘要

Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the- art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization.

  • 出版日期2017-2-28