摘要

Cancer is a heterogeneous disease, thus one of the central problems is how to dissect the resulting complex phenotypes in terms of their biological building blocks. Computationally, this is to represent and interpret high dimensional observations through a structural and conceptual abstraction into the most influential determinants underlying the problem. The working hypothesis of this report is to consider gene interaction to be largely responsible for the manifestation of complex cancer phenotypes, thus where the representation is to be conceptualized. Here, we report a representation learning strategy combined with regularizations, in which gene expressions are described in terms of a regularized product of meta-genes and their expression levels. The meta-genes are constrained by gene interactions thus representing their original topological contexts. The expression levels are supervised by their conditional dependencies among the observations thus providing a cluster-specific constraint. We obtain both of these structural constraints using a node-based graphical model. Our representation allows the selection of more influential modules, thus implicating their possible roles in neoplastic transformations. We validate our representation strategy by its robust recognitions of various cancer phenotypes comparing with various classical methods. The modules discovered are either shared or specify for different types or stages of human cancers, all of which are consistent with literature and biology.