摘要

In this article, we describe our work on graph mining as applied to the cellular signaling pathways in the Signal Transduction Knowledge Environment (STKE). We present new algorithms and a graphical tool that can help biologists discover relationships between pathways by looking at structural overlaps within the database. We address the problem of determining pathway relationships by using two data mining approaches: clustering and storytelling. In the first approach, our tool brings similar pathways to the same cluster, and in the second, our tool determines intermediate overlapping pathways that can lead biologists to new hypotheses and experiments regarding relationships between the pathways. We formulate the problem of discovering pathway relationships as a subgraph discovery problem and propose a new technique called Subgraph-Extension Generation (SEG), which outperforms the traditional Frequent Subgraph Discovery (FSG) approach by magnitudes. Our tool provides an interface to compare these two approaches with a variety of similarity measures and clustering techniques as well as in terms of computational performance measures such as runtime and memory consumption.

  • 出版日期2012-9
  • 单位Virginia Tech; 美国弗吉尼亚理工大学(Virginia Tech)