摘要

In many biological applications, for example high-dimensional metabolic data, the measurements consist of several continuous measurements of subjects or tissues over multiple attributes or metabolites. Measurement values are put in a matrix with subjects in rows and attributes in columns. The analysis of such data requires grouping subjects and attributes to provide a primitive guide toward data modeling. A common approach is to group subjects and attributes separately, and construct a two-dimensional dendrogram tree, once on rows and then on columns. This simple approach provides a grouping visualization through two separate trees, which is difficult to interpret jointly. When a joint grouping of rows and columns is of interest, it is more natural to partition the data matrix directly. Our suggestion is to build a dendrogram on the matrix directly, thus generalizing the two-dimensional dendrogram tree to a three-dimensional forest. The contribution of this research to the statistical analysis of metabolic data is threefold. First, a novel spike-and-slab model in various hierarchies is proposed to identify discriminant rows and columns. Second, an agglomerative approach is suggested to organize joint clusters. Third, a new visualization tool is invented to demonstrate the collection of joint clusters. The new method is motivated over gas chromatography mass spectrometry (GCMS) metabolic data, but can be applied to other continuous measurements with spike at zero property.

  • 出版日期2016-3
  • 单位MIT