摘要

The statistical analysis of tree structured data is a new topic in statistics with wide application areas. Some Principal Component Analysis (PCA) ideas have been previously developed for binary tree spaces. These ideas are extended to the more general space of rooted and ordered trees. Concepts such as tree-line and forward principal component tree-line are redefined for this more general space, and the optimal algorithm that finds them is generalized. An analog of the classical dimension reduction technique in PCA for tree spaces is developed. To do this, backward principal components, the components that carry the least amount of information on tree data set, are defined. An optimal algorithm to find them is presented. Furthermore, the relationship of these to the forward principal components is investigated, and a path-independence property between the forward and backward techniques is proven. These methods are applied to a brain artery data set of 98 subjects. Using these techniques, the effects of aging to the brain artery structure of males and females is investigated. A second data set of the organization structure of a large US company is also analyzed and the structural differences across different types of departments within the company are explored.

  • 出版日期2014-6