摘要

Many methods of cluster analysis do not explicitly account for correlation between attributes. In this paper we explicitly model any correlation using a single factor within each cluster; i.e., the correlation of attributes within each cluster is adequately described by a single component axis. However, the use of a factor is not required in every cluster. Using a Minimum Message Length criterion, we can determine the number of clusters and also whether the model of any cluster is improved by introducing a factor. The technique allows us to seek clusters which reflect directional changes rather than imposing a zonation constrained by spatial (and implicitly temporal) position. Minimal meassage length is a means of utilising Okham's Razor in inductive analysis. The 'best' model is that which allows most compression of the data, which results in a minimal message length for the description. Fit to the data is not a sufficient criterion for choosing models because more complicated models will almost alwoys firt better. Minimum message length combines fit to the data with an encoding of the model and provides a Bayesian probability criterion as a means of choosing between models (and classes of model). Applying the analysis to a pollen diagram form Southern Chile, we find that the introduction of factors does not improve the overall quality of the mixture model. The solution without axes in any cluster provides the most parsimonious solution. Examining the cluster with the best case for a factor to be incorporated in its description shows that the attributes highly loaded on the axis represent a contrast of herbaceous vegetation and dominant forests types. This contrast is also found when fitting the entire population, and in this case the factor solution is the preferred model. Overall, the cluster solution without factors is much preferred. Thus, in this case classification is preferred to ordination although more data are desirable to confirm such a conclusion.

  • 出版日期2010-6