A classification tool for N-way array based on SIMCA methodology

作者:Durante Caterina; Bro Rasmus; Cocchi Marina*
来源:Chemometrics and Intelligent Laboratory Systems, 2011, 106(1): 73-85.
DOI:10.1016/j.chemolab.2010.09.004

摘要

In the literature there are only few papers concerned with classification methods for multi-way arrays. The most common procedure, by far, is to unfold the multi-way data array into an ordinary matrix and then to apply the traditional multivariate tools for classification. As opposed to unfolding the data several possibilities exist for building classification models more directly based on the multi-way structure of the data. As an example, multi-way partial least squares discriminant analysis has been used as a supervised classification method, another alternative that has been investigated is to perform classification using Fisher's LDA or SIMCA on the score matrix from e.g. a PARAFAC or a Tucker model. Despite a few attempts of applying such multi-way classification approaches, no-one has looked into how such models are best built and implemented. In this work, the SIMCA method is extended to three-way arrays. Included in this work is also actual code that will work on general multi-way arrays rather than just three-way arrays. In analogy with two-way SIMCA. a decomposition model is separately built for the multi-way data for each class, using multi-way decomposition method such as PARAFAC or Tucker3. In the choice of the best class dimensionality, i.e. number of latent factors, both the results of cross-validation but mainly the sensitivity/specificity values are evaluated. In order to estimate the class limits for each class model, orthogonal and score distances are considered, and different statistics are implemented and tested to set confidence limits for these two parameters. Classification performance using different definitions of class boundaries and classification rules, including the use of cross-validated residuals and scores is compared. The proposed N-SIMCA methodology and code, besides simulated data sets of varying dimensionality, has been tested on two case studies, concerning food authentication tasks for typical food products.

  • 出版日期2011-3-15