摘要

Functional data becomes increasingly common in many fields of application. Although much research has been done on functional regression and clustering approaches for chemometric data, so far few classification methods exist. This paper introduces an ensemble method for classification that inherently provides automatic and interpretable feature selection. It is designed for single as well as multiple functional (and non-functional) covariates. The ensemble members are posterior probability estimates that are based on a k-nearest-neighbor approach. The ensemble allows for feature selection by including members that are calculated from various semi-metrics used in the k-nearest-neighbor approach, where a particular semi-metric represents a specific curve feature. Each ensemble member, and thus each curve feature, is weighted by an unknown coefficient. These coefficients are estimated using a proper scoring rule with implicit Lasso-type penalty, such that some coefficients can be estimated to be exactly zero. Thus, the ensemble automatically provides feature selection, and also, in the case of multiple functional (and non-functional) covariates, variable selection. The selection performance and the interpretability of the coefficients are investigated in simulation studies. Data of a cell chip used for water quality monitoring experiments is examined. Here, the relevance of especially the feature selection aspect of the ensemble is illustrated.

  • 出版日期2015-8-15