A tutorial on the Lasso approach to sparse modeling

作者:Rasmussen Morten Arendt*; Bro Rasmus
来源:Chemometrics and Intelligent Laboratory Systems, 2012, 119: 21-31.
DOI:10.1016/j.chemolab.2012.10.003

摘要

In applied research data are often collected from sources with a high dimensional multivariate output. Analysis of such data is composed of e.g. extraction and characterization of underlying patterns, and often with the aim of finding a small subset of significant variables or features. Variable and feature selection is well-established in the area of regression, whereas for other types of models this seems more difficult. Penalization of the L-1 norm provides an interesting avenue for such a problem, as it produces a sparse solution and hence embeds variable selection. In this paper a brief introduction to the mathematical properties of using the L-1 norm as a penalty is given. Examples of models extended with L-1 norm penalties/constraints are presented. The examples include PCA modeling with sparse loadings which enhance interpretability of single components. Sparse inverse covariance matrix estimation is used to unravel which variables are affecting each other, and a modified PCA to model data with (piecewise) constant responses in e.g. process monitoring is shown. All examples are demonstrated on real or synthetic data. The results indicate that sparse solutions, when appropriate, can enhance model interpretability.

  • 出版日期2012-10-1