Weight randomization test for the selection of the number of components in PLS models

作者:Thanh Tran*; Szymanska Ewa; Gerretzen Jan; Buydens Lutgarde; Afanador Nelson Lee; Blanchet Lionel
来源:Journal of Chemometrics, 2017, 31(5): e2887.
DOI:10.1002/cem.2887

摘要

The selection of the optimal number of components remains a difficult but essential task in partial least squares (PLS). Randomization tests have the advantage of being automatic and they make use of the entire dataset, in contrary with the widely used cross-validation approaches. Partial least squares modeling may include component(s) with a large amount of irrelevant data variation, and this might affect the model, depending on the assigned y-loading (which is the regression coefficient in the latent domain). This has recently been indicated by us in the basic sequence framework with respect to the underlying theory of the PLS algorithm and presented to the chemometrics society. We will show in this work that this irrelevant data variation is the root cause of the difficulty in current methods for selecting the optimal number of components. For randomization tests, PLS models with nonsignificant components may result in false positive tests because of the incorrect assumption that "the components enter the model in a natural order". In this work, we introduce a new randomization test, weight randomization test, selection of the optimal number of components in PLS in light of the underlying theory of the PLS algorithm. In the proposed method the null distribution is well characterized and efficiently determined taking into account a newly defined model quality metric: the number of consecutive non-significant components (CNC). We illustrate the effectiveness of weight randomization test in optimization of preprocessing as well as in classification models, where results are compared with the double cross-validation procedure for the latter. This is an important step towards the full automation of PLS model development and routine updates.

  • 出版日期2017-5