摘要

Partial least squares (PLS) regression is a versatile modeling approach for high-dimensional data analysis. Recently, PLS-based variable selection has attracted great attention due to high-throughput data reduction and modeling interpretability. In this paper, a class of variable selection methods for PLS, which employs marginal screening approaches to select relevant variables, is proposed. The proposed methods select variables in two steps: first, a solution path of all predictors is generated by sorting the sequence of marginal correlations between each predictor and response, and second, variable selection is carried out by screening the solution path with PLS. We provide three marginal screening methods for PLS in this paper, namely, sure independence screening (SIS), profiled independence screening ( PIS), and high-dimensional ordinary least-squares projection (HOLP). The promising performance of our methods is illustrated via three near-infrared (NIR) spectral data sets. Compared with SIS and PIS, HOLP for PLS is more suitable for selecting important wavelengths and enhances the prediction accuracy in the NIR spectral data.