A New Wavelength Selection Algorithm Based on the Fusion of Multiple Models

作者:Hong Ming jian*; Wen Zhi yu
来源:Spectroscopy and Spectral Analysis, 2010, 30(8): 2088-2092.
DOI:10.3964/j.issn.1000-0593(2010)08-2088-05

摘要

NIR spectroscopy makes a feature of a large number of wavelengths with a much smaller set of samples. However, some of the wavelengths contribute no information to the modeling. Even worse, they may contain the irrelevant information such as noise and background, which may result in a complex model and/or bad predictive ability of the model. So, it's important to do research in-depth to eliminate these wavelengths and improve the quality of the final model. The present paper firstly summarizes the variable selection methods based on a single PLS regression model and concludes that (1) the cross-validation can be used to select optimal model with good predictive ability, but the resulting model may be not suitable for selecting variables; (2) selecting variables based on a single regression model is inaccurate and instable because a single vector of regression coefficients may not measure the importance of the variables correctly and may vary with models of different complexity. On basis of this analysis, this paper proposed a new method for variable selection based on the fusion of multiple PLS models. This method fuses the multiple PLS regression coefficients to form a vector, then a threshold is determined to eliminate the variables whose corresponding element in the vector is lower than this threshold. Finally, this method is verified by 3 well-known NIR datasets and compared with the UVE-PLS and GA-PLS algorithms. The experiments show that this method may result in a model with less complexity and/or better predictive ability. Moreover, the proposed method is elegant and efficient and therefore can be put in practical use.