摘要

Random forest (RF) and neural network have received significant interest for statistical data analysis as a result of their good predictive performance and attractive analytical properties. When developing a RF regression model for spectral analysis, some informative wavelengths are supposed to be selected so as to reduce dimension effectively and improve interpretability. Whereas a neural network has the merit of restoring high signals in data. A chemometric strategy was proposed in this paper, implemented through the combined use of the RF algorithm and back propagation (BP) network. The RF-selected informative wavelengths were further refined by a moderate 3-layer BP network, where the number of hidden nodes was tunable and finally determined by searching the minimum output error. The BP network was trained with the combined running of RF to generate a new comprehensive variable, so that a renewal informative-plus-net variable group could be produced. This renewed group of variables (or this selected group of variables) was used in a multiple linear regression model to predict the spectral analytical ability in quantitatively determining the content of the target analyte. The application case was based on the Fourier transform near infrared dataset of soil samples, aiming to chemometrically determine the content of the nutritional organic carbon. The prediction results indicated that the proposed strategy of combining RF and BP network can improve prediction accuracy and enhance model interpretability in comparison with the general RF method and the conventional benchmark partial least squares regression. The methodology presented here is of practical significance and has wide application in rapid nutrition determination in the development of precise agriculture.