摘要

Internal qualities of navel oranges are the key factors for their market value and of major concern to customers. Unlike traditional subjective quality assessment, near infrared (NIR) spectroscopy based techniques are quantitative, convenient and non-destructive. Various machine learning methods have been applied to NIR spectra analysis to determine the fruit qualities. NIR spectra are usually of very high dimension. Explicit or implicit variable selection is essential to ensure prediction performance. Least angle regression (LAR) is a relatively new and efficient machine learning algorithm for regression analysis and is good for variable selection. We investigate the potential of the LAR algorithm for NIR spectra analysis to determine the internal qualities of navel oranges. A total of 1535 navel orange samples from 15 origins were prepared for NIR spectra collection and quality parameters measurement. Spectra are of 1500 dimensions with wavelengths ranging from 1000 nm to 2499 nm. The LAR was compared with the most widely used linear and nonlinear methods in three aspects: prediction accuracy, computational efficiency, and model interpretability. The results showed that the prediction performance of LAR was better than that of PLS, while slightly inferior to that of least squares support vector machines (LS-SVM). LAR was computationally more efficient than both PLS and LS-SVM. By concentrating on the most important predictors, LAR is much easier to reveal the most relevant predictors than PLS; LS-SVM was hardly interpretable because of its nonlinear kernel.