摘要

In the current work, we attempt to leverage the fewer wavelengths and samples to develop a classification model for classifying hard and soft blueberries using near infrared (NIR) data. To do this, random frog selection and active learning approaches are used in the spectral space and the sample queue, respectively. To reduce the spectral number, a random frog spectral selection approach was applied to collect wavelengths informative with hardness. Prediction model based on 22 selected spectra gave slightly better results than that based on the full spectra. In terms of the selection operation in the sample space, the query by committee was validated to be suitable for blueberry hardness classification with the accuracy, precision and recall of 78%, 74% and 98% when taking only 25 sample queries. Its standard deviation curves of performance metrics are also located in regions of low values (around 0.05) and fluctuated steadily in shape, winning over those of the other 4 active learning strategies and random method. In summary, the respective uses of random frog and query by committee in the NIR spectral vector and the sample queue showed the considerable potential for establishing a simple but robust classifier for hard and soft blueberries with very low labeling cost.