An active learning representative subset selection method using net analyte signal

作者:He, Zhonghai*; Ma, Zhenhe; Luan, Jingmin; Cai, Xi
来源:Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy , 2018, 196: 311-316.
DOI:10.1016/j.saa.2018.02.038

摘要

To guarantee accurate predictions, representative samples are needed when building a calibration model for spectroscopic measurements. However, in general, it is not known whether a sample is representative prior to measuring its concentration, which is both time-consuming and expensive. In this paper, a method to determine whether a sample should be selected into a calibration set is presented. The selection is based on the difference of Euclidean norm of net analyte signal (NAS) vector between the candidate and existing samples. First, the concentrations and spectra of a group of samples are used to compute the projection matrix, NAS vector, and scalar values. Next, the NAS vectors of candidate samples are computed by multiplying projection matrix with spectra of samples. Scalar value of NAS is obtained by norm computation. The distance between the candidate set and the selected set is computed, and samples with the largest distance are added to selected set sequentially. Last, the concentration of the analyte is measured such that the sample can be used as a calibration sample. Using a validation test, it is shown that the presented method is more efficient than random selection. As a result, the amount of time and money spent on reference measurements is greatly reduced.