摘要

A novel descriptor selection scheme for Support Vector Machine (SVM) classification method has been proposed and its utility demonstrated using a skin sensitisation dataset as an example. A backward elimination procedure, guided by mean accuracy (the average of specificity and sensitivity) of a leave-one-out cross validation, is devised for the SVM. Subsets of descriptors were first selected using a sequential t-test filter or a Random Forest filter, before backward elimination was applied. Different kernels for SVM were compared using this descriptor selection scheme. The Radial Basis RBF) kernel worked best when a sequential t-test filter was adopted. The highest mean accuracy, 84.9%, was obtained using SVM with 23 descriptors. The sensitivity and the specificity were as high as 93.1% and 76.6%, respectively. A linear kernel was found to be optimal when a Random Forest filter was used. The performance using 24 descriptors was comparable with a RBF kernel with a sequential t-test filter. As a comparison, Fisher's linear discriminant analysis (LDA) under the same descriptor selection scheme was carried out. SVM was shown to outperform the LDA.

  • 出版日期2007