A new validity index of feature subset for evaluating the dimensionality reduction algorithms

Liu, Chuan; Wang, Wenyong<sup>*</sup>; Konan, Martin; Wang, Siyang; Huang, Lisheng; Tang, Yong; Zhang, Xiang

doi:10.1016/j.knosys.2017.01.017

摘要

A critical aspect of dimensionality reduction is to assess the quality of selected (or produced) feature subsets properly. Feature subset assessment in machine learning refers to split a given feature subset into a training set, which is used to estimate the parameters of a classification model, and a test set used to estimate the predictive performance of the model. Then, averaging the results of multiple splitting (i.e., Cross-Validation, CV) is commonly used to decrease the variance of the estimator. But in practice, CV scheme is very computationally expensive. In this paper, we propose a new statistics index method called LW-index for evaluation of feature subset and dimensionality reduction algorithms in general. The proposed method is a type of "classical statistics" approach that uses the feature subset to compute an empirical estimate of the quality of feature subset. A large number of performance comparisons with the machine learning approach conducted on fourteen benchmark collections show that the proposed LW index is highly correlated with the external indices (i.e., MacroF(1), MicroF(1)) of SVM and Centroid-Based Classifier (CBC) trained by five-fold CV scheme. Furthermore, the experimental results indicate that LW index has the same performance as the traditional CV scheme for evaluating the dimensionality reduction algorithms and it is more efficient than the traditional methodology. Therefore, one contribution of this paper is to present an alternative methodology, based on an internal index typically used in the unsupervised learning context, that is computationally cheaper than the traditional CV methodology. Another contribution is to propose a new internal index that behaves better than other similar indices widely used in clustering and shows high correlation with the results obtained by the traditional methodology.

出版日期2017-4-1
单位电子科技大学

全文

访问全文

收藏分享被引(9) 浏览

更新时间：2021-07-17 11:24

A new validity index of feature subset for evaluating the dimensionality reduction algorithms

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友