A combination of fuzzy similarity measures and fuzzy entropy measures for supervised feature selection

作者:Lohrmann Christoph*; Luukka Pasi; Jablonska Sabuka Matylda; Kauranne Tuomo
来源:Expert Systems with Applications, 2018, 110: 216-236.
DOI:10.1016/j.eswa.2018.06.002

摘要

Large amounts of information and various features are in many machine learning applications available, or easily obtainable. However, their quality is potentially low and greater volumes of information are not always beneficial for machine learning, for instance, when not all available features in a data set are relevant for the classification task and for understanding the studied phenomenon. Feature selection aims at determining a subset of features that represents the data well, gives accurate classification results and reduces the impact of noise on the classification performance. In this paper, we propose a filter feature ranking method for feature selection based on fuzzy similarity and entropy measures (FSAE), which is an adaptation of the idea used for the wrapper function by Luukka (2011) and has an additional scaling factor. The scaling factor to the feature and class-specific entropy values that is implemented, accounts for the distance between the ideal vectors for each class. Moreover, a wrapper version of the FSAE with a similarity classifier is presented as well. The feature selection method is tested on five medical data sets: dermatology, chronic kidney disease, breast cancer, diabetic retinopathy and horse colic. The wrapper version of FSAE is compared to the wrapper introduced by Luukka (2011) and shows at least as accurate results with often considerably fewer features. In the comparison with ReliefF, Laplacian score, Fisher score and the filter version of Luukka (2011), the FSAE filter in general achieves competitive mean accuracies and results for one medical data set, the breast cancer Wisconsin data set, together with the Laplacian score in the best results over all possible feature removals.

  • 出版日期2018-11-15