A New Feature Selection Approach to Naive Bayes Text Classifiers

作者:Zhang, Lungan; Jiang, Liangxiao*; Li, Chaoqun
来源:International Journal of Pattern Recognition and Artificial Intelligence, 2016, 30(2): 1650003.
DOI:10.1142/S0218001416500038

摘要

Handling text data is a challenge for machine learning because text data is high dimensional in many cases. Feature selection has been approved to be an effective approach to handle high dimensional data. Feature selection approaches can be broadly divided into two categories: filter approaches and wrapper approaches. Generally, wrapper approaches have superior accuracy compared to filters, but filters always run faster than wrapper approaches. In order to integrate the advantages of filter approaches and wrapper approaches, we propose a gain ratio-based hybrid feature selection approach to naive Bayes text classifiers. The hybrid feature selection approach uses base classifiers to evaluate feature subsets like wrapper approaches, but it need not repeatedly search feature subsets and build base classifiers. The experimental results on large suite of benchmark text datasets show that the proposed hybrid feature selection approach significantly improves the classification accuracy of the original naive Bayes text classifiers while does not incur the high time complexity that characterizes wrapper approaches.