A NOVEL EMBEDDED FEATURE SELECTION METHOD: A COMPARATIVE STUDY IN THE APPLICATION OF TEXT CATEGORIZATION

作者:Imani Maryam Bahojb; Keyvanpour Mohammad Reza*; Azmi Reza
来源:Applied Artificial Intelligence, 2013, 27(5): 408-427.
DOI:10.1080/08839514.2013.774211

摘要

In text classification based on a vector space model, the high dimension of the feature may pose some problems. These problems occur not only for computational reasons, but also because of overfitting. Feature selection is an important preprocessing step used for text classification applications to reduce the vector space size, control the computational time, and maintain or improve performance. In this study, we used an embedded approach in feature selection in which the Chi-square (CHI) feature selector is a filter step. In this step, the less discriminative features are discarded. In the wrapper step, a novel algorithm is proposed based on the combination of the fast global search ability of the genetic algorithm (GA) and the positive feedback mechanism of ant colony optimization (ACO). In order to validate our approach, we carried out a series of experiments on Reuters-21578 corpus, and we compare the achieved results with some other well-known techniques. The evaluation results are such that our method obtained a better performance compared with the other methods in the majority of cases.

  • 出版日期2013-5-28

全文