摘要

Feature selection methods play a significant role during classification of data having high dimensions of features. The methods select most relevant subset of features that describe data appropriately. Mutual information (MI) based upon information theory is one of the metrics used for measuring relevance of features. This paper analyses various feature selection methods for (1) reduction in number of features; (2) performance of Naive Bayes classification model trained on reduced set of features. Research gaps identified are: (1) computation of MI from the whole sample space instead of unclassified sample subspace; (2) consideration of relevance of features only or tradeoff between relevance and redundancy, but class conditional interaction of features is ignored. In this paper, we propose a general evaluation function using MI for feature selection. The proposed evaluation function is implemented which use dynamically computed MI values from unclassified instances. Effectiveness of the proposed feature selection method is done empirically by comparing classification results using KDD 1999 benchmarked dataset of intrusion detection. The results indicate practicability and effectiveness of the proposed method for applications concerned with high accuracy and stability of predictions.

  • 出版日期2012-2