摘要

Feature selection has a direct impact on text categorization. Most existing algorithms are based on document level, and they haven't considered the influence of term frequency on text categorization. Based on these, we put forward a feature selection approach, FSATD, based on term distributions in the paper. In our proposed algorithm, three critical factors which are term frequency, the inter-class distribution and the intraclass distribution of the terms are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20NewsGroup and SougouCS corpus show that FSATD algorithm achieves better performance than DF and t-Test algorithms.