An enhanced text categorization method based on improved text frequency   approach and mutual information algorithm

Pei Zhili; Shi Xiaohu; Marchese Maurizio; Liang Yanchun<sup>*</sup>

摘要

Text categorization plays an important role in data mining. Feature selection is the most important process of text categorization. Focused on feature selection, we present an improved text frequency method for filtering of low frequency features to deal with the data preprocessing, propose an improved mutual information algorithm for feature selection, and develop an improved tf. idf method for characteristic weights evaluation. The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness. Numerical results show that the precision, the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.

出版日期2007-12
单位吉林大学; 内蒙古民族大学

收藏分享被引浏览

更新时间：2018-08-02 11:46

An enhanced text categorization method based on improved text frequency approach and mutual information algorithm

摘要

产品服务

站内浏览

服务支持

联系方式

科研之友