An approach to hierarchical email categorization based on ME

作者:Li Peifeng*; Li Jinhui; Zhu Qiaoming
来源:12th International Conference on Applications of Natural Language to Information Systems, 2007-06-27 to 2007-06-29.

摘要

This paper proposes a hierarchical approach for categorizing emails with the ME (Maximum Entropy) model based on its contents and attributes. That approach categorizes emails in a two-phase way. First, it divides emails into two sets: legitimate set and Spam set; then it categorizes them in two different sets with different feature selection methods respectively. In addition, the pre-processing, the construction of features and the ME model suitable for the email categorization are also described in building the categorizer. Experimental results testify that our hierarchical approach is more efficient than existing approaches and the feature selection is an important factor that affects the precision of email categorization.