摘要

The main objective is to propose a text classification based on the features selection and preprocessing thereby reducing the dimensionality of the Feature vector and increase the classificationaccuracy. Text classification is the process of assigning a document to one or more target categories, based on its contents. In the proposed method, machine learning methods for text classification is used to apply some text preprocessing methods in different dataset, and then to extract feature vectors for each new document by using various feature weighting methods for enhancing the text classification accuracy. Further training the classifier by Naive Bayesian (NB) and K-nearest neighbor (KNN) algorithms, the predication can be made according to the category distribution among this k nearest neighbors.Experimental results show that the methods are favorable in terms of their effectiveness and efficiencywhen compared with other classifier such as SVM.

  • 出版日期2012

全文