Automatic Arabic text categorization: A comprehensive comparative study

Hmeidi Ismail<sup>*</sup>; Al Ayyoub Mahmoud; Abdulla Nawaf A; Almodawar Abdalrahman A; Abooraig Raddad; Mahyoub Nizar A

doi:10.1177/0165551514558172

摘要

Text categorization or classification (TC) is concerned with placing text documents in their proper category according to their contents. Owing to the various applications of TC and the large volume of text documents uploaded on the Internet daily, the need for such an automated method stems from the difficulty and tedium of performing such a process manually. The usefulness of TC is manifested in different fields and needs. For instance, the ability to automatically classify an article or an email into its right class (Arts, Economics, Politics, Sports, etc.) would be appreciated by individual users as well as companies. This paper is concerned with TC of Arabic articles. It contains a comparison of the five best known algorithms for TC. It also studies the effects of utilizing different Arabic stemmers (light and root-based stemmers) on the effectiveness of these classifiers. Furthermore, a comparison between different data mining software tools (Weka and RapidMiner) is presented. The results illustrate the good accuracy provided by the SVM classifier, especially when used with the light10 stemmer. This outcome can be used in future as a baseline to compare with other unexplored classifiers and Arabic stemmers.

出版日期2015-2

全文

访问全文

收藏分享被引(30) 浏览

更新时间：2022-01-10 04:27

Automatic Arabic text categorization: A comprehensive comparative study

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友