A novel framework for termset selection and weighting in binary text classification

Badawi Dima; Altincay Hakan<sup>*</sup>

doi:10.1016/j.engappai.2014.06.012

摘要

This study presents a new framework for termset selection and weighting. The proposed framework is based on employing the joint occurrence statistics of pairs of terms for termset selection and weighting. More specifically, each termset is evaluated by taking into account the simultaneous or individual occurrences of the terms within the termset. Based on the idea that the occurrence of one term but not the other may also convey valuable information for discrimination, the conventionally used term selection schemes are adapted to be employed for termset selection. Similarly, the weight of a selected termset is computed as a function of the terms that occur in the document under concern where a termset is assigned a nonzero weight if either or both of the terms appear in the This weight estimation scheme allows evaluation of the individual occurrences of the terms and their co-occurrences separately so as to compute the document-specific weight of each termset. The proposed termset-based representation is concatenated with the bag-of-words approach to construct the document vectors. Experiments conducted on three widely used datasets have verified the effectiveness of the proposed framework.

出版日期2014-10

全文

访问全文

收藏分享被引(18) 浏览

更新时间：2021-04-17 16:23

A novel framework for termset selection and weighting in binary text classification

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友