A paper-text perspective Studies on the influence of feature granularity for Chinese short-text-classification in the Big Data era

Wang, Hao; Deng, Sanhong<sup>*</sup>

doi:10.1108/EL-09-2016-0192

摘要

Purpose - In the era of Big Data, network digital resources are growing rapidly, especially the short-text resources, such as tweets, comments, messages and so on, are showing a vigorous vitality. This study aims to compare the categories discriminative capacity (CDC) of Chinese language fragments with different granularities and to explore and verify feasibility, rationality and effectiveness of the low-granularity feature, such as Chinese characters in Chinese short-text classification (CSTC). Design/methodology/approach - This study takes discipline classification of journal articles from CSSCI as a simulation environment. On the basis of sorting out the distribution rules of classification features with various granularities, including keywords, terms and characters, the classification effects accessed by the SVM algorithm are comprehensively compared and evaluated from three angles of using the same experiment samples, testing before and after feature optimization, and introducing external data. Findings - The granularity of a classification feature has an important impact on CSTC. In general, the larger the granularity is, the better the classification result is, and vice versa. However, a low-granularity feature is also feasible, and its CDC could be improved by reasonable weight setting, even exceeding a high-granularity feature if synthetically considering classification precision, computational complexity and text coverage. Originality/value - This is the first study to propose that Chinese characters are more suitable as descriptive features in CSTC than terms and keywords and to demonstrate that CDC of Chinese character features could be strengthened by mixing frequency and position as weight.

出版日期2017
单位南京大学

全文

访问全文

收藏分享被引(2) 浏览

更新时间：2021-01-17 12:41

A paper-text perspective Studies on the influence of feature granularity for Chinese short-text-classification in the Big Data era

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友