Addressing the class imbalance problem in Twitter spam detection using ensemble learning

Liu Shigang; Wang Yu<sup>*</sup>; Zhang Jun; Chen Chao; Xiang Yang

doi:10.1016/j.cose.2016.12.004

摘要

In recent years, microblogging sites like Twitter have become an important and popular source for real-time information and news dissemination, and they have become a prime target of spammers inevitably. A series of incidents have shown that the security threats caused by Twitter spam can reach far beyond the social media platform to impact the real world. To mitigate the threat, a lot of recent studies apply machine learning techniques to classify Twitter spam and promising results are reported. However, most of these studies overlook the class imbalance problem in real-world Twitter data. In this paper, we experimentally demonstrate that the unequal distribution between spam and non-spam classes has a great impact on spam detection rate. To address the problem, we propose FOS, a fuzzy-based oversampling method that generates synthetic data samples from limited observed samples based on the idea of fuzzy-based information decomposition. Moreover, we develop an ensemble learning approach that learns more accurate classifiers from imbalanced data in three steps. In the first step, the class distribution in the imbalanced data set is adjusted by using various strategies, including random oversampling, random undersampling and FOS. In the second step, a classification model is built upon each of the redistributed data sets. In the final step, a majority voting scheme is introduced to combine the predictions from all the classification models. We conduct experiments on real-world Twitter data for the purpose of evaluation. The results indicate that the proposed learning approach can significantly improve the spam detection rate in data sets with imbalanced class distribution.

出版日期2017-8

全文

访问全文

收藏分享被引(72) 浏览

更新时间：2024-04-16 10:22

Addressing the class imbalance problem in Twitter spam detection using ensemble learning

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友