A study on real-time low-quality content detection on Twitter from the users' perspective

作者:Chen Weiling*; Yeo Chai Kiat; Lau Chiew Tong; Lee Bu Sung
来源:PLos One, 2017, 12(8): e0182487.
DOI:10.1371/journal.pone.0182487

摘要

Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts users' content browsing experience most. The aim of our work is to detect low-quality content from the users' perspective in real time. To define low-quality content comprehensibly, Expectation Maximization (EM) algorithm is first used to coarsely classify low-quality tweets into four categories. Based on this preliminary study, a survey is carefully designed to gather users' opinions on different categories of low-quality content. Both direct and indirect features including newly proposed features are identified to characterize all types of low-quality content. We then further combine word level analysis with the identified features and build a keyword blacklist dictionary to improve the detection performance. We manually label an extensive Twitter dataset of 100,000 tweets and perform low-quality content detection in real time based on the characterized significant features and word level analysis. The results of our research show that our method has a high accuracy of 0.9711 and a good F1 of 0.8379 based on a random forest classifier with real time performance in the detection of low-quality content in tweets. Our work therefore achieves a positive impact in improving user experience in browsing social media content.

  • 出版日期2017-8-9
  • 单位南阳理工学院