Spam query detection using stream clustering

作者:Shakiba Tahere; Zarifzadeh Sajjad*; Derhami Vali
来源:World Wide Web-Internet and Web Information Systems, 2018, 21(2): 557-572.
DOI:10.1007/s11280-017-0471-z

摘要

Nowadays, search engines play a gateway role for users to access their needed information in the Web. However, malicious users can also use them to facilitate their attacks by submitting excessive amounts of bot-generated queries, called spam queries. In this paper, we propose a novel semi-supervised method which can effectively detect spam queries in a practical manner. We first train a model to characterize normal and malicious users, using the linguistic properties of queries as well as the behavioral characteristics of users and IP addresses. Then, we use the trained model to predict the label of arriving requests with a fast and efficient algorithm which works based on the stream clustering approach. The results of our evaluation with the real log of a local search engine show that the proposed algorithm yields an accuracy of about %94, while incurring a low response-time and memory overhead.

  • 出版日期2018-3