An automatical moderating system for FML using hashing regression

作者:Zhang, Peichao; Guo, Minyi
来源:9th International Conference on Advanced Data Mining and Applications, ADMA 2013, China,Zhejiang,Hangzhou, 2013-12-14 to 2013-12-16.
DOI:10.1007/978-3-642-53917-6_13

摘要

In this paper we propose a novel machine learning application on a funny story sharing website for automatical moderation of newly submitted posts based on their content and metadata. This is a challenging task due to the limitation of a machine to understand a joke and the fact that the content of each post is quite short. We collect all the posts of the website using a web crawler, and then extract the features of the posts with the help of some natural language processing (NLP) tools. Finally we utilize a regression model based on approximate nearest neighbor (ANN) search to predict the number of votes for a given post to achieve the goal of determining its quality. Hashing techniques are used to address the curse of dimensionality issue and also for its fast query speed and low storage cost. The experiment shows that our system can achieve a satisfactory performance using various hashing methods.

全文