摘要

The search results retrieved by search engine are not always reputable for the reason of spam pages and the diverse user preference. Hidden Markov Model is one of the most efiective methods which uses the users log to predict users preference and filter the spam pages. However, Hidden Markov Model needs optimize parameter for each user to retrieve personalized search results, and it is time-consuming. In this paper, we propose an efiective personalized spam page detection method which is achieved by gathering the similar users into a cluster using K-means algorithm, optimizing the parameter of Hidden Markov Model for the cluster instead of each user, and predicting the users preference with Hidden Markov Model. In the experiment we compare our method with the naive Hidden Markov Model based method and a typical page rank based methods. The experimental results show that our proposed method is better than the page rank based method, and the eficiency of our method is better than the naive HMM based method while the accuracy is almost same.

全文