摘要

Cold start problem for new users and new items is a major challenge facing most collaborative filtering systems. Existing methods to collaborative filtering (CF) emphasize to scale well up to large and sparse dataset, lacking of scalable approach to dealing with new data. In this paper, we consider a novel method for alleviating the problem by incorporating content-based information about users and items, i.e., tags and keywords. The user-item ratings imply the relevance of users' tags to items' keywords, so we convert the direct prediction on the user-item rating matrix into the indirect prediction on the tag-keyword relation matrix that adopts to the emergence of new data. We first propose a novel neighborhood approach for building the tag-keyword relation matrix based on the statistics of tag-keyword pairs in the ratings. Then, with the relation matrix, we propose a 3-factor matrix factorization model over the rating matrix, for learning every user's interest vector for selected tags and every item's correlation vector for extracted keywords. Finally, we integrate the relation matrix with the two kinds of vectors to make recommendations. Experiments on real dataset demonstrate that our method not only outperforms other state-of-the-art CF algorithms for historical data, but also has good scalability for new data.