摘要

We consider the problem of collaborative filtering from a channel coding perspective. We model the underlying rating matrix as a finite alphabet matrix with block constant structure. The observations are obtained from this underlying matrix through a discrete memoryless channel with a noisy part representing noisy user behavior and an erasure part representing missing data. Moreover, the clusters over which the underlying matrix is constant are unknown. We establish a threshold result for this model: if the largest cluster size is smaller than C(1) log(mn) (where the rating matrix is of size m x n), then the underlying matrix cannot be recovered with any estimator, but if the smallest cluster size is larger than C(2) log(mn), then we show a polynomial time estimator with asymptotically vanishing probability of error. In the case of uniform cluster size, not only the order of the threshold, but also the constant is identified.

  • 出版日期2011-4