摘要

Non-negative Matrix Factorization (NMF) plays an important role in many data mining applications for low-rank representation and analysis. Due to the sparsity that is caused by missing information in many high-dimension scenes, e.g., social networks or recommender systems, NMF cannot mine a more accurate representation from the explicit information. Manifold learning can incorporate the intrinsic geometry of the data, which is combined with a neighborhood with implicit information. Thus, manifold-regularized NMF (MNMF) can realize a more compact representation for the sparse data. However, MNMF suffers from (a) the forming of large-scale Laplacian matrices, (b) frequent large-scale matrix manipulation, and (c) the involved K-nearest neighbor points, which will result in the over-writing problem in parallelization. To address these issues, a single-thread-based MNMF model is proposed on two types of divergence, i.e., Euclidean distance and Kullback-Leibler (KL) divergence, which depends only on the involved feature-tuples' multiplication and summation and can avoid large-scale matrix manipulation. Furthermore, this model can remove the dependence among the feature vectors with fine-grain parallelization inherence. On that basis, a CUDA parallelization MNMF (CUMNMF) is presented on GPU computing. From the experimental results, CUMNMF achieves a 20X speedup compared with MNMF, as well as a lower time complexity and space requirement.