摘要

Data fusion is used to integrate features from heterogenous data sources into a consistent and accurate representation for certain learning tasks. As an effective technique for data fusion, unsupervised multimodal feature representation aims to learn discriminative features, indicating the improvement of classification and clustering performance of learning algorithms. However, it is a challenging issue since varying modality favors different structural learning. In this paper, we propose an efficient feature learning method to represent multimodal images as a sparse multigraph structure embedding problem. First, an effective algorithm is proposed to learn a sparse multigraph construction from multimodal data, where each modality corresponds to one regularized graph structure. Second, incorporating the learned multigraph structure, the feature learning problem for multimodal images is formulated as a form of matrix factorization. An efficient corresponding algorithm is developed to optimize the problem and its convergence is also proved. Finally, the proposed method is compared with several state-of-the-art single-modal and multimodal feature learning techniques in eight publicly available face image datasets. Comprehensive experimental results demonstrate that the proposed method outperforms the existing ones in terms of clustering performance for all tested datasets.