Deep canonical correlation analysis with progressive and hypergraph learning for cross-modal retrieval

Shao, Jie<sup>*</sup>; Wang, Leiquan; Zhao, Zhicheng; Su, Fei; Cai, Anni

doi:10.1016/j.neucom.2016.06.047

摘要

This paper deals with the problem of modeling Internet images and associated texts for cross-modal retrieval such as text-to-image retrieval and image-to-text retrieval. We start with deep canonical correlation analysis (DCCA), a deep approach for mapping text and image pairs into a common latent space. We first propose a novel progressive framework and embed DCCA in it. In our progressive framework, a linear projection loss layer is inserted before the nonlinear hidden layers of a deep network. The training of linear projection and the training of nonlinear layers are combined to ensure that the linear projection is well matched with the nonlinear processing stages and good representations of the input raw data are learned at the output of the network. Then we introduce a hypergraph semantic embedding (HSE) method, which extracts latent semantics from texts, into DCCA to regularize the latent space learned by image view and text view. In addition, a search-based similarity measure is proposed to score relevance of image-text pairs. Based on the above ideas, we propose a model, called DCCA-PHS, for cross-modal retrieval. Experiments on three publicly available data sets show that DCCA-PHS is effective and efficient, and achieves state-of-the-art performance for unsupervised scenario.

出版日期2016-11-19
单位北京邮电大学; 中国石油大学（华东）; 中国石油大学（北京）

全文

访问全文

收藏分享被引(16) 浏览

更新时间：2024-05-12 17:21

Deep canonical correlation analysis with progressive and hypergraph learning for cross-modal retrieval

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友