Attention driven multi-modal similarity learning

Gao, Xinjian; Mu, Tingting<sup>*</sup>; Goulermas, John Y.; Wang, Meng

doi:10.1016/j.ins.2017.08.026

摘要

To learn a function for measuring similarity or relevance between objects is an important machine learning task, referred to as similarity learning. Conventional methods are usually insufficient for processing complex patterns, while more sophisticated methods produce results supported by parameters and mathematical operations that are hard to interpret. To improve both model robustness and interpretability, we propose a novel attention driven multi-modal algorithm, which learns a distributed similarity score over different relation modalities and develops an interaction-oriented dynamic attention mechanism to selectively focus on salient patches of objects of interest. Neural networks are used to generate a set of high-level representation vectors for both the entire object and its segmented patches. Multi-view local neighboring structures between objects are encoded in the high-level object representation through an unsupervised pre-training procedure. By initializing the relation embeddings with object cluster centers, each relation modality can be reasonably interpreted as a semantic topic. A layer-wise training scheme based on a mixture of unsupervised and supervised training is proposed to improve generalization. The effectiveness of the proposed method and its superior performance compared against state-of-the-art algorithms are demonstrated through evaluations based on different image retrieval tasks.

出版日期2018-3
单位合肥工业大学

全文

访问全文

收藏分享被引(12) 浏览

更新时间：2024-04-17 05:29

Attention driven multi-modal similarity learning

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友